Development • Apr 8th '16

Filtering and Searching In Visible Content With AngularJS (Part 2/2)

In a previous post I had illustrated that a regular AngularJS filter is seldom sufficient to implement a search because the data displayed differs significantly from the data stored in the application model. Today I will present a solution to the problem that applies the filter to exactly what the user sees.

The problem can be seen in the fictuous subscriber list from the last part. Searching for "Italy" does not work because the raw data only contains the string "IT", not "Italy". In the last part we have also seen that the problem can only be fixed by a custom filter that searches through the visible data, not the raw data from the model.

You can also follow locally:

$ git clone -b visible git://git.guido-flohr.net/web/angular/angular-filter-visible-content.git
$ cd angular-filter-visible-content
$ npm start

This will start the final (working) version. The initial state is tagged with "wrapper-filter":

$ git checkout wrapper-filter

We will call our AngularJS filter "visibleFilter". Look into src/app/shared/visibleFilter.js:

'use strict';

angular.module('myApp')
.filter('visibleFilter', [
    '$filter',
function($filter) {
    return function (array, search) {
        return $filter('filter')(array, search);
    };
}]);

This filter is a mere wrapper around the standard AngularJS filter (called "filter"). Every AngularJS filter is called with the array to be filtered as the first argument plus the arguments specified in the HTML:

<div class="row table-row"
     ng-class="{'odd': $index % 2 === 1, 'even': $index % 2 === 0}"
     ng-repeat="subscriber in subscribers | visibleFilter: query">

The variable query is bound to the input field with the search query and the behavior of the application is exactly as before.

Extracting and Filtering Visible Content

The first step is to extract the content that is actually visible and filter that instead of the original raw data.

For that the filter has to know where the content is stored in the DOM. We therefore give the entire table body an id attribute subscribers, all table rows get a class name table-row, and all table cells get a class name table-cell:

<div id="subscribers">
  <div class="row table-row" data-pkey="{{subscriber.id}}"
       ng-class="{'odd': $index % 2 === 1, 'even': $index % 2 === 0}"
       ng-repeat="subscriber in subscribers 
                  | visibleFilter:query:'subscribers':'id'">
    <div class="col-md-3 table-cell">
      {{ subscriber.givenName }} {{ subscriber.surname }}
    </div>
    <div class="col-md-3 table-cell">
      <span ng-show="subscriber.showEmail">{{ subscriber.email }}</span>
      <span ng-hide="subscriber.showEmail">-</span>
    </div>
    <div class="col-md-2 table-cell" table-cell>
      {{ subscriber.date | date }}
    </div>
    <div class="col-md-2 table-cell" table-cell>
      {{ subscriber.postings | number }}
    </div>
    <div class="col-md-2 table-cell" ng-show="showCountry">
      {{ subscriber.country | isoCountry }}
    </div>
  </div>
</div>

There are more changes: Every table row gets a data attribute data-pkey that holds the subscriber id, the primary key of our model. And the filter is called with two more string arguments: "subscribers" is the id of the HTML element that contains the data. And "id" is the property name in our data hash that contains the primary key. Remember that an example data record looks like this:

{
    "country" : "IT",
    "surname" : "Pirozzi",
    "id" : "jzhodg-6388-694720",
    "email" : "a.pirozzi@costa.it"
    "postings" : 3969,
    "givenName" : "Albano"
    "showEmail" : true,
    "date" : 1246609342000
}

The field "id" contains a unique subscriber id that is currently not displayed.

The biggest change is in the filter app/shared/visibleFilter.js:

'use strict';

angular.module('myApp')
.filter('visibleFilter', [
    '$filter',
function($filter) {
    return function (array, search, container, pkey) {
        var visible, subset, filtered, retval = [];

        if (array === undefined || search === undefined)
            return array;

        visible = extractTable(container);

        for (var i = 0; i < visible.length; ++i) {
            for (var j = 0; j < array.length; ++j) {
                if (array[j][pkey] === visible[i].pkey) {
                    array[j][':visible'] = true;
                    break;
                }
            }
            delete visible[i].pkey;
        }

        subset = array.filter(function(item) {
            return item[':visible'];
        });
        for (var i = 0; i < array.length; ++i) {
            delete array[i][':visible'];
        }

        filtered = $filter('filter')(visible, search);

        for (var i = 0, j = 0;
             i < visible.length && j < filtered.length;
             ++i) {
            if (visible[i] === filtered[j]) {
                retval.push(subset[i]);
                ++j;
            }
        }

        return retval;
    };

    function extractTable(container) {
        var elem = document.getElementById(container),
            rows, table = [];

        rows = elem.querySelectorAll('[data-pkey]');
        for (var i = 0; i < rows.length; ++i) {
            var cells = extractRow(rows[i]);

            cells.pkey = rows[i].dataset['pkey'];
            table.push(cells);
        }

        return table;
    }

    function extractRow(row) {
        var cells = row.querySelectorAll('.table-cell'),
             content = [];

        for (var i = 0; i < cells.length; ++i) {
            content.push(extractCell(cells[i]));
        }

        return content;
    }

    function extractCell(cell) {
        return cell.innerText || cell.textContent;
    }
}]);

In line 4 we can see the two new arguments "container" for the HTML id attribute of the element that contains all content, and "id" that is the name of the primary key of our data model.

The early exit in lines 10 and 11 serves two purposes. In the first call to the filter the argument array is still undefined and the filter should return immediately. Likewise, when the query input field is still empty, the entire array should be returned as is. For performance reasons, the same should happen, when the search query is the empty string but that was forgotten here.

The real work is triggered in line 13, where the method extractTable gets called. The definition starts at line 46 and should be pretty self-explanatory. The function grabs all child elements that have a data attribute data-pkey (alternatively you could use the class name table-row), iterates over the rows, and calls another method extractRow that gets the content from each row. That is defined in line 61 and works in the same manner. It gets all child elements that have a class name table-cell and calls the method extractCell on it that (recursively) extracts the textual content of the cell and returns it.

The resulting data structure visible in line 13 is an array of arrays, containing the scraped content of the table structured in rows and cells.

Is it really necessary to structure so deeply? Only rows get filtered. Why not just get the content for each row at once with the DOM property textContent? Imagine a subscriber with the name "Johnny Bad" and the email address "example@example.com". The aggregated content for the table row would be "Johnny Bad example@example.com". Now searching for "bad example" would produce a hit although it should not, just because the table cells are accidentally adjacent. If you think that this is a feature, then go for that approach. It does not change the concept.

Line 54 is important. After getting the content of each row, the array gets an additional property pkey with the content of the attribute data-pkey. This attribute is the glue between the raw data model and the displayed content. With it, we can identify each visible row in our data model.

Now back to line 13. The variable visible is now populated with the visible table contents, and each row is annotated with the primary key of the underlying data model.

This brings us one step further to the solution of the problem. We can now identify the data rows that are currently visible by checking the primary key. That happens in the loop beginning in line 15.

The variable array contains the raw data that AngularJS calls the filter with, the variable visible contains the data that is currently displayed. The next step is to filter out those rows from array that do not correspond to a visible table row. That is easy. We have the property name of the primary key in array (in the variable pkey), and we have annotated the visible array with a property pkey. That property is now copied from the visible array into the data array abusing a property :visible that you have to change in case your data contains such a key. Finally in lines 25 to 27, array is filtered for all rows that have such an attribute :visible into the array variable subset. This variable nows holds the subset of array that represents a currently visible row, that is a row that has not been hidden as the result of a previous search.

The result of the operation is that visible and subset now have the same number of rows, and that they are sorted in the same order. The primary key that we had polluted our data with was deleted on the fly (line 22) so that it cannot produce false positives in the later search.

The original array is cleaned in lines 28 to 30, where the additional property :visible is removed. This is not a cosmetic measure but necessary since array has to be in its initial state for the next invocation of the filter.

In line 32, the actual filtering takes place. But not the raw data in array gets filtered for the search term but the transformed data in visible. The filtered data is stored in a variable filtered. It is the similar to visible, only with the lines not matching the search query being weeded out.

However, an AngularJS filter should return a subset of the original data. We must somehow transform the result of the filtering back into the required structure.

How can this be done? Let's look at what we have! We will simplify things and assume that we are only searching through the countries.

array	visible	subset	filtered	result
`[ 'de' ]`	`[ 'GERMANY' ]`	`[ 'de' ]`	`[ 'GERMANY' ]`	`[ 'de' ]`
`[ 'es' ]`	`[ 'SPAIN' ]`	`[ 'es' ]`	`[ 'FRANCE' ]`	`[ 'fr' ]`
`[ 'fr' ]`	`[ 'FRANCE' ]`	`[ 'fr' ]`
`[ 'it' ]`	`[ 'ITALY' ]`	`[ 'it' ]`
`[ 'se' ]`
`[ 'uk' ]`

This displays the state, when the current search term was "a", and the user has now typed an "n", searching for "an".

The column array contains the complete, raw data. It is actually passed in arbitrary order to the filter function.

The column visible contains the processed data. The first difference is that it has only rows that contain the last search "a". For example the row for the UK and Sweden are missing. The second difference is that the data was modified by the view, in our case the country identifier is transformed into a full country name.

The column subset is the subset of array that corresponds to the visible rows. It was derived from array and visible by comparing the primary keys.

filtered is the subset of visible that matches the new search string "an". Only "Germany" and "France" contain that string.

The column result shows what we want to get. We must return rows in the raw data format (country codes not country names) that correspond to the rows in filtered. result has the same relationship to filtered as subset has to visible. For subset and visible we have used the primary keys of the data rows for identifying the rows but we had to delete them from the data because they would have contributed to the search producing false positives.

But we can make do without the primary keys here. The elements of filtered are the same as the corresponding elements in visible, they are the same objects, they point to the same thing. And visible and subset have the same number of elements, they are sorted the same, and they represent the same data. That means that whenever we have identified a row in visible as a hit for the new search query, we simply stuff the row from subset that has the same index into the result set.

With this knowledge we can construct the array result from filtered. The first element in filtered is (the Javascript object) ['GERMANY']. We search for that object in visible and find it at position 0. We use that index 0 as the index into subset and have the first element of result, ['de'].

The next hit is ['FRANCE']. This can be found at position 2 in visible. Therefore subset[2], that is ['fr'] is the second element of result.

The real implementation starting at line 34 in the code listing above is a little smarter and utilizes the fact that filtered is just a subset of visible but in the same sort order. It iterates with the loop variable i over visible and with j over filtered. i is incremented everytime, j only, when there is a match. In case of a match subset[2] is copied into the result set.

The following animation maybe illustrates this:

i = 0 A C j = 0

i = 1 B E j = 1

i = 2 C G j = 2

i = 3 D H j = 3

i = 4 E

i = 5 F

i = 6 G

i = 7 H

By the way, the state after this step is tagged as "filter-transformed":

$ git checkout filter-transformed

Reset Filter On Query Change

The application has already improved a lot. A search for "Italy" now really displays all rows for subscribers from Italy. But there is no way back. If you delete characters one by one searching now for "Ital", "Ita", "It", "I" and finally for nothing, nothing happens. We still only see the subscribers from Italy and the only way to change this is to reload the page.

The reason is clear. The search is always performed over the currently visible content. Modifying the search can only reduce the number of hits, it can never extend it. Once that content has vanished, it has vanished for good.

So how could this be fixed? One solution would be to extract the content only once, initially, and then always search over the cached content. Although basically working that approach would be a little awkward and it would break if the subscriber list could dynamically change.

Instead we will reset the filter (almost) every time that the search term has changed, forcing AngularJS to render the table for an instant with the entire data before we filter again. Effectively, we will always filter twice. The first time we will return the unfiltered array, so that the page can be re-rendered for the extraction of the visible content. The next time that visible content gets filtered. This is not as inefficient as it sounds because of the way that AngularJS implements the two-way data-binding. It calls the filter twice anyway, and we do not really lose that much.

This is how it can be done:

'use strict';

angular.module('myApp')
.filter('visibleFilter', [
    '$filter',
function($filter) {
    var lastSearches = {};

    return function (array, search, container, pkey) {
        var visible, subset, filtered, retval = [];

        if (array === undefined || search === undefined)
            return array;

        if (container in lastSearches
            && lastSearches[container] !== search
            && search.substr(0, lastSearches[container].length)
                !== lastSearches[container]) {
            lastSearches[container] = search;
            return array;
        }

        lastSearches[container] = search;

        visible = extractTable(container);
        
        ...
}

What has changed? In line 7 a new hash variable lastSearches is defined that will hold the last search term for every table in the page (there could be more than one).

The reset logic is contained in lines 15 to 21. First we check whether a previous query was modified (lines 15 and 16). In lines 17 and 18 we do a little optimization. The most common case is that one or more characters are added to last search term. The user searches for example subsequently for "I", "It", "Ita", "Ital", and finally for "Italy". In that special case it is okay to search in the last result set because it cannot get any larger now. Therefore, an additional check has been added in lines 17 and 18, preventing the reset if the last search term is an initial substring of the new one.

Note that you actually have to disable this optimization if the content can dynamically change because then that assumption would no longer be true. But in this case, the filter would have to be re-evaluated anyway.

On the other hand you could go even a little further. First of all, the substring comparison can be done case-insensitively, and second, the substring does not necessarily have to be anchored to the beginning of the string. But the potential gain would be neglectable because the extra optimization would almost never help, and besides, there is no function like strcasestr() in Javascript and we would have to write extra code.

So, now if the search query has been modified by the user in any other way than just adding characters to the end of the string, the filter is reset to the empty string, the content is extracted again, and then the filter is applied once more. This is happening fast enough so that it is not visible for the user.

The state of the application after this step is tagged in git with "editable-query".

Ignore Hidden Content

The application has that stupid checkbox that hides the country column. Reload the application in the browser and check the box. The country names will disappear. Now search once more for "Italy". You will notice that the result will still be all subscribers from Italy althoug the string "Italy" is displayed nowhere on the page.

This happens because of the way how the content is extracted from the HTML. See the functions extractRow and extractCell:

function extractRow(row) {
        var cells = row.querySelectorAll('.table-cell'),
             content = [];

        for (var i = 0; i < cells.length; ++i) {
            content.push(extractCell(cells[i]));
        }

        return content;
    }

    function extractCell(cell) {
        return cell.innerText || cell.textContent;
    }

The DOM property textContent just evaluates the markup. Content that is hidden by CSS is still considered text content. While there are many ways to hide content by CSS, we just fix the most common case here and check that the CSS attribute display is not set to none because AngularJS uses exactly that attribute for the implementation of ng-show and ng-hide. The two above methods have to be modified as follows:

function extractRow(row) {
        var cells = row.querySelectorAll('.table-cell'),
            content = [];

        for (var i = 0; i < cells.length; ++i) {
            if (cells[i].offsetParent !== null)
                content.push(extractCell(cells[i]));
        }

        return content;
    }

    function extractCell(cell) {
        var children = cell.childNodes, content = '';
        for (var i = 0; i < children.length; ++i) {
            switch(children[i].nodeType) {
            case 1:
                if (children[i].offsetParent !== null)
                    content += extractCell(children[i]);
                break;
            case 3:
                content += children[i].innerText || children[i].textContent;
                break;
            }
        }
        return cell.innerText || cell.textContent;
    }

The function extractCell() had to be changed to go recursively over the cell's content. It iterates over all child nodes of the cell element (line 14). If a child is an element node (type 1) and it is visible (line 18) the function gets called recursively with the child. If the child is a text node (type 3) the inner text gets appended. Additionally in line 6, the initial call to cellContent() has to be avoided if the cell is not visible.

Note that the element attribute offsetParent is null if the element has the attribute display set to "none"!

Now only content that is really displayed by the browser is taken into account for the search.

The current state is tagged as "ignore-hidden" in git.

Displaying the Match Count

One problem remains. Above the table a count of matching rows is displayed and this is still non-functional. A close look on the html reveals why:

<em>
  Displaying {{ (subscribers | filter: query).length }}
  of {{ subscribers.length }} entries.
</em>

The standard filter is still used for computing the number of matches. It would be possible to just replace that with a call to the custom filter but that is not very smart. The count of matching rows is known during the computation and can simply be exported:

'use strict';

angular.module('myApp')
.filter('visibleFilter', [
    '$filter',
    '$rootScope',
function($filter, $rootScope) {
    var lastSearches = {};

    $rootScope.matches = {};

    return function (array, search, container, pkey) {
        var visible, subset, filtered, retval = [];

        if (array === undefined)
            return array;

        if (container !== undefined)
            $rootScope.matches[container] = array.length;

        if (search === undefined)
            return array;

        ...
        
        $rootScope.matches[container] = retval.length;

        return retval;
    };

The count will be stored per table in the root scope variable matches. Therefore $rootScope has to be injected (lines 6 and 7) into the filter component.

In line 19 it is set to the size of the initial array, so that the code does not have to be duplicated for every early exit return statement following.
Finally, before returning the filtered result set it is set to the real size of the result set (line 28).

The HTML also has to be modified a little:

<em>
  Displaying {{ matches['subscribers'] }}
  of {{ subscribers.length }} entries.
</em>

This state can be checked out with the head of the branch "visible":

$ git checkout visible

You can also see the application in action here.

Further Improvements

The solution presented here is still not perfect and may have to be modified here and there for the requirements of a particular project. And there is still room for improvements.

For example, the number of postings is formatted with the standard AngularJS number filter. That means that 1234 is displayed as "1,234" with a comma as the thousands separator. It would probably be better if a search for "1,234" and "1234" would trigger the same result. That could be achieved relatively easily by a little change in the HTML:

<div class="col-md-2 table-cell" data-filter-alt="{{ subscriber.postings }}">
    {{ subscriber.postings | number }}
</div>

The unformatted number is stored in the data attribute filter-alt. The function extractCell() would have to be modified to additionally add the contents of this data attribute to the extracted content.

Another similar improvement would be to allow changing the content altogether:

<div class="col-md-3 table-cell" 
     data-filter-content="{{ subscriber.givenName }} {{ subscriber.surname }}">
    {{ subscriber.givenName }}&nbsp;{{ subscriber.surname }}
</div>

Imagine, the subscriber's first and last name was displayed with a non-breaking space in order to avoid a line break. Searching for "FIRSTNAME LASTNAME" would then no longer work because nobody would type a non-breaking space in the search query.

There are two ways how that could be fixed. In extractCell() all non-breaking spaces (or in general all sequences of whitespace characters) could be replaced by a single space. A more flexible solution is displayed above. The contents of the data attribute filter-content would then simply have precedence over the actual content of the cell.

Another special case can occur if the table contents is user-editable. When editing inside a filtered result set, it is a good idea to copy the content of the original row into some hidden fields so that the row being edited is not suddenly disappearing while editing. But in that case this hidden content should not be ignored.

You will find your own improvements in your own application.

Giving your email address is optional. But please keep in mind that you cannot get a notification about a response without a valid email address. The address will not be displayed with the comment!

Guido Flohr

Filtering and Searching In Visible Content With AngularJS (Part 2/2)

Extracting and Filtering Visible Content

Reset Filter On Query Change

Ignore Hidden Content

Displaying the Match Count

Further Improvements

Leave a comment

Nested <use> inside SVG <symbol>

Home-Made Soured Milk

Pitfalls in Testing NestJS Modules using HttpService

Practice Chess Openings with Anki Flashcards

Filtering and Searching In Visible Content With AngularJS (Part 2/2)

Filtering and Searching In Visible Content With AngularJS (Part 1/2)

Improving the Easy Cube Solving Method

The Easiest Method to Solve the Rubik's Cube

Abusing JSON.stringify()

Hidden Quirks of JavaScript `for...in` Loops

Creating E-Invoices with Free and Open Source Software

Dynamic Angular Configuration

Categories

Tags