15

Benchmark compares QSA & .forEach vs a NodeIterator

toArray(document.querySelectorAll("div > a.klass")).forEach(function (node) {
  // do something with node
});

var filter = {
    acceptNode: function (node) {
        var condition = node.parentNode.tagName === "DIV" &&
            node.classList.contains("klass") &&
            node.tagName === "A";

        return condition ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT
    }  
}
// FIREFOX Y U SUCK
var iter = document.createNodeIterator(document, NodeFilter.SHOW_ELEMENT, filter, false);
var node;
while (node = iter.nextNode()) {
    // do thing with node    
}

Now either NodeIterator's suck or I'm doing it wrong.

Question: When should I use a NodeIterator ?

In case you don't know, DOM4 specifies what NodeIterator is.

Raynos
  • 162,380
  • 56
  • 343
  • 392

2 Answers2

32

NodeIterator (and TreeWalker, for that matter) are almost never used, because of a variety of reasons. This means that information on the topic is scarce and answers like @gsnedders' come to be, which completely miss the mark. I know this question is almost a decade old, so excuse my necromancy.

  1. Initiation & Performance = It is true that the initiation of a NodeIterator is waaay slower than a method like querySelectorAll, but that is not the performance you should be measuring.

The thing about NodeIterators is that they are live-ish in the way that, just like an HTMLCollection or live NodeList, you can keep using the object after initiating it once.
The NodeList returned by querySelectorAll is static and will have to be re-initiated every time you need to match newly added elements.

This version of the jsPerf puts the NodeIterator in the preparation code. The actual test only tries to loop over all newly added elements with iter.nextNode(). You can see that the iterator is now orders of magnitudes faster.

  1. Selector performance = Okay, cool. Caching the iterator is faster. This version, however, shows another significant difference. I've added 10 classes (done[0-9]) that the selectors shouldn't be matching. The iterator loses about 10% of its speed, while the querySelectors lose 20%.

On the other hand, this version, shows what happens when you add another div > at the start of the selector. The iterator loses 33% of its speed, while the querySelectors got a speed INCREASE of 10%.

Removing the initial div > at the start of the selector like in this version shows that both methods become slower, because they match more than earlier versions. Like expected, the iterator is relatively more performant than the querySelectors in this case.

This means that filtering on basis of a node's own properties (its classes, attributes, etc.) is probably faster in a NodeIterator, while having a lot of combinators (>, +, ~, etc.) in your selector probably means querySelectorAll is faster.
This is especially true for the  (space) combinator. Selecting elements with querySelectorAll('article a') is way easier than manually looping over all parents of every a element, looking for one that has a tagName of 'ARTICLE'.

P.S. in §3.2, I give an example of how the exact opposite can be true if you want the opposite of what the space combinator does (exclude a tags with an article ancestor).

3 Impossible selectors

3.1 Simple hierarchical relationships

Of course, manually filtering elements gives you practically unlimited control. This means that you can filter out elements that would normally be impossible to match with CSS selectors. For example, CSS selectors can only "look back" in the way that selecting divs that are preceded by another div is possible with div + div. Selecting divs that are followed by another div is impossible.

However, inside a NodeFilter, you can achieve this by checking node.nextElementSibling.tagName === 'DIV'. The same goes for every selection CSS selectors can't make.

3.2 More global hierarchical relationships

Another thing I personally love about the usage of NodeFilters, is that when passed to a TreeWalker, you can reject a node and its whole sub-tree by returning NodeFilter.FILTER_REJECT instead of NodeFilter.FILTER_SKIP.

Imagine you want to iterate over all a tags on the page, except for ones with an article ancestor.
With querySelectors, you'd type something like

let a = document.querySelectorAll('a')
a = Array.prototype.filter.call(a, function (node) {
  while (node = node.parentElement) if (node.tagName === 'ARTICLE') return false
  return true
})

While in a NodeFilter, you'd only have to type this

return node.tagName === 'ARTICLE' ? NodeFilter.FILTER_REJECT : // ✨ Magic happens here ✨
       node.tagName === 'A'       ? NodeFilter.FILTER_ACCEPT :
                                    NodeFilter.FILTER_SKIP

In conclusion

You don't initiate the API every time you need to iterate over nodes of the same kind. Sadly, that assumption was made with the question being asked, and the +500 answer (giving it a lot more credit) doesn't even address the error or any of the perks NodeIterators have.

There's two main advantages NodeIterators have to offer:

  • Live-ishness, as discussed in §1
  • Advanced filtering, as discussed in §3
    (I can't stress enough how useful the NodeFilter.FILTER_REJECT example is)

However, don't use NodeIterators when any of the following is true:

  • Its instance is only going to be used once/a few times
  • Complex hierarchical relationships are queried that are possible with CSS selectors
    (i.e. body.no-js article > div > div a[href^="/"])

Sorry for the long answer :)
Gust van de Wal
  • 5,052
  • 1
  • 20
  • 45
  • 2
    This is a first-rate answer. It covers multiple cases with the pros and cons of each, explains *why* things work like they do, and includes an excellent summary at the end. More importantly, this was written towards the end of 2019. A LOT has happened to browsers and the web in the 8 years since the official “answer”. If possible, _this_ should become the official answer for 2020 and the foreseeable future. – Zearin Jan 10 '20 at 16:57
  • *"NodeIterator (and TreeWalker, for that matter) are almost never used"* - I've had the pleasure of using it in a few projects. I agree it is rare to need it, BUT sometimes you do need a to go **deep** into a node tree, and this really helps. I would say a developer might use these ~5 times total in their life. – vsync Feb 24 '20 at 20:05
  • 1
    @Gust https://jsperf.com/qsa-vs-node-iterator/17 doesn't actually benchmark the same things: the node iterator benchmark only iterates over the iterator on the _first_ iteration of the benchmark loop (the setup code is [only run once](https://benchmarkjs.com/docs#prototype_setup), not before each run); for all further runs `nextNode` returns `null` immediately. [Creating the iterator](https://jsperf.com/qsa-vs-node-iterator/27) is actually _really_ cheap, because it doesn't actually have to do much at all! – gsnedders Jun 14 '20 at 02:30
  • More generally, at this point the original benchmarks are rather worthless: JS VMs nowadays are smart enough (and know enough about the DOM objects) to realise the loops are no-ops and avoid the iteration entirely(!). – gsnedders Jun 14 '20 at 02:42
  • I've tried contacting JSPerf's creator regarding the timing inconsistenties. I'd noticed something was off when I moved things around, but he never responded and I couldn't find much useful information myself. Thanks for looking into this; will edit sometime soon! – Gust van de Wal Jun 14 '20 at 11:52
  • @GustvandeWal https://mrale.ph/blog/2012/12/15/microbenchmarks-fairy-tale.html is old but still fundamentality accurate; https://stackoverflow.com/questions/36159794/how-can-i-avoid-optimizing-away-my-javascript-test-cases is a question here about benchmarking. – gsnedders Jun 15 '20 at 01:12
  • If we go by the knowledge that creating the iterator is really cheap, then [this](https://jsperf.com/qsa-vs-node-iterator/31) would appear to be a better benchmark. Note that this takes the filter function out of the benchmark loop and shares it with the manual iteration. – gsnedders Jun 15 '20 at 01:14
14

It's slow for a variety of reasons. Most obviously is the fact that nobody uses it so quite simply far less time has been spent optimizing it. The other problem is it's massively re-entrant, every node having to call into JS and run the filter function.

If you look at revision three of the benchmark, you'll find I've added a reimplementation of what the iterator is doing using getElementsByTagName("*") and then running an identical filter on that. As the results show, it's massively quicker. Going JS -> C++ -> JS is slow.

Filtering the nodes entirely in JS (the getElementsByTagName case) or C++ (the querySelectorAll case) is far quicker than doing it by repeatedly crossing the boundary.

Note also selector matching, as used by querySelectorAll, is comparatively smart: it does right-to-left matching and is based on pre-computed caches (most browsers will iterate over a cached list of all elements with the class "klass", check if it's an a element, and then check if the parent is a div) and hence they won't even bother with iterating over the entire document.

Given that, when to use NodeIterator? Basically never in JavaScript, at least. In languages such as Java (undoubtedly the primary reason why there's an interface called NodeIterator), it will likely be just as quick as anything else, as then your filter will be in the same language as the filter. Apart from that, the only other time it makes sense is in languages where the memory usage of creating a Node object is far greater than the internal representation of the Node.

gsnedders
  • 5,322
  • 1
  • 29
  • 40