ElementTree - findall to recursively select all child elements

Question

Python code:

import xml.etree.ElementTree as ET
root = ET.parse("h.xml")
print root.findall('saybye')

h.xml code:

<hello>
  <saybye>
   <saybye>
   </saybye>
  </saybye>
  <saybye>
  </saybye>
</hello>

Code outputs,

[<Element 'saybye' at 0x7fdbcbbec690>, <Element 'saybye' at 0x7fdbcbbec790>]

saybye which is a child of another saybye is not selected here. So, how to instruct findall to recursively walk down the DOM tree and collect all three saybye elements?

score 15 · Answer 1 · answered Aug 09 '17 at 10:41

15

From version 2.7 on, you can use xml.etree.ElementTree.Element.iter:

import xml.etree.ElementTree as ET
root = ET.parse("h.xml")
print root.iter('saybye')

See 19.7. xml.etree.ElementTree — The ElementTree XML API

answered Aug 09 '17 at 10:41

Ingo Schalk-Schupp

833
9
25

3

unfortunately they forgot the namespaces for that one – kassiopeia May 12 '18 at 16:53
@kassiopeia: I am not sure I understand what you mean. Could you help me out? – Ingo Schalk-Schupp May 12 '18 at 23:51
2

In python 3 any of the functions `find()`, `findall()`, `findtext()` and even `iterfind()` have an optional `namespaces` argument to specify a dictionary with namespaces. Only `iter()` does not. See: https://docs.python.org/3/library/xml.etree.elementtree.html#elementtree-objects – kassiopeia May 13 '18 at 08:32

score 13 · Answer 2 · edited Apr 16 '21 at 17:44

13

If you aren't afraid of a little XPath, you can use the // syntax that means find any descendant node:

import xml.etree.ElementTree as ET
root = ET.parse("h.xml")
print(root.findall('.//saybye'))

Full XPath isn't supported, but here's the list of what is: https://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax

edited Apr 16 '21 at 17:44

Pikamander2

5,875
3
42
57

answered Mar 07 '19 at 17:09

Lowell

353
4
9

score 4 · Accepted Answer · edited Apr 11 '17 at 12:01

Quoting findall,

Element.findall() finds only elements with a tag which are direct children of the current element.

Since it finds only the direct children, we need to recursively find other children, like this

>>> import xml.etree.ElementTree as ET
>>> 
>>> def find_rec(node, element, result):
...     for item in node.findall(element):
...         result.append(item)
...         find_rec(item, element, result)
...     return result
... 
>>> find_rec(ET.parse("h.xml"), 'saybye', [])
[<Element 'saybye' at 0x7f4fce206710>, <Element 'saybye' at 0x7f4fce206750>, <Element 'saybye' at 0x7f4fce2067d0>]

Even better, make it a generator function, like this

>>> def find_rec(node, element):
...     for item in node.findall(element):
...         yield item
...         for child in find_rec(item, element):
...             yield child
... 
>>> list(find_rec(ET.parse("h.xml"), 'saybye'))
[<Element 'saybye' at 0x7f4fce206a50>, <Element 'saybye' at 0x7f4fce206ad0>, <Element 'saybye' at 0x7f4fce206b10>]

In this manner you return `` nodes which are either direct children of root or have `` as parent because you don't traverse the whole tree. — Maksym Ganenko, Apr 26 '19 at 12:06

score 0 · Answer 4 · answered Nov 19 '16 at 15:12

Element.findall() finds only elements with a tag which are direct children of the current element.

we need to recursively traversing all childrens to find elements matching your element.

def find_rec(node, element):
    def _find_rec(node, element, result):
        for el in node.getchildren():
            _find_rec(el, element, result)
        if node.tag == element:
            result.append(node)
    res = list()
    _find_rec(node, element, res)
    return res

ElementTree - findall to recursively select all child elements

4 Answers4

Linked