3

This is the XML document that I have:

<products xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Product Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"></Attribute>
      </Product>
      <Attributes xmlns="http://some/path/to/entity/def">
        <Attribute Name="Identifier">NumberOne</Attribute>
      </Attributes>
  </Product>
  <Product Id="2">
    <Attributes xmlns="http://some/path/to/entity/def">
      <Attribute Name="Identifier">NumberTwo</Attribute>
    </Attributes>
  </Product>
</products>

I'm trying to use XPath for getting a Product by its child Attributes.Attribute[Name=Identifier] value (e.g. "NumberOne"). So in that case my expected result would be:

<Product Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"></Attribute>
      </Product>
      <Attributes xmlns="http://some/path/to/entity/def">
        <Attribute Name="Identifier">NumberOne</Attribute>
      </Attributes>
</Product>

Based on this explanation, I tried to implement the query in Python by using the lxml lib:

found_products = xml_tree_from_string.xpath('//products//Product[c:Attributes[Attribute[@Name="Identifier" and text()="NumberOne"]]]', namespaces={"c": "http://some/path/to/entity/def"})

Unfortunately, this never returns a result due to the Attributes namespace definition.

What am I missing?

kjhughes
  • 98,039
  • 18
  • 159
  • 218
user2549803
  • 323
  • 3
  • 14

2 Answers2

2

What am I missing?

You're missing that Attribute is also in the same namespace as Attributes because default namespace declarations are inherited by descendent XML elements.

So, just add a c: to Attribute in your XPath, and it should work as you observed in your comment to Jack's answer.

kjhughes
  • 98,039
  • 18
  • 159
  • 218
1

You need to first define a namespace map, declare a prefix for those namespaces that don't have one (as is the case here) and then apply xpath:

from lxml import etree
prods ="""[your xml above]"""
ns = { (k if k else "xx"):(v) for k, v in doc.xpath('//namespace::*') } #create ns map
doc = etree.XML(prods)
for product in doc.xpath('//products//Product[.//xx:Attribute[@Name="Identifier"][text()="NumberOne"]]', namespaces=ns):
    print(etree.tostring(product).decode())

Output:

<Product xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"/>
      </Product>
      <Attributes xmlns="http://some/path/to/entity/def">
        <Attribute Name="Identifier">NumberOne</Attribute>
      </Attributes>
  </Product>

To suppress the namespaces attributes, change the for loop to:

for product in doc.xpath('//products//Product[.//xx:Attribute[@Name="Identifier"][text()="NumberOne"]]', namespaces=ns):
    etree.cleanup_namespaces(doc) #note: the parameter is "doc", not "product"
    print(etree.tostring(product).decode())

Output:

<Product Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"/>
      </Product>
      <Attributes xmlns="http://some/path/to/entity/def">
        <Attribute Name="Identifier">NumberOne</Attribute>
      </Attributes>
  </Product>
Jack Fleeting
  • 20,849
  • 6
  • 20
  • 43
  • Thanks a lot. Is it possible to keep the top level Product definition as it was in the original XML file? I mean without the xmlns definitions. Fun fact - alternatively my initial approach works if I add the c: prefix to the Attribute as well. – user2549803 Dec 20 '20 at 14:30
  • @user2549803 Yes, it's possible. See edit. – Jack Fleeting Dec 20 '20 at 14:35
  • @JackFleeting: Dynamically creating the namespace prefix map like this would be overkill in most situations, including this one, and requires a more sophisticated approach to account for the possibility of different default namespaces at different points in the XML hierarchy. – kjhughes Dec 20 '20 at 16:37
  • @kjhughes Absolutely right - except that when I go to the other extreme and suggest a purely `local-name()` based solution, I get whacked upside the head for taking a cavalier attitude to namespaces.. – Jack Fleeting Dec 20 '20 at 17:51
  • Oh no, wasn't suggesting that you defeat namespaces -- just that you use the namespace prefix mechanism directly without the partially general generation code you have. At least state its limitations so that future readers won't be surprised that it's not as general as it appears to be. Really, though, I'd just back off the generality and fix OP's XPath to include `c:` on the descendent elements of `Attribute` and call it done. Feel free to pull from my answer below and elaborate as needed. – kjhughes Dec 20 '20 at 18:00
  • @JackFleeting: Am I understanding you right that there is an equivalent solution to my XPath query that is based on local-name() ? Out of curiosity - what would that look like? Might come in handy some day :) – user2549803 Dec 20 '20 at 19:27
  • @user2549803: See **Defeating Namespaces** section of [my answer here](https://stackoverflow.com/a/40796315/290085), but be sure to read the part about why you should not do it. – kjhughes Dec 20 '20 at 19:36