7

I want to check to see if an XML document contains a 'person' element anywhere inside. I can check all the first-generation elements very simply:

NodeList nodeList = root.getChildNodes();
for(int i=0; i<nodeList.getLength(); i++){
  Node childNode = nodeList.item(i);
  if (childNode.getNodeName() == "person") {
     //do something with it
  }
}

And and I can add more loops to go into subelements, but I would have to know how many nested loops to put in to determine how far into the document to drill. I could nest 10 loops, and end up with a person element nested 12 elements deep in a given document. I need to be able to pull out the element not matter how deeply nested it is.

Is there way to harvest elements from an entire document? Like return the text values of all tags as an array or iterate over it?

Something akin to python's elementtree 'findall' method perhaps:

for person in tree.findall('//person'):
   personlist.append(person)
directedition
  • 10,495
  • 17
  • 57
  • 78

5 Answers5

10

As mmyers states, you could use recursion for this problem.

doSomethingWithAll(root.getChildNodes());

void doSomethingWithAll(NodeList nodeList)
{
    for (int i = 0; i < nodeList.getLength(); i++) {
        Node childNode = nodeList.item(i);
        if (childNode.getNodeName().equals("person")) {
            //do something with it
        }

        NodeList children = childNode.getChildNodes();
        if (children != null)
        {
            doSomethingWithAll(children);
        }
    }
}
nobody
  • 19,421
  • 17
  • 55
  • 76
user125661
  • 1,528
  • 10
  • 28
10

I see three possiblities (two of which others have answered):

  1. Use recursion.
  2. Use XPath (might be a bit overkill for this problem, but if you have a lot of queries like this it is definitely something to explore). Use kdgregory's help on that; a quick look at the api indicated that it is a bit painful to use directly.
  3. If what you have is in fact a Document (that is if root is a Document), you can use Document.getElementsByTagName
Kathy Van Stone
  • 24,585
  • 3
  • 31
  • 40
4

That's what XPath is for. To get all elements named "person", here's the expression:

//person

It can be painful to use the JDK's XPath APIs directly. I prefer the wrappers that I wrote in the Practical XML library: http://practicalxml.sourceforge.net/

And here's a tutorial that I wrote (on JDK XPath in general, but mentions XPathWrapper): http://www.kdgregory.com/index.php?page=xml.xpath

kdgregory
  • 37,714
  • 10
  • 75
  • 100
2

Here is the formatted version:

Element root = xmlData.getDocumentElement();  
NodeList children = root.getChildNodes(); 

public void doSomethingWithAllToConsole(NodeList nodeList, String tabs)
{
    for(int i=0; i<nodeList.getLength(); i++){

      //print current node & values
      Node childNode = nodeList.item(i);
      if(childNode.getNodeType()==Node.ELEMENT_NODE){
          System.out.print(tabs + childNode.getNodeName());
          if(childNode.getFirstChild()!=null 
                  && childNode.getFirstChild().getNodeType()==Node.TEXT_NODE
                  && !StringUtil.isNullOrEmpty(childNode.getFirstChild().getNodeValue()) ){
              System.out.print(" = " + childNode.getFirstChild().getNodeValue());
          }
          System.out.println();
      }

      //recursively iterate through child nodes
      NodeList children = childNode.getChildNodes();
      if (children != null)
      {
          doSomethingWithAllToConsole(children, tabs+"\t");
      }
    }
}
devrys
  • 1,561
  • 3
  • 28
  • 42
parser
  • 21
  • 1
0

Apart from Document.getElementsByTagName() or XPath, you could also use jOOX, a library that I have created for simpler XML access and manipulation. jOOX wraps standard Java API's and adds jquery-like utility methods. Your Python code snippet would then translate to this Java code:

// Just looking for tag names
for (Element person : $(tree).find("person")) {
  personlist.append(person);
}

// Use XPath for more elaborate queries
for (Element person : $(tree).xpath("//person")) {
  personlist.append(person);
}
Lukas Eder
  • 196,412
  • 123
  • 648
  • 1,411