7

I'm in a position where I use Java to connect to a TCP port and am streamed XML documents one after another, each delimited with the <?xml start of document tag. An example which demonstrates the format:

<?xml version="1.0"?>
<person>
    <name>Fred Bloggs</name>
</person>
<?xml version="1.0"?>
<person>
    <name>Peter Jones</name>
</person>

I'm using the org.xml.sax.* api. The SAX parsing works perfectly for the first document but throws an exception when it comes across the start of the second document:

Exception in thread "main" org.xml.sax.SAXParseException: The processing instruction 
target matching "[xX][mM][lL]" is not allowed.

The following skeleton class demonstrates the setup I'm using:

import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

import java.io.FileReader;

public class XMLTest extends DefaultHandler {

  public XMLTest() {
     super();
  }

  public static void main(String[] args) throws Exception {
    XMLReader xr = XMLReaderFactory.createXMLReader();

    XMLTest handler = new XMLTest();
    xr.setContentHandler(handler);
    xr.setErrorHandler(handler);

    xr.parse(new InputSource(new Socket("127.0.0.1", 4555).getInputStream()));
  }
}

I have no control over the format of the xml (it's a financial data feed), but I need to be able to parse it efficiently, and parse all the documents. I've spent the afternoon/evening trying different things but none have yielded results. Any help would be greatly appreciated.

jkt
  • 73
  • 1
  • 4
  • You have to call parse for each separate document, which means you need to filter and break up the input stream on the ' – Steven D. Majewski Jul 21 '10 at 18:53
  • I had to do something like this and just replied (to myself) [here](http://stackoverflow.com/questions/6711766/multiple-xml-files-in-one-stream/) wrapping everything in its own Reader for simpler use – Filipe Pina Jul 27 '11 at 17:23

1 Answers1

7

You'd like to split the stream on every <?xml version="1.0"?> and parse them all separately. The BufferedReader may be helpful in this. Kickoff example:

reader = new BufferedReader(new InputStreamReader(input, "UTF-8"));
StringBuilder builder = null;
for (String line; (line = reader.readLine()) != null;) {
    if (line.startsWith("<?xml")) {
        if (builder != null) {
            xr.parse(new InputSource(builder.toString()));
        }
        builder = new StringBuilder();
    }
    builder.append(line);
}
BalusC
  • 1,040,783
  • 362
  • 3,548
  • 3,513
  • When doing this when `input` is `InputStream input = new Socket("127.0.0.1", 4500).getInputStream();` I get the following exception: Exception in thread "main" java.io.FileNotFoundException: /Users/admin/IdeaProjects/XMLTest/< (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:106) at java.io.FileInputStream.(FileInputStream.java:66) It seems xr.parse() doesn't like strings, even when wrapped as an InputSource. – jkt Jul 21 '10 at 19:14
  • Do you consider yourself capable to interpret stacktraces? I don't see how `FileNotFoundException` is related to this all. I'd say, your problem lies somewhere else, maybe in the step beyond parsing. The in the exception message given filename `/Users/admin/IdeaProjects/XMLTest/ – BalusC Jul 21 '10 at 19:18
  • Hey, I can read stacktraces - I only pasted the first few lines. The stacktrace pointer to my code is `at XMLTest.main(XMLTest.java:42)` and line 42 is: `xr.parse(new InputSource(builder.toString()));` (which is from your example above). I appreciate your assistance with this. – jkt Jul 21 '10 at 19:34
  • The solution is to wrap the StringBuilder in a StringReader, ie: `xr.parse(new InputSource(new StringReader(builder.toString())));` Thanks for your assistance! – jkt Jul 21 '10 at 19:42