2

I have this very simple python code to read xml for the wikipedia api:

import urllib
from xml.dom import minidom

usock = urllib.urlopen("http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500")
xmldoc=minidom.parse(usock)
usock.close()
print xmldoc.toxml() 

But this code returns with these errors:

Traceback (most recent call last):
  File "/home/user/workspace/wikipediafoundations/src/list.py", line 5, in <module><br>
    xmldoc=minidom.parse(usock)<br>
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 1918, in parse<br>
    return expatbuilder.parse(file)<br>
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 928, in parse<br>
    result = builder.parseFile(file)<br>
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 207, in parseFile<br>
    parser.Parse(buffer, 0)<br>
xml.parsers.expat.ExpatError: syntax error: line 1, column 62<br>

I have no clue as I just learning python. Is there a way to get an error with more detail? Does anyone know the solution? Also, please recommend a better language to do this in.

Thank You,
Venkat Rao

Daniel Vassallo
  • 326,724
  • 71
  • 495
  • 436
Venkat S. Rao
  • 1,080
  • 3
  • 13
  • 29

1 Answers1

9

The URL you're requesting is an HTML representation of the XML that would be returned:

http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500

So the XML parser fails. You can see this by pasting the above in a browser. Try adding a format=xml at the end:

http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500&format=xml

as documented on the linked page:

ars
  • 113,841
  • 22
  • 139
  • 133
  • 3
    @user, since @ars's answer solved you problem, **accept it** -- that is, clic on the checkmark-shaped icon on the left of his answer's text. This is fundamental SO etiquette! – Alex Martelli Aug 11 '10 at 04:36