6

I know how to parse a page using Python. My question is which is the fastest method of all parsing techniques, how fast is it from others?

The parsing techniques I know are Xpath, DOM, BeautifulSoup, and using the find method of Python.

Platinum Azure
  • 43,544
  • 11
  • 104
  • 132
codersofthedark
  • 8,501
  • 8
  • 42
  • 68
  • 5
    Pick a web page. Use the `timeit` module to test the execution times of the various mechanisms as they parse your selected source. Report which one is fastest. – larsks Dec 01 '11 at 13:54
  • Ha ha I think now I would because I am wondering about how much can parsing performance vary on x86 and x64 ;) – codersofthedark Dec 01 '11 at 14:28

2 Answers2

10

http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

Comparison

Acorn
  • 46,659
  • 24
  • 128
  • 169
1

lxml was written on C. And if you use x86 it is best chose. If we speak about techniques there is no big difference between Xpath and DOM - it's very quickly methods. But if you will use find or findAll in BeautifulSoup it will be slow than other. BeautifulSoup was written on Python. This lib needs a lot of memory for parse any data and, of course, it use standard search methods from python libs.

SkyFox
  • 1,715
  • 3
  • 20
  • 33
  • Well said, C written lib is always lot faster than pure Python module. Thanks for the update that lxml is written in C. Wanted to know why did u mention x86. Is it like in x64 something can perform better than lxml, if yes then which one and why? – codersofthedark Dec 01 '11 at 14:26
  • 2
    x86 or x64 in this context don't have any difference. I mean other platforms, like SPARC or ARM :) – SkyFox Dec 01 '11 at 14:27
  • aaw okies, that wont be a problem in my case :) – codersofthedark Dec 01 '11 at 14:30