0

What is better in terms of performance, Use XPath to parse a big HTML file or use preg_match to fetch the attributes and text I want on it?

Because the website has way too many requests so I need a way to make it very fast to parse the HTML and consuming very low CPU Processing.

Right now I'm using preg_match because I really want performance, Does using Regular Expression make a very big difference in processing? or I should just move to XPath because it doesn't make any difference in terms of speed ?

Thanks.

Grego
  • 2,178
  • 9
  • 39
  • 61
  • 3
    It's not a matter of performance you need to worry about: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – PeeHaa Oct 01 '11 at 00:10
  • hmm I read it but the thing is that there, they are saying that you shouldn't use when you are going to do it for the public, but when you know that the HTML Tags are safe from the server and also you can always use tidyHTML to make them correct. – Grego Oct 01 '11 at 00:28
  • It's not a matter of safe / valid HTML. It's a matter of HTML not being a 'regular' language. Trying to parse it with regex will fail. Maybe it seems to work at first glance but believe me it will fail. – PeeHaa Oct 01 '11 at 00:35
  • Yea I agree with you, but you know, people who created libraries like XPath used something to parse the HTML and that thing is regular expression, you have to remember that someone had to develop the library. – Grego Oct 01 '11 at 00:40
  • Ok didn't knew that. Yeah. If that's the case I would definitively use regex to parse those documents myself – PeeHaa Oct 01 '11 at 00:49
  • If I were you, I would test it and see for myself. – Patrick Fisher Oct 01 '11 at 00:49
  • Grego, you make it sound like XPath implementations are just thin wrappers around regex cores. That is absolutely not true. Don't be fooled by the apparent similarity between regexes and XPath expressions; regexes are fundamentally incompatible with the task of parsing HTML, which is what XPath has to do. – Alan Moore Oct 01 '11 at 13:11

0 Answers0