0

I need to use regex for a string to find matching results. I need to find the (.+?) but would like to ignore everything where it says (*) right now:

$regex='#<span class="(*)"><a href="/venues/(*)">(.+?)</a></span>#';

Instead of ignoring (* ), it echoes out what is in (* ).

How can I ignore these and only get (.+?) ?

kaya3
  • 41,043
  • 4
  • 50
  • 79
weltschmerz
  • 12,936
  • 10
  • 60
  • 110
  • Please refrain from parsing HTML with RegEx as it will [drive you insane](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Use an [HTML parser](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) instead. – Madara's Ghost May 19 '12 at 20:58

1 Answers1

1

The parenthesizes mean capture: what's inside those () will be captured so you can use it later.

If you do not want something to be captured, because you don't want/need to use it later, just remove the parenthesizes.

I should add that using regular expressions to extract data from HTML is generally quite not such a good idea... You might want to use a DOM parser instead, with DOMDocument::loadHTML() for example .

Pascal MARTIN
  • 385,748
  • 76
  • 642
  • 654
  • thank you - the point is also that there will be something inside class="", e.g. class="example" and I need to tell regex that it should ignore what is inside the quotes, in this case example - any idea how to do that? – weltschmerz May 19 '12 at 20:49
  • You can use something like `.*?` without having to capture what it matches ;-) – Pascal MARTIN May 19 '12 at 20:51
  • Awesome, thank you! That worked. I will add the solution to the question. – weltschmerz May 19 '12 at 20:53