0

Im trying to do a regex where I can find all html tags, but for each one, each opening and closing tag must be the same. Heres what I mean: (Yes I only want max 3 letters)

preg_match_all("/\<[a-z]{1,3}\>(.*?)\<\/[a-z]{1,3}\>/", $string, $matches);

Where the 2 [a-z]{1,3} are, I want those to be the same, so it doesn't match <b> with <\i>, etc. Thanks... let me know if you need further explanation

John Kugelman
  • 330,190
  • 66
  • 504
  • 555
David
  • 2,275
  • 7
  • 33
  • 58

3 Answers3

1

Don't parse HTML with regex. Use PHP Tidy instead.

Community
  • 1
  • 1
Vivin Paliath
  • 91,149
  • 38
  • 215
  • 293
  • Im not really parsing HTML, its just the closest example and easiest explanation to show what Im trying to do.. – David Aug 25 '10 at 02:58
  • So you're parsing XML? :P Sorry, whenever I see `regex` and HTML I laugh. – Nick T Aug 25 '10 at 03:02
  • It doesn't matter if you're parsing HTML/XML or if you're checking for specific closing-tags. HTML and Regex go together like gasoline and milk. i.e., not recommended. :) – Vivin Paliath Aug 25 '10 at 03:03
  • @David: If it's so much *like* HTML, could you just use an *ML parser anyways? – Nick T Aug 25 '10 at 03:04
1

you really shouldn't be parsing *ml with regex because of problems with nested elements, but if this is any help:

preg_match_all("/<([a-z]{1,3})>(.*?)<\/\1>/", $string, $matches);
bcosca
  • 17,071
  • 5
  • 38
  • 51
  • Be aware that this won't handle tags that are enclosed in the same kind of tag. For example, given ``, it will match ``. – Alan Moore Aug 25 '10 at 06:58
0

As Vivin Paliath said plus you can try to use PHP5's DomDocument with XPath

http://php.net/manual/en/class.domdocument.php

Jake N
  • 10,330
  • 9
  • 60
  • 108