1

What I want to do is to get the attribute value from a simple text I'm parsing. I want to be able to contain HTML as well inside the quotes, so that's what got me stalling right now.

$line = 'attribute = "<p class=\"qwerty\">Hello World</p>" attribute2 = "value2"'



I've gotten to the point (substring) where I'm getting the value

$line = '"<p class=\"qwerty\">Hello World</p>" attribute2 = "value2"'

My current regex works if there are no escaped quotes inside the text. However, when I try to escape the HTML quotes, it doesn't work at all. Also, using .* is going to the end of the second attribute.

What I'm trying to obtain from the string above is

$result = '<p class=\"qwerty\">Hello World</p>'



This is how far I've gotten with my trial and error regex-ing.

$value_regex = "/^\"(.+?)\"/"

if (preg_match($value_regex, $line, $matches)) 
     $result = $matches[1];

Thank you very much in advance!

Grozav Alex Ioan
  • 1,499
  • 3
  • 17
  • 26

1 Answers1

0

You can use negative lookbehind to avoid matching escaped quotes:

(?<!\\)"(.+?)(?<!\\)"

RegEx Demo

Here (?<!\\) is negative lookbehind that will avoid matching \".

However I would caution you on using regex to parse HTML, better to use DOM for that.


PHP Code:

$value_regex = '~(?<!\\\\)"(.+?)(?<!\\\\)"~';
if (preg_match($value_regex, $line, $matches)) 
     $result = $matches[1];
anubhava
  • 713,503
  • 59
  • 514
  • 593