1

I am trying to extract content="TEXT_TO_EXTRACT" for property="og:description":

<meta data-vue-meta="true" property="og:description" content="Example text here"><meta data-vue-meta="true" name="twitter:card" content="summary_large_image"><meta data-vue-meta="true" name="twitter:title" content=""><meta data-vue-meta="true" name="twitter:description" content="">

It works fine if I use this preg_match():

preg_match('/property(\s+|)=(\s+|)\"(\s+|)og:description(\s+|)\"(\s+|)content(\s+|)=(\s+|)\"(.+?)\"/is', $html, $matches)

The extracted string can be found on $matches[8]

However, if the content="" is empty, i.e:

<meta data-vue-meta="true" property="og:description" content=""><meta data-vue-meta="true" name="twitter:card" content="summary_large_image"><meta data-vue-meta="true" name="twitter:title" content=""><meta data-vue-meta="true" name="twitter:description" content="">

It returns this:

"><meta data-vue-meta=

As you can see from regexr.com example:

https://regexr.com/4rlgu

Here is a screenshot:

enter image description here

I'd like to get an empty string if content="" is empty...

Anyone can help here?

user2972081
  • 507
  • 1
  • 4
  • 14
  • Someone has closed this question while I was writing an answer... So I'm trying to leave it as a comment here : Use `*` instead of `+` as quantifier in the capturing group `(.+?)`. It must find between *zero* and unlimited matches, as few as possible, expanding as needed (lazy, non greedy). To simplify (replacing `(\s+|)` by just `\s?` and leave these only where spaces could actually exist), you got your matches in [1st] group with : `/property\s?=\s?"og:description"\s+content\s?=\s?"(.*?)"/` – EricLavault Jan 05 '20 at 16:31

0 Answers0