2

Here's a simple example:

Text: <input name="zzz" value="18754" type="hidden"><input name="zzz" value="18311" type="hidden"><input name="zzz" value="17138" type="hidden">

Regex: /<input.*?value="(18754|17138)".*?>/

When matches are replaced by an empty string, the result is an empty string. I expected the middle <input> to remain since I am using non-greedy matching (.*?). Anyone could explain why it is removed?

Ree
  • 5,921
  • 11
  • 47
  • 50

3 Answers3

5

There are two matches:

  1. <input name="zzz" value="18754" type="hidden">
  2. <input name="zzz" value="18311" type="hidden"><input name="zzz" value="17138" type="hidden">

In the second case, the first .*? matches name="zzz" value="18311" type="hidden"><input name="zzz". It's a match and it's non-greedy.

NPE
  • 464,258
  • 100
  • 912
  • 987
2

aix already explained, why it does match the middle part.

To avoid this behaviour, get rid of the .*?, instead try this:

/<input[^>]*value="(18754|17138)"[^>]*>/

See it here on Regexr

Instead of matching any character, match any, but ">"

stema
  • 85,585
  • 19
  • 101
  • 125
0

aiz's answer is correct -- the second match includes the 2nd and 3rd input tags.

One possible fix for your regex would be to change . to [^>], like this:

/<input[^>]*?value="(18754|17138)"[^>]*?>/

That will cause it to match any character except >. But that has the obvious problem of breaking whenever > shows up inside a quoted literal. As everyone always says: Regexes aren't designed to work on HTML. Don't use them unless you have no other choice.

Community
  • 1
  • 1
ean5533
  • 8,726
  • 3
  • 38
  • 64