0

I have a string like this

<tag1>
    <tag1>
        any text
    </tag1>
    text
</tag1>

and I want to find a <tag1>, that contains shortest text in this string.

I used the following regex <tag1>.*?</tag1>, but instead of <tag1>any text</tag1> i got <tag1> <tag1>any text</tag1>. Here is the example.

Why it doesn't works and what am I doing wrong?

default locale
  • 12,495
  • 13
  • 55
  • 62

3 Answers3

1

You can use this simple code to solve your specific problem :

<tag1>[^<]*</tag1>
Sujith PS
  • 4,606
  • 3
  • 30
  • 61
0

I would be able to help you if those tags were not nested inside themselves (the same tag).

It is generally a bad idea to do this type of thing with regex. You should get a proper parser to fit your requirements.

Vasili Syrakis
  • 8,846
  • 1
  • 36
  • 55
0

It is not working, because it will start matching at the first <tag1> and then match as least as possible, so ending at the first </tag1>, resulting in "<tag1> <tag1>any text</tag1>".

You can avoid matching tags by using a negated character class

<tag1>[^<>]*</tag1>

See it on Regexr.

The other possibility is to use a negated lookahead assertion and match the next character only, if it is not the tag.

(<tag1>)((?!\1).)*?</tag1>

See it on Regexr

stema
  • 85,585
  • 19
  • 101
  • 125