0

Given this PCRE pattern:

/(<name>[^<>]*<\/name>[^<>]*<phone>[^<>]*<\/phone>)/

And this subject text:

<name>John Stevens</name>  <phone>888-555-1212</phone>
<name>Peter Wilson</name>  
<phone>888-555-2424</phone>

How can I get the Regular Expression to match the first name-phone pair but not the second? I don't want to match pairs that are separated by line breaks. I tried including an end-of-line in the negated character class like so [^<>$]* but nothing changed.

You can use the following online tools to test your expressions:
http://rubular.com/
http://www.regextester.com/
Thank you.

Andrew
  • 8,023
  • 7
  • 41
  • 70
  • 1
    Inside a character class, the `$` loses its special meaning and becomes simply a literal dollar sign. What you want is: `[^<>\r\n]` as sawa suggests. – ridgerunner Apr 24 '11 at 04:14

3 Answers3

4

I think this will do it

/<name>[^<>]*<\/name>[^<>\r\n]*<phone>[^<>]*<\/phone>/

Whatever you put in the class [ ] must be something that represents a single character. $ is interpreted as literal $ within a class, probably because $ as line end is 0-width, and could not be interpreted as such within a class. (Edited after comment by ridgerunner)

By the way, I took off the parentheses that surrounds your regex because whatever matches it can be referred to as the whole match.

sawa
  • 160,959
  • 41
  • 265
  • 366
1

If you don't want to match pairs separated by line breaks then following regex will do the job:

/(<name>[^<>]*<\/name>.*?<phone>[^<>]*<\/phone>)/

Matches only first name, phone pair since dot . will not match EOL but [^<>] will match it.

Tested it on http://rubular.com/r/amXvq20sl8

anubhava
  • 713,503
  • 59
  • 514
  • 593
  • Thank you. But I also needed to exclude `<>` to prevent capturing other tags. – Andrew Apr 24 '11 at 04:56
  • It wouldn't really hurt to make it `[^<>]*` above, however I think once we are already inside `` then to capture everything up to `' we just need `[ – anubhava Apr 24 '11 at 05:06
  • Right, and I like that change. What I omitted from the subject text is that there could be other tags between name and phone that I don't want to capture if they're there. ie `MarkBill888...`. The `.*` would capture both names on that same line. I know I could make it lazy instead of greedy, but that could negatively affect other parts of my pattern. I think the `\r\n` as stated above will work for me. With the addition of your change: `[^ – Andrew Apr 24 '11 at 13:54
0

Those sites don't seem to support the whole PCRE syntax. I used this site: http://lumadis.be/regex/test_regex.php

And this worked:

/^(<name>[^<>]*<\/name>[^<>$]*<phone>[^<>]*<\/phone>)/

/(?-s)(<name>[^<>]*<\/name>.*<phone>[^<>]*<\/phone>)/

is probably better

Christo
  • 8,181
  • 2
  • 20
  • 16