2

I have a text file full of names, I want to match them all via Regex.

Each name ends with the following text: fsa fwb fcc, eg:

">Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc

I want to use the following expression to match the names:

""">.+?""fsa fwb fcc"

AKA match all text from "> up to fsa fwb fcc, I can then parse the excess matched myself.

However as "> occurs throughout the file, it starts matching from much earlier. I have always wondered how to match from the LAST occurance of something, in this case, ">, up to the end specified.

Cœur
  • 34,719
  • 24
  • 185
  • 251
John Cliven
  • 603
  • 1
  • 7
  • 19
  • In your particular case, [`RegexOptions.RightToLeft`](http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions.aspx) should do it. – Martin Ender Aug 15 '13 at 21:01
  • 1
    Don't parse HTML with regular expressions. – Mulan Aug 15 '13 at 21:01
  • And what naomik said. [This](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags?rq=1) is at the top of the related questions. ;) – Martin Ender Aug 15 '13 at 21:02
  • This isn't parsing, rather this is pattern matching. Given the requirements I doubt this can be accomplished as easily with an HTML parsing engine as it can be via pattern matching. Also I'm not sure \u0012 is a valid html character. – Ro Yo Mi Aug 16 '13 at 02:13
  • Thanks m.buettner, Regex.Options.RightToLeft works perfectly! Exactly what I was looking for. – John Cliven Aug 16 '13 at 11:54
  • @neomik, Denomales is correct, this is not a HTML file and the content is static, predictable, and does not vary, so REGEX seems fine for matching. – John Cliven Aug 16 '13 at 11:55

2 Answers2

1

You can try this:-

.+((fsa|fwb|fcc).+)$

+ matches many characters in front.

((fsa|fwb|fcc) matches and captures the keywords.

.+) matches and captures characters.

$ matches the end of the line.

EDIT:- As suggested by m.buettner RegexOptions.RightToLeft should work for your case.

Rahul Tripathi
  • 161,154
  • 30
  • 262
  • 319
0

Description

It looks like you're ending string is literally fsa fwb fcc, and the beginning of the substring you're interested in starts directly after the last "> before the end string.

This expression will:

  • find the substring between the last "> and the next fsa fwb fcc

">((?:(?!">).)*)fsa\sfwb\sfcc

enter image description here

Live Demo

Sample Text

">sometext">A Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
">sometext">B Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
">sometext">C Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc

Matches Found:

[0][0] = ">A Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
[0][1] = A Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"

[1][0] = ">B Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
[1][1] = B Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"

[2][0] = ">C Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
[2][1] = C Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"

Or

If you want to go further and only capture from the last "> through to the \u0012 before the fsa fwb fcc ... i.e. the actual name and not the markup text, then have a look at this expression

">((?:(?!">).)*?)\\u0012(?:(?!">).)*fsa\sfwb\sfcc

enter image description here

Live Demo

Sample Text

">sometext">A Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
">sometext">B Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
">sometext">C Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc

Matches Found

[0][0] = ">A Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
[0][1] = A Dave Smith

[1][0] = ">B Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
[1][1] = B Dave Smith

[2][0] = ">C Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
[2][1] = C Dave Smith
animuson
  • 52,378
  • 28
  • 138
  • 145
Ro Yo Mi
  • 14,212
  • 4
  • 33
  • 41
  • 1
    This is a really great explanation that is so thorough and works perfectly! I really appreciate that Denomales! – John Cliven Aug 16 '13 at 11:59