1

I have the following regex rule:

'/((f|ht)tp)(.*?)(.gif|.png|.jpg|.jpeg)/'

It works great, but I don't want it to match anything that is preceded by a newline and 4 or more spaces, that means something like this:

"\n    "

How can do this?

Alan Moore
  • 71,299
  • 12
  • 93
  • 154
Frantisek
  • 7,165
  • 15
  • 56
  • 99

2 Answers2

1

I have added a negative lookahead anchored at the beginning of the line. It checks for the existence of a newline character followed by 4 or more whitespace characters. If this condition exists the match will fail.

'/^(?!\n\s{4,}).*((f|ht)tp)(.*?)(.gif|.png|.jpg|.jpeg)/'
Mike Brant
  • 68,891
  • 9
  • 93
  • 99
  • Have you tried it? It doesn't seem to work (the match passes), but I might be doing something wrong, although I'd say I'm not. – Frantisek Feb 20 '13 at 00:20
  • @RichardRodriguez Yeah just did a little testing and it seems I am able to get it to work when moving the line start anchor outside of the lookahead (not sure why it didn't work inside). Take a look at the revised answer. – Mike Brant Feb 20 '13 at 00:33
1

You don't need to include the linefeed itself in the lookahead, just use the start anchor (^) in multiline mode. Also, since \s can match all kinds of whitespace including linefeeds and tabs, you're better off using a literal space character:

'/^(?! {4}).*(f|ht)tp(.*?)(.gif|.png|.jpg|.jpeg)/m'

Speaking of tabs, they can be used in place of the four spaces to create code blocks here on SO, so you might want to allow for that as well:

'/^(?! {4}|\t).*(f|ht)tp(.*?)(.gif|.png|.jpg|.jpeg)/m'

Finally, if you want the regex to match (as in consume) only the URL, you can use the match-start-reset operator, \K. It acts like a positive lookbehind, without the fixed-length limitation:

'/^(?! {4}|\t).*?\K(f|ht)tp(.*?)(.gif|.png|.jpg|.jpeg)/m'
Alan Moore
  • 71,299
  • 12
  • 93
  • 154