1

Trying to create a regular expression that excludes results of a substring is present.

Data Set:

 http://www.cnn.com/test1
 http://www.cnn.com/test3
 http://www.cnn.com/test5
 http://www.stackflow.com/test4
 http://www.cnn.com/test3
 http://www.cnn.com/test4

exclude:

  • find all cnn.com sites
  • that don't have /test3

Results:

 http://www.cnn.com/test1
 http://www.cnn.com/test5
 http://www.cnn.com/test4
Lacer
  • 5,092
  • 6
  • 29
  • 39

3 Answers3

1

Figured it out: (www.cnn.com)(?!/test3)

kayess
  • 3,404
  • 9
  • 29
  • 44
Lacer
  • 5,092
  • 6
  • 29
  • 39
0

If you want to avoid matching strings like http://www.cnn.com/test/test3 then you can use a negtive lookbehind at the end of the string

cnn\.com.*(?<!test3)$
Patrick Haugh
  • 55,247
  • 13
  • 83
  • 91
0

I'm guessing this would be fastest:

cnn\.com(?!\/test3)[a-zA-Z0-9-._~:?#@!$&'*+,;=`.\/\(\)\[\]]*

because you restrict the URL to allowed characters only.

Community
  • 1
  • 1
Bram Vanroy
  • 24,991
  • 21
  • 120
  • 214