135

I'm setting up some goals in Google Analytics and could use a little regex help.

Lets say I have 4 URLs

http://www.anydotcom.com/test/search.cfm?metric=blah&selector=size&value=1
http://www.anydotcom.com/test/search.cfm?metric=blah2&selector=style&value=1
http://www.anydotcom.com/test/search.cfm?metric=blah3&selector=size&value=1
http://www.anydotcom.com/test/details.cfm?metric=blah&selector=size&value=1

I want to create an expression that will identify any URL that contains the string selector=size but does NOT contain details.cfm

I know that to find a string that does NOT contain another string I can use this expression:

(^((?!details.cfm).)*$)

But, I'm not sure how to add in the selector=size portion.

Any help would be greatly appreciated!

Chris Stahl
  • 1,558
  • 3
  • 13
  • 16

5 Answers5

181

This should do it:

^(?!.*details\.cfm).*selector=size.*$

^.*selector=size.*$ should be clear enough. The first bit, (?!.*details.cfm) is a negative look-ahead: before matching the string it checks the string does not contain "details.cfm" (with any number of characters before it).

Kobi
  • 130,553
  • 41
  • 252
  • 283
  • 9
    FYI, check out http://www.regexr.com/ for a nice way to test these expressions out. – Joshua Pinter Apr 08 '14 at 14:23
  • Always forget about negative lookahead and it's so useful – Alexei Blue Feb 20 '18 at 15:35
  • `"http://www.anydotcom.com/test/search.cfm?metric=blah&selector=sized&value=1" =~ /^(?!.*details\.cfm).*selector=size.*$/ #=> 0` is incorrect. (Note the string contains `"...selector=sized..."`.) Also, why `.*$` at the end? – Cary Swoveland Dec 12 '18 at 01:02
4
^(?=.*selector=size)(?:(?!details\.cfm).)+$

If your regex engine supported posessive quantifiers (though I suspect Google Analytics does not), then I guess this will perform better for large input sets:

^[^?]*+(?<!details\.cfm).*?selector=size.*$
Tomalak
  • 322,446
  • 66
  • 504
  • 612
  • This assumes `selector=size` is always before `details.cfm`, which isn't the case in the last url. – Kobi Jun 01 '10 at 20:34
  • Just to clear this up, it wasn't me. I can't see why someone would down-vote two answers here, they are both correct. – Kobi Jun 01 '10 at 20:47
  • @Kobi: This should have been a look-ahead, corrected. Oh and by the way, I did not suspect it was your down-vote. – Tomalak Jun 01 '10 at 20:48
3

regex could be (perl syntax):

`/^[(^(?!.*details\.cfm).*selector=size.*)|(selector=size.*^(?!.*details\.cfm).*)]$/`
djipko
  • 77
  • 3
0

There is a problem with the regex in the accepted answer. It also matches abcselector=size, selector=sizeabc etc.

A correct regex can be ^(?!.*\bdetails\.cfm\b).*\bselector=size\b.*$

Explanation of the regex at regex101:

enter image description here

Arvind Kumar Avinash
  • 62,771
  • 5
  • 54
  • 92
  • While you are not wrong, the regex as originally accepted met my need as your examples did not exist in the set of possible strings. – Chris Stahl Apr 01 '21 at 01:53
-1

I was looking for a way to avoid --line-buffered on a tail in a similar situation as the OP and Kobi's solution works great for me. In my case excluding lines with either "bot" or "spider" while including ' / ' (for my root document).

My original command:

tail -f mylogfile | grep --line-buffered -v 'bot\|spider' | grep ' / '

Now becomes (with -P perl switch):

tail -f mylogfile | grep -P '^(?!.*(bot|spider)).*\s\/\s.*$'
J. Scott Elblein
  • 3,541
  • 11
  • 52
  • 82
roon
  • 1
  • 1