162

Imagine you are trying to pattern match "stackoverflow".

You want the following:

 this is stackoverflow and it rocks [MATCH]

 stackoverflow is the best [MATCH]

 i love stackoverflow [MATCH]

 typostackoverflow rules [NO MATCH]

 i love stackoverflowtypo [NO MATCH]

I know how to parse out stackoverflow if it has spaces on both sites using:

/\s(stackoverflow)\s/

Same with if its at the start or end of a string:

/^(stackoverflow)\s/

/\s(stackoverflow)$/

But how do you specify "space or end of string" and "space or start of string" using a regular expression?

Patrick McDonald
  • 62,076
  • 14
  • 100
  • 117
anonymous-one
  • 13,444
  • 18
  • 56
  • 81

4 Answers4

219

You can use any of the following:

\b      #A word break and will work for both spaces and end of lines.
(^|\s)  #the | means or. () is a capturing group. 


/\b(stackoverflow)\b/

Also, if you don't want to include the space in your match, you can use lookbehind/aheads.

(?<=\s|^)         #to look behind the match
(stackoverflow)   #the string you want. () optional
(?=\s|$)          #to look ahead.
Chuck Le Butt
  • 45,923
  • 58
  • 187
  • 280
Jacob Eggers
  • 8,771
  • 2
  • 23
  • 40
  • 11
    `\b` is a zero-width assertion; it never consumes any characters. There's no need to wrap it in a lookaround. – Alan Moore Jul 15 '11 at 21:41
  • good point. I was thinking about his original `\s`. I will adjust my answer. – Jacob Eggers Jul 15 '11 at 21:46
  • 3
    Note that in most regexp implementations, `\b` is **standard ASCII only**, that is to say, no unicode support. If you need to match unicode words you have no choice but to use this instead: http://stackoverflow.com/a/6713327/1329367 – Mahn Jan 27 '15 at 16:55
  • 4
    The easier way to exclude the group selection from the match is `(?:^|\s)` – user2426679 Oct 22 '15 at 16:48
  • 8
    for python, replace `(?<=\s|^)` with `(?:(?<=\s)|(?<=^))`. Otherwise, you get `error: look-behind requires fixed-width pattern` – user2426679 Aug 31 '16 at 20:06
  • Thanks for the look behind and look ahead solution. This makes results comparable to \b – Brian Risk Jun 15 '17 at 14:20
  • 7
    The `\b` would consider other characters -- such as "`.`" as word-breakers, whereas the asker specifically said "space". @gordy's solution seems better. – Mikhail T. Dec 01 '17 at 17:42
  • Beware: lookbehind is not implemented [in most browsers](https://caniuse.com/#feat=js-regexp-lookbehind) as of 2019. – user Apr 27 '19 at 22:43
  • See [this answer](https://stackoverflow.com/a/6713427/11069485) for a Python-friendly regex that's a bit neater than the one suggested by @user2426679 – Chris Wong May 26 '22 at 01:08
90

(^|\s) would match space or start of string and ($|\s) for space or end of string. Together it's:

(^|\s)stackoverflow($|\s)
gordy
  • 8,840
  • 1
  • 28
  • 41
  • 6
    this is the only one that works for me. thank you @gordy – robsonrosa Jun 13 '14 at 18:47
  • 3
    If you use this pattern to replace, remember to keep the spaces in the replaced result by replacing with the pattern `$1string$2`. – Mahn Jan 27 '15 at 16:57
  • 1
    This is the only one that works for me too. Word boundaries never seem to do what I want. For one, they match some characters besides whitespace (like dashes). This solved it for me because I'd been trying to put `$` and `^` into a character class, but this shows they can just be put into a regular pattern group. – felwithe Jan 02 '19 at 14:20
  • 1
    This works quite nicely but if you are not interested in capturing the spaces use this: `(?:^|\s)stackoverflow(?:$|\s)` – Vlax Apr 12 '21 at 21:03
25

Here's what I would use:

 (?<!\S)stackoverflow(?!\S)

In other words, match "stackoverflow" if it's not preceded by a non-whitespace character and not followed by a non-whitespace character.

This is neater (IMO) than the "space-or-anchor" approach, and it doesn't assume the string starts and ends with word characters like the \b approach does.

Alan Moore
  • 71,299
  • 12
  • 93
  • 154
  • 1
    good explanation on why to use this. i would have picked this however the string being tested is ALWAYS a single line. – anonymous-one Jul 17 '11 at 18:21
  • 1
    @LawrenceDol, did you mean `(?<=\S)...(?=\S)`? Note that the uppercase `\S` matches any character that's NOT whitespace. So the negative lookarounds will match if there IS a whitespace character there, or if there's no character at all. – Alan Moore Dec 20 '20 at 02:38
7

\b matches at word boundaries (without actually matching any characters), so the following should do what you want:

\bstackoverflow\b
Andrew Clark
  • 192,132
  • 30
  • 260
  • 294
  • 1
    For Python it helps to specify it a [raw string](https://docs.python.org/3/reference/lexical_analysis.html#index-19), e.g. `mystr = r'\bstack overflow\b'` – Asclepius Mar 26 '19 at 15:33