Regex: Specify "space or start of string" and "space or end of string"

Question

Imagine you are trying to pattern match "stackoverflow".

You want the following:

 this is stackoverflow and it rocks [MATCH]

 stackoverflow is the best [MATCH]

 i love stackoverflow [MATCH]

 typostackoverflow rules [NO MATCH]

 i love stackoverflowtypo [NO MATCH]

I know how to parse out stackoverflow if it has spaces on both sites using:

/\s(stackoverflow)\s/

Same with if its at the start or end of a string:

/^(stackoverflow)\s/

/\s(stackoverflow)$/

But how do you specify "space or end of string" and "space or start of string" using a regular expression?

score 219 · Accepted Answer · edited Apr 17 '19 at 23:09

219

You can use any of the following:

\b      #A word break and will work for both spaces and end of lines.
(^|\s)  #the | means or. () is a capturing group. 


/\b(stackoverflow)\b/

Also, if you don't want to include the space in your match, you can use lookbehind/aheads.

(?<=\s|^)         #to look behind the match
(stackoverflow)   #the string you want. () optional
(?=\s|$)          #to look ahead.

edited Apr 17 '19 at 23:09

Chuck Le Butt

45,923
58
187
280

answered Jul 15 '11 at 21:32

Jacob Eggers

8,771
2
23
40

11

`\b` is a zero-width assertion; it never consumes any characters. There's no need to wrap it in a lookaround. – Alan Moore Jul 15 '11 at 21:41
good point. I was thinking about his original `\s`. I will adjust my answer. – Jacob Eggers Jul 15 '11 at 21:46
3

Note that in most regexp implementations, `\b` is **standard ASCII only**, that is to say, no unicode support. If you need to match unicode words you have no choice but to use this instead: http://stackoverflow.com/a/6713327/1329367 – Mahn Jan 27 '15 at 16:55
4

The easier way to exclude the group selection from the match is `(?:^|\s)` – user2426679 Oct 22 '15 at 16:48
8

for python, replace `(?<=\s|^)` with `(?:(?<=\s)|(?<=^))`. Otherwise, you get `error: look-behind requires fixed-width pattern` – user2426679 Aug 31 '16 at 20:06
Thanks for the look behind and look ahead solution. This makes results comparable to \b – Brian Risk Jun 15 '17 at 14:20
7

The `\b` would consider other characters -- such as "`.`" as word-breakers, whereas the asker specifically said "space". @gordy's solution seems better. – Mikhail T. Dec 01 '17 at 17:42
Beware: lookbehind is not implemented [in most browsers](https://caniuse.com/#feat=js-regexp-lookbehind) as of 2019. – user Apr 27 '19 at 22:43
See [this answer](https://stackoverflow.com/a/6713427/11069485) for a Python-friendly regex that's a bit neater than the one suggested by @user2426679 – Chris Wong May 26 '22 at 01:08

score 90 · Answer 2 · answered Jul 15 '11 at 21:28

90

(^|\s) would match space or start of string and ($|\s) for space or end of string. Together it's:

(^|\s)stackoverflow($|\s)

answered Jul 15 '11 at 21:28

gordy

8,840
1
28
41

6

this is the only one that works for me. thank you @gordy – robsonrosa Jun 13 '14 at 18:47
3

If you use this pattern to replace, remember to keep the spaces in the replaced result by replacing with the pattern `$1string$2`. – Mahn Jan 27 '15 at 16:57
1

This is the only one that works for me too. Word boundaries never seem to do what I want. For one, they match some characters besides whitespace (like dashes). This solved it for me because I'd been trying to put `$` and `^` into a character class, but this shows they can just be put into a regular pattern group. – felwithe Jan 02 '19 at 14:20
1

This works quite nicely but if you are not interested in capturing the spaces use this: `(?:^|\s)stackoverflow(?:$|\s)` – Vlax Apr 12 '21 at 21:03

Alan Moore · Answer 3 · 2011-07-15T21:44:32.093

25

Here's what I would use:

 (?<!\S)stackoverflow(?!\S)

In other words, match "stackoverflow" if it's not preceded by a non-whitespace character and not followed by a non-whitespace character.

This is neater (IMO) than the "space-or-anchor" approach, and it doesn't assume the string starts and ends with word characters like the \b approach does.

edited Jul 15 '11 at 21:44

answered Jul 15 '11 at 21:38

Alan Moore

71,299
12
93
154

1

good explanation on why to use this. i would have picked this however the string being tested is ALWAYS a single line. – anonymous-one Jul 17 '11 at 18:21
1

@LawrenceDol, did you mean `(?<=\S)...(?=\S)`? Note that the uppercase `\S` matches any character that's NOT whitespace. So the negative lookarounds will match if there IS a whitespace character there, or if there's no character at all. – Alan Moore Dec 20 '20 at 02:38

score 7 · Answer 4 · answered Jul 15 '11 at 21:32

7

\b matches at word boundaries (without actually matching any characters), so the following should do what you want:

\bstackoverflow\b

answered Jul 15 '11 at 21:32

Andrew Clark

192,132
30
260
294

1

For Python it helps to specify it a [raw string](https://docs.python.org/3/reference/lexical_analysis.html#index-19), e.g. `mystr = r'\bstack overflow\b'` – Asclepius Mar 26 '19 at 15:33

Regex: Specify "space or start of string" and "space or end of string"

4 Answers4

Linked

Related