32

Say for example I have the following string "one two(three) (three) four five" and I want to replace "(three)" with "(four)" but not within words. How would I do it?

Basically I want to do a regex replace and end up with the following string:

"one two(three) (four) four five"

I have tried the following regex but it doesn't work:

@"\b\(three\)\b"

Basically I am writing some search and replace code and am giving the user the usual options to match case, match whole word etc. In this instance the user has chosen to match whole words but I don't know what the text being searched for will be.

CroweMan
  • 323
  • 1
  • 3
  • 5
  • Anything either side of a ( or ) will automatically be a word boundary, because it's not in between two word characters – Gareth Aug 12 '10 at 13:48

4 Answers4

65

Your problem stems from a misunderstanding of what \b actually means. Admittedly, it is not obvious.

The reason \b\(three\)\b doesn’t match the threes in your input string is the following:

  • \b means: the boundary between a word character and a non-word character.
  • Letters (e.g. a-z) are considered word characters.
  • Punctuation marks such as ( are considered non-word characters.

Here is your input string again, stretched out a bit, and I’ve marked the places where \b matches:

 o n e   t w o ( t h r e e )   ( t h r e e )   f o u r   f i v e
↑     ↑ ↑     ↑ ↑         ↑     ↑         ↑   ↑       ↑ ↑       ↑

As you can see here, there is a \b between “two” and “(three)”, but not before the second “(three)”.

The moral of the story? “Whole-word search” doesn’t really make much sense if what you’re searching for is not just a word (a string of letters). Since you have punctuation characters (parentheses) in your search string, it is not as such a “word”. If you searched for a word consisting only of word characters, then \b would do what you expect.

You can, of course, use a different Regex to match the string only if it surrounded by spaces or occurs at the beginning or end of the string:

(^|\s)\(three\)(\s|$)

However, the problem with this is, of course, that if you search for “three” (without the parentheses), it won’t find the one in “(three)” because it doesn’t have spaces around it, even though it is actually a whole word.

I think most text editors (including Visual Studio) will use \b only if your search string actually starts and/or ends with a word character:

var pattern = Regex.Escape(searchString);
if (Regex.IsMatch(searchString, @"^\w"))
    pattern = @"\b" + pattern;
if (Regex.IsMatch(searchString, @"\w$"))
    pattern = pattern + @"\b";

That way they will find “(three)” even if you select “whole words only”.

Timwi
  • 63,217
  • 30
  • 158
  • 225
  • It possibly doesn't make sense but that is how I would like it to work. Have you got any ideas how I could do this? Basically I would like to mimick the find and replace functionality within visual studio. – CroweMan Aug 12 '10 at 13:46
  • @CroweMan: You are contradicting yourself. You said, “I don't want "two(three)" to be replaced”, but Visual Studio does. – Timwi Aug 12 '10 at 13:52
  • Thank you very much. You are a star! – CroweMan Aug 12 '10 at 13:55
  • 1
    Please [be careful](http://stackoverflow.com/questions/4213800/is-there-something-like-a-counter-variable-in-regular-expression-replace/4214173#4214173) of `\b` style boundaries. – tchrist Nov 18 '10 at 16:18
7

Here a simple code you may be interested in:

    string pattern = @"\b" + find + @"\b";
    Regex.Replace(stringToSearch, pattern, replace, RegexOptions.IgnoreCase);

Source code: snip2code - C#: Replace an exact word in a sentence

Dominique Terrs
  • 539
  • 7
  • 5
0

I recently came across a similar issue in javascript trying to match terms with a leading '$' character only as separate words, e.g. if $hot = 'FUZZ', then:

"some $hot $hotel bird$hot pellets" ---> "some FUZZ $hotel bird$hot pellets"

The regex /\b\$hot\b/g (my first guess) did not work for the same reason the parens did not match in the original question — as non word characters, there is no word/non-word boundary preceding them with whitespace or a string start.

However the regex /\B\$hot\b/g does match, which shows that the positions not marked in @timwi's excellent example match the \B term. This was not intuitive to me because ") (" is not made of regex word characters. But I guess since \B is an inversion of the \b class, it doesn't have to be word characters, it just has to be not- not- word characters :)

jongala
  • 497
  • 6
  • 7
-1

As Gopi said, but (theoretically) catching only (three) not two(three):

string input = "one two(three) (three) four five";

string output = input.Replace(" (three) ", " (four) ");

When I test that, I get: "one two(three) (four) four five" Just remember that white-space is a string character, too, so it can also be replaced. If I did this:

//use same input
string output = input.Replace(" ", ";");

I'd get one;two(three);(three);four;five"

AllenG
  • 8,032
  • 28
  • 40
  • The problem is that the user is entering the text in a find and replace box and they have selected 'match whole words'. So I need to use something inteligent like regular expressions and I can't just add a " " before or after the expression as the character proceding could be a ',' or something else – CroweMan Aug 12 '10 at 13:45