Match pattern not preceded by character

Question

I want to make my regex match a pattern only if it is not preceded by a character, the ^ (circumflex) in my case.

My regex: /[^\^]\w+/g

Text to test it on: Test: ^Anotherword

Matches: "Test" and " Anotherword", even though the latter is preceded by a circumflex. Which I was trying to prevent by inserting the [^\^] at the start. So I'm not only trying to not match the circumflex, but also the word that comes after it. " Anotherword" should not be matched.

[^\^] - This is what should stop the regex from matching if an accent circonflexe is in front of it.

\w+ - Match any word that is not preceded by a circumflex.

I cannot use lookbehind because of JavaScript limitations.

`([^^]|^)\w+` then if need be, write back `$1` on a replace. No, it works, but it's matching `T`, then `est` — , Sep 16 '16 at 00:09
That doesn't seem to work, am I doing something wrong? http://regexr.com/3e855 — The Coding Wombat, Sep 16 '16 at 00:12
Use `([^^\w]|^)\w+` see http://regexr.com/3e85b It basically ingects a word boundary. — , Sep 16 '16 at 00:13
Yeah so, injects a word boundary while excluding the `^` as well. `[^\w]` = `\W\b\w` Otherwise `[^^]` will match a '^`T`' and `\w+` will match `est` You can see it if you put capture groups around it. — , Sep 16 '16 at 00:20
That worked, thanks, you should post it as the official answer so I can accept it — The Coding Wombat, Sep 16 '16 at 00:51

score 1 · Accepted Answer · answered Sep 16 '16 at 01:08

1

Use ([^^\w]|^)\w+
(see http://regexr.com/3e85b)

It basically injects a word boundary while excluding the ^ as well.
[^\w] = \W\b\w

Otherwise [^^] will match a '^T'
and \w+ will match est.

You can see it if you put capture groups around it.

answered Sep 16 '16 at 01:08

Used it for this `/([^^\-\.\d])(\-?(\d+\.)?\d+\*\-?(\d+\.)?\d+)/g`, the thing I actually wanted to work, and it works using your solution. It worked without adding the `|^` though, could you explain what that does? – The Coding Wombat Sep 16 '16 at 01:17
1

Sure, in JS, when using `(something|^)` in place of a lookbehind, it reads something _OR_ beginning of string/line. Lets it match at the beginning where no character is behind it. This would simulate, for example with an negative lookbehind, `(? – Sep 16 '16 at 02:20

score 0 · Answer 2 · answered Sep 16 '16 at 01:02

0

If matching is not strictly forbidden.

(?:\^\w+)|(\w+): matches both expressions but no group is generated for ^Anotherworld.

(?:\^\w+): matches ^Kawabanga but no group is generated.
(\w+): everything else for grouping.

I case you want ^Anotherworld to have a group simply remove ?:.

answered Sep 16 '16 at 01:02

Marcs

3,600
5
31
42

That would work, but I was really looking for something that wouldn't match the second one (`^Anotherword`) – The Coding Wombat Sep 16 '16 at 01:14

score 0 · Answer 3 · answered Nov 21 '21 at 16:00

With the growing adoption of the ECMAScript 2018 standard, it makes sense to also consider the lookbehind approach:

const text = "One Test: ^Anotherword";

// Extracing words not preceded with ^:
console.log(text.match(/\b(?<!\^)\w+/g)); // => [ "One", "Test" ]

// Replacing words not preceded with ^ with some other text:
console.log(text.replace(/\b(?<!\^)\w+/g, '<SPAN>$&</SPAN>'));
// => <SPAN>One</SPAN> <SPAN>Test</SPAN>: ^Anotherword

The \b(?<!\^)\w+ regex matches one or more word chars (\w+) that have no word char (letter, digit or _) immediately on the left (achieved with a word boundary, \b) that have no ^ char immediately on the left (achieved with the negative lookbehind (?<!\^)). Note that ^ is a special regex metacharacter that needs to be escaped if one wants to match it as a literal caret char.

For older JavaScript environments, it is still necessary to use a workaround:

var text = "One Test: ^Anotherword";

// Extracing words not preceded with ^:
var regex = /(?:[^\w^]|^)(\w+)/g, result = [],  m;
while (m = regex.exec(text)) {
    result.push(m[1]);
}
console.log(result); // => [ "One", "Test" ]

// Replacing words not preceded with ^ with some other text:
var regex = /([^\w^]|^)(\w+)/g;
console.log(text.replace(regex, '$1<SPAN>$2</SPAN>'));
// => <SPAN>One</SPAN> <SPAN>Test</SPAN>: ^Anotherword

The extraction and replacement regexps differ in the amount of capturing groups, as when extracing, we only need one group, and when replacing we need both groups. If you decide to use a regex with two capturing groups for extraction, you would need to collect m[2] values.

Extraction pattern means

(?:[^\w^]|^) - a non-capturing group matching
- [^\w^] - any char other than a word and ^ char
- | - or
- ^ - start of string
(\w+) - Group 1: one or more word chars.

Wonderful! I will keep the other answer as the accepted answer since I asked for a non look behind solution though, and some people may be forced to use older javascript versions like I (thought I) was. — The Coding Wombat, Nov 22 '21 at 21:40
@TheCodingWombat The second snippet shows how to achieve what you needed without lookbehinds. — Wiktor Stribiżew, Nov 22 '21 at 21:57

Match pattern not preceded by character

3 Answers3

Linked