27

How would I make a regular expression to match the character < not followed by (a or em or strong)

So <hello and <string would match, but <strong wouldn't.

zb226
  • 8,586
  • 6
  • 44
  • 73
Kyle
  • 20,701
  • 36
  • 109
  • 197
  • **See Also**: [A regex to match a substring that isn't followed by a certain other substring](https://stackoverflow.com/q/2631010/1366033) – KyleMit Dec 21 '21 at 13:29

5 Answers5

52

Try this:

<(?!a|em|strong)
Andrew Hare
  • 333,516
  • 69
  • 632
  • 626
  • +1 I think that does it for Perl-compatible regexp syntax. (For other syntaxes, it might be different) – David Z Apr 25 '10 at 01:00
  • 3
    Just in case someone is interested, `?!` initiates a negative lookahead. I found a good overview of lookarounds here: http://www.rexegg.com/regex-lookarounds.html – schnatterer Aug 18 '14 at 20:11
  • For a full function: `myString.replace(//g, '');` I also added in `\/?` to check for closing tags – SwiftNinjaPro Jan 09 '20 at 18:21
11

You use a negative lookahead, the simplest form for which is (for this problem):

<(?!a|em|strong)

The one issue with that is that it will ignore <applet>. A way to deal with that is by using \b, which is a zero-width expression (meaning it captures none of the input) that matches a word to non-word or non-word to word transition. Word characters are [0-9a-zA-Z_]. So:

<(?!(a|em|strong)\b)
700 Software
  • 81,209
  • 77
  • 221
  • 333
cletus
  • 599,013
  • 161
  • 897
  • 938
3

Although Andrew's answer is clearly superior, before, I also got it to work with [^(?:a|em|strong)].

WoodrowShigeru
  • 1,249
  • 16
  • 21
2

If your regex engine supports it, use a negative lookahead assertion: this looks ahead in the string, and succeeds if it wouldn't match; however, it doesn't consume any input. Thus, you want /<(?!(?:a|em|strong)\b)/: match a <, then succeed if there isn't an a, em, or strong followed by a word break, \b.

Antal Spector-Zabusky
  • 35,571
  • 6
  • 77
  • 137
0
function strip_tags(str, keep){
    if(keep && Array.isArray(keep)){keep = '|'+keep.join('|');}else if(keep){keep = '|'+keep;}else{keep = '';}
    return str.replace(new RegExp('<\/?(?![^A-Za-z0-9_\-]'+keep+').*?>', 'g'), '');
}

usage:

strip_tags('<html><a href="a">a</a> <strong>strong text</strong> and <em>italic text</em></html>', ['strong', 'em']);
//output: a <strong>strong text</strong> and <em>italic text</em>

I would also recommend you strip parameters from the tags you keep

function strip_params(str){
    return str.replace(/<((?:[A-Za-z0-9_\-])).*?>/g, '<$1>');
}
SwiftNinjaPro
  • 669
  • 5
  • 15