-1

Below is the latest version of the regular expression I am using and it is throwing the error "Invalid Regular Expression."

Any foo with the formatting of the regular expression would be much appreciated!

Below is my code:

// This function gets all the text in browser
function getText() {
    return document.body.innerText;
}
var allText = getText(); // stores into browser text into variable

//regex set to rid text of all punctuaction, symbols, numbers, and excess  spaces
var matcher = new RegExp ("/(?<!\w)[a-zA-Z]+(?!\w)/", "g");

//cleanses text in browser of punctuation, symbols, numbers, and excess spaces
var newWords = allText.match(matcher);

//using a single space as the dividing tool, creates a list of all words
var Words=newWords.split(" ");
Lance
  • 123
  • 2
  • 13

1 Answers1

3

Instead of

//regex set to rid text of all punctuaction, symbols, numbers, and excess  spaces
var matcher = new RegExp ("/(?<!\w)[a-zA-Z]+(?!\w)/", "g");
//cleanses text in browser of punctuation, symbols, numbers, and excess spaces
var newWords = allText.match(matcher);
//using a single space as the dividing tool, creates a list of all words
var Words=newWords.split(" ");

Just use

var Words = allText.match(/\b[a-zA-Z]+\b/g); // OR...
// var Words = allText.match(/\b[A-Z]+\b/ig);

This will get you all the "words" just consisting of ASCII letters as String#match together with a /g-based regex will fetch all substrings matching the regex (that matches 1 or more ASCII letters between word boundaries).

JS does not support lookbehind (i.e. (?<!) or (?<=) constructs), you need a word boundary \b here.

Note that you'd need something like .replace(/\W+/g, ' ') to rid text of all punctuaction, symbols, numbers, and excess spaces, but it seems you just can rely on .match(/\b[a-zA-Z]\b/g).

Wiktor Stribiżew
  • 561,645
  • 34
  • 376
  • 476