1

I'm trying to find every string in a program longer than N characters (5 in this test) but only on lines that don't contain the word "printf" (or maybe "printf\s*(")

These all fail

  /^.*(?!printf).+"[^"]{5,}".*$/,
  /^(?!printf)*"[^"]{5,}"/,
  /^((?!printf).)*"[^"]{5,}((?!printf).)*"$/,

I saw this but seems irrelevant. I saw this which seems closer but doesn't work. This one too

If I separate it into 2 problems, first filter all lines with printf, then search for lines with 5 character strings it's easy but I'd actually like to use this regex in vscode or other editor that supports regular expressions so I need to do it in one expression.

const regexs = [
  /^.*(?!printf).+"[^"]{5,}".*$/,
  /^(?!printf)*"[^"]{5,}"/,
  /^((?!printf).)*"[^"]{5,}((?!printf).)*"$/,
];

const lines = [
  'write("blueberry"); // yum',                    // match
  'printf("-%s-", "strawberry"); // whatever',     // do not match
  'x = 12; printf("lime"); write("coconut")',      // do not match
  'x = 12; write("coconut") printf("lime");',      // do not match
  'y = 34; write("banana")',                       // match
  'z = "pineapple";',                              // match
  'p = "seed";'                                    // do not match
];

for (const re of regexs) {
  console.log('--------------: ', re.toString());
  for (const line of lines) {
    console.log(re.test(line).toString().padEnd(7), line);
  }
}

PS: I'm not worried about strings in comments or multiline strings or escaped quotes or single quotes. I just need to be able to easily browse 200k lines of code for all strings longer than a certain size but without certain keywords in the line at a glance.

PPS: I get that the first 2 would not work for 4th line, just trying to get some to work for the other 5 lines first on my way to handling the 4th line as well.

For a more concrete example, replace the 'printf' with 'localized' so I'm searching for all strings N characters or longer for lines that don't contain the word 'localized' to try to, at a glance, see which lines in the code still need localization. I don't need to find every string because in general they'll come in batches so just knowing where to look by seeing a few lines in a certain file will help find most cases. Lines have have already been localized contain the word 'localized'

samanthaj
  • 304
  • 2
  • 9

2 Answers2

2

Try this: (?<!printf[^\n]*)"(?![^"\n]*printf)[^"\n]{5,}"(?![^\n]*printf)

Test on regex101.com

namgold
  • 860
  • 1
  • 9
  • 31
1
  • The regex first checks if the line does not contain printf: ^(?!.*printf)
  • skip as few strings as possible text"text": ([^"\n]*"[^"\n]*")*?
  • to find a string that has 5 or more characters: [^"\n]*"[^"\n]{5,}"
^(?!.*printf)([^"\n]*"[^"\n]*")*?[^"\n]*"[^"\n]{5,}"

See regex101


If you want to see the lines affected in the PROBLEMS panel of VSC you can use a task.

For windows I used the grep available in the git install.

    {
      "label": "Find to localize",
      "type": "shell",
      "windows": {
        "command": "\"C:\\Program Files\\Git\\usr\\bin\\grep\"",
      },
      "linux": {
        "command": "grep"
      },
      "args": [ "-nrP", "--file=${workspaceFolder}/.vscode/local5-grep.txt", "*" ],
      "options": { "cwd": "${workspaceFolder}" },
      "presentation": { "clear": true },
      "problemMatcher": {
        "owner": "localize",
        "fileLocation": ["relative", "${workspaceFolder}"],
        "pattern": [
          {
              "regexp": "^([^:]+):(\\d+):(.*)$",
              "file": 1,
              "line": 2,
              "message": 3
          }
        ]
      }
    }

Because the regular expression used contains a lot of " it is better to save it in a file.

I used .vscode/local5-grep.txt but you can use any file. Change the location in the task if needed.

The file .vscode/local5-grep.txt contains

^(?!.*printf)([^"\n]*"[^"\n]*")*?[^"\n]*"[^"\n]{5,}"

If only particular files are to be searched change the "*" argument in the task.

rioV8
  • 18,123
  • 3
  • 18
  • 35