0

I'm getting inconsistencies when comparing results from C++ MSVC and RegExr.

I'm trying to find references within a json schema:

"$ref":"X.json#/definitions/X"

To account for whitespace before and after the colon, on RegExr I'm using this regular expression:

"\$ref"\s*:\s*".*?"

In C++ I've escaped the string to:

    //Search the file for any "$ref": variables using regex
    std::regex regexExpression("\"\\$ref\"\\s*:\\s*\".*?\"");
    std::smatch matches;

    if(std::regex_search(fileContents, matches, regexExpression))
    {
        for (auto str : matches)
        {

On Regexr this will be caught:

"$ref" : "X.json#/definitions/X"

Whereas in the C++ program that will not.

Any ideas?

APoster
  • 53
  • 1
  • 1
  • 6

3 Answers3

0

You need to capture the sequences you want to get. This is done through the use of parenthesis in the regex:

std::regex regexExpression("(\"\\$ref\")\\s*:\\s*(\".*?\")");
//                          ^          ^         ^       ^
// Note parentheses in these places
Some programmer dude
  • 380,411
  • 33
  • 383
  • 585
  • No I'm afraid that's not right. That captures: "$ref" and then ""X.json#/definitions/X"" seperately. I need it to capture both at once. The version I posted does, but it doesn't handle whitespace between the colon properly - i.e. what \s* is supposed to handle. So mine will catch: "$ref":"X.json#/definitions/X" But not "$ref" : "X.json#/definitions/X" – APoster Aug 30 '18 at 17:30
  • @APoster So capture each separately, and concatenate them with a colon in between? – Some programmer dude Aug 30 '18 at 17:34
0

after calling std::regex_search, matches contains a single matched expression.

Iterating over std::smatch returns the sub-expressions of the match. As your regex contains no captures there are no sub-expressions.

To find all matches in a string you need to repeatedly call std::regex_search, see https://en.cppreference.com/w/cpp/regex/regex_search

while(std::regex_search(fileContents, matches, regexExpression))
{
    std::cout << matches.str() << '\n';
    fileContents = matches.suffix();
}

you might also want to look at std::regex_iterator

Alan Birtles
  • 27,579
  • 4
  • 25
  • 50
  • This was the issue. I did initially try it, but I was getting stuck in an infinite loop. The important bit here to exit the loop is: fileContents = matches.suffix(); This moves the string forward. You have to use a copy of the original string if you have further processing to do on the source string though. – APoster Sep 03 '18 at 09:54
0

Looks like a bug in Visual Studio 2015, and your code is just fine. It is also possible that the the input string contains some characters that you are not aware of.

To debug it, I suggest you start removing parts from your expression, until it matches. This way, the cause for the mismatch will be easily identified.

The interface of matches is described in cppreference std::match_results::operator[]:

If n == 0, returns a reference to the std::sub_match representing the part of the target sequence matched by the entire matched regular expression.

The way you escape quotes and backslashes is correct, but not very readable. A more readable way is to use the R"(text)" syntax:

std::regex regexExpression(R"("\$ref"\s*:\s*".*?")"); 
std::smatch matches;
if(std::regex_search(fileContents, matches, regexExpression)) 
{ 
   for (auto str : matches) 
   {

This is just a matter of style, not correctness, as your code is correct. You can try this out on wandbox.org

I have tested it on GCC-8.1, GCC-5.4.9, clang 3.8.9, and Visual Studio 2017. All of these compilers recognize the match.

Michael Veksler
  • 7,972
  • 1
  • 19
  • 31
  • Other than wandbox, you can also present future readers the differences between compilers, even between msvc 2015 - pre2018, on [godbolt](https://godbolt.org/) – sandthorn Aug 31 '18 at 05:21
  • 1
    @sandthorn in general I prefer godbolt, but unfortunately publicly accessible *Compiler Explorer* does not allow code execution. It allows only compilation. – Michael Veksler Aug 31 '18 at 05:32