0

Why on applying regular expression(rx) on data(d) gives output(o) ?
Regular expression (rx):

s/(?<!\#include)[\s]*\<[\s]*([^\s\>]*)[\s]*\>/\<$1\>/g

Data (d):

#include  <a.h>  // 2 spaces after e

output (o):

#include <a.h>  // 1 space is still there

Expected output is:

#include<a.h>  // no space after include

ikegami
  • 343,984
  • 15
  • 249
  • 495
  • 1
    tip: `[\s]` is pointless. `[]` is for grouping MULTIPLE characters into a single match point. `[\s]*` is functionally identical to `\s*`. – Marc B Jul 31 '13 at 15:15

3 Answers3

6

The condition (?<!\#include) is true as soon as you've passed the first of the two spaces, therefore the match starts there.

#include  <a.h>
         ^^^^^^- matched by your regex.

That means the space is not removed by your replace operation.

If you use a positive lookbehind assertion instead, you get the desired result:

s/(?<=#include)\s*<\s*([^\s>]*)\s*>/<$1>/g;

which can be rewritten to use the more efficient \K:

s/#include\K\s*<\s*([^\s>]*)\s*>/<$1>/g;
Community
  • 1
  • 1
Tim Pietzcker
  • 313,408
  • 56
  • 485
  • 544
2

?<!\#include)[\s] is a space that is not directly preceded by #include. The first space in #include <a.h> is directly preceded by #include, so it isn't matched. The second one isn't (it's preceded by the other space), so that's where the match starts.

sepp2k
  • 353,842
  • 52
  • 662
  • 667
0

As an aside comment, you can use this pattern which doesn't use the lookbehind:

s/(?:#include\K|\G)(?:\s+|(<|[^\s><]+))/$1/g

pattern details:

(?:              # open a non capturing group
    #include\K   # match "#include" and reset it from the match result
  |              # OR
    \G           # a contiguous match
)                # close the non capturing group
(?:          
    \s+          # white characters (spaces or tabs here) 
  |              # OR
    (            # capturing group
        <
      |
        [^\s><]+ # content inside brackets except spaces (and brackets)
    )
)

The search stop at the closing bracket since it is not describe in the pattern and since there is no more contiguous matches until the next #include.

Casimir et Hippolyte
  • 85,718
  • 5
  • 90
  • 121