1

Here is a very simple regex:-

kabk
k(.*)k
result:
kabk has 1 group:
ab

What I normally do is the following, which is right but unnecessary.

k([^k]*)k

Can anyone explain to me how the regex parser work? Is the .* supposed to consume as much as possible.

ΩmegaMan
  • 26,526
  • 10
  • 91
  • 107
leon
  • 365
  • 3
  • 10
  • 2
    possible duplicate of [What do lazy and greedy mean in the context of regular expressions?](http://stackoverflow.com/questions/2301285/what-do-lazy-and-greedy-mean-in-the-context-of-regular-expressions) –  Mar 07 '14 at 04:35

4 Answers4

2

Yes, the * operator is greedy, and will capture as many valid characters as it can.

For example, the pattern k(.*)k applied to kkkkak will capture kkka.

You can make an operator non-greedy with the ? operator.

So the pattern k(.*?)k applied to kkkkak will capture nothing, because the smallest non-greedy match is to allow nothing between the first two k characters.

In reply to your comment, the existance of the final k in the pattern means that it needs to leave as much as possible to still match a k after consuming as much as possible.

2

Lets see k([^k]*)k (matching using negation) first on regex101.com debugger::

1   /k([^k]*)k/ kabk
2   /k([^k]*)k/ kabk
3   /k([^k]*)k/ kabk
4   /k([^k]*)k/ kabk
5   /k([^k]*)k/ kabk
6   /k([^k]*)k/ kabk
7   /k([^k]*)k/ kabk
#   Match found in 7 steps.

Now lets see k(.*)k (matching using geedy match) on same regex101.com debugger:

1   /k(.*)k/ kabk
2   /k(.*)k/ kabk
3   /k(.*)k/ kabk
4   /k(.*)k/ kabk
5   /k(.*)k/ kabk
6   /k(.*)k/ kabk BACKTRACK
7   /k(.*)k/ kabk BACKTRACK
8   /k(.*)k/ kabk
9   /k(.*)k/ kabk
#   Match found in 9 steps.

Clearly first regex is much more efficient in longer strings as there is no BACKTRACKing involved.

anubhava
  • 713,503
  • 59
  • 514
  • 593
1

In kabk, there's only two ks. So it's not proper example to learn about greedy match.

Let's try it with different example: kabkxyk

Greedy version match returns kabkxyk (.* advances before the last k): (Followng is javascript):

'kabkxyk'.match(/k.*k/)
// ["kabkxyk"]

while non-greedy version match returns kabk:

'kabkxyk'.match(/k.*?k/)
// ["kabk"]
falsetru
  • 336,967
  • 57
  • 673
  • 597
  • Thanks. if put greed aside, why not .* in my example consume all the character after the first 'k', which find no match. – leon Mar 07 '14 at 04:38
  • @leon, The regular expression `.*` match greedily **as long as the entire pattern matches**. – falsetru Mar 07 '14 at 04:40
0

Yes, .* is greedy. It matches as much as it can.

.*? is non-greedy, or parsimonious. It matches as little as it can.

If you match "bananas" against ba.*a, it will match banana.

If you match "bananas" against ba.*?a, it will match bana.

Andy Lester
  • 86,927
  • 13
  • 98
  • 148