117

What is the difference between:

(.+?)

and

(.*?)

when I use it in my php preg_match regex?

David19801
  • 10,694
  • 24
  • 81
  • 123

9 Answers9

183

They are called quantifiers.

* 0 or more of the preceding expression

+ 1 or more of the preceding expression

Per default a quantifier is greedy, that means it matches as many characters as possible.

The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.

Example greedy/ungreedy

For example on the string "abab"

a.*b will match "abab" (preg_match_all will return one match, the "abab")

while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")

You can test your regexes online e.g. on Regexr, see the greedy example here

Morten Jensen
  • 5,555
  • 3
  • 41
  • 54
stema
  • 85,585
  • 19
  • 101
  • 125
  • 4
    "lazy" is the more common term for "ungreedy" – Walter Tross Mar 11 '17 at 10:40
  • The example is incorrect. Both `(.+?)` and `(.*?)` behave differently in a various position of regular expressions which are `a(.+?)`, `(.+?)b`, `a(.+?)b`, `a(.*?)`, `(.*?)b`, `a(.*?)b`. – Louis55 Nov 15 '18 at 08:36
  • Why wouldn't a.*b give back "ab"? Isn't it saying "word that has between a and b, 0 or more characters", therefore, ab has zero character between and could be a match. Why is this incorrect? – Hello World Jul 22 '20 at 03:40
  • @HelloWorld, this has to do with the greediness I explained above. `.*` will match as much as possible. If you want to stop as early as possible, then you have to make it ungreedy `.*?` – stema Jul 24 '20 at 07:16
23

The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).

Quentin
  • 857,932
  • 118
  • 1,152
  • 1,264
11

In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:

  • {3,7} means between 3 to 7 matches
  • {,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
  • {3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
  • {,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
  • {5} means exactly 4

Most good languages contain abbreviations, so does RegEx:

  • + is the shorthand for {1,}
  • * is the shorthand for {,}
  • ? is the shorthand for {,1}

This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.

Credit: Codecademy.com

Miladiouss
  • 3,532
  • 1
  • 21
  • 30
11

+ matches at least one character

* matches any number (including 0) of characters

The ? indicates a lazy expression, so it will match as few characters as possible.

Xophmeister
  • 8,247
  • 3
  • 36
  • 73
10

A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.

So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.

DaveRandom
  • 86,228
  • 11
  • 149
  • 173
9

Consider below is the string to match.

ab

The pattern (ab.*) will return a match for capture group with result of ab

While the pattern (ab.+) will not match and not returning anything.

But if you change the string to following, it will return aba for pattern (ab.+)

aba
Azri Jamil
  • 2,362
  • 2
  • 29
  • 36
6

+ is minimal one, * can be zero as well.

jeroen
  • 90,003
  • 21
  • 112
  • 129
4

A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.

Prashant G
  • 4,820
  • 2
  • 34
  • 44
Madara's Ghost
  • 165,920
  • 50
  • 255
  • 304
1

I think the previous answers fail to highlight a simple example:

for example we have an array:

numbers = [5, 15]

The following regex expression ^[0-9]+ matches: 15 only. However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression

kgui
  • 3,879
  • 4
  • 39
  • 51
  • Um, what?!? Why is this answer uv'ed at all? This is simply incorrect. Both patterns will definitely match strings `5` and `15`. – mickmackusa Jul 05 '21 at 09:26