19

From :h E65 we can see that Vim doesn't allow more than 9 capture groups in a substitution command.

For example the following command will work:

s/\v(a)(b)(c)(d)(e)(f)(g)(h)(i)/\9\8\7\6\5\4\3\2\1

But this one with one more capture group will fail:

s/\v(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)/\10\9\8\7\6\5\4\3\2\1

My question is not about why it fails (it's a Vim hard limit) but about why does Vim have this limit at all?

Also, I'm aware that a real life regex with more than 9 capture group would probably be pretty monstrous to read and to maintain but I'm still curious.

statox
  • 49,782
  • 19
  • 148
  • 225
  • 2
    Maybe not related only to Vim: http://stackoverflow.com/a/10993346/2558252 – nobe4 Sep 22 '16 at 15:59
  • 1
    @nobe4: Interesting! So maybe people creating these tools considered that more than 9 groups were useless... – statox Sep 22 '16 at 16:01
  • 1
    I suppose this limit comes from vi, which inherited the limit from ed/sed. Some years ago I made a patch to support up to 99 groups, but it was not included – Christian Brabandt Sep 22 '16 at 16:06
  • @ChristianBrabandt So maybe the question is not totally on topic for this site but I'm still curious to know why ed didn't supported more that 9. Probably because of memory limitations? And that sounds like a cool patch, too bad it wasn't included. – statox Sep 22 '16 at 16:08
  • 2
    @ChristianBrabandt A more useful addition would be to implement numeric flags like in sed: s/.../.../3 would replace only the 3rd occurrence of the pattern. This is probably the feature I miss the most in Vim. – Sato Katsura Sep 22 '16 at 16:18
  • @SatoKatsura there is an entry about this in the todo list and that describes the problem pretty well, search for :s//N I think on vim_use a mapping to achieve this was posted several years ago by A Politz – Christian Brabandt Sep 22 '16 at 16:49
  • 4
    Supporting named captures would be another way to alleviate this problem. That being said, most times I've seen anywhere near 9 capture groups was when people didn't know they could use non-capturing groups -- \%(). – jamessan Sep 22 '16 at 20:06

1 Answers1

28

The obvious reason is that groups with two or more digits are ambiguous: should \12 be taken as group 12, or as group 1 followed by the string 2?

There are other reasons related to efficiency (exponential matching time and the like). These were a show stopper when ed was written. Better algorithms have been discovered since then.

Sato Katsura
  • 4,009
  • 17
  • 24
  • This is a good possibility, do you have any reference/reading regarding this? – nobe4 Sep 22 '16 at 16:11
  • 2
    @nobe4 For the ambiguity part: no, but IMO it's obvious. For the efficiency part, you'd have to read about the early implementations of regexps. It was a well-known problem at the time. I don't have exact citations, but they shouldn't be hard to find. – Sato Katsura Sep 22 '16 at 16:15
  • Indeed that sounds totally plausible. – statox Sep 22 '16 at 16:16
  • 5
    Yes, it's almost definitely that the parser was written to look for a single digit after backslash, and never changed. This was common enough, a long time ago. Other languages have come up with ways around this (for example, only considering \11 a reference to a capture if there are at least 11 of them, which is inconsistent but usually okay; and things like \g{11} for backreferences and ${11} for substitutions), but vim has never introduced any of those. – hobbs Sep 22 '16 at 17:07
  • They could maybe change the parser to look for [0-9a-Z], this would give an additional 52 groups without ambiguity by referencing \1, \a, or \B. – dan Oct 23 '22 at 15:48