5

Bash can natively match regular expressions in the [[ ]] construct, using the =~ operator. When it matches something, the result is stored in the BASH_REMATCH array variable. BASH_REMATCH contains, in order, the entire matched text, and then each matched subexpression:

$ foo=abcd
$ [[ $foo =~ (.(.))(.) ]]
$ printf "%s\n" "${BASH_REMATCH[@]}"
abc
ab
b
c

Is there something that would look like:

let foo = 'abcd'
let bar = groupmatch(foo, '\(.\(.\)\)\(.\)')

And echo bar would give:

['abc', 'ab', 'b', 'c']

It would be something like split(), but far, far more powerful. The (.(.))(.) is just a toy expression - the full power of regular expressions can be availed in [[ ]].

muru
  • 24,838
  • 8
  • 82
  • 143

1 Answers1

8

Maybe you could try matchlist() and filter(), like this:

:let foo = 'abcd'
:let bar = filter(matchlist(foo, '\v(.(.))(.)'), 'v:val !=# ""')

The output of :echo bar should be:

['abc', 'ab', 'b', 'c']

matchlist() by itself always returns a list of ten items if the match was successful, one for each possible sub-expression, whether or not the pattern actually contained enough sub-expressions:

:echo matchlist('abcd', '\v(.(.))(.)')
['abc', 'ab', 'b', 'c', '', '', '', '', '', '']

More than nine sub-expressions causes it to error out:

:echo matchlist('abcdefghijklmn', '\v(.(.))(.(.))(.(.))(.(.))(.(.))(.(.))')
E872: (NFA regexp) Too many '('
E51: Too many (
[]

Whereas Bash can go further:

$ foo=abcdefghijklmn
$ [[ $foo =~ (.(.))(.(.))(.(.))(.(.))(.(.))(.(.)) ]]
$ printf "%s\n" "${#BASH_REMATCH[@]}" "${BASH_REMATCH[@]}"
13
abcdefghijkl
ab
b
cd
d
ef
f
gh
h
ij
j
kl
l
user9433424
  • 6,138
  • 2
  • 21
  • 30
  • 2
    Nice. I'd given up on the match*() functions as being related to :match. So, if I read this correctly, as with regular expressions in Vim, this is limited to 9 groups, and the list returned always has 10 entries, no matter how many groups were actually used? – muru Mar 24 '16 at 18:50
  • 1
    @muru I'm no expert, but I think so. At least in the help, they talk about submatches, like "\1", "\2", etc. in :substitute. And afaik, with :substitute you can only refer to 10 submatches. – user9433424 Mar 24 '16 at 18:53
  • Yes, that's the correct error. Just a typo when I re-ran the commands for copying here. – muru Mar 24 '16 at 19:20