0

I have a string with 5 pieces of data delimited by underscores:

AAA_BBB_CCC_DDD_EEE

I want a different regex for each component. The regex needs to return just the one component. For example, the first would return just AAA, the second for BBB, etc.

I am able to parse out AAA with the following:

^([^_]*)?

I see that I can do a look-around like this to find:

(?<=[^_]*_).*
BBB_CCC_DDD_EEE

But the following can not find just BBB

(?<=[^_]*_)[^_]*(?=_)
George Hernando
  • 2,480
  • 7
  • 36
  • 56
  • 1
    what is your question please! – YCF_L Apr 11 '18 at 21:51
  • and is it Java or JS? 'cause `(?<=[^_]*_).*` wouldn't work in Java. – revo Apr 11 '18 at 21:54
  • 1
    You really don't want to split the string first? – dnault Apr 11 '18 at 21:56
  • ? >> I want a different regex for each component. I would like to get 5 separate regex expressions, one to find each of the five components with no other text I am configuring a tool based on Java that accepts only a single regex statement. – George Hernando Apr 11 '18 at 21:58
  • Does it have to match the whole expression or can you use a capturing group? – shmosel Apr 11 '18 at 22:09
  • It needs to match the whole expression – George Hernando Apr 11 '18 at 22:19
  • Possible duplicate of [Splitting a nested string keeping quotation marks](https://stackoverflow.com/questions/36292591/splitting-a-nested-string-keeping-quotation-marks) – Ethan Moore Apr 11 '18 at 23:14
  • are the strings between the "_" constant length? – AndresDLRG Apr 11 '18 at 23:21
  • 2
    This is arguably not what Regexes are for (especially considering Java Regexes don't support variable-length look-behind), and **objectively** much more **simply** achieved using `str.split("_")` (which would coincidentally also be much more readable). – ccjmne Apr 12 '18 at 00:03
  • @ccjmne Java supports variable-length lookbehinds but not infinite lookbehinds. – revo Apr 12 '18 at 06:48
  • Ah! Wow, I was really convinced it wasn't a thing in Java. Thanks @revo – ccjmne Apr 12 '18 at 09:53

2 Answers2

0

Mixing lookbehind and lookahead

^([^_]+)? // 1st
(?<=_)[^_]+ // 2nd
(?<=_)[^_]+(?=_[^_]+_[^_]+$) // 3rd
(?<=_)[^_]+(?=_[^_]+$) // 4th
[^_]+$ // 5th

Just if the lengths of the strings beetween the "_" are known it can be like this

1st match

^([^_]+)?

2nd match

(?<=_)\K[^_]+

3rd match

(?<=_[A-Za-z]{3}_)\K[^_]+

4th match

(?<=_[A-Za-z]{3}_[A-Za-z]{3}_)\K[^_]+

5th match

(?<=_[A-Za-z]{3}_[A-Za-z]{3}_[A-Za-z]{3}_)\K[^_]+

each {3} is expressing the length of the string beetween "_"

AndresDLRG
  • 136
  • 7
0

If your string is always uses underscores, you might use 1 regex to capture your values in a capturing group by repeating the pattern of what is before (in this case NOT an underscore followed by an underscore) using a quantifier which you can change like {3}.

This way you can specify using the quantifier how many times you want to repeat the pattern before and then capture your match. For your example string AAA_BBB_CCC_DDD_EEE you could use {0}, {1},{2},{3} or {4}

^(?:[^_\n]+_){3}([0-9A-Za-z]+)(?:_[^_\n]+)*$

That would match:

  • ^ Assert position at start of the line
  • (?:[^_\n]+_){3} In a non capturing group (?:, match NOT and underscore or a new line one or more times [^_\n]+ followed by an underscore and repeat that n times (In this example n is 3 times)
  • ([0-9A-Za-z]+) Capture your characters in a group using for example a character class (or use [^_]+ to match not an underscore but that will also match any white space characters)
  • (?:_[^_\n]+)* Following after your captured values, repeat in a non capturing group matching an underscore, NOT and underscore or a new line one or more times and repeat that pattern zero or more times to get a full match
  • $ Assert position at the end of the line
The fourth bird
  • 127,136
  • 16
  • 45
  • 63