0
MID4BNW2Uq-01;Standard Offline - Acc 01;SA\;BATE:GOOGN

I'm trying to split the above line on semicolons like so: line.split(";", -1).

The resulting list that I need is:

1. MID4BNW2Uq-01
2. Standard Offline - Acc 01
3. SA\;BATE:GOOGN

But instead, I get one more element because of that ";" inside SA\;BATE:GOOGN:

1. MID4BNW2Uq-01
2. Standard Offline - Acc 01
3. SA\
4. BATE:GOOGN

I'm looking for a way to make the .split method match ";" BUT NOT "\;". In other words, split on the semicolon (;) only if there's no "\" right before it.

I've thought about using regex but I'm at a complete loss when it comes to it. Any help would be much appreciated. Thank you!

BDL
  • 19,702
  • 16
  • 49
  • 50
George Cimpoies
  • 695
  • 2
  • 11
  • 23

1 Answers1

2

What you're looking for is a zero-length assertion called "negative lookbehind".

For example,

(?<!a)b

matches a "b" that is not preceded by an "a", using negative lookbehind.

Try splitting on this:

(?<!\\);

The backslash is a special character in regular expressions so it must be escaped using an extra backslash.

neuhaus
  • 3,588
  • 1
  • 9
  • 24