-1

Im trying to parse

|123|create|item|1497359166334|Sport|Some League|\|Team\| vs \|Team\||1497359216693|

With regex (https://regex101.com/r/KLzIOa/1/)

I currently have

[^|]++

Which is parsing everything correctly except \|Team\| vs \|Team\|

I would expect this to be parsed as |Team| vs |Team|

If i change the regex to

[^\\|]++

It parses the Teams separately instead of together with the escaped pipe

Basically i want to parse the fields between the pipes however, if there are any escaped pipes i would like to capture them. So with my example i would expect

["123", "create", "item", "1497359166334", "Sport", "Some League", "|Team| vs |Team|", "1497359216693"]
Jack Wilkinson
  • 447
  • 3
  • 14

3 Answers3

1

You can alternate between:

  • \\. - A literal backslash followed by anything, or
  • [^|\\]+ - Anything but a pipe or backslash
(?:\\.|[^|\\]+)+

https://regex101.com/r/KLzIOa/2

Note that there's no need for the possessive quantifier, because no backtracking will occur.

If you also want to replace \|s with |s, then do that afterwards: match \\\| and replace with |.

CertainPerformance
  • 313,535
  • 40
  • 245
  • 254
  • In this situation `\|Team\| vs \|Team\|` is captured. Would that mean i would have to remove the `\` myself after manually? – Jack Wilkinson Jan 18 '20 at 10:17
  • A regular expression can generally *match* substrings, or *replace* part of a match, but not do both at the same time. If you want to replace the `\|`s with `|`s, you'll have to do that afterwards. – CertainPerformance Jan 18 '20 at 10:18
1

To handle escaping, you should match a backslash and the character after it as a single "item".

(?:\\.|[^|])++

This conveniently also works for escaping the backslashes themselves!

To then remove the backslashes from the results, use a simple replacement:

Replace: \\(.)
With: $1
Niet the Dark Absol
  • 311,322
  • 76
  • 447
  • 566
1

Use:

(?:\\\||[^|])+

Demo & explanation

Toto
  • 86,179
  • 61
  • 85
  • 118