-1

I need help setting a regex that can separate some strings from some commas.

This is the regex I have so far:

(?<string>(?<=")(?:[^""\\]|\\.)*(?="))|(?<comma>\,)

The idea is to group strings defined by being enclosed in quotes and also group commas. A use case would be:

"Dog", "cat", "Dog, cat"

The wanted output:

string group (3 elements):

  • Dog
  • cat
  • Dog, cat

comma group (2 elements):

  • ,
  • ,

The problem is that my regex pattern reads the comma as a string too, since it's surrounded by quotes, so my output is:

string group (5 elements):

  • Dog
  • ,
  • cat
  • ,
  • Dog, cat

Comma group (0 elements):

Michał Turczyn
  • 30,583
  • 14
  • 41
  • 64
  • 1
    I'd suggest using a CSV parser instead. – juharr Oct 15 '19 at 16:39
  • 4
    So, you want a CSV parser. Regex is horrible at this kind of thing, comparatively. .NET has a [perfectly adequate CSV parser](https://docs.microsoft.com/en-us/dotnet/api/microsoft.visualbasic.fileio.textfieldparser?view=netframework-4.8) built in. And if you do a search for "C# CSV Parser" you can find a ton of other libraries that can accomplish the task far more efficiently than RegEx. – Sam Axe Oct 15 '19 at 16:40

2 Answers2

1

You can try:

var testString = "\"Dog\", \"cat\", \"Dog, cat\"";
var splitted = Regex.Split(testString, @"""\s*,\s*""").Select(s => s.Trim('"'));

Used pattern: "\s*,\s*", which splittes on comma, which is only between quotes: ", ", can also have whitespaces in between.

It also trimms " from elements after splitting, as first and last element would have additional quetes :)

Michał Turczyn
  • 30,583
  • 14
  • 41
  • 64
-1

You should ignore the commas altogether and just isolate non-quote characters enclosed in quotes. By doing match.Next until the match fails you should get them all.

Martin Maat
  • 640
  • 4
  • 21