24

I'm checking line by line in C#

Example data:

bob jones,123,55.6,,,"Hello , World",,0
jim neighbor,432,66.5,,,Andy "Blank,,1
john smith,555,77.4,,,Some value,,2

Regex to pick commas outside of quotes doesn't resolve second line, it's the closest.

Dale K
  • 21,987
  • 13
  • 41
  • 69
Chris Hayes
  • 3,776
  • 7
  • 38
  • 70

5 Answers5

55

Stand back and be amazed!


Here is the regex you seek:

(?!\B"[^"]*),(?![^"]*"\B)


Here is a demonstration:

regex101 demo


  • It does not match the second line because the " you inserted does not have a closing quotation mark.
  • It will not match values like so: ,r"a string",10 because the letter on the edge of the " will create a word boundary, rather than a non-word boundary.

Alternative version

(".*?,.*?"|.*?(?:,|$))

This will match the content and the commas and is compatible with values that are full of punctuation marks

regex101 demo

Vasili Syrakis
  • 8,846
  • 1
  • 36
  • 55
  • not working with line punctuations ie question marks, periods – Chris Hayes Jan 14 '14 at 20:14
  • Where are those punctuations located in the string as an example? – Vasili Syrakis Jan 14 '14 at 21:28
  • 1
    I've put in an alternative version, hopefully that works with your code... if not, we shall come up with something else. – Vasili Syrakis Jan 15 '14 at 00:55
  • To compile correctly in C# use double double quotes for escaping instead of a slash double quote. ex: "" instead of \" – Jason Foglia Jun 05 '14 at 15:48
  • This may help with finding the opposite - [removing commas inside quotes](http://stackoverflow.com/a/23205667/845584) – PeterX Feb 18 '15 at 05:40
  • The alternative version does not work with CSV files, where the quotes are optional. The original looks better but fails identifying the first comma in this example: _"bob jones","","123",55.6,,,"Hello , World",,0_. Unfortunately I'm not proficient enough to offer a solution :-) – giacecco Jun 25 '15 at 16:53
  • You can also use similar regex to find other strings not in quotes (?!\B"[^"]*)SOMEOTHERSTRING(?![^"]*"\B) – Andrew Downes Aug 28 '17 at 13:15
  • 1
    Found a better working solution [here](https://stackoverflow.com/a/25544437/3459910) – winklerrr Oct 20 '17 at 13:06
  • There's might be more thorough solution but this solution saved my code interview, thanks! – Phạm Huy Phát Mar 27 '22 at 05:53
2

The below regex is for parsing each fields in a line, not an entire line

Apply the methodical and desperate regex technique: Divide and conquer

Case: field does not contain a quote

  • abc,
  • abc(end of line)

[^,"]*(,|$)

Case: field contains exactly two quotes

  • abc"abc,"abc,
  • abc"abc,"abc(end of line)

[^,"]*"[^"]*"[^,"]*(,|$)

Case: field contains exactly one quote

  • abc"abc(end of line)
  • abc"abc, (and that there's no quote before the end of this line)

[^,"]*"[^,"]$

[^,"]*"[^"],(?!.*")

Now that we have all the cases, we then '|' everything together and enjoy the resultant monstrosity.

Community
  • 1
  • 1
twj
  • 699
  • 3
  • 12
1

try this pattern ".*?"(*SKIP)(*FAIL)|, Demo

alpha bravo
  • 7,520
  • 1
  • 16
  • 23
0
import re

print re.sub(',(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)',"",string)
Michał Perłakowski
  • 80,501
  • 25
  • 149
  • 167
Nithin
  • 171
  • 1
  • 3
0

The best answer written by Vasili Syrakis does not work with negative numbers inside quotation marks such as:

bob jones,123,"-55.6",,,"Hello , World",,0
jim neighbor,432,66.5

Following regex works for this purpose:

,(?!(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$))

But I was not successful with this part of input:

,Andy "Blank,
StilesCrisis
  • 15,646
  • 4
  • 35
  • 58