53

I have some input that looks like the following:

A,B,C,"D12121",E,F,G,H,"I9,I8",J,K

The comma-separated values can be in any order. I'd like to split the string on commas; however, in the case where something is inside double quotation marks, I need it to both ignore commas and strip out the quotation marks (if possible). So basically, the output would be this list of strings:

['A', 'B', 'C', 'D12121', 'E', 'F', 'G', 'H', 'I9,I8', 'J', 'K']

I've had a look at some other answers, and I'm thinking a regular expression would be best, but I'm terrible at coming up with them.

Lightness Races in Orbit
  • 369,052
  • 73
  • 620
  • 1,021
XåpplI'-I0llwlg'I -
  • 20,410
  • 24
  • 99
  • 149

1 Answers1

74

Lasse is right; it's a comma separated value file, so you should use the csv module. A brief example:

from csv import reader

# test
infile = ['A,B,C,"D12121",E,F,G,H,"I9,I8",J,K']
# real is probably like
# infile = open('filename', 'r')
# or use 'with open(...) as infile:' and indent the rest

for line in reader(infile):
    print line
# for the test input, prints
# ['A', 'B', 'C', 'D12121', 'E', 'F', 'G', 'H', 'I9,I8', 'J', 'K']
agf
  • 160,324
  • 40
  • 275
  • 231
  • 1
    I am not sure this answers the question. Would the output be what OP has asked for? Where is `reader` being used here, or how should it be? – heltonbiker Nov 09 '11 at 18:54
  • 1
    @heltonbiker Yes, it gives the desired output. Please look at the last line of my answer, or run the code yourself and test it. `csv.reader` is being used in the `for` line -- it reads a line from the input iterable, and transforms it into a list of cells. – agf Nov 09 '11 at 18:56
  • Fine, just the answer looked incomplete. Thanks for caring. – heltonbiker Nov 09 '11 at 18:57
  • 7
    @heltonbiker I've had that feeling with Python too -- it feels like you're not doing anything at all sometimes, and it still works :) – agf Nov 09 '11 at 18:58
  • 1
    If you came here looking for a regex for this answer, see [this answer](https://stackoverflow.com/questions/16710076/python-split-a-string-respect-and-preserve-quotes) – Austin A Nov 21 '17 at 16:32
  • @TahreemIqbal I'm not sure what you mean. Spaces are perfectly valid in a CSV file, and this handles them fine. Can you show me an example input and output, along with what you expect the output to be you're not getting? – agf Aug 01 '18 at 13:32
  • for input string ` infile = ['abc, def, "ghi, jkl", mno, pqr']` it gives this output: `['abc', ' def', ' "ghi', ' jkl"', ' mno', ' pqr']` – Tahreem Iqbal Aug 02 '18 at 06:00
  • @TahreemIqbal That's because your field isn't `"ghi, jkl"` it's ` "ghi, jkl"`. The leading space invalidates the quote. The space inside the quotes is handled fine. Edit: Can't get the second example to format correctly. – agf Aug 03 '18 at 13:58
  • 1
    You could fix the issue with spaces after the commas by passing `skipinitialspace=True` to `reader`. – Blckknght Jan 13 '19 at 18:27
  • not working for complex combinations such as: 'priceRanges: "[{\\"min\\":0,\\"max\\":889}]"' – Nir O. Sep 01 '21 at 00:33
  • @NirO. That appears to be JSON with another, escaped, snippet of JSON in a string inside of it. Not at all related to the CSV data this question was about. – agf Sep 01 '21 at 01:38
  • the csv module solution is inferior in any case and I would turn to better solutions that would cover more extensive use cases. when you have mediocre and good solution, its a good practice to take the good one – Nir O. Sep 02 '21 at 09:33
  • @NirO. I think you've got it backwards. If you have CSV data, then the built in CSV parser is better than something hand rolled. It is the good solution, for that use-case. Your data isn't CSV, so this answer just doesn't apply at all. – agf Sep 03 '21 at 01:53
  • in case you need to have a solution which works in every programming language having support for regular expressions, have a look here: https://stackoverflow.com/questions/17939307/regular-expression-for-comma-based-splitting-ignoring-commas-inside-quotes – Aydin K. Sep 25 '21 at 22:20