1

I am a Python dev who is stumped on how to acheive the following without writing a really ugly set of rules.

I have a list of strings, and I would like to split each string in the list by "," to achieve a list of lists of strings.

The problem is that one of the "columns" contains code in plaintext, delimited by double quotes "". And I would like to ignore the "," within the string delimited by the double quotes when splitting by comma.

i.e. (Note that I am using escaped quotes in the double quote, so Python would not recognize this as string delimiting quotes that would make the below an invalid string format)

'This,is,an,""example, but I am not sure how to do it""'

Should split to

[ 'This', 'is', 'an', '""example, but I am not sure how to do it""' ]

If I do this naively, i.e. just splitting by comma, I would get

[ 'This', 'is', 'an', '""example', 'but I am not sure how to do it""' ]

Moreover, the csv contains a fixed number of columns, so I can easily write a rule that identifies the double quotes and then splits the string according to the commas before and after the double quoted substring, but is there a better way?

This would be a really cool thing to do with regular expressions, but I have no idea how to do that. Is there a way to pass a regular expression to the ".split(...)" function on the string object in python to tell it to treat the entire regular expression as a single object, rather than parse that substring according to the same rule?

i.e.

'This,is,a,""test, because I don't know how to do it""'

splits to

[ 'This', 'is', 'a', <Python string object> ]

Where is a pointer to the string '""test, because I don't know how to do it""' in memory?

Adam Jaamour
  • 1,046
  • 1
  • 16
  • 30
user227837
  • 129
  • 1
  • 10
  • 3
    Use the [`csv`](https://docs.python.org/3/library/csv.html) module. Its reader classes deal with delimiters and quotechars out of the box. – user2390182 Nov 28 '17 at 10:26

0 Answers0