I am a Python dev who is stumped on how to acheive the following without writing a really ugly set of rules.
I have a list of strings, and I would like to split each string in the list by "," to achieve a list of lists of strings.
The problem is that one of the "columns" contains code in plaintext, delimited by double quotes "". And I would like to ignore the "," within the string delimited by the double quotes when splitting by comma.
i.e. (Note that I am using escaped quotes in the double quote, so Python would not recognize this as string delimiting quotes that would make the below an invalid string format)
'This,is,an,""example, but I am not sure how to do it""'
Should split to
[ 'This', 'is', 'an', '""example, but I am not sure how to do it""' ]
If I do this naively, i.e. just splitting by comma, I would get
[ 'This', 'is', 'an', '""example', 'but I am not sure how to do it""' ]
Moreover, the csv contains a fixed number of columns, so I can easily write a rule that identifies the double quotes and then splits the string according to the commas before and after the double quoted substring, but is there a better way?
This would be a really cool thing to do with regular expressions, but I have no idea how to do that. Is there a way to pass a regular expression to the ".split(...)" function on the string object in python to tell it to treat the entire regular expression as a single object, rather than parse that substring according to the same rule?
i.e.
'This,is,a,""test, because I don't know how to do it""'
splits to
[ 'This', 'is', 'a', <Python string object> ]
Where is a pointer to the string '""test, because I don't know how to do it""' in memory?