-2

I have a number of supposed csv's but in fact they have some rows with different numbers of fields. I would like to found out which rows these are and look at them. If the csv's weren't broken I would just use pandas and do:

df = pd.read_csv("file.csv")

But this isn't suitable for data cleaning and preprocessing I need to do.

How can I find the number of fields in each row in a "csv" file? Is it, for example, possible to just read in one row at a time, without remembering the number of fields from previous rows?

graffe
  • 17,307
  • 35
  • 112
  • 239
  • Why the downvote? – graffe Jun 08 '18 at 20:54
  • You can visually get a list of all "bad" lines by calling `pd.read_csv('file.csv',error_bad_lines=False)`. I am not sure you can store it in a variable for further processing. – DYZ Jun 08 '18 at 21:07
  • [Possible duplicate](https://stackoverflow.com/questions/32334966/pandas-bad-lines-warning-capture). – DYZ Jun 08 '18 at 21:09

2 Answers2

1

CSV is not a fully defined standard, so close to RFC 4180 you can do something like this

import re
with open('file.csv', 'r') as f:
    print([re.sub(r'("[^"]*),([^"]*")', r'\1<comma>\2', l).count(',') for l in f.readlines()])

which counts the commas after replacing the ones enclosed in double quotes.

Community
  • 1
  • 1
Diego Torres Milano
  • 61,192
  • 8
  • 106
  • 129
-1

It seems the following works.

import csv
def f(s):
    return map(len,csv.reader(s.split("\n"))
graffe
  • 17,307
  • 35
  • 112
  • 239