0

I am using the code below to read a csv file into a dataframe. However, I get the error pandas.parser.CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 and hence I changed pd.read_csv('D:/TRYOUT.csv') to pd.read_csv('D:/TRYOUT.csv', error_bad_lines=False) as suggested here. However, I now get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 1: invalid continuation byte in the same line.

def ExcelFileReader():
    mergedf = pd.read_csv('D:/TRYOUT.csv', error_bad_lines=False)
    return mergedf
Henry Ecker
  • 31,792
  • 14
  • 29
  • 50
user131983
  • 3,627
  • 4
  • 25
  • 38

3 Answers3

1

If you're on Windows, you probably need to use pd.read_csv(filename, encoding='latin-1')

maxymoo
  • 32,647
  • 9
  • 86
  • 115
0

I had a similar problem and had to use

utf-8-sig 

as the encoding,

The reason i used utf-8-sig is because if you do ever get non-Latin characters it wont be able to deal with it correctly. There are a few ways of getting around the problem, but i guess you can just choose the best that suits your needs.

Hope that helps.

Vishal Jethwa
  • 57
  • 1
  • 7
0

If you would like to exclude the rows providing error and ignore the malformed data then you need to use:

pd.read_csv(file_path, encoding="utf8", error_bad_lines=False, encoding_errors="ignore")
j__carlson
  • 1,272
  • 3
  • 10
  • 18