UnicodeDecodeError when reading CSV file in Pandas with Python

Question

I'm running a program which is processing 30,000 similar files. A random number of them are stopping and producing this error...

File "C:\Importer\src\dfman\importer.py", line 26, in import_chr
     data = pd.read_csv(filepath, names=fields)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 400, in parser_f
     return _read(filepath_or_buffer, kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
     return parser.read()
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 608, in read
     ret = self._engine.read(nrows)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 1028, in read
     data = self._reader.read(nrows)
File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas\parser.c:6745)
File "parser.pyx", line 728, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:6964)
File "parser.pyx", line 804, in pandas.parser.TextReader._read_rows (pandas\parser.c:7780)
File "parser.pyx", line 890, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:8793)
File "parser.pyx", line 950, in pandas.parser.TextReader._convert_tokens (pandas\parser.c:9484)
File "parser.pyx", line 1026, in pandas.parser.TextReader._convert_with_dtype (pandas\parser.c:10642)
File "parser.pyx", line 1046, in pandas.parser.TextReader._string_convert (pandas\parser.c:10853)
File "parser.pyx", line 1278, in pandas.parser._string_box_utf8 (pandas\parser.c:15657)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 6: invalid    continuation byte

The source/creation of these files all come from the same place. What's the best way to correct this to proceed with the import?

Adding this argument worked for me: ```encoding_errors='ignore'``` — matthewpark319, Dec 06 '21 at 22:11
Ignoring errors should be your very last desperate resort. Similarly, randomly guessing which encoding to use might superficially remove the error, but produce total garbage in the results. If you are new to text encodings, perhaps start your next adventure with the [Stack Overflow `character-encoding` tag info page](http://stackoverflow.com/tags/character-encoding/info) — tripleee, Apr 06 '22 at 16:59

Stefan · Accepted Answer · 2022-02-18T20:59:29.330

1095

read_csv takes an encoding option to deal with files in different formats. I mostly use read_csv('file', encoding = "ISO-8859-1"), or alternatively encoding = "utf-8" for reading, and generally utf-8 for to_csv.

You can also use one of several alias options like 'latin' or 'cp1252' (Windows) instead of 'ISO-8859-1' (see python docs, also for numerous other encodings you may encounter).

See relevant Pandas documentation, python docs examples on csv files, and plenty of related questions here on SO. A good background resource is What every developer should know about unicode and character sets.

To detect the encoding (assuming the file contains non-ascii characters), you can use enca (see man page) or file -i (linux) or file -I (osx) (see man page).

edited Feb 18 '22 at 20:59

answered Aug 11 '13 at 13:10

Stefan

38,755
12
69
79

16

Since this is a Windows issue, `cp1252` might be preferrable to `iso-8859-1`. – tzot Jun 07 '17 at 09:28
Maybe converting the file encoding to UTF-8 is easier using the MS Excel. You may sometimes not know what encoding does the file holds. – Rahul Bali Sep 16 '17 at 20:48
30

Thanks `pd.read_csv('immigration.csv', encoding = "ISO-8859-1", engine='python')` worked for me – Mona Jalal Apr 01 '18 at 19:03
16

Don't blindly assume a certain encoding is the right one just because no exception is thrown. You need to look at the strings and figure out whether the interpretation makes sense. For example, if you get "hors d’½uvre" instead of "hors d’œuvre" you probably need to switch from ISO-8859-1 to ISO-8859-15. – Joachim Wagner May 14 '18 at 08:03
19

for me encoding was `ANSI`. To figure it out, I opened the csv in `notepad` then click `save as`, there it shows the encoding in beside the save button. – Vaibhav Vishal Dec 17 '18 at 12:11
1

On Windows 10, `encoding = "utf-8"` did not work, while `encoding = "ISO-8859-1"` did. Seems challenging to remember. Any way to set this as a default for `pandas`? – Wassadamo Jan 08 '19 at 18:22
@Wassadamo not that I'm aware of other than forking and modifying the source. You may find the `latin1` alias easier. – Stefan Mar 22 '19 at 14:49
1

Can someone please explain why utf-8 isn't working but ISO-8859-1? – Ben Jun 14 '19 at 06:21
8

@Ben Here's a good resource [What every developer should know about unicode and character sets](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/) – Stefan Jun 14 '19 at 13:53
1

On MacOS, I used `file -I` as you mentioned and it stated the charset as `utf-16le`. When I read this csv using `read_csv` with `encoding` parameter set to either `latin` or `utf-8` or any such value, my DataFrame always had NULL values. Only, when I specified the option as `encoding="utf-16le"` were I able to get the correct values in my DataFrame. Hoping that this experience of mine, can help someone else who might encounter a similar issue on MacOS – Sarfraaz Ahmed Feb 06 '21 at 14:13
Taking ```encoding = "ISO-8859-1"``` worked super fine for me – Prasad Nadiger Mar 21 '21 at 06:15
ISO-8859-1 has the unique property among Python encodings that it can read absolutely any binary data without throwing an error. Quite often, beginners will try different encodings without actually looking at their resulting data, then wonder why their results are junk when they picked the wrong one. – tripleee Apr 06 '22 at 17:03

Gil Baggio · Answer 2 · 2020-08-14T14:14:36.067

185

Simplest of all Solutions:

import pandas as pd
df = pd.read_csv('file_name.csv', engine='python')

Alternate Solution:

Open the csv file in Sublime text editor or VS Code.
Save the file in utf-8 format.

In sublime, Click File -> Save with encoding -> UTF-8

Then, you can read your file as usual:

import pandas as pd
data = pd.read_csv('file_name.csv', encoding='utf-8')

and the other different encoding types are:

encoding = "cp1252"
encoding = "ISO-8859-1"

edited Aug 14 '20 at 14:14

answered Dec 21 '17 at 09:53

Gil Baggio

11,049
3
46
33

22

The question explains that there are 30,000 such files. Opening each file manually would not be practical. – Keith Jan 02 '18 at 20:06
10

well at least for one file, this seemed to work for me! – apil.tamang May 31 '18 at 14:43
1

The C engine is evidently more forgiving in what it accepts. For a particular CSV file that opens fine with `encoding='iso-8859-1'`, using instead `engine='python'` throws `_csv.Error: field larger than field limit (131072)`. – Greg Bacon Jan 10 '20 at 22:31
4

alternative solution to use save with encoding was really helpful! here's how to use it for VSCode https://stackoverflow.com/questions/30082741/change-the-encoding-of-a-file-in-visual-studio-code – brownmagik352 Feb 13 '20 at 00:20
4

Thanks for the tip, saving the CSV file via _VS Code_ did the trick for me. – talha06 Aug 10 '20 at 22:21
Yeah this trick works, both in Sublime and VS code..@talha06 – Gil Baggio Aug 14 '20 at 14:15

score 33 · Answer 3 · answered Aug 09 '18 at 09:42

Pandas allows to specify encoding, but does not allow to ignore errors not to automatically replace the offending bytes. So there is no one size fits all method but different ways depending on the actual use case.

You know the encoding, and there is no encoding error in the file. Great: you have just to specify the encoding:

file_encoding = 'cp1252'        # set file_encoding to the file encoding (utf8, latin1, etc.)
pd.read_csv(input_file_and_path, ..., encoding=file_encoding)

You do not want to be bothered with encoding questions, and only want that damn file to load, no matter if some text fields contain garbage. Ok, you only have to use Latin1 encoding because it accept any possible byte as input (and convert it to the unicode character of same code):
```
pd.read_csv(input_file_and_path, ..., encoding='latin1')
```
You know that most of the file is written with a specific encoding, but it also contains encoding errors. A real world example is an UTF8 file that has been edited with a non utf8 editor and which contains some lines with a different encoding. Pandas has no provision for a special error processing, but Python open function has (assuming Python3), and read_csv accepts a file like object. Typical errors parameter to use here are 'ignore' which just suppresses the offending bytes or (IMHO better) 'backslashreplace' which replaces the offending bytes by their Python’s backslashed escape sequence:
```
file_encoding = 'utf8'        # set file_encoding to the file encoding (utf8, latin1, etc.)
input_fd = open(input_file_and_path, encoding=file_encoding, errors = 'backslashreplace')
pd.read_csv(input_fd, ...)
```

Late answer, but targetted at a [duplicate question](https://stackoverflow.com/q/51762885/3545273)... — Serge Ballesta, Aug 09 '18 at 09:43

score 20 · Answer 4 · answered Dec 15 '18 at 13:49

20

with open('filename.csv') as f:
   print(f)

after executing this code you will find encoding of 'filename.csv' then execute code as following

data=pd.read_csv('filename.csv', encoding="encoding as you found earlier"

there you go

answered Dec 15 '18 at 13:49

bhavesh

217
2
2

This naively assumes that Python can correctly divine the encoding where Pandas could not. – tripleee Apr 22 '22 at 06:06

Fledias weh · Answer 5 · 2021-02-05T15:50:10.100

This is a more general script approach for the stated question.

import pandas as pd

encoding_list = ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp273', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737'
                 , 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862'
                 , 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950'
                 , 'cp1006', 'cp1026', 'cp1125', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254'
                 , 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr'
                 , 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2'
                 , 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2'
                 , 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9'
                 , 'iso8859_10', 'iso8859_11', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab'
                 , 'koi8_r', 'koi8_t', 'koi8_u', 'kz1048', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2'
                 , 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32'
                 , 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig']

for encoding in encoding_list:
    worked = True
    try:
        df = pd.read_csv(path, encoding=encoding, nrows=5)
    except:
        worked = False
    if worked:
        print(encoding, ':\n', df.head())

One starts with all the standard encodings available for the python version (in this case 3.7 python 3.7 standard encodings). A usable python list of the standard encodings for the different python version is provided here: Helpful Stack overflow answer

Trying each encoding on a small chunk of the data; only printing the working encoding. The output is directly obvious. This output also addresses the problem that an encoding like 'latin1' that runs through with ought any error, does not necessarily produce the wanted outcome.

In case of the question, I would try this approach specific for problematic CSV file and then maybe try to use the found working encoding for all others.

You should move ISO-8859 to the end of the list, because it will succeed in every case. — tripleee, Apr 06 '22 at 18:30

Prakhar Rathi · Answer 6 · 2022-02-25T19:08:51.830

12

Please try to add

import pandas as pd
df = pd.read_csv('file.csv', encoding='unicode_escape')

This will help. Worked for me. Also, make sure you're using the correct delimiter and column names.

You can start with loading just 1000 rows to load the file quickly.

edited Feb 25 '22 at 19:08

answered May 21 '20 at 05:39

Prakhar Rathi

521
6
17

It works for me. My csv file contains fields with double quotes – Youth overturn May 02 '22 at 18:42

score 9 · Answer 7 · edited Jul 09 '20 at 06:22

9

Try changing the encoding. In my case, encoding = "utf-16" worked.

df = pd.read_csv("file.csv",encoding='utf-16')

edited Jul 09 '20 at 06:22

ah bon

7,903
7
43
86

answered Jul 07 '20 at 16:45

Kunal Kumar

959
11
12

1

`with open (file) as f: encoding = f.encoding` returned `cp1252` for the file I was working with. However, specifying `encoding = 'utf-16'` is what opened the file for me without any errors. Also, I set `engine = 'python'`. Thanks for the suggestion. – datalifenyc Jul 22 '20 at 11:59
1

utf-16 worked for me! All cases above - did not( – Victoria Dec 24 '20 at 11:02

score 8 · Answer 8 · edited Jan 09 '20 at 05:26

8

In my case, a file has USC-2 LE BOM encoding, according to Notepad++. It is encoding="utf_16_le" for python.

Hope, it helps to find an answer a bit faster for someone.

edited Jan 09 '20 at 05:26

UserAG

132
6

answered Dec 18 '18 at 06:13

Vodyanikov Andrew

316
2
5

score 6 · Answer 9 · answered Jan 31 '19 at 02:52

6

Try specifying the engine='python'. It worked for me but I'm still trying to figure out why.

df = pd.read_csv(input_file_path,...engine='python')

answered Jan 31 '19 at 02:52

Jan33

119
1
6

This also worked for me. So did encoding = "ISO-8859-1". It's definitely an encoding issue. If a special character is encoded in ANSI, such as an ellipse character (i.e. "..."), and you try to read it in UTF-8, you may get an error. Bottom line is you must know the encoding the file was created with. – Sean McCarthy Aug 11 '19 at 16:03
"Encoded in ANSI" is not well-defined. The ANSI term on Windows is a misnomer, and doesn't name any specific encoding; rather, it selects a encoding defined elsewhere in the system's settings. – tripleee Apr 23 '22 at 08:50

score 5 · Answer 10 · answered Jul 26 '19 at 15:49

5

In my case this worked for python 2.7:

data = read_csv(filename, encoding = "ISO-8859-1", dtype={'name_of_colum': unicode}, low_memory=False)

And for python 3, only:

data = read_csv(filename, encoding = "ISO-8859-1", low_memory=False)

answered Jul 26 '19 at 15:49

Victor Villacorta

461
4
5

score 5 · Answer 11 · answered Jun 14 '20 at 09:33

5

You can try this.

import csv
import pandas as pd
df = pd.read_csv(filepath,encoding='unicode_escape')

answered Jun 14 '20 at 09:33

Dileep Dominic

401
7
19

score 4 · Answer 12 · edited Sep 19 '19 at 07:36

Struggled with this a while and thought I'd post on this question as it's the first search result. Adding the encoding="iso-8859-1" tag to pandas read_csv didn't work, nor did any other encoding, kept giving a UnicodeDecodeError.

If you're passing a file handle to pd.read_csv(), you need to put the encoding attribute on the file open, not in read_csv. Obvious in hindsight, but a subtle error to track down.

score 4 · Answer 13 · edited Jan 09 '20 at 03:58

I am posting an answer to provide an updated solution and explanation as to why this problem can occur. Say you are getting this data from a database or Excel workbook. If you have special characters like La Cañada Flintridge city, well unless you are exporting the data using UTF-8 encoding, you're going to introduce errors. La Cañada Flintridge city will become La Ca\xf1ada Flintridge city. If you are using pandas.read_csv without any adjustments to the default parameters, you'll hit the following error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 5: invalid continuation byte

Fortunately, there are a few solutions.

Option 1, fix the exporting. Be sure to use UTF-8 encoding.

Option 2, if fixing the exporting problem is not available to you, and you need to use pandas.read_csv, be sure to include the following paramters, engine='python'. By default, pandas uses engine='C' which is great for reading large clean files, but will crash if anything unexpected comes up. In my experience, setting encoding='utf-8' has never fixed this UnicodeDecodeError. Also, you do not need to use errors_bad_lines, however, that is still an option if you REALLY need it.

pd.read_csv(<your file>, engine='python')

Option 3: solution is my preferred solution personally. Read the file using vanilla Python.

import pandas as pd

data = []

with open(<your file>, "rb") as myfile:
    # read the header seperately
    # decode it as 'utf-8', remove any special characters, and split it on the comma (or deliminator)
    header = myfile.readline().decode('utf-8').replace('\r\n', '').split(',')
    # read the rest of the data
    for line in myfile:
        row = line.decode('utf-8', errors='ignore').replace('\r\n', '').split(',')
        data.append(row)

# save the data as a dataframe
df = pd.DataFrame(data=data, columns = header)

Hope this helps people encountering this issue for the first time.

The correct solution in this case would be to establish the correct encoding. Based on a single character, it's hard to say authoritatively, but 0xF1 corresponds to **ñ** in e.g. Latin-1. https://tripleee.github.io/8bit/#f1 has a listing of which character this byte represents in all 8-bit encodings supported by Python. `errors="ignore"` is a supremely bad idea if you care about the integrity of your data. — tripleee, Apr 22 '22 at 05:58
@tripleee good point, but that is a solution that is not always available. Sometimes you are given a data set with much say as to how it’s exported — Jon, Apr 25 '22 at 21:44
I don't see how your comment is relevant. If you can export the file in a known encoding yourself, then of course you don't need to figure out the encoding. If you are given a file with an unknown encoding, figuring out which encoding it's in is vastly preferable over blindly guessing, or brutally removing any data which isn't UTF-8. This still assumes that the file is not corrupted or something,, in which case you would need to find and fix the corruption manually _in addition to_ figuring out the correct encoding. — tripleee, Apr 26 '22 at 06:35

score 3 · Answer 14 · answered Nov 03 '19 at 06:01

3

I have trouble opening a CSV file in simplified Chinese downloaded from an online bank, I have tried latin1, I have tried iso-8859-1, I have tried cp1252, all to no avail.

But pd.read_csv("",encoding ='gbk') simply does the work.

answered Nov 03 '19 at 06:01

Luk Aron

983
6
24

None of the encodings you tried have any support for representing Chinese text. GBK is still moderately popular in China, where it is a national standard, but not used much anywhere else. See also https://en.wikipedia.org/wiki/GBK_(character_encoding) – tripleee Apr 22 '22 at 06:01

score 3 · Answer 15 · edited Jun 20 '20 at 09:12

3

Another important issue that I faced which resulted in the same error was:

_values = pd.read_csv("C:\Users\Mujeeb\Desktop\file.xlxs")

^This line resulted in the same error because I am reading an excel file using read_csv() method. Use read_excel() for reading .xlxs

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 21 '20 at 20:36

Mujeeb Ishaque

1,607
20
13

Wow, everyone else is talking about encoding issues. Looks like my issue was peculiar. – Mujeeb Ishaque Apr 21 '20 at 20:42

score 2 · Answer 16 · answered Dec 17 '18 at 18:14

This answer seems to be the catch-all for CSV encoding issues. If you are getting a strange encoding problem with your header like this:

>>> f = open(filename,"r")
>>> reader = DictReader(f)
>>> next(reader)
OrderedDict([('\ufeffid', '1'), ... ])

Then you have a byte order mark (BOM) character at the beginning of your CSV file. This answer addresses the issue:

Python read csv - BOM embedded into the first key

The solution is to load the CSV with encoding="utf-8-sig":

>>> f = open(filename,"r", encoding="utf-8-sig")
>>> reader = DictReader(f)
>>> next(reader)
OrderedDict([('id', '1'), ... ])

Hopefully this helps someone.

score 2 · Answer 17 · edited Jan 01 '19 at 01:15

2

I am posting an update to this old thread. I found one solution that worked, but requires opening each file. I opened my csv file in LibreOffice, chose Save As > edit filter settings. In the drop-down menu I chose UTF8 encoding. Then I added encoding="utf-8-sig" to the data = pd.read_csv(r'C:\fullpathtofile\filename.csv', sep = ',', encoding="utf-8-sig").

Hope this helps someone.

edited Jan 01 '19 at 01:15

Nisse Engström

4,636
22
26
40

answered Jan 01 '19 at 00:54

tshirtdr1

39
1
6

Nisse, thanks for the edit. Can you please explain what you changed? I don't see a difference. – tshirtdr1 Jan 01 '19 at 15:32
I'm not Nisse, but he added a full stop and a newline before the "Hope this helps" ending. – tripleee Apr 22 '22 at 06:04

score 1 · Answer 18 · answered May 13 '19 at 00:18

1

I am using Jupyter-notebook. And in my case, it was showing the file in the wrong format. The 'encoding' option was not working. So I save the csv in utf-8 format, and it works.

answered May 13 '19 at 00:18

Himanshu Sharma

11
1

score 1 · Answer 19 · answered Oct 11 '19 at 14:12

1

Check the encoding before you pass to pandas. It will slow you down, but...

with open(path, 'r') as f:
    encoding = f.encoding 

df = pd.read_csv(path,sep=sep, encoding=encoding)

In python 3.7

answered Oct 11 '19 at 14:12

DaveP

200
2
13

This will just fail earlier if Python is unable to divine the encoding. On modern Python 3 on non-Windows platforms, this will default to UTF-8, and fail if the file is not compatible. On Windows, this will default to whatever the system is configured to default to, so it might work if you are lucky, or produce garbage if you are less lucky. – tripleee Apr 22 '22 at 06:02

score 1 · Answer 20 · answered Mar 02 '22 at 06:50

1

You can always try to detect the encoding of the file first, with chardet or cchardet:

from pathlib import Path
import chardet

filename = "file_name.csv"
detected = chardet.detect(Path(filename).read_bytes())
# detected is something like {'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}

encoding = detected.get("encoding")
assert encoding, "Unable to detect encoding, is it a binary file?"

df = pd.read_csv(filename, encoding=encoding)

answered Mar 02 '22 at 06:50

mikey

1,442
1
13
19

This helped me get out of tricky situation where I was assuming my json file was utf-8, but turns out it was utf-16 all along. Thanks for sharing! – Alain Apr 25 '22 at 21:19

score 0 · Answer 21 · answered Oct 08 '19 at 04:32

0

Try this:

import pandas as pd
with open('filename.csv') as f:
    data = pd.read_csv(f)

Looks like it will take care of the encoding without explicitly expressing it through argument

answered Oct 08 '19 at 04:32

Ke Xu

1

score 0 · Answer 22 · answered Jul 07 '20 at 09:44

0

You can try with:

df = pd.read_csv('./file_name.csv', encoding='gbk')

answered Jul 07 '20 at 09:44

ah bon

7,903
7
43
86

Om Rastogi · Answer 23 · 2020-10-04T07:42:56.327

0

Sometimes the problem is with the .csv file only. The file may be corrupted. When faced with this issue. 'Save As' the file as csv again.

0. Open the xls/csv file
1. Go to -> files 
2. Click -> Save As 
3. Write the file name 
4. Choose 'file type' as -> CSV [very important]
5. Click -> Ok

edited Oct 04 '20 at 07:42

answered Oct 03 '20 at 17:17

Om Rastogi

123
1
7

Not sure which program do you use here. Excel? – Arpad Horvath -- Слава Україні Oct 03 '20 at 18:45
open excel file – Om Rastogi Oct 04 '20 at 07:43
This assumes that the CSV was generated from an Excel file, which obviously often isn't the case (and if it is, maybe you should examine your tooling choices critically). – tripleee Apr 22 '22 at 05:45
You are right, but irrespective of the tool used to create the CSV file MS excel has the support to open CSV files, hence you can use it. Also most often CSV files are generated by the excel tool only. If this doesn't work, you can try to use some other source tool. – Om Rastogi Apr 22 '22 at 10:15

score 0 · Answer 24 · answered May 26 '21 at 20:59

0

Pandas does not automatically replace the offending bytes by changing the encoding style. In my case, changing the encoding parameter from encoding = "utf-8" to encoding = "utf-16" resolved the issue.

answered May 26 '21 at 20:59

Reza Rahimi

489
6
6

score 0 · Answer 25 · answered Dec 28 '21 at 08:26

In my case, I could not manage to overcome this issue using any method provided before. Changing the encoder type to utf-8, utf-16, iso-8859-1, or any other type somehow did not work.

But instead of using pd.read_csv(filename, delimiter=';'), I used;

pd.read_csv(open(filename, 'r'), delimiter=';')

and things seem working just fine.

UnicodeDecodeError when reading CSV file in Pandas with Python

25 Answers25

Another important issue that I faced which resulted in the same error was:

Linked

Related