10

I need to import a CSV file in Python on Windows. My file is delimited by ';' and has strings with non-English symbols and commas (',').

I've read posts:

Importing a CSV file into a sqlite3 database table using Python

Python import csv to list

When I run:

with open('d:/trade/test.csv', 'r') as f1:
    reader1 = csv.reader(f1)
    your_list1 = list(reader1)

I get an issue: comma is changed to '-' symbol.

When I try:

df = pandas.read_csv(csvfile)

I got errors:

pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 2.

Please help. I would prefer to use pandas as the code is shorter without listing all field names from the CSV file.

I understand there could be the work around of temporarily replacing commas. Still, I would like to solve it by some parameters to pandas.

Community
  • 1
  • 1
Alexei Martianov
  • 2,839
  • 7
  • 28
  • 62

5 Answers5

15

Pandas solution - use read_csv with regex separator [;,]. You need add engine='python', because warning:

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

import pandas as pd
import io

temp=u"""a;b;c
1;1,8
1;2,1
1;3,6
1;4,3
1;5,7
"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="[;,]", engine='python')
print (df)

   a  b  c
0  1  1  8
1  1  2  1
2  1  3  6
3  1  4  3
4  1  5  7
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
2

Pandas documentation says for parameters:

pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

sep : str, default ‘,’

    Delimiter to use. If sep is None, will try to automatically determine this.

Pandas did not parse my file delimited by ; because default is not None denoted for automatic but ,. Adding sep parameter set to ; for pandas fixed the issue.

Alexei Martianov
  • 2,839
  • 7
  • 28
  • 62
1

Unless your CSV file is broken, you can try to make csv guess your format.

import csv

with open('d:/trade/test.csv', 'r') as f1:
    dialect = csv.Sniffer().sniff(f1.read(1024))
    f1.seek(0)
    r = csv.reader(f1, dialect=dialect)
    for row in r:
        print(row)
totoro
  • 2,454
  • 2
  • 17
  • 23
0

Try to specify the encoding, you will need to find out what is the encoding of file one is trying to read.

I have used ASCII for this example, but it could be different.

df = pd.read_csv(fname, encoding='ascii')
Stephen Rauch
  • 44,696
  • 30
  • 102
  • 125
0

To avoid below warning in your code,

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'

Use property names inside of your read_csv function. Check the example for both cases where this warning comes and where it won't.

CODE THAT THROW WARNING:

selEncoding = "ISO-8859–1"

dfCovid19DS = pd.read_csv(dsSrcPath, selEncoding)

CODE WITHOUT WARNING:

selEncoding = "ISO-8859–1"

dfCovid19DS = pd.read_csv(dsSrcPath, encoding = selEncoding)
Jeremy Caney
  • 6,191
  • 35
  • 44
  • 70
Ashish
  • 1
  • Credit where due: This is essentially the same answer as @Santosh-Pathak gave two years ago ([reference](https://stackoverflow.com/a/53459498/3025856)). – Jeremy Caney Jun 15 '20 at 22:24