0

I am new to Python Pandas and working on a small application where in i want to read my excel file having data in Hindi Language.

Issue I am facing is , pandas is not able to read hindi words and is placing some arbitary '?' symbol.

I have tried adding encoding to utf-8 but that is also not working.

My Excel Data :

enter image description here

Python Code :

df = pd.read_csv("Vegaretable_List.csv", encoding='utf-8')

Output :

['?? ' '??? ' '???? ' '????? ' '????']

Any help will be appreciable. Thanks in advance.

Avinash
  • 293
  • 1
  • 5
  • 12
  • 1
    You need to find out the encoding of your input file. It may be something else. You can also use this tool: https://r12a.github.io/app-conversion/ – Amin Baqershahi Jan 06 '21 at 06:27
  • you require language converter like codec. https://docs.python.org/3/library/codecs.html refer this link – harshil suthar Jan 06 '21 at 06:38
  • 1
    Try opening the file and save as `CSV UTF-8 (Comma delimited) (*.csv)` – Trenton McKinney Jan 06 '21 at 06:42
  • FYI: Thoroughly answering questions is time-consuming. If your question is **solved**, say thank you by _**accepting** the solution that is **best for your needs**._ The **✔** is below the up/down arrow, at the top left of the answer. A new solution can be accepted if a better one shows up. You may also vote on the quality/helpfulness of an answer, with the up or down arrow, if you have a 15+ reputation. **Leave a comment if a solution doesn't answer the question.** [What should I do when someone answers my question?](https://stackoverflow.com/help/someone-answers). Thank you. – Trenton McKinney Jan 18 '21 at 07:36

3 Answers3

2

The problem shouldn't occur if the file is read in using the same encoding it was created with.

If you get "???", it means the csv or excel file was saved with a different encoding.

Here is a table of the standard encodings.

Also, you could open your file in an appropriate program, and save it with UTF-8, in order to read with your code.

Also See:

Trenton McKinney
  • 43,885
  • 25
  • 111
  • 113
Alfredo Maussa
  • 370
  • 1
  • 8
0

Do not create csv file, instead use excel file in .xlsx format. Python will read the hindi text. I did this and it worked.

dataset = pd.read_excel("Data.xlsx") 

Here the Data.xlsx contains all the hindi text that you gave.

Best of luck

Flair
  • 2,123
  • 1
  • 24
  • 39
-1

Assuming that your Excel/CSV file has a content similar to this:

मिशल
बहादुर
मेरी
जेन
जॉन
स्मिथ

The encoding type is correct. It's just that you have to iterate through the data to get it back.

For .CSV

import csv

with open('customers.csv', 'r', encoding='utf-8') as file:
    data = csv.reader(file)
    for row in data:
        print(row)

For .XLSX

with open('customers.xlsx', 'r', encoding='utf-8') as file:
    data = file.readlines()
    for row in data:
        print(row.strip())
Klein -_0
  • 156
  • 12