-1

I have imported an excel to a pandas dataframe, which I'm trying to translate and then export back to an excel.

For example purpose say this is my data set:

d = {"cool":"chill", "guy":"dude","cool guy":"bro"}```
data = [['cool guy'], ['cool'], ['guy']]
df = pd.DataFrame(data, columns = ['WORDS'])


print(df)
#    WORDS   
# 0  cool guy   
# 1  cool  
# 2  guy    

So the easiest solution would be to use pandas built in function replace. However if you use:

df['WORDS'] = df['WORDS'].replace(d, regex=True) 

The result is:

print(df)
#    WORDS   
# 0  chill dude   
# 1  chill  
# 2  dude 

(cool guy doesn't get translated correctly)

This could be solved by sorting the dictionary by the longest word first. I tried to use this function:

import re
def replace_words(col, dictionary):
    # sort keys by length, in reverse order
    for item in sorted(dictionary.keys(), key = len, reverse = True):
        col = re.sub(item, dictionary[item], col)
    return col

But..

df['WORDS'] = replace_words(df['WORDS'], d) 

Results in a type error: TypeError: expected string or bytes-like object

Trying to convert the row to a string did not help either

...*
col = re.sub(item, dictionary[item], [str(row) for row in col])

Does anyone have any solution or different approach I could try?

Wiktor Stribiżew
  • 561,645
  • 34
  • 376
  • 476

2 Answers2

1

Let us try replace

df.WORDS.replace(d)
Out[307]: 
0      bro
1    chill
2     dude
Name: WORDS, dtype: object
BENY
  • 296,997
  • 19
  • 147
  • 204
0
df['WORDS'] = df['WORDS'].apply(lambda x: d[x])

This will do the work.