2

I am using this code but it does not remove "ellipsis":

Column Review contains 1500 rows of text

Df["Reviews"] = Df['Reviews'].apply(lambda x : " ".join(re.findall('[\w\.]+',x)))

example text would be: "dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers"

Saud
  • 33
  • 11

3 Answers3

2

You can try any of the below ways-

With REGEX

import pandas as pd
pd.set_option('max_colwidth', 400)
df = pd.DataFrame({'Reviews':['dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers']})
df['Reviews'] = df['Reviews'].replace('\.+','.',regex=True)
print(df)

With REGEX

import re
regex = r"[.]+"
test_str = "dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers" 
subst = "."
result = re.sub(regex, subst, test_str, 0, re.MULTILINE | re.IGNORECASE)
if result:
    print (result)

With REGEX

import re
regex = r"(\W)\1+"
test_str = "dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers"
subst = "\\1"
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)   
if result:
    print (result)
Always Sunny
  • 32,751
  • 7
  • 52
  • 86
0

Series.str.replace should work for simple expressions:

df.Reviews.str.replace("...", "")
Kyle
  • 2,424
  • 1
  • 14
  • 26
0

If you want to remove this specific word from each row, then you don't need to use RegEx. You can use str.replace as indicated here: How to strip a specific word from a string?

Df["Reviews"] = Df['Reviews'].apply(lambda x:x.replace("ellipsis",""))
Loïc L.
  • 46
  • 3