How to use regular expression syntax to remove "ellipsis" from text in a given column?

Question

I am using this code but it does not remove "ellipsis":

Column Review contains 1500 rows of text

Df["Reviews"] = Df['Reviews'].apply(lambda x : " ".join(re.findall('[\w\.]+',x)))

example text would be: "dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers"

please [edit] your question when you want to add additional information — grooveplex, Feb 07 '19 at 16:25
Try this - https://stackoverflow.com/questions/7208861/replace-characters-not-working-in-python or try using "\" before the ellipsis and set regex = true. — Siddharth Thanga Mariappan, Feb 07 '19 at 16:28

Always Sunny · Accepted Answer · 2019-02-07T16:52:18.527

2

You can try any of the below ways-

With REGEX

import pandas as pd
pd.set_option('max_colwidth', 400)
df = pd.DataFrame({'Reviews':['dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers']})
df['Reviews'] = df['Reviews'].replace('\.+','.',regex=True)
print(df)

With REGEX

import re
regex = r"[.]+"
test_str = "dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers" 
subst = "."
result = re.sub(regex, subst, test_str, 0, re.MULTILINE | re.IGNORECASE)
if result:
    print (result)

With REGEX

import re
regex = r"(\W)\1+"
test_str = "dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers"
subst = "\\1"
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)   
if result:
    print (result)

edited Feb 07 '19 at 16:52

answered Feb 07 '19 at 16:32

Always Sunny

32,751
7
52
86

interesting, let me try this – Saud Feb 07 '19 at 16:50
@Saud I've added a new answer with pandas, I urge you to try that – Always Sunny Feb 07 '19 at 16:52
please share the answer with pandas, that would be interesting – Saud Feb 07 '19 at 16:57
@Saud added answer with `pandas`, have a look again – Always Sunny Feb 07 '19 at 16:58
1

yes, pandas works, thanks – Saud Feb 07 '19 at 17:02
best of luck :) – Always Sunny Feb 07 '19 at 17:02

score 0 · Answer 2 · answered Feb 07 '19 at 16:26

0

Series.str.replace should work for simple expressions:

df.Reviews.str.replace("...", "")

answered Feb 07 '19 at 16:26

Kyle

2,424
1
14
26

shouldn't it be regex = True ? – Siddharth Thanga Mariappan Feb 07 '19 at 16:31
it is not really working for me – Saud Feb 07 '19 at 16:34
I think `'\...'` is the correct pattern, and @Sid29 `.str.replace` defaults to `regex=True` – ALollz Feb 07 '19 at 16:40
the replace function doesn't work at all – Saud Feb 07 '19 at 16:47

score 0 · Answer 3 · answered Feb 07 '19 at 16:29

0

If you want to remove this specific word from each row, then you don't need to use RegEx. You can use str.replace as indicated here: How to strip a specific word from a string?

Df["Reviews"] = Df['Reviews'].apply(lambda x:x.replace("ellipsis",""))

answered Feb 07 '19 at 16:29

Loïc L.

46
3

1

by ellipsis i mean "..." – Saud Feb 07 '19 at 16:49

How to use regular expression syntax to remove "ellipsis" from text in a given column?

3 Answers3