0

I have two regex expression each works separtley fine.

The first one removes any non-English chars from the sentence

def remove_non_English_chars(line):
    return re.sub(r'[^\x00-\x7F]+',' ', line)

The second removes any non-Arabic chars from the sentence

def remove_non_Arabic_chars(line):
    return re.sub(r'[^،-٩0-9]+',' ', line)

I want to combine these two regex to remove any character that is neither English nor Arabic

My attempt was the following:

def keep_En_Ar(line):
    return re.sub(r'^([^\x00-\x7F]+|[^،-٩0-9]+)',' ',line)

However, I am not getting the non-English and non-Arabic removed So when I try the below:

 print(remove_non_English_chars('here is a trial 123 ☺  ♈ محاولة ü '))

The result is: the ü and the ☺ are still there

M.A.G
  • 503
  • 1
  • 5
  • 18

0 Answers0