I have two regex expression each works separtley fine.
The first one removes any non-English chars from the sentence
def remove_non_English_chars(line):
return re.sub(r'[^\x00-\x7F]+',' ', line)
The second removes any non-Arabic chars from the sentence
def remove_non_Arabic_chars(line):
return re.sub(r'[^،-٩0-9]+',' ', line)
I want to combine these two regex to remove any character that is neither English nor Arabic
My attempt was the following:
def keep_En_Ar(line):
return re.sub(r'^([^\x00-\x7F]+|[^،-٩0-9]+)',' ',line)
However, I am not getting the non-English and non-Arabic removed So when I try the below:
print(remove_non_English_chars('here is a trial 123 ☺ ♈ محاولة ü '))
The result is: the ü and the ☺ are still there