This does the job.
import pandas as pd
import re
data = [
[10, 1, '99-223344-1', 'GA', 'Abc'],
[15, 12, "No", 'MA', 'Xyz']
]
df = pd.DataFrame(data, columns=['Age Rank PhoneNumber State City'.split()])
print(df)
def valphone(p):
p = p['PhoneNumber']
if re.match(r'[123456789-]+$', p):
return p
else:
return ""
print(df['PhoneNumber'])
df['PhoneNumber'] = df['PhoneNumber'].apply(valphone, axis=1)
print(df)
Output:
Age Rank PhoneNumber State City
0 10 1 99-223344-1 GA Abc
1 15 12 No MA Xyz
Age Rank PhoneNumber State City
0 10 1 99-223344-1 GA Abc
1 15 12 MA Xyz
I do have to admit to a bit of frustration with this. I EXPECTED to be able to do
df['PhoneNumber'] = df['PhoneNumber'].apply(valphone)
because df['PhoneNumber'] should return a Series, and the Series.apply function should pass me one value at a time. However, that's not what happens here, and I don't know why. df['PhoneNumber'] returns a DataFrame instead of a Series, so I have to use the column reference inside the function.
Thus, YOU may need to do some experimentation. If df['PhoneNumber'] returns a Series for you, then you don't need the axis=1, and you don't need the p = p['PhoneNumber'] line in the function.
Followup
OK, assuming the presence of a "phone number validation" module, as is mentioned in the comments, this becomes:
import phonenumbers
...
def valphone(p):
p = p['PhoneNumber'] # May not be required
n = phonenumbmers.parse(p)
if phonenumbers.is_possible_number(n):
return p
else:
return ''
...