0

I have a datframe

>temp
    Age Rank PhoneNumber State City
    10   1    99-22344-1 Ga    abc
    15   12    No        Ma    xyz

For the column(Phone Number), I want to strip all characters like - unless they are full phone numbers and if it says No or any word apart from a numeric, I want it to be a blank. How can I do this

My attempt is able to handle special chars but not words symbols like 'No'

temp['PhoneNumber '] = temp['PhoneNumber '].str.replace('[^\d]+', '')

Desired Output df -

>temp
    Age Rank PhoneNumber State City
    10   1    99223441    Ga    abc
    15   12               Ma    xyz
Joe Tha
  • 163
  • 9

2 Answers2

2

This does the job.

import pandas as pd
import re

data = [
    [10, 1, '99-223344-1', 'GA', 'Abc'],
    [15, 12, "No", 'MA', 'Xyz']
]

df = pd.DataFrame(data, columns=['Age Rank PhoneNumber State City'.split()])
print(df)

def valphone(p):
    p = p['PhoneNumber']
    if re.match(r'[123456789-]+$', p):
        return p
    else:
        return ""

print(df['PhoneNumber'])
df['PhoneNumber'] = df['PhoneNumber'].apply(valphone, axis=1)
print(df)

Output:

  Age Rank  PhoneNumber State City
0  10    1  99-223344-1    GA  Abc
1  15   12           No    MA  Xyz
  Age Rank  PhoneNumber State City
0  10    1  99-223344-1    GA  Abc
1  15   12                 MA  Xyz

I do have to admit to a bit of frustration with this. I EXPECTED to be able to do

df['PhoneNumber'] = df['PhoneNumber'].apply(valphone)

because df['PhoneNumber'] should return a Series, and the Series.apply function should pass me one value at a time. However, that's not what happens here, and I don't know why. df['PhoneNumber'] returns a DataFrame instead of a Series, so I have to use the column reference inside the function.

Thus, YOU may need to do some experimentation. If df['PhoneNumber'] returns a Series for you, then you don't need the axis=1, and you don't need the p = p['PhoneNumber'] line in the function.

Followup

OK, assuming the presence of a "phone number validation" module, as is mentioned in the comments, this becomes:

import phonenumbers
...
def valphone(p):
    p = p['PhoneNumber'] # May not be required
    n = phonenumbmers.parse(p)
    if phonenumbers.is_possible_number(n):
        return p
    else:
        return ''
...
Tim Roberts
  • 34,376
  • 3
  • 17
  • 24
  • tim, thanks for your answer. just curious, before i test. why does phonenumber row 1 still have hyphens in your output df? – Joe Tha May 05 '22 at 20:21
  • I left all the numbers alone. How will you decide whether a number is a "full phone number"? That's not an easy thing for a computer to decide. – Tim Roberts May 05 '22 at 20:27
  • import phonenumbers my_string_number = "+40021234567" my_number = phonenumbers.parse(my_string_number) print(phonenumbers.is_possible_number(my_number)) True – Joe Tha May 05 '22 at 20:36
  • If you already had a function to validate phone numbers, why didn't you say so, and why didn't you use it in your example instead of a regular expression? I'll modify the answer. – Tim Roberts May 05 '22 at 20:44
  • Have to admit. I did not already have that function. But you stated a point that really peaked my interest. "Thats not an easy thing for a comp to decide". I immediately started looking for a py library and thought id post it in the comments. Thank you Tim – Joe Tha May 05 '22 at 20:49
  • hey @tim ive asked a question on shutil here https://stackoverflow.com/questions/72149034/how-do-i-shutil-move-my-file-post-processing-in-python-based-on-filename no one has given a canonical answer yet. please do chime in if you have had any luck with this in the past – Joe Tha May 07 '22 at 05:52
1
temp['PhoneNumber'] = temp['PhoneNumber'].apply(str).str.findall(r'\d').str.join('')
Ze'ev Ben-Tsvi
  • 674
  • 2
  • 5
  • @zeev. thnks for your answer. can you please explain your function findall(r'\d') – Joe Tha May 05 '22 at 21:01
  • 1
    the findall method returns all matches (digits) to list, and join it to one string – Ze'ev Ben-Tsvi May 05 '22 at 21:03
  • Excellent. this is short and sweet and works too. Not sure what the protocol is cant mark both answers correct. but for anyone who comes to this question please note. this answer is accepted too – Joe Tha May 05 '22 at 21:19
  • There is an issue here when i cleared my console and reran it though. AttributeError: Can only use .str accessor with string values! – Joe Tha May 05 '22 at 21:50
  • 1
    @JoeTha I'm sorry for the late response, I've updated my answer to handle non strings values as well as strings – Ze'ev Ben-Tsvi May 06 '22 at 09:11
  • @zeev thank you. this was able to fix the error – Joe Tha May 06 '22 at 16:47