7

So basically I need to parse a name and find the following info:

  • First Name

  • First Initial (if employee has initials for a first name like D.J., use both initials)

  • Last Name (include if employee has a suffix such as Jr. or III.)


So here's the interface I'm working with:

Input:

names = ["D.J. Richies III", "John Doe", "A.J. Hardie Jr."]
for name in names:
   print parse_name(name)

Expected Output:

{'FirstName': 'D.J.', 'FirstInitial': 'D.J.', 'LastName': 'Richies III' }
{'FirstName': 'John', 'FirstInitial': 'J.', 'LastName': 'Doe' }
{'FirstName': 'A.J.', 'FirstInitial': 'A.J.', 'LastName': 'Hardie Jr.' }

Not really good at Regex, and actually that's probably overkill for this. I'm just guessing:

if name[1] == ".":  # we have a name like D.J.?
E_net4 - Krabbe mit Hüten
  • 24,143
  • 12
  • 85
  • 121
y2k
  • 64,108
  • 26
  • 59
  • 85
  • I18n: Do you consider systems, where the family name comes first and the given name is second? – Boldewyn Nov 12 '09 at 07:33
  • 1
    The underlying problem (regardless of implementation language) is not as obviously solveable as it may seem - see this duplicate: http://stackoverflow.com/questions/103422/simple-way-to-parse-a-persons-name-into-its-component-parts – Daniel Earwicker Nov 12 '09 at 07:34
  • Nah, I don't believe that is in the context of my requirements. That's an interesting point though, for a more complex name parser. – y2k Nov 12 '09 at 07:34
  • The most complex the names will be are shown in my A.J. Hardie Jr. and D.J. Richies III examples. – y2k Nov 12 '09 at 07:36

4 Answers4

7

I found this library quite useful for parsing names. https://code.google.com/p/python-nameparser/

It can also deal with names that are formatted Lastname, Firstname.

Hamish Currie
  • 461
  • 4
  • 11
4

There is no general solution and solution will depend on the constraints you put. For the specs you have given here is a simple solution which gives exactly what you want

def parse_name(name):
   fl = name.split()
   first_name = fl[0]
   last_name = ' '.join(fl[1:])
   if "." in first_name:
      first_initial = first_name
   else:
      first_initial = first_name[0]+"."

   return {'FirstName':first_name, 'FirstInitial':first_initial, 'LastName':last_name}

names = ["D.J. Richies III", "John Doe", "A.J. Hardie Jr."]
for name in names:
   print parse_name(name)

output:

{'LastName': 'Richies III', 'FirstInitial': 'D.J.', 'FirstName': 'D.J.'}
{'LastName': 'Doe', 'FirstInitial': 'J.', 'FirstName': 'John'}
{'LastName': 'Hardie Jr.', 'FirstInitial': 'A.J.', 'FirstName': 'A.J.'}
Anurag Uniyal
  • 81,711
  • 39
  • 167
  • 215
3

Well, for your simple example names, you can do something like this.

# This separates the first and last names
name = name.partition(" ")
firstName = name[0]
# now figure out the first initial
# we're assuming that if it has a dot it's an initialized name,
# but this may not hold in general
if "." in firstName:
    firstInitial = firstName
else:
    firstInitial = firstName[0] + "."
lastName = name[2]
return {"FirstName":firstName, "FirstInitial":firstInitial, "LastName": lastName}

I haven't tested it, but a function like that should do the job on the input example you provided.

Daniel G
  • 62,706
  • 7
  • 41
  • 41
3

This is basically the same solution as the one Anurag Uniyal provided, only a little more compact:

import re

def parse_name(name):
    first_name, last_name = name.split(' ', 1)
    first_initial = re.search("^[A-Z.]+", first_name).group()
    if not first_initial.endswith("."):
        first_initial += "."
    return {"FirstName": first_name,
            "FirstInitial": first_initial,
            "LastName": last_name}
Pär Wieslander
  • 27,610
  • 7
  • 49
  • 53
  • Interesting use of regex. This will probably handle and adapt to more cases tan Anurag's would. Thanks for the solution. – y2k Nov 12 '09 at 07:55