How to replace several fixed sub-strings in a series of strings efficiently

Asked Jun 16 '21 at 11:25

Active Jun 16 '21 at 12:15

Viewed 29 times

Given a list of names, I want to squash names like "Dos Santos", "Van der Ploeg" or "Abdel-Hamid" to "DosSantos", "VanderPloeg" and "AbdelHamid", respectively.

However, plain removal of " " or "-" will not do, for I need to preserve double names such as "Baker-Wilson" or "Ann-Catherine".

A simple regex substitution would do if I had just single prefixes, but that does not work for double prefixes like "De la", "Van der" or "Bin Abdul" which contain spaces themselves.

Using Python 3: What would be a good (=efficient) way to do this transformation using standard libraries? For each name loop over all known prefixes and just substitute what I find, or is there a more clever way?

edited Jun 16 '21 at 12:15

asked Jun 16 '21 at 11:25

countermode

How to replace several fixed sub-strings in a series of strings efficiently

0 Answers0