75

I have to replace the north, south, etc with N S in address fields.

If I have

list = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
address = "123 north anywhere street"

Can I for iterate over my dictionary values to replace my address field?

for dir in list[]:
   address.upper().replace(key,value)

I know i'm not even close!! But any input would be appreciated if you can use dictionary values like this.

georg
  • 204,715
  • 48
  • 286
  • 369
user1947457
  • 807
  • 1
  • 7
  • 5
  • 3
    this is pretty tricky if matches can overlap. See [this question](http://stackoverflow.com/questions/10931150/phps-strtr-for-python) – georg Jan 04 '13 at 11:45
  • A BIG part of the problem is that the string `replace()` method _returns_ a copy of string with occurrences replaced -- it doesn't do it in-place. – martineau Jan 04 '13 at 13:26
  • 1
    You can simply use [str.translate](https://docs.python.org/3/library/stdtypes.html#str.translate). – Neel Patel Sep 29 '19 at 11:54
  • 2
    See https://stackoverflow.com/questions/2400504/easiest-way-to-replace-a-string-using-a-dictionary-of-replacements for the best solution – Ethan Bradford Feb 28 '20 at 01:19

12 Answers12

59
address = "123 north anywhere street"

for word, initial in {"NORTH":"N", "SOUTH":"S" }.items():
    address = address.replace(word.lower(), initial)
print address

nice and concise and readable too.

Django Doctor
  • 8,450
  • 9
  • 45
  • 66
  • This seems to be the standard approach. I was curious how the XML parsers do it, and the same approach is seen in: `import xml.sax.saxutils as su; print(inspect.getsource(su.escape))` which leads us to `print(inspect.getsource(su.__dict_replace))` – C8H10N4O2 Mar 06 '18 at 18:03
22

you are close, actually:

dictionary = {"NORTH":"N", "SOUTH":"S" } 
for key in dictionary.iterkeys():
    address = address.upper().replace(key, dictionary[key])

Note: for Python 3 users, you should use .keys() instead of .iterkeys():

dictionary = {"NORTH":"N", "SOUTH":"S" } 
for key in dictionary.keys():
    address = address.upper().replace(key, dictionary[key])
CharlesG
  • 304
  • 2
  • 11
Samuele Mattiuzzo
  • 10,392
  • 5
  • 37
  • 62
  • 1
    very simple and efective to replace against a dictionary. for me just it was enough: – Alexandre Andrade May 08 '18 at 03:17
  • Concise and simple to understand. Exactly enough for me. – msarafzadeh Jun 13 '19 at 11:17
  • 12
    How is this correct? `address.upper().replace(...)` doesn't modify anything in place, it just returns a value, and it's not being assigned to anything. – Enrico Borba Nov 03 '19 at 02:05
  • 2
    If you want you can iterate through the dictionary's key and values at the same time using `for key, value in dictionary.items()`. I don't know whether it has advantages in terms of performance, but I think it is more pythonic – gionni Nov 25 '20 at 11:29
  • The downside of the for loop is that it creates replacement ordering problems, e.g. when you have the string `Do you like café? No, I prefer tea.` and you do .replace("café", "tea") and .replace("tea", "café"), you will get `Do you like café? No, I prefer café.`. If the replacement is done in just one pass, "café" changes to "tea", but it does not change back to "café". See, for example, this question: https://stackoverflow.com/a/15221068/13968392 – mouwsy Nov 06 '21 at 21:34
19

One option I don't think anyone has yet suggested is to build a regular expression containing all of the keys and then simply do one replace on the string:

>>> import re
>>> l = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
>>> pattern = '|'.join(sorted(re.escape(k) for k in l))
>>> address = "123 north anywhere street"
>>> re.sub(pattern, lambda m: l.get(m.group(0).upper()), address, flags=re.IGNORECASE)
'123 N anywhere street'
>>> 

This has the advantage that the regular expression can ignore the case of the input string without modifying it.

If you want to operate only on complete words then you can do that too with a simple modification of the pattern:

>>> pattern = r'\b({})\b'.format('|'.join(sorted(re.escape(k) for k in l)))
>>> address2 = "123 north anywhere southstreet"
>>> re.sub(pattern, lambda m: l.get(m.group(0).upper()), address2, flags=re.IGNORECASE)
'123 N anywhere southstreet'
Duncan
  • 86,487
  • 10
  • 115
  • 155
  • I am quite new to the regular expression and was hoping if you can explain what exactly is happening with lambda and group function. I noticed you also did sorted function. I have multiple keys for which the words are to be replaced by their value, in that case will the sorted function affect anything? Is it really necessary so for example there could be some words in the text file which are present on different intervals/lines – trillion Sep 04 '20 at 10:08
10

You are probably looking for iteritems():

d = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
address = "123 north anywhere street"

for k,v in d.iteritems():
    address = address.upper().replace(k, v)

address is now '123 N ANYWHERE STREET'


Well, if you want to preserve case, whitespace and nested words (e.g. Southstreet should not converted to Sstreet), consider using this simple list comprehension:

import re

l = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}

address = "North 123 East Anywhere Southstreet    West"

new_address = ''.join(l[p.upper()] if p.upper() in l else p for p in re.split(r'(\W+)', address))

new_address is now

N 123 E Anywhere Southstreet    W
sloth
  • 95,484
  • 19
  • 164
  • 210
  • But this would end up changing the entire case of the address – Abhijit Jan 04 '13 at 11:46
  • Depends on whether the question is *iterate over an dictionary* or *do all the work for me*. – sloth Jan 04 '13 at 11:53
  • @Abhijit Nonetheless, I added and example of how to preserve case, whitespace and nested matches. – sloth Jan 04 '13 at 12:31
  • @Dominic - great suggestion about unintentionally skewing addresses such as Southstreet Rd. In rethinking this, is there a way to ignore the replace if I have an address such as South St.? Is there a RE that would ignore the replace in this case? – user1947457 Jan 07 '13 at 02:02
10

"Translating" a string with a dictionary is a very common requirement. I propose a function that you might want to keep in your toolkit:

def translate(text, conversion_dict, before=None):
    """
    Translate words from a text using a conversion dictionary

    Arguments:
        text: the text to be translated
        conversion_dict: the conversion dictionary
        before: a function to transform the input
        (by default it will to a lowercase)
    """
    # if empty:
    if not text: return text
    # preliminary transformation:
    before = before or str.lower
    t = before(text)
    for key, value in conversion_dict.items():
        t = t.replace(key, value)
    return t

Then you can write:

>>> a = {'hello':'bonjour', 'world':'tout-le-monde'}
>>> translate('hello world', a)
'bonjour tout-le-monde'
fralau
  • 2,721
  • 1
  • 25
  • 38
5

I would suggest to use a regular expression instead of a simple replace. With a replace you have the risk that subparts of words are replaced which is maybe not what you want.

import json
import re

with open('filePath.txt') as f:
   data = f.read()

with open('filePath.json') as f:
   glossar = json.load(f)

for word, initial in glossar.items():
   data = re.sub(r'\b' + word + r'\b', initial, data)

print(data)
Trafalgar
  • 341
  • 1
  • 3
  • 14
4
def replace_values_in_string(text, args_dict):
    for key in args_dict.keys():
        text = text.replace(key, str(args_dict[key]))
    return text
Artem Malikov
  • 195
  • 1
  • 1
  • 9
3

Try,

import re
l = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}

address = "123 north anywhere street"

for k, v in l.iteritems():
    t = re.compile(re.escape(k), re.IGNORECASE)
    address = t.sub(v, address)
print(address)
Adem Öztaş
  • 18,635
  • 4
  • 32
  • 41
2

All of these answers are good, but you are missing python string substitution - it's simple and quick, but requires your string to be formatted correctly.

address = "123 %(direction)s anywhere street"
print(address % {"direction": "N"})
cacti5
  • 1,780
  • 2
  • 24
  • 31
2

Both using replace() and format() are not so precise:

data =  '{content} {address}'
for k,v in {"{content}":"some {address}", "{address}":"New York" }.items():
    data = data.replace(k,v)
# results: some New York New York

'{ {content} {address}'.format(**{'content':'str1', 'address':'str2'})
# results: ValueError: unexpected '{' in field name

It is better to translate with re.sub() if you need precise place:

import re
def translate(text, kw, ignore_case=False):
    search_keys = map(lambda x:re.escape(x), kw.keys())
    if ignore_case:
        kw = {k.lower():kw[k] for k in kw}
        regex = re.compile('|'.join(search_keys), re.IGNORECASE)
        res = regex.sub( lambda m:kw[m.group().lower()], text)
    else:
        regex = re.compile('|'.join(search_keys))
        res = regex.sub( lambda m:kw[m.group()], text)

    return res

#'score: 99.5% name:%(name)s' %{'name':'foo'}
res = translate( 'score: 99.5% name:{name}', {'{name}':'foo'})
print(res)

res = translate( 'score: 99.5% name:{NAME}', {'{name}':'foo'}, ignore_case=True)
print(res)
ahuigo
  • 2,343
  • 2
  • 22
  • 39
2

If you're looking for a concise way, you can go for reduce from functools:

from functools import reduce

str_to_replace = "The string for replacement."
replacement_dict = {"The ": "A new ", "for ": "after "}

str_replaced = reduce(lambda x, y: x.replace(*y), [str_to_replace, *list(replacement_dict.items())])
print(str_replaced)
m7s
  • 53
  • 6
0

The advantage of Duncan's approach is that it is careful not to overwrite previous answers. For example if you have {"Shirt": "Tank Top", "Top": "Sweater"}, the other approaches replace "Shirt" with "Tank Sweater".

The following code extends that approach, but sorts the keys such that the longest one is always found first and it uses named groups to search case insensitively.

import re
root_synonyms = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
# put the longest search term first. This menas the system does not replace "top" before "tank top"
synonym_keys = sorted(root_synonyms.keys(),key=len,reverse=True)
# the groups will be named w1, w2, ... . Determine what each of them should become
number_mapping = {f'w{i}':root_synonyms[key] for i,key in enumerate(synonym_keys) }
# make a regex for each word where "tank top" or "tank  top" are the same
search_terms = [re.sub(r'\s+',r'\s+',re.escape(k)) for k in synonym_keys]
# give each search term a name w1 etc where
search_terms = [f'(?P<w{i}>\\b{key}\\b)' for i,key in enumerate(search_terms)]
# make one huge regex
search_terms = '|'.join(search_terms)
# compile it for speed
search_re = re.compile(search_terms,re.IGNORECASE)

query = "123 north anywhere street"
result = re.sub(search_re,lambda x: number_mapping[x.lastgroup],query)
print(result)
Jelmer
  • 192
  • 1
  • 8