Easiest way to replace a string using a dictionary of replacements?

Question

Consider..

dict = {
'Спорт':'Досуг',
'russianA':'englishA'
}

s = 'Спорт russianA'

I'd like to replace all dict keys with their respective dict values in s.

This might not be so straightforward. You should probably have an explicit tokenizer (for example `{'cat': 'russiancat'}` and "caterpillar"). Also overlapping words (`{'car':'russiancar', 'pet' : 'russianpet'}` and 'carpet'). — Joe, Mar 08 '10 at 10:15
Also see http://code.activestate.com/recipes/81330-single-pass-multiple-replace/ — ChristopheD, Mar 08 '10 at 13:12
As an aside: I think `dict` is best avoided as a variable name, because a variable of this name would shadow the built-in function of the same name. — jochen, Nov 15 '12 at 18:11

Max Shawabkeh · Accepted Answer · 2010-03-08T13:48:51.730

104

Using re:

import re

s = 'Спорт not russianA'
d = {
'Спорт':'Досуг',
'russianA':'englishA'
}

pattern = re.compile(r'\b(' + '|'.join(d.keys()) + r')\b')
result = pattern.sub(lambda x: d[x.group()], s)
# Output: 'Досуг not englishA'

This will match whole words only. If you don't need that, use the pattern:

pattern = re.compile('|'.join(d.keys()))

Note that in this case you should sort the words descending by length if some of your dictionary entries are substrings of others.

edited Mar 08 '10 at 13:48

answered Mar 08 '10 at 10:20

Max Shawabkeh

36,389
9
80
90

24

In case the dictionary keys contain characters like "^", "$" and "/", the keys need to be escaped before the regular expression is assembled. To do this, `.join(d.keys())` could be replaced by `.join(re.escape(key) for key in d.keys())`. – jochen Nov 15 '12 at 18:05
Please note that the first example(Досуг not englishA) only works in python3. In python2 it still return me "Спорт not englishA" – 林果皞 Dec 30 '14 at 10:56
It seems to fail when word in dict has dot - `https://regex101.com/r/bliVUS/1` - I need to remove `\b` at the end but not sure it's correct. – Peter.k Mar 14 '19 at 14:33

score 25 · Answer 2 · edited Jun 26 '13 at 12:56

25

You could use the reduce function:

reduce(lambda x, y: x.replace(y, dict[y]), dict, s)

edited Jun 26 '13 at 12:56

MvG

54,493
18
133
262

answered Mar 08 '10 at 10:19

codeape

94,365
23
147
176

17

Different to the solution by @Max Shawabkeh, using `reduce` applies the substitutions one after another. As a consequence, swapping words using dictionaries `{ 'red': 'green', 'green': 'red'}` does not work with the `reduce`-based approach, and overlapping matches are transformed in an unpredictable way. – jochen Nov 15 '12 at 18:10
2

A good example of why repeated `.replace()` calls may have unintended consequences: `html.replace('"', '"').replace('&', '&')`—try it on `html = '"foo"'`. – zigg Jun 26 '13 at 13:07
This is unnecessarily complex and unreadable compared to the unfolded loop as shown in answers by [ChristopheD](https://stackoverflow.com/a/2401481/216074), or [user2769207](https://stackoverflow.com/a/18748467/216074). – poke Aug 07 '17 at 11:50

score 20 · Answer 3 · answered Mar 08 '10 at 13:15

20

Solution found here (I like its simplicity):

def multipleReplace(text, wordDict):
    for key in wordDict:
        text = text.replace(key, wordDict[key])
    return text

answered Mar 08 '10 at 13:15

ChristopheD

106,997
27
158
177

11

Again, as @jochen described, this risks a bad translation if there is a key that is also a value. A single-pass replacement would be best. – Chris Feb 17 '13 at 16:03

score 5 · Answer 4 · answered Mar 08 '10 at 10:16

5

one way, without re

d = {
'Спорт':'Досуг',
'russianA':'englishA'
}

s = 'Спорт russianA'.split()
for n,i in enumerate(s):
    if i in d:
        s[n]=d[i]
print ' '.join(s)

answered Mar 08 '10 at 10:16

ghostdog74

307,646
55
250
337

3

This will fail if the dict has space in its keys – James Feb 02 '16 at 01:48

score 3 · Answer 5 · answered Mar 08 '10 at 10:18

Almost the same as ghostdog74, though independently created. One difference, using d.get() in stead of d[] can handle items not in the dict.

>>> d = {'a':'b', 'c':'d'}
>>> s = "a c x"
>>> foo = s.split()
>>> ret = []
>>> for item in foo:
...   ret.append(d.get(item,item)) # Try to get from dict, otherwise keep value
... 
>>> " ".join(ret)
'b d x'

score 1 · Answer 6 · answered Sep 11 '13 at 18:19

1

I used this in a similar situation (my string was all in uppercase):

def translate(string, wdict):
    for key in wdict:
        string = string.replace(key, wdict[key].lower())
    return string.upper()

hope that helps in some way... :)

answered Sep 11 '13 at 18:19

user2769207

11
1

2

It's very similar to ChristopheD's solution. Do you disagree with him? – hynekcer Sep 11 '13 at 21:59

score 1 · Answer 7 · answered Nov 20 '17 at 22:50

1

With the warning that it fails if key has space, this is a compressed solution similar to ghostdog74 and extaneons answers:

d = {
'Спорт':'Досуг',
'russianA':'englishA'
}

s = 'Спорт russianA'

' '.join(d.get(i,i) for i in s.split())

answered Nov 20 '17 at 22:50

Anton vBR

16,833
3
36
44

Easiest way to replace a string using a dictionary of replacements?

7 Answers7

Linked

Related