80

Consider..

dict = {
'Спорт':'Досуг',
'russianA':'englishA'
}

s = 'Спорт russianA'

I'd like to replace all dict keys with their respective dict values in s.

SilentGhost
  • 287,765
  • 61
  • 300
  • 288
meder omuraliev
  • 177,923
  • 69
  • 381
  • 426
  • 1
    This might not be so straightforward. You should probably have an explicit tokenizer (for example `{'cat': 'russiancat'}` and "caterpillar"). Also overlapping words (`{'car':'russiancar', 'pet' : 'russianpet'}` and 'carpet'). – Joe Mar 08 '10 at 10:15
  • 2
    Also see http://code.activestate.com/recipes/81330-single-pass-multiple-replace/ – ChristopheD Mar 08 '10 at 13:12
  • 3
    As an aside: I think `dict` is best avoided as a variable name, because a variable of this name would shadow the built-in function of the same name. – jochen Nov 15 '12 at 18:11

7 Answers7

104

Using re:

import re

s = 'Спорт not russianA'
d = {
'Спорт':'Досуг',
'russianA':'englishA'
}

pattern = re.compile(r'\b(' + '|'.join(d.keys()) + r')\b')
result = pattern.sub(lambda x: d[x.group()], s)
# Output: 'Досуг not englishA'

This will match whole words only. If you don't need that, use the pattern:

pattern = re.compile('|'.join(d.keys()))

Note that in this case you should sort the words descending by length if some of your dictionary entries are substrings of others.

Max Shawabkeh
  • 36,389
  • 9
  • 80
  • 90
  • 24
    In case the dictionary keys contain characters like "^", "$" and "/", the keys need to be escaped before the regular expression is assembled. To do this, `.join(d.keys())` could be replaced by `.join(re.escape(key) for key in d.keys())`. – jochen Nov 15 '12 at 18:05
  • Please note that the first example(Досуг not englishA) only works in python3. In python2 it still return me "Спорт not englishA" – 林果皞 Dec 30 '14 at 10:56
  • It seems to fail when word in dict has dot - `https://regex101.com/r/bliVUS/1` - I need to remove `\b` at the end but not sure it's correct. – Peter.k Mar 14 '19 at 14:33
25

You could use the reduce function:

reduce(lambda x, y: x.replace(y, dict[y]), dict, s)
MvG
  • 54,493
  • 18
  • 133
  • 262
codeape
  • 94,365
  • 23
  • 147
  • 176
  • 17
    Different to the solution by @Max Shawabkeh, using `reduce` applies the substitutions one after another. As a consequence, swapping words using dictionaries `{ 'red': 'green', 'green': 'red'}` does not work with the `reduce`-based approach, and overlapping matches are transformed in an unpredictable way. – jochen Nov 15 '12 at 18:10
  • 2
    A good example of why repeated `.replace()` calls may have unintended consequences: `html.replace('"', '"').replace('&', '&')`—try it on `html = '"foo"'`. – zigg Jun 26 '13 at 13:07
  • This is unnecessarily complex and unreadable compared to the unfolded loop as shown in answers by [ChristopheD](https://stackoverflow.com/a/2401481/216074), or [user2769207](https://stackoverflow.com/a/18748467/216074). – poke Aug 07 '17 at 11:50
20

Solution found here (I like its simplicity):

def multipleReplace(text, wordDict):
    for key in wordDict:
        text = text.replace(key, wordDict[key])
    return text
ChristopheD
  • 106,997
  • 27
  • 158
  • 177
  • 11
    Again, as @jochen described, this risks a bad translation if there is a key that is also a value. A single-pass replacement would be best. – Chris Feb 17 '13 at 16:03
5

one way, without re

d = {
'Спорт':'Досуг',
'russianA':'englishA'
}

s = 'Спорт russianA'.split()
for n,i in enumerate(s):
    if i in d:
        s[n]=d[i]
print ' '.join(s)
ghostdog74
  • 307,646
  • 55
  • 250
  • 337
3

Almost the same as ghostdog74, though independently created. One difference, using d.get() in stead of d[] can handle items not in the dict.

>>> d = {'a':'b', 'c':'d'}
>>> s = "a c x"
>>> foo = s.split()
>>> ret = []
>>> for item in foo:
...   ret.append(d.get(item,item)) # Try to get from dict, otherwise keep value
... 
>>> " ".join(ret)
'b d x'
extraneon
  • 22,900
  • 2
  • 45
  • 50
1

I used this in a similar situation (my string was all in uppercase):

def translate(string, wdict):
    for key in wdict:
        string = string.replace(key, wdict[key].lower())
    return string.upper()

hope that helps in some way... :)

1

With the warning that it fails if key has space, this is a compressed solution similar to ghostdog74 and extaneons answers:

d = {
'Спорт':'Досуг',
'russianA':'englishA'
}

s = 'Спорт russianA'

' '.join(d.get(i,i) for i in s.split())
Anton vBR
  • 16,833
  • 3
  • 36
  • 44