Extract subset of key-value pairs from dictionary?

Question

I have a big dictionary object that has several key value pairs (about 16), but I am only interested in 3 of them. What is the best way (shortest/efficient/most elegant) to achieve that?

The best I know is:

bigdict = {'a':1,'b':2,....,'z':26} 
subdict = {'l':bigdict['l'], 'm':bigdict['m'], 'n':bigdict['n']}

I am sure there is a more elegant way than this.

score 526 · Accepted Answer · edited Jun 27 '19 at 07:07

526

You could try:

dict((k, bigdict[k]) for k in ('l', 'm', 'n'))

... or in ~~Python 3~~ Python versions 2.7 or later (thanks to Fábio Diniz for pointing that out that it works in 2.7 too):

{k: bigdict[k] for k in ('l', 'm', 'n')}

Update: As Håvard S points out, I'm assuming that you know the keys are going to be in the dictionary - see his answer if you aren't able to make that assumption. Alternatively, as timbo points out in the comments, if you want a key that's missing in bigdict to map to None, you can do:

{k: bigdict.get(k, None) for k in ('l', 'm', 'n')}

If you're using Python 3, and you only want keys in the new dict that actually exist in the original one, you can use the fact to view objects implement some set operations:

{k: bigdict[k] for k in bigdict.keys() & {'l', 'm', 'n'}}

edited Jun 27 '19 at 07:07

poorva

1,626
1
17
15

answered Mar 18 '11 at 13:28

Mark Longair

415,589
70
403
320

6

Will fail if `bigdict` does not contain `k` – Håvard S Mar 18 '11 at 13:29
@Håvard S: I think from the OPs post, we can assume that all given given elements are in `bigdict`. – phimuemue Mar 18 '11 at 13:32
"or in Python 3" or "or in Python >= 2.7"? – Fábio Diniz Mar 18 '11 at 13:36
12

`{k: bigdict.get(k,None) for k in ('l', 'm', 'n')}` will deal with the situation where a specified key is missing in the source dictionary by setting key in the new dict to None – timbo Dec 21 '13 at 22:44
Thanks, @timbo - I've added that to the answer too, hope that's OK – Mark Longair Dec 22 '13 at 09:37
9

@MarkLongair Depending on the use case {k: bigdict[k] for k in ('l','m','n') if k in bigdict} might be better, as it only stores the keys that actually have values. – Brian Wylie Mar 07 '14 at 22:20
Upvoting this _and_ the linked answer by @HåvardS, which is exactly what I was looking for. I love it when devs cite code properly. – Michael Scheper Mar 22 '16 at 17:08
I hope my edit is not too presumtuous. @BrifordWylie's version might be better if you want to avoid an essentially undocumented feature. – Apr 02 '16 at 16:12
@hop Thanks for the addition - I made a small change just to make it clear that it only works on Python 3. – Mark Longair Apr 03 '16 at 08:03
How can I check if `['l','m','n']` is a substring of `k`? – Arjun Jun 30 '16 at 16:53
6

`bigdict.keys() & {'l', 'm', 'n'}` ==> `bigdict.viewkeys() & {'l', 'm', 'n'}` for Python2.7 – kxr Aug 25 '16 at 15:58
`{ x : bigdict[x] for x in (1, 2, 3) if x in bigdict.keys() }` to avoid `KeyError` and the `None` values. – varun Mar 29 '18 at 09:00
1

The last solution is nice because you can just replace the '&' with a `-` to get an "all keys except" operation. Unfortunately that results in a dictionary with differently ordered keys (even in python 3.7 and 3.8) – naught101 Jun 19 '20 at 01:51
1

What if my `dict` is too big? – Adamantish Mar 17 '21 at 22:12
dict.get(k), will return `None` by default if k is not found, no need to explicitly set that default as a param – Clint Eastwood May 14 '21 at 15:06

score 137 · Answer 2 · edited Apr 02 '16 at 16:07

137

A bit shorter, at least:

wanted_keys = ['l', 'm', 'n'] # The keys you want
dict((k, bigdict[k]) for k in wanted_keys if k in bigdict)

edited Apr 02 '16 at 16:07

answered Mar 18 '11 at 13:28

Håvard S

22,173
6
59
70

9

+1 for alternate behavior of excluding a key if it is not in bigdict as opposed to setting it to None. – dhj Jun 12 '14 at 18:35
1

Alternatively: `dict((k,bigdict.get(k,defaultVal) for k in wanted_keys)` if you must have all keys. – Thomas Andrews May 01 '18 at 20:57
4

This answer is saved by a "t". – sakurashinken May 29 '19 at 07:42
1

Also a bit shorter variant (syntax) of your solution is when using `{}`, i.e. `{k: bigdict[k] for k in wanted_keys if k in bigdict}` – Arty Oct 04 '21 at 03:42

theheadofabroom · Answer 3 · 2014-10-28T16:04:35.280

29

interesting_keys = ('l', 'm', 'n')
subdict = {x: bigdict[x] for x in interesting_keys if x in bigdict}

edited Oct 28 '14 at 16:04

answered Mar 18 '11 at 13:29

theheadofabroom

18,591
5
30
64

@loutre how else do you propose to ensure you extract all the data for the given keys? – theheadofabroom Aug 10 '20 at 09:18
1

sry I made a mistake. I was thinking you were looping on "bigdict". My bad. I delete my comment – loutre Aug 12 '20 at 08:13

Sklavit · Answer 4 · 2020-07-13T13:58:20.913

27

A bit of speed comparison for all mentioned methods:

UPDATED on 2020.07.13 (thx to @user3780389): ONLY for keys from bigdict.

 IPython 5.5.0 -- An enhanced Interactive Python.
Python 2.7.18 (default, Aug  8 2019, 00:00:00) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
import numpy.random as nprnd
  ...: keys = nprnd.randint(100000, size=10000)
  ...: bigdict = dict([(_, nprnd.rand()) for _ in range(100000)])
  ...: 
  ...: %timeit {key:bigdict[key] for key in keys}
  ...: %timeit dict((key, bigdict[key]) for key in keys)
  ...: %timeit dict(map(lambda k: (k, bigdict[k]), keys))
  ...: %timeit {key:bigdict[key] for key in set(keys) & set(bigdict.keys())}
  ...: %timeit dict(filter(lambda i:i[0] in keys, bigdict.items()))
  ...: %timeit {key:value for key, value in bigdict.items() if key in keys}
100 loops, best of 3: 2.36 ms per loop
100 loops, best of 3: 2.87 ms per loop
100 loops, best of 3: 3.65 ms per loop
100 loops, best of 3: 7.14 ms per loop
1 loop, best of 3: 577 ms per loop
1 loop, best of 3: 563 ms per loop

As it was expected: dictionary comprehensions are the best option.

edited Jul 13 '20 at 13:58

answered Mar 29 '16 at 09:48

Sklavit

1,979
19
25

The first 3 operations are doing a different thing to the last two, and will result in an error if `key` doesn't exist in `bigdict`. – naught101 Jun 19 '20 at 01:56
2

nice. maybe worth adding `{key:bigdict[key] for key in bigdict.keys() & keys}` from the [accepted solution](https://stackoverflow.com/a/5352630/3780389) which accomplishes the filter while actually being faster (on my machine) than the first method you list which doesn't filter. In fact, `{key:bigdict[key] for key in set(keys) & set(bigdict.keys())} ` seems to be even faster for these very large sets of keys ... – teichert Jul 08 '20 at 18:30
@telchert you are missing, that in the giving speed comparison bigdict.keys() & keys are not sets. And with explicit conversion to sets accepted solution is not so fast. – Sklavit Oct 18 '21 at 13:58

score 16 · Answer 5 · answered Jul 12 '15 at 18:01

16

This answer uses a dictionary comprehension similar to the selected answer, but will not except on a missing item.

python 2 version:

{k:v for k, v in bigDict.iteritems() if k in ('l', 'm', 'n')}

python 3 version:

{k:v for k, v in bigDict.items() if k in ('l', 'm', 'n')}

answered Jul 12 '15 at 18:01

Meow

1,070
12
21

5

...but if the big dict is HUGE it will still be iterated over completely (this is an O(n) operation), while the inverse would just grab 3 items (each an O(1) operation). – wouter bolsterlee Oct 05 '15 at 16:08
1

The question is about a dictionary of only 16 keys – Meow Oct 06 '15 at 17:09

score 7 · Answer 6 · answered Mar 18 '11 at 13:29

7

Maybe:

subdict=dict([(x,bigdict[x]) for x in ['l', 'm', 'n']])

Python 3 even supports the following:

subdict={a:bigdict[a] for a in ['l','m','n']}

Note that you can check for existence in dictionary as follows:

subdict=dict([(x,bigdict[x]) for x in ['l', 'm', 'n'] if x in bigdict])

resp. for python 3

subdict={a:bigdict[a] for a in ['l','m','n'] if a in bigdict}

answered Mar 18 '11 at 13:29

phimuemue

32,638
9
79
112

Fails if `a` is not in `bigdict` – Håvard S Mar 18 '11 at 13:31
the things that are said to work only in python 3, also work in 2.7 – Clint Eastwood May 14 '21 at 15:07

score 6 · Answer 7 · edited Mar 29 '20 at 11:09

6

You can also use map (which is a very useful function to get to know anyway):

sd = dict(map(lambda k: (k, l.get(k, None)), l))

Example:

large_dictionary = {'a1':123, 'a2':45, 'a3':344}
list_of_keys = ['a1', 'a3']
small_dictionary = dict(map(lambda key: (key, large_dictionary.get(key, None)), list_of_keys))

PS: I borrowed the .get(key, None) from a previous answer :)

edited Mar 29 '20 at 11:09

petezurich

7,683
8
34
51

answered Feb 23 '14 at 08:03

halfdanrump

363
3
11

score 5 · Answer 8 · answered Sep 01 '21 at 00:04

5

An alternative approach for if you want to retain the majority of the keys while removing a few:

{k: bigdict[k] for k in bigdict.keys() if k not in ['l', 'm', 'n']}

answered Sep 01 '21 at 00:04

Kevin Grimm

71
2
3

4

Even shorter: `{k: v for k, v in bigdict.items() if k not in ['l', 'm', 'n']}` – pierresegonne Oct 11 '21 at 13:42

pandamonium · Answer 9 · 2015-03-11T12:21:17.483

Okay, this is something that has bothered me a few times, so thank you Jayesh for asking it.

The answers above seem like as good a solution as any, but if you are using this all over your code, it makes sense to wrap the functionality IMHO. Also, there are two possible use cases here: one where you care about whether all keywords are in the original dictionary. and one where you don't. It would be nice to treat both equally.

So, for my two-penneth worth, I suggest writing a sub-class of dictionary, e.g.

class my_dict(dict):
    def subdict(self, keywords, fragile=False):
        d = {}
        for k in keywords:
            try:
                d[k] = self[k]
            except KeyError:
                if fragile:
                    raise
        return d

Now you can pull out a sub-dictionary with

orig_dict.subdict(keywords)

Usage examples:

#
## our keywords are letters of the alphabet
keywords = 'abcdefghijklmnopqrstuvwxyz'
#
## our dictionary maps letters to their index
d = my_dict([(k,i) for i,k in enumerate(keywords)])
print('Original dictionary:\n%r\n\n' % (d,))
#
## constructing a sub-dictionary with good keywords
oddkeywords = keywords[::2]
subd = d.subdict(oddkeywords)
print('Dictionary from odd numbered keys:\n%r\n\n' % (subd,))
#
## constructing a sub-dictionary with mixture of good and bad keywords
somebadkeywords = keywords[1::2] + 'A'
try:
    subd2 = d.subdict(somebadkeywords)
    print("We shouldn't see this message")
except KeyError:
    print("subd2 construction fails:")
    print("\toriginal dictionary doesn't contain some keys\n\n")
#
## Trying again with fragile set to false
try:
    subd3 = d.subdict(somebadkeywords, fragile=False)
    print('Dictionary constructed using some bad keys:\n%r\n\n' % (subd3,))
except KeyError:
    print("We shouldn't see this message")

If you run all the above code, you should see (something like) the following output (sorry for the formatting):

Original dictionary:
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5, 'i': 8, 'h': 7, 'k': 10, 'j': 9, 'm': 12, 'l': 11, 'o': 14, 'n': 13, 'q': 16, 'p': 15, 's': 18, 'r': 17, 'u': 20, 't': 19, 'w': 22, 'v': 21, 'y': 24, 'x': 23, 'z': 25}

Dictionary from odd numbered keys:
{'a': 0, 'c': 2, 'e': 4, 'g': 6, 'i': 8, 'k': 10, 'm': 12, 'o': 14, 'q': 16, 's': 18, 'u': 20, 'w': 22, 'y': 24}

subd2 construction fails:
original dictionary doesn't contain some keys

Dictionary constructed using some bad keys:
{'b': 1, 'd': 3, 'f': 5, 'h': 7, 'j': 9, 'l': 11, 'n': 13, 'p': 15, 'r': 17, 't': 19, 'v': 21, 'x': 23, 'z': 25}

Subclassing requires an existing dict object to be converted into the subclass type, which can be expensive. Why not just write a simple function `subdict(orig_dict, keys, …)`? — musiphil, Jul 17 '15 at 17:43
@musiphil: I doubt there's much difference in overhead. The nice thing about subclassing is the method is part of the class and doesn't need to be imported or in-lined. Only potential problem or limitation of the code in this answer is the result is *not* of type `my_dict`. — martineau, Sep 14 '21 at 21:37

score 2 · Answer 10 · answered Apr 05 '13 at 08:13

2

Yet another one (I prefer Mark Longair's answer)

di = {'a':1,'b':2,'c':3}
req = ['a','c','w']
dict([i for i in di.iteritems() if i[0] in di and i[0] in req])

answered Apr 05 '13 at 08:13

georg

635
7
16

its slow for **big**dict's – kxr Jan 28 '16 at 08:10

score 2 · Answer 11 · answered May 24 '20 at 21:10

solution

from operator import itemgetter
from typing import List, Dict, Union


def subdict(d: Union[Dict, List], columns: List[str]) -> Union[Dict, List[Dict]]:
    """Return a dict or list of dicts with subset of 
    columns from the d argument.
    """
    getter = itemgetter(*columns)

    if isinstance(d, list):
        result = []
        for subset in map(getter, d):
            record = dict(zip(columns, subset))
            result.append(record)
        return result
    elif isinstance(d, dict):
        return dict(zip(columns, getter(d)))

    raise ValueError('Unsupported type for `d`')

examples of use

# pure dict

d = dict(a=1, b=2, c=3)
print(subdict(d, ['a', 'c']))

>>> In [5]: {'a': 1, 'c': 3}

# list of dicts

d = [
    dict(a=1, b=2, c=3),
    dict(a=2, b=4, c=6),
    dict(a=4, b=8, c=12),
]

print(subdict(d, ['a', 'c']))

>>> In [5]: [{'a': 1, 'c': 3}, {'a': 2, 'c': 6}, {'a': 4, 'c': 12}]

score 1 · Answer 12 · answered Nov 11 '20 at 10:08

Using map (halfdanrump's answer) is best for me, though haven't timed it...

But if you go for a dictionary, and if you have a big_dict:

Make absolutely certain you loop through the the req. This is crucial, and affects the running time of the algorithm (big O, theta, you name it)
Write it generic enough to avoid errors if keys are not there.

so e.g.:

big_dict = {'a':1,'b':2,'c':3,................................................}
req = ['a','c','w']

{k:big_dict.get(k,None) for k in req )
# or 
{k:big_dict[k] for k in req if k in big_dict)

Note that in the converse case, that the req is big, but my_dict is small, you should loop through my_dict instead.

In general, we are doing an intersection and the complexity of the problem is O(min(len(dict)),min(len(req))). Python's own implementation of intersection considers the size of the two sets, so it seems optimal. Also, being in c and part of the core library, is probably faster than most not optimized python statements. Therefore, a solution that I would consider is:

dict = {'a':1,'b':2,'c':3,................................................}
req = ['a','c','w',...................]

{k:dic[k] for k in set(req).intersection(dict.keys())}

It moves the critical operation inside python's c code and will work for all cases.

score -1 · Answer 13 · answered Aug 26 '21 at 07:04

-1

In case someone wants first few items n of the dictionary without knowing the keys:

n = 5 # First Five Items
ks = [*dikt.keys()][:n]
less_dikt = {i: dikt[i] for i in ks}

answered Aug 26 '21 at 07:04

Extract subset of key-value pairs from dictionary?

13 Answers13

Linked

Related