186

I have 2 CSV files: 'Data' and 'Mapping':

  • 'Mapping' file has 4 columns: Device_Name, GDN, Device_Type, and Device_OS. All four columns are populated.
  • 'Data' file has these same columns, with Device_Name column populated and the other three columns blank.
  • I want my Python code to open both files and for each Device_Name in the Data file, map its GDN, Device_Type, and Device_OS value from the Mapping file.

I know how to use dict when only 2 columns are present (1 is needed to be mapped) but I don't know how to accomplish this when 3 columns need to be mapped.

Following is the code using which I tried to accomplish mapping of Device_Type:

x = dict([])
with open("Pricing Mapping_2013-04-22.csv", "rb") as in_file1:
    file_map = csv.reader(in_file1, delimiter=',')
    for row in file_map:
       typemap = [row[0],row[2]]
       x.append(typemap)

with open("Pricing_Updated_Cleaned.csv", "rb") as in_file2, open("Data Scraper_GDN.csv", "wb") as out_file:
    writer = csv.writer(out_file, delimiter=',')
    for row in csv.reader(in_file2, delimiter=','):
         try:
              row[27] = x[row[11]]
         except KeyError:
              row[27] = ""
         writer.writerow(row)

It returns Attribute Error.

After some researching, I think I need to create a nested dict, but I don't have any idea how to do this.

smci
  • 29,564
  • 18
  • 109
  • 144
atams
  • 2,559
  • 3
  • 15
  • 14
  • `Device_Name` column is the key in both files, on this key I want to map Device_OS, GDN & Device_Type values from mapping file to data file. – atams May 02 '13 at 08:36
  • Do you want to be able to do something like `row[27] = x[row[11]]["Device_OS"]`? – Janne Karila May 02 '13 at 09:49
  • **See also:** [search for key in nested dict](https://stackoverflow.com/questions/7681301/search-for-a-key-in-a-nested-python-dictionary) -- [python dpath](https://stackoverflow.com/a/16508328/42223) – dreftymac Oct 30 '17 at 20:16
  • This doesn't need a nested dict, necessarily. You could use pandas, read_csv, make `Device_Name` the index, then you can directly `join` the two dataframes on their index `Device_Name`. – smci Dec 11 '19 at 07:30

10 Answers10

386

A nested dict is a dictionary within a dictionary. A very simple thing.

>>> d = {}
>>> d['dict1'] = {}
>>> d['dict1']['innerkey'] = 'value'
>>> d['dict1']['innerkey2'] = 'value2'
>>> d
{'dict1': {'innerkey': 'value', 'innerkey2': 'value2'}}

You can also use a defaultdict from the collections package to facilitate creating nested dictionaries.

>>> import collections
>>> d = collections.defaultdict(dict)
>>> d['dict1']['innerkey'] = 'value'
>>> d  # currently a defaultdict type
defaultdict(<type 'dict'>, {'dict1': {'innerkey': 'value'}})
>>> dict(d)  # but is exactly like a normal dictionary.
{'dict1': {'innerkey': 'value'}}

You can populate that however you want.

I would recommend in your code something like the following:

d = {}  # can use defaultdict(dict) instead

for row in file_map:
    # derive row key from something 
    # when using defaultdict, we can skip the next step creating a dictionary on row_key
    d[row_key] = {} 
    for idx, col in enumerate(row):
        d[row_key][idx] = col

According to your comment:

may be above code is confusing the question. My problem in nutshell: I have 2 files a.csv b.csv, a.csv has 4 columns i j k l, b.csv also has these columns. i is kind of key columns for these csvs'. j k l column is empty in a.csv but populated in b.csv. I want to map values of j k l columns using 'i` as key column from b.csv to a.csv file

My suggestion would be something like this (without using defaultdict):

a_file = "path/to/a.csv"
b_file = "path/to/b.csv"

# read from file a.csv
with open(a_file) as f:
    # skip headers
    f.next()
    # get first colum as keys
    keys = (line.split(',')[0] for line in f) 

# create empty dictionary:
d = {}

# read from file b.csv
with open(b_file) as f:
    # gather headers except first key header
    headers = f.next().split(',')[1:]
    # iterate lines
    for line in f:
        # gather the colums
        cols = line.strip().split(',')
        # check to make sure this key should be mapped.
        if cols[0] not in keys:
            continue
        # add key to dict
        d[cols[0]] = dict(
            # inner keys are the header names, values are columns
            (headers[idx], v) for idx, v in enumerate(cols[1:]))

Please note though, that for parsing csv files there is a csv module.

Inbar Rose
  • 39,034
  • 24
  • 81
  • 124
  • may be above code is confusing the question. My problem in nutshell: I have 2 files `a.csv` `b.csv`, `a.csv` has 4 columns `i j k l`, `b.csv` also has these columns. `i` is kind of key columns for these csvs'. `j k l` column is empty in `a.csv` but populated in `b.csv`. I want to map values of `j k l` columns using 'i` as key column from b.csv to a.csv file. – atams May 02 '13 at 12:04
71

UPDATE: For an arbitrary length of a nested dictionary, go to this answer.

Use the defaultdict function from the collections.

High performance: "if key not in dict" is very expensive when the data set is large.

Low maintenance: make the code more readable and can be easily extended.

from collections import defaultdict

target_dict = defaultdict(dict)
target_dict[key1][key2] = val
Junchen
  • 1,723
  • 2
  • 18
  • 25
33

For arbitrary levels of nestedness:

In [2]: def nested_dict():
   ...:     return collections.defaultdict(nested_dict)
   ...:

In [3]: a = nested_dict()

In [4]: a
Out[4]: defaultdict(<function __main__.nested_dict>, {})

In [5]: a['a']['b']['c'] = 1

In [6]: a
Out[6]:
defaultdict(<function __main__.nested_dict>,
            {'a': defaultdict(<function __main__.nested_dict>,
                         {'b': defaultdict(<function __main__.nested_dict>,
                                      {'c': 1})})})
andrew
  • 1,753
  • 19
  • 18
  • 4
    What the above answer does with a two-line function, you can also do with a one-line lambda, as in [this answer](https://stackoverflow.com/a/8702435/832230). – Asclepius Jun 03 '17 at 21:58
4

It is important to remember when using defaultdict and similar nested dict modules such as nested_dict, that looking up a nonexistent key may inadvertently create a new key entry in the dict and cause a lot of havoc.

Here is a Python3 example with nested_dict module:

import nested_dict as nd
nest = nd.nested_dict()
nest['outer1']['inner1'] = 'v11'
nest['outer1']['inner2'] = 'v12'
print('original nested dict: \n', nest)
try:
    nest['outer1']['wrong_key1']
except KeyError as e:
    print('exception missing key', e)
print('nested dict after lookup with missing key.  no exception raised:\n', nest)

# Instead, convert back to normal dict...
nest_d = nest.to_dict(nest)
try:
    print('converted to normal dict. Trying to lookup Wrong_key2')
    nest_d['outer1']['wrong_key2']
except KeyError as e:
    print('exception missing key', e)
else:
    print(' no exception raised:\n')

# ...or use dict.keys to check if key in nested dict
print('checking with dict.keys')
print(list(nest['outer1'].keys()))
if 'wrong_key3' in list(nest.keys()):

    print('found wrong_key3')
else:
    print(' did not find wrong_key3')

Output is:

original nested dict:   {"outer1": {"inner2": "v12", "inner1": "v11"}}

nested dict after lookup with missing key.  no exception raised:  
{"outer1": {"wrong_key1": {}, "inner2": "v12", "inner1": "v11"}} 

converted to normal dict. 
Trying to lookup Wrong_key2 

exception missing key 'wrong_key2' 

checking with dict.keys 

['wrong_key1', 'inner2', 'inner1']  
did not find wrong_key3
smci
  • 29,564
  • 18
  • 109
  • 144
Gerard G
  • 141
  • 2
  • 4
3
pip install addict
from addict import Dict

mapping = Dict()
mapping.a.b.c.d.e = 2
print(mapping)  # {'a': {'b': {'c': {'d': {'e': 2}}}}}

References:

  1. easydict GitHub
  2. addict GitHub
XerCis
  • 399
  • 2
  • 4
0

If you want to create a nested dictionary given a list (arbitrary length) for a path and perform a function on an item that may exist at the end of the path, this handy little recursive function is quite helpful:

def ensure_path(data, path, default=None, default_func=lambda x: x):
    """
    Function:

    - Ensures a path exists within a nested dictionary

    Requires:

    - `data`:
        - Type: dict
        - What: A dictionary to check if the path exists
    - `path`:
        - Type: list of strs
        - What: The path to check

    Optional:

    - `default`:
        - Type: any
        - What: The default item to add to a path that does not yet exist
        - Default: None

    - `default_func`:
        - Type: function
        - What: A single input function that takes in the current path item (or default) and adjusts it
        - Default: `lambda x: x` # Returns the value in the dict or the default value if none was present
    """
    if len(path)>1:
        if path[0] not in data:
            data[path[0]]={}
        data[path[0]]=ensure_path(data=data[path[0]], path=path[1:], default=default, default_func=default_func)
    else:
        if path[0] not in data:
            data[path[0]]=default
        data[path[0]]=default_func(data[path[0]])
    return data

Example:

data={'a':{'b':1}}
ensure_path(data=data, path=['a','c'], default=[1])
print(data) #=> {'a':{'b':1, 'c':[1]}}
ensure_path(data=data, path=['a','c'], default=[1], default_func=lambda x:x+[2])
print(data) #=> {'a': {'b': 1, 'c': [1, 2]}}
conmak
  • 867
  • 7
  • 12
0

This thing is empty nested list from which ne will append data to empty dict

ls = [['a','a1','a2','a3'],['b','b1','b2','b3'],['c','c1','c2','c3'], 
['d','d1','d2','d3']]

this means to create four empty dict inside data_dict

data_dict = {f'dict{i}':{} for i in range(4)}
for i in range(4):
    upd_dict = {'val' : ls[i][0], 'val1' : ls[i][1],'val2' : ls[i][2],'val3' : ls[i][3]}

    data_dict[f'dict{i}'].update(upd_dict)

print(data_dict)

The output

{'dict0': {'val': 'a', 'val1': 'a1', 'val2': 'a2', 'val3': 'a3'}, 'dict1': {'val': 'b', 'val1': 'b1', 'val2': 'b2', 'val3': 'b3'},'dict2': {'val': 'c', 'val1': 'c1', 'val2': 'c2', 'val3': 'c3'}, 'dict3': {'val': 'd', 'val1': 'd1', 'val2': 'd2', 'val3': 'd3'}}

Shah Vipul
  • 404
  • 4
  • 9
0
#in jupyter
import sys
!conda install -c conda-forge --yes --prefix {sys.prefix} nested_dict 
import nested_dict as nd
d = nd.nested_dict()

'd' can be used now to store the nested key value pairs.

0
travel_log = {
    "France" : {"cities_visited" : ["paris", "lille", "dijon"], "total_visits" : 10},
    "india" : {"cities_visited" : ["Mumbai", "delhi", "surat",], "total_visits" : 12}
}
4b0
  • 20,627
  • 30
  • 92
  • 137
  • 3
    Please add some details why this answer is better than the ones previously given. What's wrong with the other answers that prompted you to post yours? Giving context to an answer is the best way to help future readers. – 4b0 Mar 26 '21 at 08:51
0

You can initialize an empty NestedDict and then assign values to new keys.

from ndicts.ndicts import NestedDict

nd = NestedDict()
nd["level1", "level2", "level3"] = 0
>>> nd
NestedDict({'level1': {'level2': {'level3': 0}}})

ndicts is on Pypi

pip install ndicts
edd313
  • 503
  • 3
  • 11