2

My dictionary looks like this:

docScores = {0:[{u'word':2.3},{u'the':8.7},{u'if':4.1},{u'Car':1.7}],
             1:[{u'friend':1.2},{u'a':5.2},{u'you':3.8},{u'person':0.8}],
             ...
             29:[{u'yard':1.5},{u'gardening':2.8},{u'paint':3.7},{u'brush':1.6}]
            }

I want to sum the values of each inner dict for each list and store it in a new dict, with the new dict having key values of {0:2.3+8.7+4.1+1.7, 1:1.2+5.2+3.8+0.8, ... etc} i.e.

for x in docScores[0]: #{0:
    for x in docScores[0][0].values(): #{,2.3}.
        sum = sum+x #where sum = 0 before loop
        docSum[0] = sum
    repeat this loop for every document

Any variation that I have tried is giving me unexpected outputs. Can anyone give me the correct syntax for this?

adohertyd
  • 2,659
  • 19
  • 50
  • 77
  • 3
    Can you post what you expect the result to be for this? do you want `{0: 2.3+8.7+4.1+1.7, 1: 1.2+5.2_3.8+0.8, ... 29: 1.5+2.8+3.7+1.6}` ? – mgilson Aug 03 '12 at 14:12
  • Yes that's exactly it will edit my question accordingly – adohertyd Aug 03 '12 at 14:13
  • This calls for one amazing list comprehension... It's a pity that I don't have time to write it now. – BrtH Aug 03 '12 at 14:13
  • Why the list of `dict`s, each holding a single element? That looks like a terrible waste of memory. – Fred Foo Aug 03 '12 at 14:15
  • @larsmans I'm still learning python and dicts have been my biggest stumbling block. I'm working on a large program and to get to the point I'm at now it was the only way I could get the values as they are. What would you recommend as an alternative that I could look into? – adohertyd Aug 03 '12 at 14:17
  • 2
    @adohertyd: instead of `[{u'word':2.3},{u'the':8.7}]`, I would suggest `{u'word':2.3, u'the':8.7}`. I.e., a single `dict` per document. – Fred Foo Aug 03 '12 at 14:20
  • 1
    @larsmans is *exactly* right. What's more, instead of each integer having its own key in the dictionary, just have it as a list. So instead of a dict of lists of dicts, you'll just have a list of dicts! – David Robinson Aug 03 '12 at 14:26

5 Answers5

3

This dict comprehension works:

docScores = {0:[{u'word':2.3},{u'the':8.7},{u'if':4.1},{u'Car':1.7}],
             1:[{u'friend':1.2},{u'a':5.2},{u'you':3.8},{u'person':0.8}],
             29:[{u'yard':1.5},{u'gardening':2.8},{u'paint':3.7},{u'brush':1.6}]
            }

sum_d={k:sum(d.values()[0] for d in v) for k,v in docScores.items()}

print sum_d

Prints:

{0: 16.8, 1: 11.0, 29: 9.6}

However, changing your data structure may be easier. You could have a dict of dicts:

>>> NdocScores = {0:{u'word':2.3,u'the':8.7,u'if':4.1,u'Car':1.7},
...              1:{u'friend':1.2,u'a':5.2,u'you':3.8,u'person':0.8},
...              29:{u'yard':1.5,u'gardening':2.8,u'paint':3.7,u'brush':1.6}
...             }   

Which allows each docs data to be directly accessed:

>>> NdocScores[0]
{u'Car': 1.7, u'the': 8.7, u'word': 2.3, u'if': 4.1}
>>> NdocScores[0][u'Car']
1.7
>>> sum(NdocScores[1].values())
11.0

>>> NdocScores[29]
{u'gardening': 2.8, u'yard': 1.5, u'brush': 1.6, u'paint': 3.7}

Or, just have a list of dicts with the position in the list corresponding to the doc index:

>>> lofdicts=[v for k,v in NdocScores.items()]
>>> lofdicts
[{u'Car': 1.7, u'the': 8.7, u'word': 2.3, u'if': 4.1}, {u'a': 5.2, u'person': 0.8, u'you': 3.8, u'friend': 1.2}, {u'gardening': 2.8, u'yard': 1.5, u'brush': 1.6, u'paint': 3.7}]
>>> lofdicts[0]
{u'Car': 1.7, u'the': 8.7, u'word': 2.3, u'if': 4.1}
>>> sum(lofdicts[1].values())
11.0
dawg
  • 90,796
  • 20
  • 120
  • 197
  • Why `d[d.keys()[0]]` instead of `d.values()[0]`? – David Robinson Aug 03 '12 at 14:24
  • Note that this solution only works on python2.7. prior to 2.7, dict comprehensions didn't exist. after 2.7, `dict.values()` no longer returns an indexable object and `dict.iteritems()` no longer exists. – mgilson Aug 03 '12 at 14:35
2
new_dict={}

docScores = {0:[{u'word':2.3},{u'the':8.7},{u'if':4.1},{u'Car':1.7}],
             1:[{u'friend':1.2},{u'a':5.2},{u'you':3.8},{u'person':0.8}],
             29:[{u'yard':1.5},{u'gardening':2.8},{u'paint':3.7},{u'brush':1.6}]
            }

for k,v in docScores.items():
    new_dict[k]=sum( sum(d.values()) for d in v )

print (new_dict) #{0: 16.8, 1: 11.0, 29: 9.6}

As others have mentioned, you could make this into a dictionary comprehension (python 2.7+):

new_dict = {k : sum( sum(d.values()) for d in v ) for k,v in docScores.items() }

But at this point I think that the comprehension is getting very difficult to comprehend (and therefore I wouldn't do it).

Also, someone should probably point out that if all your dictionary keys are sequential integers starting from 0 and going to 29, You probably shouldn't be using a dictionary to store this data -- maybe a list would be more appropriate ...

EDIT

using a list:

new_list = [sum( sum(d.values()) for d in v ) for _,v in sorted(docScores.items()) ]
mgilson
  • 283,004
  • 58
  • 591
  • 667
  • I'm using Python 2.7 if that's going to make a difference? – adohertyd Aug 03 '12 at 14:18
  • @adohertyd -- I've updated my answer to be python2 and python3 compatible. – mgilson Aug 03 '12 at 14:20
  • @adohertyd -- one new update `sum(d[kk] for kk in d)` was stupid -- I don't know why I used that ... `sum(d.values())` is much better. – mgilson Aug 03 '12 at 14:21
  • thank you works great. How would I transform this to a list instead as you recommend? I'm more comfortable working with lists anyway – adohertyd Aug 03 '12 at 14:25
  • @adohertyd: You should generate the original data as a list rather than create it as a dict and transform it. However, `[docScores[i] for i in range(0, max(docScores.keys()) + 1)]` would do the trick. – David Robinson Aug 03 '12 at 14:27
  • @adohertyd -- I've updated the solution to use a list. Note to transform a dict with your structure to a list, `[v for _,k in sorted(your_dict.items()]` will do the trick. – mgilson Aug 03 '12 at 14:30
1
>>> doc_scores = {
        0: [{u'word': 2.3}, {u'the': 8.7}, {u'if': 4.1}, {u'Car': 1.7}],
        1: [{u'friend': 1.2}, {u'a': 5.2}, {u'you': 3.8}, {u'person': 0.8}],
        29: [{u'yard': 1.5}, {u'gardening': 2.8}, {u'paint': 3.7}, {u'brush': 1.6}]
}
>>> dict((k, sum(n for d in v for n in d.itervalues())) 
         for k, v in doc_scores.iteritems())
{0: 16.8, 1: 11.0, 29: 9.6}

If you only have one value in each of the dicts in the lists you can shorten this:

>>> dict((k, sum(d.values()[0] for d in v)) for k, v in doc_scores.iteritems())
{0: 16.8, 1: 11.0, 29: 9.6}
jamylak
  • 120,885
  • 29
  • 225
  • 225
1

And more oneline solve )

sum(reduce(lambda x, y: x+y, [d.values() for d in v for _,v in docScores.iteritems()]))
Denis
  • 6,591
  • 6
  • 36
  • 57
  • Opppps, I have not read the problem correctly, but this code sum all your coeffs for all structure. )) – Denis Aug 03 '12 at 14:22
0
docScores = {0:[{u'word':2.3},{u'the':8.7},{u'if':4.1},{u'Car':1.7}],
             1:[{u'friend':1.2},{u'a':5.2},{u'you':3.8},{u'person':0.8}],
             2:[{u'yard':1.5},{u'gardening':2.8},{u'paint':3.7},{u'brush':1.6}]
            }


result = dict(enumerate(sum (sum(word.values()) for word in  word_list[1]) for word_list in sorted(docScores.items())  ) ) 
jsbueno
  • 86,446
  • 9
  • 131
  • 182