关于词典：Python：使用sum()合并字典

Python: Elegantly merge dictionaries with sum() of values

本问题已经有最佳答案，请猛点这里访问。

我正在尝试合并来自多个服务器的日志。每个日志都是元组列表(date和count)。date可能出现多次，我希望生成的字典包含所有服务器的所有计数总和。

以下是我的尝试，包括一些数据，例如：

1
2
3
4
5
6
7
8
9
10
11
12

from collections import defaultdict

a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input=[a,b,c]

output=defaultdict(int)
for d in input:
for item in d:
output[item[0]]+=item[1]
print dict(output)

它给出：

1	{'14.5': 100, '16.5': 100, '13.5': 100, '15.5': 200}

果不其然。

因为一个同事看到了密码，我就要发疯了。她坚持认为，必须有一个更为Python和优雅的方式来做到这一点，没有这些嵌套的for循环。有什么想法吗？

相关讨论

我想不会比这简单得多：

1
2
3
4
5
6
7
8
9
10

a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input=[a,b,c]

from collections import Counter

print sum(
(Counter(dict(x)) for x in input),
Counter())

请注意，Counter(也称为multiset)是您的数据最自然的数据结构(一种元素可以多次属于的集合类型，或者等价于一个带有语义元素的映射->occurrencecount)。您可以首先使用它，而不是元组列表。

也是可能的：

1
2
3
4

from collections import Counter
from operator import add

print reduce(add, (Counter(dict(x)) for x in input))

使用reduce(add, seq)而不是sum(seq, initialValue)通常更灵活，允许您跳过多余的初始值。

请注意，您也可以使用operator.and_来查找多集的交集，而不是求和。

上面的变量非常慢，因为每一步都会创建一个新的计数器。我们来解决这个问题。

我们知道，Counter+Counter返回一个新的Counter和合并数据。这是可以的，但我们希望避免额外的创建。我们用Counter.update代替：

update(self, iterable=None, **kwds) unbound collections.Counter method

Like dict.update() but add counts instead of replacing them.
Source can be an iterable, a dictionary, or another Counter instance.

这就是我们想要的。让我们用一个与reduce兼容的函数来包装它，看看会发生什么。

1
2
3
4
5

def updateInPlace(a,b):
a.update(b)
return a

print reduce(updateInPlace, (Counter(dict(x)) for x in input))

这只比OP的解决方案稍微慢一点。

基准测试：http://ideone.com/7izsx(由于astynax，更新了另一个解决方案)

(同样：如果你非常想要一个一行程序，你可以用lambda x,y: x.update(y) or x替换updateInPlace，它的工作方式相同，甚至可以更快地被证明是一个分裂秒，但在可读性上失败。不要：-)

相关讨论

1
2
3
4
5
6
7
8
9
10
11
12

from collections import Counter

a = [("13.5",100)]
b = [("14.5",100), ("15.5", 100)]
c = [("15.5",100), ("16.5", 100)]

inp = [dict(x) for x in (a,b,c)]
count = Counter()
for y in inp:
count += Counter(y)
print(count)

输出：

1	Counter({'15.5': 200, '14.5': 100, '16.5': 100, '13.5': 100})

编辑：正如Duncan建议的，您可以用一条线替换这三条线：

1
2
3

count = Counter()
for y in inp:
count += Counter(y)

替换为：count = sum((Counter(y) for y in inp), Counter())。

相关讨论

您可以使用itertools的groupby:

1
2
3
4
5
6
7
8
9
10
11
12

from itertools import groupby, chain

a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input = sorted(chain(a,b,c), key=lambda x: x[0])

output = {}
for k, g in groupby(input, key=lambda x: x[0]):
output[k] = sum(x[1] for x in g)

print output

使用groupby而不是两个循环和一个defaultdict将使代码更清晰。

相关讨论

您可以使用counter或defaultdict，也可以尝试我的变体：

1
2
3
4
5
6
7
8
9
10
11

def merge_with(d1, d2, fn=lambda x, y: x + y):
res = d1.copy() #"= dict(d1)" for lists of tuples
for key, val in d2.iteritems(): #".. in d2" for lists of tuples
try:
res[key] = fn(res[key], val)
except KeyError:
res[key] = val
return res

>>> merge_with({'a':1, 'b':2}, {'a':3, 'c':4})
{'a': 4, 'c': 4, 'b': 2}

或者更一般：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

def make_merger(fappend=lambda x, y: x + y, fempty=lambda x: x):
def inner(*dicts):
res = dict((k, fempty(v)) for k, v
in dicts[0].iteritems()) #".. in dicts[0]" for lists of tuples
for dic in dicts[1:]:
for key, val in dic.iteritems(): #".. in dic" for lists of tuples
try:
res[key] = fappend(res[key], val)
except KeyError:
res[key] = fempty(val)
return res
return inner

>>> make_merger()({'a':1, 'b':2}, {'a':3, 'c':4})
{'a': 4, 'c': 4, 'b': 2}

>>> appender = make_merger(lambda x, y: x + [y], lambda x: [x])
>>> appender({'a':1, 'b':2}, {'a':3, 'c':4}, {'b':'BBB', 'c':'CCC'})
{'a': [1, 3], 'c': [4, 'CCC'], 'b': [2, 'BBB']}

此外，还可以对dict进行子类化，并实现__add__方法：

相关讨论