Python: Elegantly merge dictionaries with sum() of values
我正在尝试合并来自多个服务器的日志。每个日志都是元组列表(
以下是我的尝试,包括一些数据,例如:
1 2 3 4 5 6 7 8 9 10 11 12 | from collections import defaultdict a=[("13.5",100)] b=[("14.5",100), ("15.5", 100)] c=[("15.5",100), ("16.5", 100)] input=[a,b,c] output=defaultdict(int) for d in input: for item in d: output[item[0]]+=item[1] print dict(output) |
它给出:
1 | {'14.5': 100, '16.5': 100, '13.5': 100, '15.5': 200} |
果不其然。
因为一个同事看到了密码,我就要发疯了。她坚持认为,必须有一个更为Python和优雅的方式来做到这一点,没有这些嵌套的for循环。有什么想法吗?
我想不会比这简单得多:
1 2 3 4 5 6 7 8 9 10 | a=[("13.5",100)] b=[("14.5",100), ("15.5", 100)] c=[("15.5",100), ("16.5", 100)] input=[a,b,c] from collections import Counter print sum( (Counter(dict(x)) for x in input), Counter()) |
请注意,
也是可能的:
1 2 3 4 | from collections import Counter from operator import add print reduce(add, (Counter(dict(x)) for x in input)) |
使用
请注意,您也可以使用
上面的变量非常慢,因为每一步都会创建一个新的计数器。我们来解决这个问题。
我们知道,
update(self, iterable=None, **kwds) unbound collections.Counter method
Like dict.update() but add counts instead of replacing them.
Source can be an iterable, a dictionary, or another Counter instance.
这就是我们想要的。让我们用一个与
1 2 3 4 5 | def updateInPlace(a,b): a.update(b) return a print reduce(updateInPlace, (Counter(dict(x)) for x in input)) |
这只比OP的解决方案稍微慢一点。
基准测试:http://ideone.com/7izsx(由于astynax,更新了另一个解决方案)
(同样:如果你非常想要一个一行程序,你可以用
1 2 3 4 5 6 7 8 9 10 11 12 | from collections import Counter a = [("13.5",100)] b = [("14.5",100), ("15.5", 100)] c = [("15.5",100), ("16.5", 100)] inp = [dict(x) for x in (a,b,c)] count = Counter() for y in inp: count += Counter(y) print(count) |
输出:
1 | Counter({'15.5': 200, '14.5': 100, '16.5': 100, '13.5': 100}) |
编辑:正如Duncan建议的,您可以用一条线替换这三条线:
1 2 3 | count = Counter() for y in inp: count += Counter(y) |
替换为:
您可以使用itertools的groupby:
1 2 3 4 5 6 7 8 9 10 11 12 | from itertools import groupby, chain a=[("13.5",100)] b=[("14.5",100), ("15.5", 100)] c=[("15.5",100), ("16.5", 100)] input = sorted(chain(a,b,c), key=lambda x: x[0]) output = {} for k, g in groupby(input, key=lambda x: x[0]): output[k] = sum(x[1] for x in g) print output |
使用
您可以使用counter或defaultdict,也可以尝试我的变体:
1 2 3 4 5 6 7 8 9 10 11 | def merge_with(d1, d2, fn=lambda x, y: x + y): res = d1.copy() #"= dict(d1)" for lists of tuples for key, val in d2.iteritems(): #".. in d2" for lists of tuples try: res[key] = fn(res[key], val) except KeyError: res[key] = val return res >>> merge_with({'a':1, 'b':2}, {'a':3, 'c':4}) {'a': 4, 'c': 4, 'b': 2} |
或者更一般:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | def make_merger(fappend=lambda x, y: x + y, fempty=lambda x: x): def inner(*dicts): res = dict((k, fempty(v)) for k, v in dicts[0].iteritems()) #".. in dicts[0]" for lists of tuples for dic in dicts[1:]: for key, val in dic.iteritems(): #".. in dic" for lists of tuples try: res[key] = fappend(res[key], val) except KeyError: res[key] = fempty(val) return res return inner >>> make_merger()({'a':1, 'b':2}, {'a':3, 'c':4}) {'a': 4, 'c': 4, 'b': 2} >>> appender = make_merger(lambda x, y: x + [y], lambda x: [x]) >>> appender({'a':1, 'b':2}, {'a':3, 'c':4}, {'b':'BBB', 'c':'CCC'}) {'a': [1, 3], 'c': [4, 'CCC'], 'b': [2, 'BBB']} |
此外,还可以对