Python merge dictionaries with custom merge function
我想合并两个字典A和B,这样结果包含:
- 其中的键对
- B的所有对,其中键对B是唯一的
- F(valuea,valueb),其中A和B中都存在相同的键
例如:
1 2 3 4 5 6 7 | def f(x, y): return x * y A = {1:1, 2:3} B = {7:3, 2:2} C = merge(A, B) |
输出:
1 | {1:1, 7:3, 2:6} |
号
感觉应该有一个很好的一行这样做。
使用字典视图来实现这一点;
1 2 3 4 5 6 | def merge(A, B, f): # Start with symmetric difference; keys either in A or B, but not both merged = {k: A.get(k, B.get(k)) for k in A.viewkeys() ^ B.viewkeys()} # Update with `f()` applied to the intersection merged.update({k: f(A[k], B[k]) for k in A.viewkeys() & B.viewkeys()}) return merged |
在python 3中,
上述
演示:
1 2 3 4 5 6 7 8 9 | >>> def f(x, y): ... return x * y ... >>> A = {1:1, 2:3} >>> B = {7:3, 2:2} >>> merge(A, B, f) {1: 1, 2: 6, 7: 3} >>> merge(A, B, lambda a, b: '{} merged with {}'.format(a, b)) {1: 1, 2: '3 merged with 2', 7: 3} |
号
下面是我在Python3中针对一般情况的解决方案代码。
我首先编写了merge函数,然后将其扩展到更一般的merge_with函数,它需要一个函数和各种数量的字典。如果这些字典中有任何重复的键,请将提供的函数应用于键重复的值。
合并函数可以使用merge_with函数重新定义,就像合并函数一样。名称合并是指将它们全部合并并保留最右边的值,如果有重复的值。mergel函数也是如此,它保持最左边。
这里的所有函数(merge、merge-with、mergel和merge)都是通用的,如果它们接受任意数量的字典参数。具体来说,merge_with必须将一个与要应用的数据兼容的函数作为参数。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | from functools import reduce from operator import or_ def merge(*dicts): return { k: reduce(lambda d, x: x.get(k, d), dicts, None) for k in reduce(or_, map(lambda x: x.keys(), dicts), set()) } def merge_with(f, *dicts): return { k: (lambda x: f(*x) if len(x)>1 else x[0])([ d[k] for d in dicts if k in d ]) for k in reduce(or_, map(lambda x: x.keys(), dicts), set()) } mergel = lambda *dicts: merge_with(lambda *x: x[0], *dicts) merger = lambda *dicts: merge_with(lambda *x: x[-1], *dicts) |
。
测验
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | >>> squares = { k:k*k for k in range(4) } >>> squares {0: 0, 1: 1, 2: 4, 3: 9} >>> cubes = { k:k**3 for k in range(2,6) } >>> cubes {2: 8, 3: 27, 4: 64, 5: 125} >>> merger(squares, cubes) {0: 0, 1: 1, 2: 8, 3: 27, 4: 64, 5: 125} >>> merger(cubes, squares) {0: 0, 1: 1, 2: 4, 3: 9, 4: 64, 5: 125} >>> mergel(squares, cubes) {0: 0, 1: 1, 2: 4, 3: 9, 4: 64, 5: 125} >>> mergel(cubes, squares) {0: 0, 1: 1, 2: 8, 3: 27, 4: 64, 5: 125} >>> merge(squares, cubes) {0: 0, 1: 1, 2: 8, 3: 27, 4: 64, 5: 125} >>> merge(cubes, squares) {0: 0, 1: 1, 2: 4, 3: 9, 4: 64, 5: 125} >>> merge_with(lambda x, y: x+y, squares, cubes) {0: 0, 1: 1, 2: 12, 3: 36, 4: 64, 5: 125} >>> merge_with(lambda x, y: x*y, squares, cubes) {0: 0, 1: 1, 2: 32, 3: 243, 4: 64, 5: 125} |
更新
在我写了以上内容之后,我发现还有另一种方法可以做到。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | from functools import reduce def merge(*dicts): return reduce(lambda d1, d2: reduce(lambda d, t: dict(list(d.items())+[t]), d2.items(), d1), dicts, {}) def merge_with(f, *dicts): return reduce(lambda d1, d2: reduce(lambda d, t: dict(list(d.items()) + [(t[0], f(d[t[0]], t[1]) if t[0] in d else t[1])]), d2.items(), d1), dicts, {}) mergel = lambda *dicts: merge_with(lambda x, y: x, *dicts) merger = lambda *dicts: merge_with(lambda x, y: y, *dicts) |
。
请注意,使用merge_with的mergel和merge的定义已经更改,新函数作为第一个参数。f函数现在必须是二进制的。上述测试仍然有效。下面还有一些测试来显示这些函数的一般性。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | >>> merge() == {} True >>> merge(squares) == squares True >>> merge(cubes) == cubes True >>> mergel() == {} True >>> mergel(squares) == squares True >>> mergel(cubes) == cubes True >>> merger() == {} True >>> merger(squares) == squares True >>> merger(cubes) == cubes True >>> merge_with(lambda x, y: x+y, squares, cubes, squares) {0: 0, 1: 2, 2: 16, 3: 45, 4: 64, 5: 125} >>> merge_with(lambda x, y: x*y, squares, cubes, squares) {0: 0, 1: 1, 2: 128, 3: 2187, 4: 64, 5: 125} |
号
从@martijnpieers那里窃取这个(
1 2 3 4 5 6 7 8 | >>> def f(x, y): return x * y >>> A = {1:1, 2:3} >>> B = {7:3, 2:2} >>> {k: f(A[k], B[k]) if k in A and k in B else A.get(k, B.get(k)) for k in A.viewkeys() | B.viewkeys()} {1: 1, 2: 6, 7: 3} |
。
一种不同的方法,对于来自函数式编程背景的用户来说,这种方法(imho)更具可读性。
1 2 3 4 5 6 | def merge_with(f): def merge(a,b): g = lambda l: [x for x in l if x is not None] keys = a.keys() | b.keys() return {key:f(*g([a.get(key), b.get(key)])) for key in keys} return merge |
号
将此应用于OP的示例:
1 2 3 | A = {1:1, 2:3} B = {7:3, 2:2} merge_with(lambda x,y=1: x*y)(A,B) |
号
1 | dict(list(A.items()) + list(B.items()) + [(k,f(A[k],B[k])) for k in A.keys() & B.keys()]) |
号
在我看来,这是Python3中最短、最可读的代码。我从dhruvpathak的答案中得到了它,并意识到优化它会导致kampu的答案专门针对python 3:
1 | dict(itertools.chain(A.items(), B.items(), ((k,f(A[k],B[k])) for k in A.keys() & B.keys()))) |
号
我比较了这里所有答案的性能,得到了这个排名:
mergeLZ: 34.0ms (雷照,单条相当笨重的班轮)mergeJK: 11.6ms (詹姆拉克)mergeMP: 11.5ms (Martijn Pieters,几乎是一条直线)mergeDP: 6.9ms 号(Dhruvpathak)mergeDS: 6.8ms (上面的第一个内衬)mergeK3: 5.2ms (kampu=上面第二行)mergeS3: 3.5ms (强制要求,不是一行)
后者合并的地方是一个幼稚的、命令式的多行代码。我对老办法在表演上占上风感到失望。这个测试是针对简单的整数键和值的,但是对于大字符串键和值的排序是非常相似的。显然,里程数可能会因字典大小和键重叠的数量而变化(在我的测试中是1/3)。顺便说一下,雷照的第二个实现,我还没试着去理解,似乎有糟糕的性能,慢了1000倍左右。
代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | import functools import itertools import operator import timeit def t(x): # transform keys and values return x # str(x) * 8 def f(x,y): # merge values return x + y N = 10000 A = {t(k*2): t(k*22) for k in range(N)} B = {t(k*3): t(k*33) for k in range(N)} def check(AB): assert(len(A) == N) assert(len(B) == N) assert(len(AB) == 16666) assert(AB[t(0)] == f(t(0), t(0))) assert(t(1) not in AB) assert(AB[t(2)] == t(1*22)) assert(AB[t(3)] == t(1*33)) assert(AB[t(4)] == t(2*22)) assert(t(5) not in AB) assert(AB[t(6)] == f(t(3*22), t(2*33))) assert(t(7) not in AB) assert(AB[t(8)] == t(4*22)) assert(AB[t(9)] == t(3*33)) def mergeLZ(): # Lei Zhao merged = {k: (lambda x: f(*x) if len(x)>1 else x[0])([ d[k] for d in [A, B] if k in d ]) for k in functools.reduce(operator.or_, map(lambda x: x.keys(), [A, B]), set()) } check(merged) def mergeJK(): # jamylak merged = {k: f(A[k], B[k]) if k in A and k in B else A.get(k, B.get(k)) for k in A.keys() | B.keys()} check(merged) def mergeMP(): # Martijn Pieters merged = {k: A.get(k, B.get(k)) for k in A.keys() ^ B.keys()} merged.update({k: f(A[k], B[k]) for k in A.keys() & B.keys()}) check(merged) def mergeDP(): # DhruvPathak merged = dict([(k,v) for k,v in A.items()] + [ (k,v) if k not in A else (k,f(A[k],B[k])) for k,v in B.items()]) check(merged) def mergeDS(): # more elegant (IMO) variation on DhruvPathak merged = dict(list(A.items()) + list(B.items()) + [(k,f(A[k],B[k])) for k in A.keys() & B.keys()]) check(merged) def mergeK3(): # kampu adapted to Python 3 merged = dict(itertools.chain(A.items(), B.items(), ((k,f(A[k],B[k])) for k in A.keys() & B.keys()))) check(merged) def mergeS3(): #"naive" imperative way merged = A.copy() for k,v in B.items(): if k in A: merged[k] = f(A[k], v) else: merged[k] = v check(merged) for m in [mergeLZ, mergeJK, mergeMP, mergeDP, mergeDS, mergeK3, mergeS3]: print("{}: {:4.1f}ms".format(m.__name__, timeit.timeit(m, number=1000))) |
号
1 2 3 4 | from itertools import chain intersection = set(A.keys()).intersection(B.keys()) C = dict(chain(A.items(), B.items(), ((k, f(A[k], B[k])) for k in intersection))) |
技术上可以做成一条直线。适用于PY2和PY3。如果您只关心py3,可以将"交叉"线重写为:
1 | intersection = A.keys() & B.keys() |
。
(仅适用于PY2,使用
1 2 3 4 5 | >>> def f(x,y): ... return x*y ... >>> dict([(k,v) for k,v in A.items()] + [ (k,v) if k not in A else (k,f(A[k],B[k])) for k,v in B.items()]) {1: 1, 2: 6, 7: 3} |
1 2 3 4 5 6 7 | def merge_dict(dict1,dict2): dict1={1:'red'} dict2={2:'black',3:'yellow'} dict1.update(dict2) print 'dict3 =',dict1 merge_dict(dict1,dict2) |
。
输出:
1 | dict3 = {1: 'red', 2: 'black', 3: 'yellow'} |