关于字典:Python合并字典与自定义合并功能

Python merge dictionaries with custom merge function

我想合并两个字典A和B,这样结果包含:

  • 其中的键对
  • B的所有对,其中键对B是唯一的
  • F(valuea,valueb),其中A和B中都存在相同的键

例如:

1
2
3
4
5
6
7
def f(x, y):
    return x * y

A = {1:1, 2:3}
B = {7:3, 2:2}

C = merge(A, B)

输出:

1
{1:1, 7:3, 2:6}

感觉应该有一个很好的一行这样做。


使用字典视图来实现这一点;dict.viewkeys()结果就像一个集合,允许您进行交叉和对称差异:

1
2
3
4
5
6
def merge(A, B, f):
    # Start with symmetric difference; keys either in A or B, but not both
    merged = {k: A.get(k, B.get(k)) for k in A.viewkeys() ^ B.viewkeys()}
    # Update with `f()` applied to the intersection
    merged.update({k: f(A[k], B[k]) for k in A.viewkeys() & B.viewkeys()})
    return merged

在python 3中,.viewkeys()方法被重命名为.keys()方法,取代了旧的.keys()功能(在python 2中,它重新生成一个列表)。

上述merge()方法是适用于任何给定f()的通用解决方案。

演示:

1
2
3
4
5
6
7
8
9
>>> def f(x, y):
...     return x * y
...
>>> A = {1:1, 2:3}
>>> B = {7:3, 2:2}
>>> merge(A, B, f)
{1: 1, 2: 6, 7: 3}
>>> merge(A, B, lambda a, b: '{} merged with {}'.format(a, b))
{1: 1, 2: '3 merged with 2', 7: 3}


下面是我在Python3中针对一般情况的解决方案代码。

我首先编写了merge函数,然后将其扩展到更一般的merge_with函数,它需要一个函数和各种数量的字典。如果这些字典中有任何重复的键,请将提供的函数应用于键重复的值。

合并函数可以使用merge_with函数重新定义,就像合并函数一样。名称合并是指将它们全部合并并保留最右边的值,如果有重复的值。mergel函数也是如此,它保持最左边。

这里的所有函数(merge、merge-with、mergel和merge)都是通用的,如果它们接受任意数量的字典参数。具体来说,merge_with必须将一个与要应用的数据兼容的函数作为参数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from functools import reduce
from operator import or_

def merge(*dicts):
    return { k: reduce(lambda d, x: x.get(k, d), dicts, None)
             for k in reduce(or_, map(lambda x: x.keys(), dicts), set()) }

def merge_with(f, *dicts):
    return { k: (lambda x: f(*x) if len(x)>1 else x[0])([ d[k] for d in dicts
                                                          if k in d ])
             for k in reduce(or_, map(lambda x: x.keys(), dicts), set()) }

mergel = lambda *dicts: merge_with(lambda *x: x[0], *dicts)

merger = lambda *dicts: merge_with(lambda *x: x[-1], *dicts)

测验

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
>>> squares = { k:k*k for k in range(4) }
>>> squares
{0: 0, 1: 1, 2: 4, 3: 9}
>>> cubes = { k:k**3 for k in range(2,6) }
>>> cubes
{2: 8, 3: 27, 4: 64, 5: 125}
>>> merger(squares, cubes)
{0: 0, 1: 1, 2: 8, 3: 27, 4: 64, 5: 125}
>>> merger(cubes, squares)
{0: 0, 1: 1, 2: 4, 3: 9, 4: 64, 5: 125}
>>> mergel(squares, cubes)
{0: 0, 1: 1, 2: 4, 3: 9, 4: 64, 5: 125}
>>> mergel(cubes, squares)
{0: 0, 1: 1, 2: 8, 3: 27, 4: 64, 5: 125}
>>> merge(squares, cubes)
{0: 0, 1: 1, 2: 8, 3: 27, 4: 64, 5: 125}
>>> merge(cubes, squares)
{0: 0, 1: 1, 2: 4, 3: 9, 4: 64, 5: 125}
>>> merge_with(lambda x, y: x+y, squares, cubes)
{0: 0, 1: 1, 2: 12, 3: 36, 4: 64, 5: 125}
>>> merge_with(lambda x, y: x*y, squares, cubes)
{0: 0, 1: 1, 2: 32, 3: 243, 4: 64, 5: 125}

更新

在我写了以上内容之后,我发现还有另一种方法可以做到。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from functools import reduce

def merge(*dicts):
    return reduce(lambda d1, d2: reduce(lambda d, t:
                                        dict(list(d.items())+[t]),
                                        d2.items(), d1),
                  dicts, {})

def merge_with(f, *dicts):
    return reduce(lambda d1, d2: reduce(lambda d, t:
                                        dict(list(d.items()) +
                                             [(t[0], f(d[t[0]], t[1])
                                               if t[0] in d else
                                               t[1])]),
                                        d2.items(), d1),
                  dicts, {})

mergel = lambda *dicts: merge_with(lambda x, y: x, *dicts)
merger = lambda *dicts: merge_with(lambda x, y: y, *dicts)

请注意,使用merge_with的mergel和merge的定义已经更改,新函数作为第一个参数。f函数现在必须是二进制的。上述测试仍然有效。下面还有一些测试来显示这些函数的一般性。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
>>> merge() == {}
True
>>> merge(squares) == squares
True
>>> merge(cubes) == cubes
True
>>> mergel() == {}
True
>>> mergel(squares) == squares
True
>>> mergel(cubes) == cubes
True
>>> merger() == {}
True
>>> merger(squares) == squares
True
>>> merger(cubes) == cubes
True
>>> merge_with(lambda x, y: x+y, squares, cubes, squares)
{0: 0, 1: 2, 2: 16, 3: 45, 4: 64, 5: 125}
>>> merge_with(lambda x, y: x*y, squares, cubes, squares)
{0: 0, 1: 1, 2: 128, 3: 2187, 4: 64, 5: 125}


从@martijnpieers那里窃取这个(A.get(k, B.get(k))片段

1
2
3
4
5
6
7
8
>>> def f(x, y):
        return x * y

>>> A = {1:1, 2:3}
>>> B = {7:3, 2:2}
>>> {k: f(A[k], B[k]) if k in A and k in B else A.get(k, B.get(k))
     for k in A.viewkeys() | B.viewkeys()}
{1: 1, 2: 6, 7: 3}


一种不同的方法,对于来自函数式编程背景的用户来说,这种方法(imho)更具可读性。

1
2
3
4
5
6
def merge_with(f):
    def merge(a,b):
        g = lambda l: [x for x in l if x is not None]  
        keys = a.keys() | b.keys()
        return {key:f(*g([a.get(key), b.get(key)])) for key in keys}
    return merge

将此应用于OP的示例:

1
2
3
A = {1:1, 2:3}
B = {7:3, 2:2}
merge_with(lambda x,y=1: x*y)(A,B)


1
dict(list(A.items()) + list(B.items()) + [(k,f(A[k],B[k])) for k in A.keys() & B.keys()])

在我看来,这是Python3中最短、最可读的代码。我从dhruvpathak的答案中得到了它,并意识到优化它会导致kampu的答案专门针对python 3:

1
dict(itertools.chain(A.items(), B.items(), ((k,f(A[k],B[k])) for k in A.keys() & B.keys())))

我比较了这里所有答案的性能,得到了这个排名:

  • mergeLZ: 34.0ms(雷照,单条相当笨重的班轮)
  • mergeJK: 11.6ms(詹姆拉克)
  • mergeMP: 11.5ms(Martijn Pieters,几乎是一条直线)
  • mergeDP: 6.9ms号(Dhruvpathak)
  • mergeDS: 6.8ms(上面的第一个内衬)
  • mergeK3: 5.2ms(kampu=上面第二行)
  • mergeS3: 3.5ms(强制要求,不是一行)

后者合并的地方是一个幼稚的、命令式的多行代码。我对老办法在表演上占上风感到失望。这个测试是针对简单的整数键和值的,但是对于大字符串键和值的排序是非常相似的。显然,里程数可能会因字典大小和键重叠的数量而变化(在我的测试中是1/3)。顺便说一下,雷照的第二个实现,我还没试着去理解,似乎有糟糕的性能,慢了1000倍左右。

代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import functools
import itertools
import operator
import timeit

def t(x): # transform keys and values
    return x # str(x) * 8

def f(x,y): # merge values
    return x + y

N = 10000
A = {t(k*2): t(k*22) for k in range(N)}
B = {t(k*3): t(k*33) for k in range(N)}

def check(AB):
    assert(len(A) == N)
    assert(len(B) == N)
    assert(len(AB) == 16666)
    assert(AB[t(0)] == f(t(0), t(0)))
    assert(t(1) not in AB)
    assert(AB[t(2)] == t(1*22))
    assert(AB[t(3)] == t(1*33))
    assert(AB[t(4)] == t(2*22))
    assert(t(5) not in AB)
    assert(AB[t(6)] == f(t(3*22), t(2*33)))
    assert(t(7) not in AB)
    assert(AB[t(8)] == t(4*22))
    assert(AB[t(9)] == t(3*33))

def mergeLZ(): # Lei Zhao
    merged = {k: (lambda x: f(*x) if len(x)>1 else x[0])([ d[k] for d in [A, B]
                                                          if k in d ])
             for k in functools.reduce(operator.or_, map(lambda x: x.keys(), [A, B]), set()) }
    check(merged)
def mergeJK(): # jamylak
    merged = {k: f(A[k], B[k]) if k in A and k in B else A.get(k, B.get(k)) for k in A.keys() | B.keys()}
    check(merged)
def mergeMP(): # Martijn Pieters
    merged = {k: A.get(k, B.get(k)) for k in A.keys() ^ B.keys()}
    merged.update({k: f(A[k], B[k]) for k in A.keys() & B.keys()})
    check(merged)
def mergeDP(): # DhruvPathak
    merged = dict([(k,v) for k,v in A.items()] + [ (k,v) if k not in A else (k,f(A[k],B[k])) for k,v in B.items()])
    check(merged)
def mergeDS(): # more elegant (IMO) variation on DhruvPathak
    merged = dict(list(A.items()) + list(B.items()) + [(k,f(A[k],B[k])) for k in A.keys() & B.keys()])
    check(merged)
def mergeK3(): # kampu adapted to Python 3
    merged = dict(itertools.chain(A.items(), B.items(), ((k,f(A[k],B[k])) for k in A.keys() & B.keys())))
    check(merged)
def mergeS3(): #"naive" imperative way
    merged = A.copy()
    for k,v in B.items():
        if k in A:
            merged[k] = f(A[k], v)
        else:
            merged[k] = v
    check(merged)

for m in [mergeLZ, mergeJK, mergeMP, mergeDP, mergeDS, mergeK3, mergeS3]:
    print("{}: {:4.1f}ms".format(m.__name__, timeit.timeit(m, number=1000)))


1
2
3
4
from itertools import chain

intersection = set(A.keys()).intersection(B.keys())
C = dict(chain(A.items(), B.items(), ((k, f(A[k], B[k])) for k in intersection)))

技术上可以做成一条直线。适用于PY2和PY3。如果您只关心py3,可以将"交叉"线重写为:

1
intersection = A.keys() & B.keys()

(仅适用于PY2,使用A.viewkeys() & B.viewkeys())


1
2
3
4
5
>>> def f(x,y):
...     return x*y
...
>>> dict([(k,v) for k,v in A.items()] + [ (k,v) if k not in A else (k,f(A[k],B[k])) for k,v in B.items()])
{1: 1, 2: 6, 7: 3}

1
2
3
4
5
6
7
def merge_dict(dict1,dict2):
    dict1={1:'red'}
    dict2={2:'black',3:'yellow'}
    dict1.update(dict2)
    print 'dict3 =',dict1

merge_dict(dict1,dict2)

输出:

1
dict3 = {1: 'red', 2: 'black', 3: 'yellow'}