关于python：递归合并dicts，以便将具有共享密钥的元素组合到一个列表中

Recursively merge dicts so that elements with shared key are combined into a list

我有两个要合并的听写：

1
2
3
4
5
6
7

a = {"name":"john",
"phone":"123123123",
"owns": {"cars":"Car 1","motorbikes":"Motorbike 1"}}

b = {"name":"john",
"phone":"123",
"owns": {"cars":"Car 2"}}

如果a和b在同一嵌套级别上有一个公共密钥，则结果应该是一个列表，其中包含两个值，这两个值被指定为共享密钥的值。

结果应该如下所示：

1
2
3

{"name":"john",
"phone":["123123123","123"],
"owns": {"cars": ["Car 1","Car 2"],"motorbikes":"Motorbike 1"}}

号

使用a.update(b)不起作用，因为它用b的共享值覆盖了a的共享值，结果如下：

1	{'name': 'john', 'phone': '123', 'owns': {'cars': 'Car 2'}}

其目标是在不重写的情况下合并dict，并保留与特定键(在任意一个dict中)相关的所有信息。

相关讨论

通过递归，您可以构建一个完成这一任务的字典理解。

此解决方案还考虑到您以后可能希望合并两个以上的字典，从而在这种情况下扁平化值列表。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

def update_merge(d1, d2):
if isinstance(d1, dict) and isinstance(d2, dict):
# Unwrap d1 and d2 in new dictionary to keep non-shared keys with **d1, **d2
# Next unwrap a dict that treats shared keys
# If two keys have an equal value, we take that value as new value
# If the values are not equal, we recursively merge them
return {
**d1, **d2,
**{k: d1[k] if d1[k] == d2[k] else update_merge(d1[k], d2[k])
for k in {*d1} & {*d2}}
}
else:
# This case happens when values are merged
# It bundle values in a list, making sure
# to flatten them if they are already lists
return [
*(d1 if isinstance(d1, list) else [d1]),
*(d2 if isinstance(d2, list) else [d2])
]

例子：

1
2
3
4
5
6
7
8

a = {"name":"john","phone":"123123123",
"owns": {"cars":"Car 1","motorbikes":"Motorbike 1"}}
b = {"name":"john","phone":"123","owns": {"cars":"Car 2"}}

update_merge(a, b)
# {'name': 'john',
# 'phone': ['123123123', '123'],
# 'owns': {'cars': ['Car 1', 'Car 2'], 'motorbikes': 'Motorbike 1'}}

。

合并了两个以上对象的示例：

1
2
3
4
5
6
7
8

a = {"name":"john"}
b = {"name":"jack"}
c = {"name":"joe"}

d = update_merge(a, b)
d = update_merge(d, c)

d # {'name': ['john', 'jack', 'joe']}

相关讨论

使用集合和事物，还可以合并任意数量的字典：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

from functools import reduce
import operator

# Usage: merge(a, b, ...)
def merge(*args):
# Make a copy of the input dicts, can be removed if you don't care about modifying
# the original dicts.
args = list(map(dict.copy, args))

# Dict to store the result.
out = {}

for k in reduce(operator.and_, map(dict.keys, args)): # Python 3 only, see footnotes.
# Use `.pop()` so that after the all elements of shared keys have been combined,
# `args` becomes a list of disjoint dicts that we can merge easily.
vs = [d.pop(k) for d in args]

if isinstance(vs[0], dict):
# Recursively merge nested dicts
common = merge(*vs)
else:
# Use a set to collect unique values
common = set(vs)
# If only one unique value, store that as is, otherwise use a list
common = next(iter(common)) if len(common) == 1 else list(common)

out[k] = common

# Merge into `out` the rest of the now disjoint dicts
for arg in args:
out.update(arg)

return out

。

假设每个要合并的字典都有相同的"结构"，那么"owns"不能是a中的列表，也不能是b中的dict。dict的每个元素也需要是可散列的，因为这个方法使用集合来聚合唯一的值。

以下仅适用于python 3，因为在python 2中，dict.keys()返回一个简单的旧列表。

1	reduce(operator.and_, map(dict.keys, args))

。

另一种方法是增加一个额外的map()，将列表转换为集合：

1	reduce(operator.and_, map(set, map(dict.keys, args)))

您可以使用itertools.groupby和递归：

1
2
3
4
5
6
7
8
9
10
11

import itertools, sys
a = {"name":"john","phone":"123123123","owns": {"cars":"Car 1","motorbikes":"Motorbike 1"}}
b = {"name":"john","phone":"123","owns": {"cars":"Car 2"}}
def condense(r):
return r[0] if len(set(r)) == 1 else r

def update_dict(c, d):
_v = {j:[c for _, c in h] for j, h in itertools.groupby(sorted(list(c.items())+list(d.items()), key=lambda x:x[0]), key=lambda x:x[0])}
return {j:update_dict(*e) if all(isinstance(i, dict) for i in e) else condense(e) for j, e in _v.items()}

print(update_dict(a, b))

输出：

1	{'name': 'john', 'owns': {'cars': ['Car 1', 'Car 2'], 'motorbikes': 'Motorbike 1'}, 'phone': ['123123123', '123']}

号