关于字典：计算两个Python字典中包含的键的差异

Calculate difference in keys contained in two Python dictionaries

假设我有两个python字典——dictA和dictB。我需要找出在dictB中是否有任何键，但在dictA中没有。最快的方法是什么？

我应该先把字典的键转换成一组然后再继续吗？

有兴趣了解你的想法…

谢谢你的回复。

很抱歉我的问题没有正确回答。我的场景是这样的-我有一个dictA，它可以与dictB相同，或者与dictB相比可能缺少一些键，或者某些键的值可能不同，必须设置为dictA键的值。

问题是字典没有标准，可以有值，可以是dict的dict。

说

1 2	dictA={'key1':a, 'key2':b, 'key3':{'key11':cc, 'key12':dd}, 'key4':{'key111':{....}}} dictB={'key1':a, 'key2:':newb, 'key3':{'key11':cc, 'key12':newdd, 'key13':ee}.......

因此，必须将"key2"值重置为新值，并且必须在dict中添加"key13"。键值没有固定格式。它可以是简单的值，也可以是dict或dict的dict。

您可以对键使用设置操作：

1	diff = set(dictb.keys()) - set(dicta.keys())

这里有一个类来查找所有的可能性：添加的、删除的、哪些键值对相同，以及哪些键值对被更改。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

class DictDiffer(object):
"""
Calculate the difference between two dictionaries as:
(1) items added
(2) items removed
(3) keys same in both but changed values
(4) keys same in both and unchanged values
"""
def __init__(self, current_dict, past_dict):
self.current_dict, self.past_dict = current_dict, past_dict
self.set_current, self.set_past = set(current_dict.keys()), set(past_dict.keys())
self.intersect = self.set_current.intersection(self.set_past)
def added(self):
return self.set_current - self.intersect
def removed(self):
return self.set_past - self.intersect
def changed(self):
return set(o for o in self.intersect if self.past_dict[o] != self.current_dict[o])
def unchanged(self):
return set(o for o in self.intersect if self.past_dict[o] == self.current_dict[o])

以下是一些示例输出：

1
2
3
4
5
6
7
8
9
10
11

>>> a = {'a': 1, 'b': 1, 'c': 0}
>>> b = {'a': 1, 'b': 2, 'd': 0}
>>> d = DictDiffer(b, a)
>>> print"Added:", d.added()
Added: set(['d'])
>>> print"Removed:", d.removed()
Removed: set(['c'])
>>> print"Changed:", d.changed()
Changed: set(['b'])
>>> print"Unchanged:", d.unchanged()
Unchanged: set(['a'])

作为Github回购提供：https://github.com/hughdbrown/dictdiffer

相关讨论

如果您想要递归地使用差异，我已经为python编写了一个包：网址：https://github.com/seperman/deepdiff

安装

从PYPI安装：

1	pip install deepdiff

示例用法

进口

1
2
3

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

同一对象返回空值

1
2
3
4

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

项目类型已更改

1
2
3
4
5
6
7

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
'newvalue': '2',
'oldtype': <class 'int'>,
'oldvalue': 2}}}

项的值已更改

1
2
3
4

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

添加和/或删除的项目

1
2
3
4
5
6
7

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
'dic_item_removed': ['root[4]'],
'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

字符串差异

1
2
3
4
5
6
7

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello","b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
"root[4]['b']": { 'newvalue': 'world!',
'oldvalue': 'world'}}}

字符串差异2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":"world!
Goodbye!
1
2
End"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":"world
1
2
End"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': {"root[4]['b']": { 'diff': '---
'
'+++
'
'@@ -1,5 +1,4 @@
'
'-world!
'
'-Goodbye!
'
'+world
'
' 1
'
' 2
'
' End',
'newvalue': 'world
1
2
End',
'oldvalue': 'world!
'
'Goodbye!
'
'1
'
'2
'
'End'}}}

>>>
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
---
+++
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
1
2
End

类型变化

1
2
3
4
5
6
7
8
9
10
11
12
13
14

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":"world

End"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': {"root[4]['b']": { 'newtype': <class 'str'>,
'newvalue': 'world

End',
'oldtype': <class 'list'>,
'oldvalue': [1, 2, 3]}}}

列表差异

1
2
3
4
5

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3,"root[4]['b'][3]": 4}}

列出差异2：

1
2
3
4
5
6
7

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
'values_changed': {"root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
"root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

忽略顺序或重复项的列表差异：(使用与上面相同的词典)

1
2
3
4
5

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

包含字典的列表：

1
2
3
4
5
6

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

集合：

1
2
3
4
5

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

命名Tuples：

1
2
3
4
5
6

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

自定义对象：

1
2
3
4
5
6
7
8
9
10

>>> class ClassA(object):
... a = 1
... def __init__(self, b):
... self.b = b
...
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>>
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

添加的对象属性：

1
2
3
4

>>> t2.c ="new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

相关讨论

不确定它是否"快速"，但通常情况下，人们可以做到这一点。

1
2
3
4
5

dicta = {"a":1,"b":2,"c":3,"d":4}
dictb = {"a":1,"d":2}
for key in dicta.keys():
if not key in dictb:
print key

相关讨论

正如亚历克斯·马泰利所写，如果你只是想检查B中是否有任何键不在A中，那么any(True for k in dictB if k not in dictA)将是解决问题的方法。

要查找丢失的密钥：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

diff = set(dictB)-set(dictA) #sets

C:\Dokumente und Einstellungen\thc>python -m timeit -s"dictA =
dict(zip(range(1000),range
(1000))); dictB = dict(zip(range(0,2000,2),range(1000)))""diff=set(dictB)-set(dictA)"
10000 loops, best of 3: 107 usec per loop

diff = [ k for k in dictB if k not in dictA ] #lc

C:\Dokumente und Einstellungen\thc>python -m timeit -s"dictA =
dict(zip(range(1000),range
(1000))); dictB = dict(zip(range(0,2000,2),range(1000)))""diff=[ k for k in dictB if
k not in dictA ]"
10000 loops, best of 3: 95.9 usec per loop

所以这两个解决方案的速度差不多。

相关讨论

如果你真的是在说什么(你只需要找出"有没有钥匙"在B而不是在A，而不是那些可能是，如果有的话)，最快的方法应该是：

1	if any(True for k in dictB if k not in dictA): ...

如果你真的需要找出哪些键(如果有的话)在b而不是a中，而不仅仅是"如果"有这样的键，那么现有的答案是相当合适的(但是如果这确实是你的意思，我建议在以后的问题中更精确地回答；-)。

使用set()：

1	set(dictA.keys()).intersection(dictB.keys())

相关讨论

StackOverflow中还有一个关于这个参数的问题，我不得不承认有一个简单的解决方案可以解释：Python的datadiff库有助于打印两个字典之间的差异。

Hughdbrown的顶级答案建议使用集差，这绝对是最佳方法：

1	diff = set(dictb.keys()) - set(dicta.keys())

这段代码的问题在于，它构建两个列表只是为了创建两个集合，所以它浪费了4n时间和2n空间。它也比需要的要复杂一些。

通常，这不是什么大问题，但如果是：

1	diff = dictb.keys() - dicta

您不需要将正确的dict转换为set；set difference需要任何iterable(而dict是其键的iterable)。
您也不需要将左听写转换为集合，因为符合collections.abc.Mapping的任何内容都有一个KeysView，其作用类似于Set。

Python 2

在python 2中，keys()返回键列表，而不是KeysView。所以你必须直接向viewkeys()提出要求。

1	diff = dictb.viewkeys() - dicta

对于双版本2.7/3.x代码，希望您使用的是six或类似的代码，因此您可以使用six.viewkeys(dictb)：

1	diff = six.viewkeys(dictb) - dicta

在2.4-2.6中，没有KeysView。但是，您至少可以通过直接从迭代器构建左侧集，而不是首先构建列表，将成本从4n降低到n：

1	diff = set(dictb) - dicta

项目

I have a dictA which can be the same as dictB or may have some keys missing as compared to dictB or else the value of some keys might be different

所以你真的不需要比较钥匙，只需要比较物品。如果值是可散列的(如字符串)，那么ItemsView只是Set。如果是，很容易：

1	diff = dictb.items() - dicta.items()

递归微分

尽管问题不是直接要求递归diff，但一些示例值是dict，并且似乎预期的输出会递归地对它们进行diff。这里已经有多个答案显示了如何做到这一点。

相关讨论

这是一种可行的方法，允许计算为False的键，如果可能的话，仍然使用生成器表达式提前退出。不过，它并不特别漂亮。

1	any(map(lambda x: True, (k for k in b if k not in a)))

编辑：

THC4K对我的评论发表了另一个回复。下面是一个更好、更漂亮的方法：

1	any(True for k in b if k not in a)

不知道我怎么也没想到…

相关讨论

这是一个老问题，问的比我需要的少一点，所以这个答案实际上解决的比这个问题要多。这个问题的答案帮助我解决了以下问题：

(提问)记录两部词典之间的差异

将1的差异合并到基本字典中

(提问)合并两个字典之间的差异(将字典2视为差异字典)

试着检测物品的移动和变化

(被问)递归地做所有这些

所有这些与JSON相结合，提供了非常强大的配置存储支持。

解决方案(也在Github上)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89

from collections import OrderedDict
from pprint import pprint

class izipDestinationMatching(object):
__slots__ = ("attr","value","index")

def __init__(self, attr, value, index):
self.attr, self.value, self.index = attr, value, index

def __repr__(self):
return"izip_destination_matching: found match by '%s' = '%s' @ %d" % (self.attr, self.value, self.index)

def izip_destination(a, b, attrs, addMarker=True):
"""
Returns zipped lists, but final size is equal to b with (if shorter) a padded with nulls
Additionally also tries to find item reallocations by searching child dicts (if they are dicts) for attribute, listed in attrs)
When addMarker == False (patching), final size will be the longer of a, b
"""
for idx, item in enumerate(b):
try:
attr = next((x for x in attrs if x in item), None) # See if the item has any of the ID attributes
match, matchIdx = next(((orgItm, idx) for idx, orgItm in enumerate(a) if attr in orgItm and orgItm[attr] == item[attr]), (None, None)) if attr else (None, None)
if match and matchIdx != idx and addMarker: item[izipDestinationMatching] = izipDestinationMatching(attr, item[attr], matchIdx)
except:
match = None
yield (match if match else a[idx] if len(a) > idx else None), item
if not addMarker and len(a) > len(b):
for item in a[len(b) - len(a):]:
yield item, item

def dictdiff(a, b, searchAttrs=[]):
"""
returns a dictionary which represents difference from a to b
the return dict is as short as possible:
equal items are removed
added / changed items are listed
removed items are listed with value=None
Also processes list values where the resulting list size will match that of b.
It can also search said list items (that are dicts) for identity values to detect changed positions.
In case such identity value is found, it is kept so that it can be re-found during the merge phase
@param a: original dict
@param b: new dict
@param searchAttrs: list of strings (keys to search for in sub-dicts)
@return: dict / list / whatever input is
"""
if not (isinstance(a, dict) and isinstance(b, dict)):
if isinstance(a, list) and isinstance(b, list):
return [dictdiff(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs)]
return b
res = OrderedDict()
if izipDestinationMatching in b:
keepKey = b[izipDestinationMatching].attr
del b[izipDestinationMatching]
else:
keepKey = izipDestinationMatching
for key in sorted(set(a.keys() + b.keys())):
v1 = a.get(key, None)
v2 = b.get(key, None)
if keepKey == key or v1 != v2: res[key] = dictdiff(v1, v2, searchAttrs)
if len(res) <= 1: res = dict(res) # This is only here for pretty print (OrderedDict doesn't pprint nicely)
return res

def dictmerge(a, b, searchAttrs=[]):
"""
Returns a dictionary which merges differences recorded in b to base dictionary a
Also processes list values where the resulting list size will match that of a
It can also search said list items (that are dicts) for identity values to detect changed positions
@param a: original dict
@param b: diff dict to patch into a
@param searchAttrs: list of strings (keys to search for in sub-dicts)
@return: dict / list / whatever input is
"""
if not (isinstance(a, dict) and isinstance(b, dict)):
if isinstance(a, list) and isinstance(b, list):
return [dictmerge(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs, False)]
return b
res = OrderedDict()
for key in sorted(set(a.keys() + b.keys())):
v1 = a.get(key, None)
v2 = b.get(key, None)
#print"processing", key, v1, v2, key not in b, dictmerge(v1, v2)
if v2 is not None: res[key] = dictmerge(v1, v2, searchAttrs)
elif key not in b: res[key] = v1
if len(res) <= 1: res = dict(res) # This is only here for pretty print (OrderedDict doesn't pprint nicely)
return res

Standart怎么样(比较完整的对象)

pydev->new pydev module->module:unittest

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

import unittest

class Test(unittest.TestCase):

def testName(self):
obj1 = {1:1, 2:2}
obj2 = {1:1, 2:2}
self.maxDiff = None # sometimes is usefull
self.assertDictEqual(d1, d2)

if __name__ =="__main__":
#import sys;sys.argv = ['', 'Test.testName']

unittest.main()

相关讨论

如果在python上≥2.7：

1
2
3
4
5
6
7
8
9
10
11

# update different values in dictB
# I would assume only dictA should be updated,
# but the question specifies otherwise

for k in dictA.viewkeys() & dictB.viewkeys():
if dictA[k] != dictB[k]:
dictB[k]= dictA[k]

# add missing keys to dictA

dictA.update( (k,dictB[k]) for k in dictB.viewkeys() - dictA.viewkeys() )

下面是一个可以比较两个以上口述的解决方案：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

def diff_dict(dicts, default=None):
diff_dict = {}
# add 'list()' around 'd.keys()' for python 3 compatibility
for k in set(sum([d.keys() for d in dicts], [])):
# we can just use"values = [d.get(k, default) ..." below if
# we don't care that d1[k]=default and d2[k]=missing will
# be treated as equal
if any(k not in d for d in dicts):
diff_dict[k] = [d.get(k, default) for d in dicts]
else:
values = [d[k] for d in dicts]
if any(v != values[0] for v in values):
diff_dict[k] = values
return diff_dict

使用实例：

1 2	import matplotlib.pyplot as plt diff_dict([plt.rcParams, plt.rcParamsDefault, plt.matplotlib.rcParamsOrig])

我的两部词典的对称差异配方：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

def find_dict_diffs(dict1, dict2):
unequal_keys = []
unequal_keys.extend(set(dict1.keys()).symmetric_difference(set(dict2.keys())))
for k in dict1.keys():
if dict1.get(k, 'N\A') != dict2.get(k, 'N\A'):
unequal_keys.append(k)
if unequal_keys:
print 'param', 'dict1\t', 'dict2'
for k in set(unequal_keys):
print str(k)+'\t'+dict1.get(k, 'N\A')+'\t '+dict2.get(k, 'N\A')
else:
print 'Dicts are equal'

dict1 = {1:'a', 2:'b', 3:'c', 4:'d', 5:'e'}
dict2 = {1:'b', 2:'a', 3:'c', 4:'d', 6:'f'}

find_dict_diffs(dict1, dict2)

结果是：

1
2
3
4
5

param dict1 dict2
1 a b
2 b a
5 e N\A
6 N\A f

正如在其他答案中提到的，UnitTest为比较听写产生了一些很好的输出，但在本例中，我们不希望首先构建一个完整的测试。

清除UnitTest源，您似乎可以通过以下方式获得一个公平的解决方案：

1
2
3
4
5
6
7
8
9
10
11

import difflib
import pprint

def diff_dicts(a, b):
if a == b:
return ''
return '
'.join(
difflib.ndiff(pprint.pformat(a, width=30).splitlines(),
pprint.pformat(b, width=30).splitlines())
)

所以

1
2
3

dictA = dict(zip(range(7), map(ord, 'python')))
dictB = {0: 112, 1: 'spam', 2: [1,2,3], 3: 104, 4: 111}
print diff_dicts(dictA, dictB)

结果：

1
2
3
4
5
6
7
8
9
10
11
12
13

{0: 112,
- 1: 121,
- 2: 116,
+ 1: 'spam',
+ 2: [1, 2, 3],
3: 104,
- 4: 111,
? ^

+ 4: 111}
? ^

- 5: 110}

在哪里？

"-"表示第一个而不是第二个dict中的键/值
"+"表示第二个而不是第一个dict中的键/值

和UnitTest一样，唯一需要注意的是，由于后面的逗号/括号，最终的映射可以被认为是diff。

@maxx有一个很好的答案，使用python提供的unittest工具：

1
2
3
4
5
6
7
8
9
10

import unittest

class Test(unittest.TestCase):
def runTest(self):
pass

def testDict(self, d1, d2, maxDiff=None):
self.maxDiff = maxDiff
self.assertDictEqual(d1, d2)

然后，您可以在代码中的任何位置调用：

1
2
3
4

try:
Test().testDict(dict1, dict2)
except Exception, e:
print e

生成的输出看起来像来自diff的输出，相当于用+或-在每行前面加上不同的内容打印字典。

下面是一个深入比较两个字典键的解决方案：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

def compareDictKeys(dict1, dict2):
if type(dict1) != dict or type(dict2) != dict:
return False

keys1, keys2 = dict1.keys(), dict2.keys()
diff = set(keys1) - set(keys2) or set(keys2) - set(keys1)

if not diff:
for key in keys1:
if (type(dict1[key]) == dict or type(dict2[key]) == dict) and not compareDictKeys(dict1[key], dict2[key]):
diff = True
break

return not diff

如果您希望使用内置解决方案与任意dict结构进行完全比较，@maxx的答案是一个很好的开始。

1
2
3
4

import unittest

test = unittest.TestCase()
test.assertEqual(dictA, dictB)

相关讨论

根据Ghostdog74的答案，

1
2
3
4
5
6

dicta = {"a":1,"d":2}
dictb = {"a":5,"d":2}

for value in dicta.values():
if not value in dictb.values():
print value

将打印不同的dicta值

不确定它是否仍然相关，但我遇到了这个问题，我的情况是我只需要返回一个所有嵌套字典等的更改字典，找不到一个好的解决方案，但最终我编写了一个简单的函数来完成这项工作。希望这有帮助，

相关讨论

尝试此操作查找两个字典中的键de intersection，如果希望在第二个字典中找不到键，只需使用not in…

1	intersect = filter(lambda x, dictB=dictB.keys(): x in dictB, dictA.keys())