Calculate difference in keys contained in two Python dictionaries
假设我有两个python字典——
我应该先把字典的键转换成一组然后再继续吗?
有兴趣了解你的想法…
谢谢你的回复。
很抱歉我的问题没有正确回答。我的场景是这样的-我有一个
问题是字典没有标准,可以有值,可以是dict的dict。
说
1 2 | dictA={'key1':a, 'key2':b, 'key3':{'key11':cc, 'key12':dd}, 'key4':{'key111':{....}}} dictB={'key1':a, 'key2:':newb, 'key3':{'key11':cc, 'key12':newdd, 'key13':ee}....... |
因此,必须将"key2"值重置为新值,并且必须在dict中添加"key13"。键值没有固定格式。它可以是简单的值,也可以是dict或dict的dict。
您可以对键使用设置操作:
1 | diff = set(dictb.keys()) - set(dicta.keys()) |
这里有一个类来查找所有的可能性:添加的、删除的、哪些键值对相同,以及哪些键值对被更改。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | class DictDiffer(object): """ Calculate the difference between two dictionaries as: (1) items added (2) items removed (3) keys same in both but changed values (4) keys same in both and unchanged values """ def __init__(self, current_dict, past_dict): self.current_dict, self.past_dict = current_dict, past_dict self.set_current, self.set_past = set(current_dict.keys()), set(past_dict.keys()) self.intersect = self.set_current.intersection(self.set_past) def added(self): return self.set_current - self.intersect def removed(self): return self.set_past - self.intersect def changed(self): return set(o for o in self.intersect if self.past_dict[o] != self.current_dict[o]) def unchanged(self): return set(o for o in self.intersect if self.past_dict[o] == self.current_dict[o]) |
以下是一些示例输出:
1 2 3 4 5 6 7 8 9 10 11 | >>> a = {'a': 1, 'b': 1, 'c': 0} >>> b = {'a': 1, 'b': 2, 'd': 0} >>> d = DictDiffer(b, a) >>> print"Added:", d.added() Added: set(['d']) >>> print"Removed:", d.removed() Removed: set(['c']) >>> print"Changed:", d.changed() Changed: set(['b']) >>> print"Unchanged:", d.unchanged() Unchanged: set(['a']) |
作为Github回购提供:https://github.com/hughdbrown/dictdiffer
如果您想要递归地使用差异,我已经为python编写了一个包:网址:https://github.com/seperman/deepdiff
安装从PYPI安装:
1 | pip install deepdiff |
示例用法
进口
1 2 3 | >>> from deepdiff import DeepDiff >>> from pprint import pprint >>> from __future__ import print_function # In case running on Python 2 |
同一对象返回空值
1 2 3 4 | >>> t1 = {1:1, 2:2, 3:3} >>> t2 = t1 >>> print(DeepDiff(t1, t2)) {} |
项目类型已更改
1 2 3 4 5 6 7 | >>> t1 = {1:1, 2:2, 3:3} >>> t2 = {1:1, 2:"2", 3:3} >>> pprint(DeepDiff(t1, t2), indent=2) { 'type_changes': { 'root[2]': { 'newtype': <class 'str'>, 'newvalue': '2', 'oldtype': <class 'int'>, 'oldvalue': 2}}} |
项的值已更改
1 2 3 4 | >>> t1 = {1:1, 2:2, 3:3} >>> t2 = {1:1, 2:4, 3:3} >>> pprint(DeepDiff(t1, t2), indent=2) {'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}} |
添加和/或删除的项目
1 2 3 4 5 6 7 | >>> t1 = {1:1, 2:2, 3:3, 4:4} >>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff) {'dic_item_added': ['root[5]', 'root[6]'], 'dic_item_removed': ['root[4]'], 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}} |
字符串差异
1 2 3 4 5 6 7 | >>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":"world"}} >>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello","b":"world!"}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2}, "root[4]['b']": { 'newvalue': 'world!', 'oldvalue': 'world'}}} |
字符串差异2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | >>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":"world! Goodbye! 1 2 End"}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":"world 1 2 End"}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'values_changed': {"root[4]['b']": { 'diff': '--- ' '+++ ' '@@ -1,5 +1,4 @@ ' '-world! ' '-Goodbye! ' '+world ' ' 1 ' ' 2 ' ' End', 'newvalue': 'world 1 2 End', 'oldvalue': 'world! ' 'Goodbye! ' '1 ' '2 ' 'End'}}} >>> >>> print (ddiff['values_changed']["root[4]['b']"]["diff"]) --- +++ @@ -1,5 +1,4 @@ -world! -Goodbye! +world 1 2 End |
类型变化
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | >>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2, 3]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":"world End"}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'type_changes': {"root[4]['b']": { 'newtype': <class 'str'>, 'newvalue': 'world End', 'oldtype': <class 'list'>, 'oldvalue': [1, 2, 3]}}} |
列表差异
1 2 3 4 5 | >>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2, 3, 4]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2]}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) {'iterable_item_removed': {"root[4]['b'][2]": 3,"root[4]['b'][3]": 4}} |
列出差异2:
1 2 3 4 5 6 7 | >>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2, 3]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 3, 2, 3]}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'iterable_item_added': {"root[4]['b'][3]": 3}, 'values_changed': {"root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2}, "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}} |
忽略顺序或重复项的列表差异:(使用与上面相同的词典)
1 2 3 4 5 | >>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2, 3]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 3, 2, 3]}} >>> ddiff = DeepDiff(t1, t2, ignore_order=True) >>> print (ddiff) {} |
包含字典的列表:
1 2 3 4 5 6 | >>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2, {1:1, 2:2}]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello","b":[1, 2, {1:3}]}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'dic_item_removed': ["root[4]['b'][2][2]"], 'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}} |
集合:
1 2 3 4 5 | >>> t1 = {1, 2, 8} >>> t2 = {1, 2, 3, 5} >>> ddiff = DeepDiff(t1, t2) >>> pprint (DeepDiff(t1, t2)) {'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']} |
命名Tuples:
1 2 3 4 5 6 | >>> from collections import namedtuple >>> Point = namedtuple('Point', ['x', 'y']) >>> t1 = Point(x=11, y=22) >>> t2 = Point(x=11, y=23) >>> pprint (DeepDiff(t1, t2)) {'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}} |
自定义对象:
1 2 3 4 5 6 7 8 9 10 | >>> class ClassA(object): ... a = 1 ... def __init__(self, b): ... self.b = b ... >>> t1 = ClassA(1) >>> t2 = ClassA(2) >>> >>> pprint(DeepDiff(t1, t2)) {'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}} |
添加的对象属性:
1 2 3 4 | >>> t2.c ="new attribute" >>> pprint(DeepDiff(t1, t2)) {'attribute_added': ['root.c'], 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}} |
不确定它是否"快速",但通常情况下,人们可以做到这一点。
1 2 3 4 5 | dicta = {"a":1,"b":2,"c":3,"d":4} dictb = {"a":1,"d":2} for key in dicta.keys(): if not key in dictb: print key |
正如亚历克斯·马泰利所写,如果你只是想检查B中是否有任何键不在A中,那么
要查找丢失的密钥:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | diff = set(dictB)-set(dictA) #sets C:\Dokumente und Einstellungen\thc>python -m timeit -s"dictA = dict(zip(range(1000),range (1000))); dictB = dict(zip(range(0,2000,2),range(1000)))""diff=set(dictB)-set(dictA)" 10000 loops, best of 3: 107 usec per loop diff = [ k for k in dictB if k not in dictA ] #lc C:\Dokumente und Einstellungen\thc>python -m timeit -s"dictA = dict(zip(range(1000),range (1000))); dictB = dict(zip(range(0,2000,2),range(1000)))""diff=[ k for k in dictB if k not in dictA ]" 10000 loops, best of 3: 95.9 usec per loop |
所以这两个解决方案的速度差不多。
如果你真的是在说什么(你只需要找出"有没有钥匙"在B而不是在A,而不是那些可能是,如果有的话),最快的方法应该是:
1 | if any(True for k in dictB if k not in dictA): ... |
如果你真的需要找出哪些键(如果有的话)在b而不是a中,而不仅仅是"如果"有这样的键,那么现有的答案是相当合适的(但是如果这确实是你的意思,我建议在以后的问题中更精确地回答;-)。
使用
1 | set(dictA.keys()).intersection(dictB.keys()) |
StackOverflow中还有一个关于这个参数的问题,我不得不承认有一个简单的解决方案可以解释:Python的datadiff库有助于打印两个字典之间的差异。
Hughdbrown的顶级答案建议使用集差,这绝对是最佳方法:
1 | diff = set(dictb.keys()) - set(dicta.keys()) |
这段代码的问题在于,它构建两个列表只是为了创建两个集合,所以它浪费了4n时间和2n空间。它也比需要的要复杂一些。
通常,这不是什么大问题,但如果是:
1 | diff = dictb.keys() - dicta |
- 您不需要将正确的dict转换为set;set difference需要任何iterable(而dict是其键的iterable)。
- 您也不需要将左听写转换为集合,因为符合
collections.abc.Mapping 的任何内容都有一个KeysView ,其作用类似于Set 。
Python 2
在python 2中,
1 | diff = dictb.viewkeys() - dicta |
对于双版本2.7/3.x代码,希望您使用的是
1 | diff = six.viewkeys(dictb) - dicta |
在2.4-2.6中,没有
1 | diff = set(dictb) - dicta |
项目
I have a dictA which can be the same as dictB or may have some keys missing as compared to dictB or else the value of some keys might be different
所以你真的不需要比较钥匙,只需要比较物品。如果值是可散列的(如字符串),那么
1 | diff = dictb.items() - dicta.items() |
递归微分
尽管问题不是直接要求递归diff,但一些示例值是dict,并且似乎预期的输出会递归地对它们进行diff。这里已经有多个答案显示了如何做到这一点。
这是一种可行的方法,允许计算为
1 | any(map(lambda x: True, (k for k in b if k not in a))) |
编辑:
THC4K对我的评论发表了另一个回复。下面是一个更好、更漂亮的方法:
1 | any(True for k in b if k not in a) |
不知道我怎么也没想到…
这是一个老问题,问的比我需要的少一点,所以这个答案实际上解决的比这个问题要多。这个问题的答案帮助我解决了以下问题:
所有这些与JSON相结合,提供了非常强大的配置存储支持。
解决方案(也在Github上):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | from collections import OrderedDict from pprint import pprint class izipDestinationMatching(object): __slots__ = ("attr","value","index") def __init__(self, attr, value, index): self.attr, self.value, self.index = attr, value, index def __repr__(self): return"izip_destination_matching: found match by '%s' = '%s' @ %d" % (self.attr, self.value, self.index) def izip_destination(a, b, attrs, addMarker=True): """ Returns zipped lists, but final size is equal to b with (if shorter) a padded with nulls Additionally also tries to find item reallocations by searching child dicts (if they are dicts) for attribute, listed in attrs) When addMarker == False (patching), final size will be the longer of a, b """ for idx, item in enumerate(b): try: attr = next((x for x in attrs if x in item), None) # See if the item has any of the ID attributes match, matchIdx = next(((orgItm, idx) for idx, orgItm in enumerate(a) if attr in orgItm and orgItm[attr] == item[attr]), (None, None)) if attr else (None, None) if match and matchIdx != idx and addMarker: item[izipDestinationMatching] = izipDestinationMatching(attr, item[attr], matchIdx) except: match = None yield (match if match else a[idx] if len(a) > idx else None), item if not addMarker and len(a) > len(b): for item in a[len(b) - len(a):]: yield item, item def dictdiff(a, b, searchAttrs=[]): """ returns a dictionary which represents difference from a to b the return dict is as short as possible: equal items are removed added / changed items are listed removed items are listed with value=None Also processes list values where the resulting list size will match that of b. It can also search said list items (that are dicts) for identity values to detect changed positions. In case such identity value is found, it is kept so that it can be re-found during the merge phase @param a: original dict @param b: new dict @param searchAttrs: list of strings (keys to search for in sub-dicts) @return: dict / list / whatever input is """ if not (isinstance(a, dict) and isinstance(b, dict)): if isinstance(a, list) and isinstance(b, list): return [dictdiff(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs)] return b res = OrderedDict() if izipDestinationMatching in b: keepKey = b[izipDestinationMatching].attr del b[izipDestinationMatching] else: keepKey = izipDestinationMatching for key in sorted(set(a.keys() + b.keys())): v1 = a.get(key, None) v2 = b.get(key, None) if keepKey == key or v1 != v2: res[key] = dictdiff(v1, v2, searchAttrs) if len(res) <= 1: res = dict(res) # This is only here for pretty print (OrderedDict doesn't pprint nicely) return res def dictmerge(a, b, searchAttrs=[]): """ Returns a dictionary which merges differences recorded in b to base dictionary a Also processes list values where the resulting list size will match that of a It can also search said list items (that are dicts) for identity values to detect changed positions @param a: original dict @param b: diff dict to patch into a @param searchAttrs: list of strings (keys to search for in sub-dicts) @return: dict / list / whatever input is """ if not (isinstance(a, dict) and isinstance(b, dict)): if isinstance(a, list) and isinstance(b, list): return [dictmerge(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs, False)] return b res = OrderedDict() for key in sorted(set(a.keys() + b.keys())): v1 = a.get(key, None) v2 = b.get(key, None) #print"processing", key, v1, v2, key not in b, dictmerge(v1, v2) if v2 is not None: res[key] = dictmerge(v1, v2, searchAttrs) elif key not in b: res[key] = v1 if len(res) <= 1: res = dict(res) # This is only here for pretty print (OrderedDict doesn't pprint nicely) return res |
Standart怎么样(比较完整的对象)
pydev->new pydev module->module:unittest
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | import unittest class Test(unittest.TestCase): def testName(self): obj1 = {1:1, 2:2} obj2 = {1:1, 2:2} self.maxDiff = None # sometimes is usefull self.assertDictEqual(d1, d2) if __name__ =="__main__": #import sys;sys.argv = ['', 'Test.testName'] unittest.main() |
如果在python上≥2.7:
1 2 3 4 5 6 7 8 9 10 11 | # update different values in dictB # I would assume only dictA should be updated, # but the question specifies otherwise for k in dictA.viewkeys() & dictB.viewkeys(): if dictA[k] != dictB[k]: dictB[k]= dictA[k] # add missing keys to dictA dictA.update( (k,dictB[k]) for k in dictB.viewkeys() - dictA.viewkeys() ) |
下面是一个可以比较两个以上口述的解决方案:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | def diff_dict(dicts, default=None): diff_dict = {} # add 'list()' around 'd.keys()' for python 3 compatibility for k in set(sum([d.keys() for d in dicts], [])): # we can just use"values = [d.get(k, default) ..." below if # we don't care that d1[k]=default and d2[k]=missing will # be treated as equal if any(k not in d for d in dicts): diff_dict[k] = [d.get(k, default) for d in dicts] else: values = [d[k] for d in dicts] if any(v != values[0] for v in values): diff_dict[k] = values return diff_dict |
使用实例:
1 2 | import matplotlib.pyplot as plt diff_dict([plt.rcParams, plt.rcParamsDefault, plt.matplotlib.rcParamsOrig]) |
我的两部词典的对称差异配方:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | def find_dict_diffs(dict1, dict2): unequal_keys = [] unequal_keys.extend(set(dict1.keys()).symmetric_difference(set(dict2.keys()))) for k in dict1.keys(): if dict1.get(k, 'N\A') != dict2.get(k, 'N\A'): unequal_keys.append(k) if unequal_keys: print 'param', 'dict1\t', 'dict2' for k in set(unequal_keys): print str(k)+'\t'+dict1.get(k, 'N\A')+'\t '+dict2.get(k, 'N\A') else: print 'Dicts are equal' dict1 = {1:'a', 2:'b', 3:'c', 4:'d', 5:'e'} dict2 = {1:'b', 2:'a', 3:'c', 4:'d', 6:'f'} find_dict_diffs(dict1, dict2) |
结果是:
1 2 3 4 5 | param dict1 dict2 1 a b 2 b a 5 e N\A 6 N\A f |
正如在其他答案中提到的,UnitTest为比较听写产生了一些很好的输出,但在本例中,我们不希望首先构建一个完整的测试。
清除UnitTest源,您似乎可以通过以下方式获得一个公平的解决方案:
1 2 3 4 5 6 7 8 9 10 11 | import difflib import pprint def diff_dicts(a, b): if a == b: return '' return ' '.join( difflib.ndiff(pprint.pformat(a, width=30).splitlines(), pprint.pformat(b, width=30).splitlines()) ) |
所以
1 2 3 | dictA = dict(zip(range(7), map(ord, 'python'))) dictB = {0: 112, 1: 'spam', 2: [1,2,3], 3: 104, 4: 111} print diff_dicts(dictA, dictB) |
结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 | {0: 112, - 1: 121, - 2: 116, + 1: 'spam', + 2: [1, 2, 3], 3: 104, - 4: 111, ? ^ + 4: 111} ? ^ - 5: 110} |
在哪里?
- "-"表示第一个而不是第二个dict中的键/值
- "+"表示第二个而不是第一个dict中的键/值
和UnitTest一样,唯一需要注意的是,由于后面的逗号/括号,最终的映射可以被认为是diff。
@maxx有一个很好的答案,使用python提供的
1 2 3 4 5 6 7 8 9 10 | import unittest class Test(unittest.TestCase): def runTest(self): pass def testDict(self, d1, d2, maxDiff=None): self.maxDiff = maxDiff self.assertDictEqual(d1, d2) |
然后,您可以在代码中的任何位置调用:
1 2 3 4 | try: Test().testDict(dict1, dict2) except Exception, e: print e |
生成的输出看起来像来自
下面是一个深入比较两个字典键的解决方案:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | def compareDictKeys(dict1, dict2): if type(dict1) != dict or type(dict2) != dict: return False keys1, keys2 = dict1.keys(), dict2.keys() diff = set(keys1) - set(keys2) or set(keys2) - set(keys1) if not diff: for key in keys1: if (type(dict1[key]) == dict or type(dict2[key]) == dict) and not compareDictKeys(dict1[key], dict2[key]): diff = True break return not diff |
如果您希望使用内置解决方案与任意dict结构进行完全比较,@maxx的答案是一个很好的开始。
1 2 3 4 | import unittest test = unittest.TestCase() test.assertEqual(dictA, dictB) |
根据Ghostdog74的答案,
1 2 3 4 5 6 | dicta = {"a":1,"d":2} dictb = {"a":5,"d":2} for value in dicta.values(): if not value in dictb.values(): print value |
将打印不同的dicta值
不确定它是否仍然相关,但我遇到了这个问题,我的情况是我只需要返回一个所有嵌套字典等的更改字典,找不到一个好的解决方案,但最终我编写了一个简单的函数来完成这项工作。希望这有帮助,
尝试此操作查找两个字典中的键de intersection,如果希望在第二个字典中找不到键,只需使用not in…
1 | intersect = filter(lambda x, dictB=dictB.keys(): x in dictB, dictA.keys()) |