How to remove duplicate dictionaries from a list in Python?
我有一个按特定键排序的字典列表。每个字典包含32个元素,列表中有4000多个字典。我需要代码来处理列表并返回一个新的列表,删除所有重复项。
这些链接中的方法:
- 删除列表中的重复项
- 在保存订单的同时,如何从列表中删除重复项?
别帮我,因为字典不好用。
有什么想法吗?如果您需要更多信息,请发表评论,我会添加信息。
编辑:
一个重复的字典可以是任何两个具有相同的
好的,这里有一个详细的解释给那些需要它的人。
我有一个这样的字典列表:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | [ { "ID" :"0001", "Organization" :"SolarUSA", "Matchcode" :"SolarUSA, Something Street, Somewhere State, Whatev Zip", "Owner" :"Timothy Black", }, { "ID" :"0002", "Organization" :"SolarUSA", "Matchcode" :"SolarUSA, Something Street, Somewhere State, Whatev Zip", "Owner" :"Johen Wilheim", }, { "ID" :"0003", "Organization" :"Zapotec", "Matchcode" :"Zapotec, Something Street, Somewhere State, Whatev Zip", "Owner" :"Simeon Yurrigan", } ] |
在这个列表中,第一和第二个字典是重复的,因为它们的
现在,此列表按以下代码排序:
1 2 3 | # sort_by is"Matchcode" def sort( list_to_be_sorted, sort_by ): return sorted(list_to_be_sorted, key=lambda k: k[sort_by]) |
所以我有一个按
正如您可以使用
1 2 3 4 5 6 | >>> d = {'a': 1, 'b': 2} >>> s = frozenset(d.items()) >>> hash(s) -7588994739874264648 >>> dict(s) == d True |
然后你可以使用你最喜欢的解决方案。将它们倒入
1 2 | >>> unique_sets = set(frozenset(d.items()) for d in list_of_dicts) >>> unique_dicts = [dict(s) for s in unique_sets] |
或者,保留顺序并使用键值:
1 2 3 | >>> sets = (frozenset(d.items()) for d in list_of_dicts) >>> unique_sets = unique_everseen(sets, key=operator.itemgetter(key)) >>> unique_dicts = [dict(s) for s in unique_sets] |
当然,如果您在其中嵌套了列表或dict,则必须递归地转换,就像对列表列表所做的那样。
使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | import itertools data =[ { "ID" :"0001", "Organization" :"SolarUSA", "Matchcode" :"SolarUSA, Something Street, Somewhere State, Whatev Zip", "Owner" :"Timothy Black", }, { "ID" :"0002", "Organization" :"SolarUSA", "Matchcode" :"SolarUSA, Something Street, Somewhere State, Whatev Zip", "Owner" :"Johen Wilheim", }, { "ID" :"0003", "Organization" :"Zapotec", "Matchcode" :"Zapotec, Something Street, Somewhere State, Whatev Zip", "Owner" :"Simeon Yurrigan", } ] print [g.next() for k,g in itertools.groupby(data, lambda x: x['Matchcode'])] |
给出结果
1 2 3 4 5 6 7 8 9 | [{'Owner': 'Timothy Black', 'Organization': 'SolarUSA', 'ID': '0001', 'Matchcode': 'SolarUSA, Something Street, Somewhere State, Whatev Zip'}, {'Owner': 'Simeon Yurrigan', 'Organization': 'Zapotec', 'ID': '0003', 'Matchcode':'Zapotec, Something Street, Somewhere State, Whatev Zip'}] |
我相信这就是你要找的。
编辑:我更喜欢独特的解决方案。它更简短,更具描述性。
对于现在已消除歧义的问题,此答案不正确。
所有的听写都有相同的键吗?如果是这样,写一个函数
1
2
3
4
5the_keys = ["foo","bar"]
def as_values(d):
return tuple(d[k] for k in the_keys)
unique_values = unique_everseen(list_of_dicts, key=as_values)其中
unique_everseen 在http://docs.python.org/2/library/itertools.html中定义。如果听写不太一致,请使用更通用的密钥,例如我发布到https://stackoverflow.com/a/2704866/192839的
FrozenDict 。
现在我们可以看到,如果一个特定的键匹配,那么两个字典是重复的,问题非常简单。只需遍历字典;跟踪您所看到的键,最后用唯一的键创建一个新的列表。
1 2 3 4 5 6 7 8 9 | import collections def get_unique_items(list_of_dicts, key="Matchcode"): # Count how many times each key occurs. key_count = collections.defaultdict(lambda: 0) for d in list_of_dicts: key_count[d[key]] += 1 # Now return a list of only those dicts with a unique key. return [d for d in list_of_dicts if key_count[d[key]] == 1] |
注意,我在这里使用
另一方面,如果你只想把你看到的第一本字典和每一个给定的键放在一起,不管以后是否有副本,那么用
So I have a neat list of dictionaries sorted by Matchcode. Now I just need to iterate over the list, accessing the list[dictionary][key] and deleting duplicates when two key values match.
我仍然不完全确定这意味着什么。听起来好像你在说,它们总是按照你想用来统一的键来排序。如果是这样,您只需使用
使用编辑问题中的示例
1 2 3 4 5 6 7 8 9 | >>> list(unique_justseen(list_of_dicts, key=itemgetter('Matchcode'))) [{'ID': '0001', 'Matchcode': 'SolarUSA, Something Street, Somewhere State, Whatev Zip', 'Organization': 'SolarUSA', 'Owner': 'Timothy Black'}, {'ID': '0003', 'Matchcode': 'Zapotec, Something Street, Somewhere State, Whatev Zip', 'Organization': 'Zapotec', 'Owner': 'Simeon Yurrigan'}] |
如果它们是按照我们唯一确定的另一个键排序的,那么它们排序的事实根本不相关,并且
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | >>> list_of_dicts.sort(key=itemgetter('Owner')) >>> list(unique_justseen(list_of_dicts, key=itemgetter('Matchcode'))) [{'ID': '0002', 'Matchcode': 'SolarUSA, Something Street, Somewhere State, Whatev Zip', 'Organization': 'SolarUSA', 'Owner': 'Johen Wilheim'}, {'ID': '0003', 'Matchcode': 'Zapotec, Something Street, Somewhere State, Whatev Zip', 'Organization': 'Zapotec', 'Owner': 'Simeon Yurrigan'}, {'ID': '0001', 'Matchcode': 'SolarUSA, Something Street, Somewhere State, Whatev Zip', 'Organization': 'SolarUSA', 'Owner': 'Timothy Black'}] |
但是你只需要使用
1 2 3 4 5 6 7 8 9 10 | >>> list_of_dicts.sort(key=itemgetter('Owner')) >>> list(unique_everseen(list_of_dicts, key=itemgetter('Matchcode'))) [{'ID': '0002', 'Matchcode': 'SolarUSA, Something Street, Somewhere State, Whatev Zip', 'Organization': 'SolarUSA', 'Owner': 'Johen Wilheim'}, {'ID': '0003', 'Matchcode': 'Zapotec, Something Street, Somewhere State, Whatev Zip', 'Organization': 'Zapotec', 'Owner': 'Simeon Yurrigan'}] |
(当然,这一次我们得到的是0002而不是0001,因为在对
字典不可散列这一事实在这里并不相关,因为配方只是将键函数的结果存储在它们的集合中,只要存储在键
基本上,你需要像
1 2 3 4 5 6 7 8 9 | def no_dup(extractor, lst): "keeps only first elements encountered for any particular extracted value using ==" known = set() res = [] for item in lst: if extractor(item) in known: continue known.add(extractor(item)) res.append(item) return res |
1 2 3 4 5 6 7 | seen_values = set() without_duplicates = [] for d in list_of_dicts: value = d[key] if value not in seen_values: without_duplicates.append(d) seen_values.add(value) |
我不完全清楚你到底想实现什么,但是:
删除所有重复的词典条目只要你不介意把所有的字典组合起来,
1 2 | import itertools dict(itertools.chain(*map(lambda x: x.items(), list_of_dictionaries))) |