How can I remove duplicate tuples from a list based on index value of tuple while maintaining the order of tuple?
我想删除索引0中除第一次出现外具有相同值的元组。我看了其他类似的问题,但没有得到我想要的具体答案。有人能帮我吗?下面是我的尝试。
1 2 3 4 5 6 7 8 | from itertools import groupby import random Newlist = [] abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)] Newlist = [random.choice(tuple(g)) for _, g in groupby(abc, key=lambda x: x[0])] print Newlist |
我的预期产量:
一个简单的方法是循环列表并跟踪已经找到的元素:
1 2 3 4 5 6 7 8 9 | abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)] found = set() NewList = [] for a in abc: if a[0] not in found: NewList.append(a) found.add(a[0]) print(NewList) #[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)] |
itertools配方(python 2:itertools配方,但在本例中基本上没有区别)包含一个用于此的配方,它比@pault的实现更通用。它还使用了
Python 2:
1 | from itertools import ifilterfalse as filterfalse |
Python 3:
1 | from itertools import filterfalse |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
使用它:
1 2 3 4 | abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)] Newlist = list(unique_everseen(abc, key=lambda x: x[0])) print Newlist # [(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)] |
由于
除此之外,我在注释中已经提到的同样的限制也适用:只有当元组的第一个元素实际上是可哈希的(当然,像在给定的例子中,这些数字是可哈希的)时,这才有效。
@Patrickhaugh声称:
but the question is explicitly about maintaining the order of the
tuples. I don't think there's a solution using groupby
我从来没有错过过使用
1 2 3 4 5 6 7 | from itertools import groupby, chain abc = [(1, 2, 3), (2, 3, 4), (1, 0, 3), (0, 2, 0), (2, 4, 5), (5, 4, 3), (0, 4, 1)] Newlist = list((lambda s: chain.from_iterable(g for f, g in groupby(abc, lambda k: s.get(k[0]) != s.setdefault(k[0], True)) if f))({})) print(Newlist) |
产量
1 2 3 | % python3 test.py [(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)] % |
使用
1 2 3 4 5 6 7 8 | from collections import OrderedDict abc = [(1,2,3), (2,3,4), (1,0,3), (0,2,0), (2,4,5),(5,4,3), (0,4,1)] d = OrderedDict() for t in abc: d.setdefault(t[0], t) abc_unique = list(d.values()) print(abc_unique) |
输出:
1 | [(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)] |
简单但效率不高:
1 2 3 | abc = [(1,2,3), (2,3,4), (1,0,3), (0,2,0), (2,4,5),(5,4,3), (0,4,1)] abc_unique = [t for i, t in enumerate(abc) if not any(t[0] == p[0] for p in abc[:i])] print(abc_unique) |
输出:
1 | [(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)] |
要正确使用
1 2 | >>> [next(g) for k,g in groupby(sorted(abc, key=lambda x:x[0]), key=lambda x:x[0])] [(0, 2, 0), (1, 2, 3), (2, 3, 4), (5, 4, 3)] |
或者,如果您需要示例的精确顺序(即保持原始顺序):
1 2 | >>> [t[2:] for t in sorted([next(g) for k,g in groupby(sorted([(t[0], i)+t for i,t in enumerate(abc)]), lambda x:x[0])], key=lambda x:x[1])] [(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)] |
这里的诀窍是在groupby()步骤之后添加一个字段来保持要恢复的原始顺序。
编辑:再短一点:
1 2 | >>> [t[1:] for t in sorted([next(g)[1:] for k,g in groupby(sorted([(t[0], i)+t for i,t in enumerate(abc)]), lambda x:x[0])])] [(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)] |