关于python：一行程序删除重复项，保持列表的顺序

One-liner to remove duplicates, keep ordering of list

本问题已经有最佳答案，请猛点这里访问。

我有以下列表：

1	['Herb', 'Alec', 'Herb', 'Don']

我希望在保持订单的同时删除重复项，因此：

1	['Herb', 'Alec', 'Don']

以下是我将如何做到这一点：

1
2
3

l_new = []
for item in l_old:
if item not in l_new: l_new.append(item)

有没有一种方法可以做到这一点？

相关讨论

可以使用集合删除重复项，然后恢复排序。它和你的原版一样慢，耶：—)

1 2	>>> sorted(set(l_old), key=l_old.index) ['Herb', 'Alec', 'Don']

相关讨论

使用pandas，从列表中创建一个系列，删除重复项，然后将其转换回列表。

1
2
3
4

import pandas as pd

>>> pd.Series(['Herb', 'Alec', 'Herb', 'Don']).drop_duplicates().tolist()
['Herb', 'Alec', 'Don']

计时

@stefanbochmann的解决方案显然是高重复列表的赢家。

1
2
3
4
5
6
7
8
9
10

my_list = ['Herb', 'Alec', 'Don'] * 10000

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.11 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 16.1 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1000 loops, best of 3: 396 μs per loop

对于没有重复的更大列表(例如简单的数字范围)，熊猫解决方案非常快。

1
2
3
4
5
6
7
8
9
10

my_list = range(10000)

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.16 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 10.8 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1 loop, best of 3: 716 ms per loop

相关讨论

你可以用一个OrderedDict，但我建议你坚持你的for循环。

1
2
3
4

>>> from collections import OrderedDict
>>> data = ['Herb', 'Alec', 'Herb', 'Don']
>>> list(OrderedDict.fromkeys(data))
['Herb', 'Alec', 'Don']

只是重申一下：我认真地建议您坚持使用for-loop方法，并使用set跟踪已经看到的项目：

1
2
3
4
5
6
7
8
9
10

>>> data = ['Herb', 'Alec', 'Herb', 'Don']
>>> seen = set()
>>> unique_data = []
>>> for x in data:
... if x not in seen:
... unique_data.append(x)
... seen.add(x)
...
>>> unique_data
['Herb', 'Alec', 'Don']

如果你只是想变得古怪(严肃地说，不要这样做)：

1 2	>>> [t[0] for t in sorted(dict(zip(reversed(data), range(len(data), -1, -1))).items(), key=lambda t:t[1])] ['Herb', 'Alec', 'Don']

相关讨论

如果您真的不在乎优化和其他东西，可以使用以下功能：

1 2	s = ['Herb', 'Alec', 'Herb', 'Don'] [x[0] for x in zip(s, range(len(s))) if x[0] not in s[:x[1]]]

Note that in my opinion you really should use the for loop in your question or the answer by @juanpa.arrivillaga

您可以尝试以下操作：

1 2	l = ['Herb', 'Alec', 'Herb', 'Don'] data = [i[-1] for i in sorted([({a:i for i, a in enumerate(l)}[a], a) for a in set({a:i for i, a in enumerate(l)}.keys())], key = lambda x: x[0])]

输出：

1	['Alec', 'Herb', 'Don']

此算法只删除重复值的第一个实例。

1
2
3

l_new = []
for item in l_old:
if item not in l_new: l_new.append(item)

在一行中..ish：

1
2
3

l_new = []

[ l_new.append(item) for item in l_old if item not in l_new]

具有以下行为：

1
2
3
4
5

> a = [1,1,2,2,3,3,4,5,5]
> b = []
> [ b.append(item) for item in a if item not in b]
> print(b)
[1,2,3,4,5]

相关讨论