比较python中连续元组列表的第一个元素

Comparing first element of the consecutive lists of tuples in Python

我有一个元组列表，每个包含两个元素。少数子列表的第一个元素很常见。我想比较这些子列表的第一个元素，并将第二个元素追加到一个列表中。这是我的名单：

1	myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

我想从中列出一个列表，看起来像这样：`

1	NewList=[(2,3,4,5),(6,7,8),(9,10)]

我希望有什么有效的方法。

相关讨论

可以使用ordereddict按每个元组的第一个子元素对元素进行分组：

1
2
3
4
5
6
7
8
9
10
11

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

from collections import OrderedDict

od = OrderedDict()

for a,b in myList:
od.setdefault(a,[]).append(b)

print(list(od.values()))
[[2, 3, 4, 5], [6, 7, 8], [9, 10]]

如果你真的想要元组：

1 2	print(list(map(tuple,od.values()))) [(2, 3, 4, 5), (6, 7, 8), (9, 10)]

如果您不关心元素出现的顺序，只想使用最有效的分组方法，可以使用collections.defaultdict:

1
2
3
4
5
6
7
8

from collections import defaultdict

od = defaultdict(list)

for a,b in myList:
od[a].append(b)

print(list(od.values()))

最后，如果数据按照输入示例(即排序)的顺序排列，则只需使用itertools.groupby按每个元组中的第一个子元素分组，并从分组的元组中提取第二个元素：

1
2
3

from itertools import groupby
from operator import itemgetter
print([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])

输出：

1	[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

同样，groupby只在数据至少按第一个元素排序时才起作用。

一些时间安排在合理的列表中：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

In [33]: myList = [(randint(1,10000),randint(1,10000)) for _ in range(100000)]

In [34]: myList.sort()

In [35]: timeit ([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])
10 loops, best of 3: 44.5 ms per loop

In [36]: %%timeit od = defaultdict(list)
for a,b in myList:
od[a].append(b)
....:
10 loops, best of 3: 33.8 ms per loop

In [37]: %%timeit
dictionary = OrderedDict()
for x, y in myList:
if x not in dictionary:
dictionary[x] = [] # new empty list
dictionary[x].append(y)
....:
10 loops, best of 3: 63.3 ms per loop

In [38]: %%timeit
od = OrderedDict()
for a,b in myList:
od.setdefault(a,[]).append(b)
....:
10 loops, best of 3: 80.3 ms per loop

如果顺序很重要，并且对数据进行排序，那么使用groupby，如果有必要将所有元素映射到defaultdict中的tuple，它将更加接近defaultdict方法。

如果数据没有排序，或者您不关心任何顺序，那么您将无法找到比使用defaultdict方法更快的分组方法。

相关讨论

这就像是一个字典的任务(如果您还不知道字典，请在python.org上查找它们)。这是一个非常冗长的例子，所以这不是我在日常编码中所写的，但最好是冗长而不是不清楚：

1
2
3
4
5
6

dictionary = collections.OrderedDict()
for x, y in myList:
if not dictionary.has_key(x):
dictionary[x] = [] # new empty list
# append y to that list
dictionary[x].append(y)

相关讨论

考虑到这一点，最有效的方法可能是这一行(假设dictionary是一个空的dict，即dictionary = {}或dictionary = OrderedDict()，就像padraic的"极好的答案"中所说的那样：

1	for x,y in myList: dictionary.setdefault(x,[]).append(y)

我不是说这是最容易阅读的方法，但我喜欢它：)

编辑哈！基准测试证明我错了；setdefault方法比if not dictionary.has_key(x): dictionary[x]=[]方法慢：

1
2
3
4
5
6
7
8
9
10
11
12
13

>>> timeit.timeit("for x,y in myList:
if not dictionary.has_key(x):
dictionary[x]=[]
dictionary[x].append(y)","from collections import OrderedDict
myList=[(1,2),(1,3),(
1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]
dictionary=OrderedDict()")
2.2573769092559814
>>> timeit.timeit("for x,y in myList: dictionary.setdefault(x,[]).append(y)","from collections import OrderedDict
myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]
dictiona
ry=OrderedDict()")
3.3534231185913086

当然，Padraic仍然是对的：他的defaultdict方法在我的机器上使用0.82秒，所以速度快了3倍。

另外，正如Padraic指出的那样：dict.has_key(x)已经被否决了，应该使用x in dict来代替；但是，我无法测量速度差。

相关讨论

以下内容应该有效：

1
2
3
4

import itertools

myList = [(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]
print [tuple(x[1] for x in g) for k, g in itertools.groupby(myList, key=lambda x: x[0])]

显示：

1	[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

相关讨论