What is itertools.groupby() used for?
在阅读python文档时,我遇到了
这里和文档中似乎没有关于它的信息,所以我决定将我的观察结果发表出来征求意见。
谢谢
首先,您可以阅读此处的文档。
我将把我认为最重要的一点放在首位。我希望在举例之后,原因会变得清楚。
始终使用用于分组的相同键对项进行排序,以避免出现意外结果。
返回值是一个类似于字典的iterable,因为它的形式是
实施例1
1 2 3 4 5 6 7 | # note here that the tuple counts as one item in this list. I did not # specify any key, so each item in the list is a key on its own. c = groupby(['goat', 'dog', 'cow', 1, 1, 2, 3, 11, 10, ('persons', 'man', 'woman')]) dic = {} for k, v in c: dic[k] = list(v) dic |
结果
1 2 3 4 5 6 7 8 9 | {1: [1, 1], 'goat': ['goat'], 3: [3], 'cow': ['cow'], ('persons', 'man', 'woman'): [('persons', 'man', 'woman')], 10: [10], 11: [11], 2: [2], 'dog': ['dog']} |
号
实施例2
1 2 3 4 5 6 7 8 9 10 | # notice here that mulato and camel don't show up. only the last element with a certain key shows up, like replacing earlier result # the last result for c actually wipes out two previous results. list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \ 'wombat', 'mongoose', 'malloo', 'camel'] c = groupby(list_things, key=lambda x: x[0]) dic = {} for k, v in c: dic[k] = list(v) dic |
结果
1 2 3 4 5 6 | {'c': ['camel'], 'd': ['dog', 'donkey'], 'g': ['goat'], 'm': ['mongoose', 'malloo'], 'persons': [('persons', 'man', 'woman')], 'w': ['wombat']} |
。
现在,对于已排序的版本
1 2 3 4 5 6 7 8 9 10 11 | # but observe the sorted version where I have the data sorted first on same key I used for grouping list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \ 'wombat', 'mongoose', 'malloo', 'camel'] sorted_list = sorted(list_things, key = lambda x: x[0]) print(sorted_list) print() c = groupby(sorted_list, key=lambda x: x[0]) dic = {} for k, v in c: dic[k] = list(v) dic |
结果
1 2 3 4 5 6 7 | ['cow', 'cat', 'camel', 'dog', 'donkey', 'goat', 'mulato', 'mongoose', 'malloo', ('persons', 'man', 'woman'), 'wombat'] {'c': ['cow', 'cat', 'camel'], 'd': ['dog', 'donkey'], 'g': ['goat'], 'm': ['mulato', 'mongoose', 'malloo'], 'persons': [('persons', 'man', 'woman')], 'w': ['wombat']} |
。
实施例3
1 2 3 4 5 6 7 | things = [("animal","bear"), ("animal","duck"), ("plant","cactus"), ("vehicle","harley"), \ ("vehicle","speed boat"), ("vehicle","school bus")] dic = {} f = lambda x: x[0] for key, group in groupby(sorted(things, key=f), f): dic[key] = list(group) dic |
。
结果
1 2 3 4 5 | {'animal': [('animal', 'bear'), ('animal', 'duck')], 'plant': [('plant', 'cactus')], 'vehicle': [('vehicle', 'harley'), ('vehicle', 'speed boat'), ('vehicle', 'school bus')]} |
现在是排序版本。我把元组改成了列表。不管怎样,结果都是一样的。
1 2 3 4 5 6 7 | things = [["animal","bear"], ["animal","duck"], ["vehicle","harley"], ["plant","cactus"], \ ["vehicle","speed boat"], ["vehicle","school bus"]] dic = {} f = lambda x: x[0] for key, group in groupby(sorted(things, key=f), f): dic[key] = list(group) dic |
。
结果
1 2 3 4 5 | {'animal': [['animal', 'bear'], ['animal', 'duck']], 'plant': [['plant', 'cactus']], 'vehicle': [['vehicle', 'harley'], ['vehicle', 'speed boat'], ['vehicle', 'school bus']]} |
和往常一样,应该首先检查函数的文档。然而,
仅当其
key 结果与连续项目相同时,才对项目进行分组:1
2
3
4
5
6
7
8from itertools import groupby
for key, group in groupby([1,1,1,1,5,1,1,1,1,4]):
print(key, list(group))
# 1 [1, 1, 1, 1]
# 5 [5]
# 1 [1, 1, 1, 1]
# 4 [4]。
如果你想做一个完整的
groupby ,你可以在之前使用sorted 。它生成两个项,第二个项是迭代器(因此需要对第二个项进行迭代!).I显式地需要将这些强制转换为前一个示例中的
list 。如果向前推进
groupby 迭代器,则丢弃第二个生成的元素:1
2
3
4
5it = groupby([1,1,1,1,5,1,1,1,1,4])
key1, group1 = next(it)
key2, group2 = next(it)
print(key1, list(group1))
# 1 []号
即使
group1 不是空的!
如前所述,可以使用
- 埃多克斯1〔14〕
- 埃多克斯1〔15〕
- 可能更多。
不过,检查当地的房产还是不错的。
1 2 3 4 | def all_equal(iterable): "Returns True if all the elements are equal to each other" g = groupby(iterable) return next(g, True) and not next(g, False) |
号
还有:
1 2 3 4 5 | def unique_justseen(iterable, key=None): "List unique elements, preserving order. Remember only the element just seen." # unique_justseen('AAAABBBCCDAABBB') --> A B C D A B # unique_justseen('ABBCcAD', str.lower) --> A B C A D return map(next, map(itemgetter(1), groupby(iterable, key))) |
号