Group list by values
假设我有一个这样的列表:
1 | list = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]] |
我如何才能最优雅地将其分组,以在python中获得此列表输出:
1 | list = [["A","C"], ["B"], ["D","E"]] |
号
所以这些值是按secound值分组的,但顺序是保留的…
1 2 | values = set(map(lambda x:x[1], list)) newlist = [[y[0] for y in list if y[1]==x] for x in values] |
。
1 2 3 4 5 6 7 8 9 10 | from operator import itemgetter from itertools import groupby lki = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]] lki.sort(key=itemgetter(1)) glo = [[x for x,y in g] for k,g in groupby(lki,key=itemgetter(1))] print glo |
.
编辑
另一个不需要导入、可读性更高、保持顺序的解决方案比前一个解决方案长22%:
1 2 3 4 5 6 7 8 9 10 11 | oldlist = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]] newlist, dicpos = [],{} for val,k in oldlist: if k in dicpos: newlist[dicpos[k]].extend(val) else: newlist.append([val]) dicpos[k] = len(dicpos) print newlist |
。
霍华德的回答简洁而优雅,但在最坏的情况下也是O(n^2)。对于具有大量分组键值的大列表,您需要先对列表进行排序,然后使用
1 2 3 4 5 6 7 | >>> from itertools import groupby >>> from operator import itemgetter >>> seq = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]] >>> seq.sort(key = itemgetter(1)) >>> groups = groupby(seq, itemgetter(1)) >>> [[item[0] for item in data] for (key, data) in groups] [['A', 'C'], ['B'], ['D', 'E']] |
。
编辑:
我在看到Eyequem的答案后改变了这一点:
1 2 3 4 5 6 7 8 9 | >>> import collections >>> D1 = collections.defaultdict(list) >>> for element in L1: ... D1[element[1]].append(element[0]) ... >>> L2 = D1.values() >>> print L2 [['A', 'C'], ['B'], ['D', 'E']] >>> |
我不知道优雅,但它确实可行:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | oldlist = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]] # change into: list = [["A","C"], ["B"], ["D","E"]] order=[] dic=dict() for value,key in oldlist: try: dic[key].append(value) except KeyError: order.append(key) dic[key]=[value] newlist=map(dic.get, order) print newlist |
这将保留每个键第一次出现的顺序,以及每个键的项的顺序。它要求密钥是可哈希的,但不为其赋予意义。
1 2 3 4 | len = max(key for (item, key) in list) newlist = [[] for i in range(len+1)] for item,key in list: newlist[key].append(item) |
您可以在一个单一的列表中理解它,也许更优雅,但是o(n**2):
1 | [[item for (item,key) in list if key==i] for i in range(max(key for (item,key) in list)+1)] |
号