numpy.unique with order preserved
1 | ['b','b','b','a','a','c','c'] |
numpy.unique给
1 | ['a','b','c'] |
我如何保存原始订单
1 | ['b','a','c'] |
好答案。 奖金问题。 为什么这些方法都不适用于该数据集? http://www.uploadmb.com/dw.php?id=1364341573这是numpy排序怪异行为的问题
1 2 3 4 | import numpy as np a = np.array(['b','a','b','b','d','a','a','c','c']) _, idx = np.unique(a, return_index=True) print(a[np.sort(idx)]) |
输出:
1 | ['b' 'a' 'd' 'c'] |
对于大数组O(N),
1 2 3 4 5 6 7 8 | import pandas as pd a = np.random.randint(0, 1000, 10000) %timeit np.unique(a) %timeit pd.unique(a) 1000 loops, best of 3: 644 us per loop 10000 loops, best of 3: 144 us per loop |
使用
1 2 3 4 | >>> u, ind = np.unique(['b','b','b','a','a','c','c'], return_index=True) >>> u[np.argsort(ind)] array(['b', 'a', 'c'], dtype='|S1') |
1 2 | a = ['b','b','b','a','a','c','c'] [a[i] for i in sorted(np.unique(a, return_index=True)[1])] |
如果您要删除已经排序的可迭代项的重复项,则可以使用
1 2 3 4 | >>> from itertools import groupby >>> a = ['b','b','b','a','a','c','c'] >>> [x[0] for x in groupby(a)] ['b', 'a', 'c'] |
这更像unix'uniq'命令,因为它假定列表已经排序。 当您在未排序的列表上尝试它时,您将获得如下内容:
1 2 3 | >>> b = ['b','b','b','a','a','c','c','a','a'] >>> [x[0] for x in groupby(b)] ['b', 'a', 'c', 'a'] |
如果要删除重复的条目(例如Unix工具
1 2 3 4 5 6 7 8 9 10 | def uniq(seq): """ Like Unix tool uniq. Removes repeated entries. :param seq: numpy.array :return: seq """ diffs = np.ones_like(seq) diffs[1:] = seq[1:] - seq[:-1] idx = diffs.nonzero() return seq[idx] |
使用OrderedDict(比列表理解要快)
1 2 3 | from collections import OrderedDict a = ['b','a','b','a','a','c','c'] list(OrderedDict.fromkeys(a)) |