Most efficient dictionary counter
我正在为一个通用的"字典计数器"寻找一个更有效的实现。当前,与集合相比,此幼稚函数生成的结果更快。计数器实现
1 2 3 4 5 | def uniqueCounter(x): dx = defaultdict(int) for i in x: dx[i] += 1 return dx |
编辑:一些特征样本输入:
1 2 3 4 5 6 7 8 9 10 11 12 | c1= zip(np.random.randint(0,2,200000),np.random.randint(0,2,200000)) c2= np.random.randint(0,2,200000) c1: uniqueCounter timing: 10 loops, best of 3: 61.1 ms per loop collections.Counter timing: 10 loops, best of 3: 113 ms per loop c2: uniqueCounter timing: 10 loops, best of 3: 57 ms per loop collections.Counter timing: 10 loops, best of 3: 120 ms per loop |
尝试使用numpy.bincount
1 2 3 4 5 6 7 8 | In [19]: Counter(c2) Out[19]: Counter({1: 100226, 0: 99774}) In [20]: uniqueCounter(c2) Out[20]: defaultdict(<type 'int'>, {0: 99774, 1: 100226}) In [21]: np.bincount(c2) Out[21]: array([ 99774, 100226]) |
一些时间:
1 2 3 4 5 6 7 8 | In [16]: %timeit np.bincount(c2) 1000 loops, best of 3: 2 ms per loop In [17]: %timeit uniqueCounter(c2) 1 loops, best of 3: 161 ms per loop In [18]: %timeit Counter(c2) 1 loops, best of 3: 362 ms per loop |