关于python:最有效的字典计数器

Most efficient dictionary counter

我正在为一个通用的"字典计数器"寻找一个更有效的实现。当前,与集合相比,此幼稚函数生成的结果更快。计数器实现

1
2
3
4
5
def uniqueCounter(x):
    dx = defaultdict(int)
    for i in x:
        dx[i] += 1
    return dx

编辑:一些特征样本输入:

1
2
3
4
5
6
7
8
9
10
11
12
c1= zip(np.random.randint(0,2,200000),np.random.randint(0,2,200000))
c2= np.random.randint(0,2,200000)

c1:
uniqueCounter timing:
10 loops, best of 3: 61.1 ms per loop
collections.Counter timing:
10 loops, best of 3: 113 ms per loop

c2:
uniqueCounter timing: 10 loops, best of 3: 57 ms per loop
collections.Counter timing: 10 loops, best of 3: 120 ms per loop


尝试使用numpy.bincount

1
2
3
4
5
6
7
8
In [19]: Counter(c2)
Out[19]: Counter({1: 100226, 0: 99774})

In [20]: uniqueCounter(c2)
Out[20]: defaultdict(<type 'int'>, {0: 99774, 1: 100226})

In [21]: np.bincount(c2)
Out[21]: array([ 99774, 100226])

一些时间:

1
2
3
4
5
6
7
8
In [16]: %timeit np.bincount(c2)
1000 loops, best of 3: 2 ms per loop

In [17]: %timeit uniqueCounter(c2)
1 loops, best of 3: 161 ms per loop

In [18]: %timeit Counter(c2)
1 loops, best of 3: 362 ms per loop