关于python:随机矩阵的所有行的快速随机加权选择

Fast random weighted selection across all rows of a stochastic matrix

numpy.random.choice允许从矢量中进行加权选择,即

1
2
3
arr = numpy.array([1, 2, 3])
weights = numpy.array([0.2, 0.5, 0.3])
choice = numpy.random.choice(arr, p=weights)

选择概率为0.2的1,概率为0.5的2,概率为0.3的3。

如果我们想以矢量化的方式快速地对每一行都是概率向量的二维数组(矩阵)执行此操作,该怎么办?也就是说,我们想要一个随机矩阵的选择向量?这是一条非常慢的路:

1
2
3
4
5
6
7
8
9
10
11
12
13
import numpy as np

m = 10
n = 100 # Or some very large number

items = np.arange(m)
prob_weights = np.random.rand(m, n)
prob_matrix = prob_weights / prob_weights.sum(axis=0, keepdims=True)

choices = np.zeros((n,))
# This is slow, because of the loop in Python
for i in range(n):
    choices[i] = np.random.choice(items, p=prob_matrix[:,i])

print(choices)

1
2
3
4
5
6
7
8
array([ 4.,  7.,  8.,  1.,  0.,  4.,  3.,  7.,  1.,  5.,  7.,  5.,  3.,
        1.,  9.,  1.,  1.,  5.,  9.,  8.,  2.,  3.,  2.,  6.,  4.,  3.,
        8.,  4.,  1.,  1.,  4.,  0.,  1.,  8.,  5.,  3.,  9.,  9.,  6.,
        5.,  4.,  8.,  4.,  2.,  4.,  0.,  3.,  1.,  2.,  5.,  9.,  3.,
        9.,  9.,  7.,  9.,  3.,  9.,  4.,  8.,  8.,  7.,  6.,  4.,  6.,
        7.,  9.,  5.,  0.,  6.,  1.,  3.,  3.,  2.,  4.,  7.,  0.,  6.,
        3.,  5.,  8.,  0.,  8.,  3.,  4.,  5.,  2.,  2.,  1.,  1.,  9.,
        9.,  4.,  3.,  3.,  2.,  8.,  0.,  6.,  1.])

这篇文章表明,cumsumbisect可能是一种潜在的方法,而且速度很快。但是,虽然numpy.cumsum(arr, axis=1)可以沿着numpy数组的一个轴执行此操作,但bisect.bisect函数一次只能在单个数组上工作。同样,numpy.searchsorted也只能在一维阵列上工作。

有没有一种只使用矢量化操作的快速方法?


这是一个完全矢量化的版本,速度相当快:

1
2
3
4
5
def vectorized(prob_matrix, items):
    s = prob_matrix.cumsum(axis=0)
    r = np.random.rand(prob_matrix.shape[1])
    k = (s < r).sum(axis=0)
    return items[k]

理论上,searchsorted是在累计加总概率中查找随机值的正确函数,但当m相对较小时,k = (s < r).sum(axis=0)最终会更快。它的时间复杂度是O(m),而searchsorted方法是O(log(m)),但这只对更大的m很重要。另外,cumsum是O(m),所以vectorized和@perimosocordiae的improved都是O(m)。(如果您的m实际上要大得多,则必须运行一些测试,以查看在这种方法变慢之前m有多大。)

下面是我对m = 10n = 10000的时间安排(使用@perimosocordiae的答案中的函数originalimproved):

1
2
3
4
5
6
7
8
In [115]: %timeit original(prob_matrix, items)
1 loops, best of 3: 270 ms per loop

In [116]: %timeit improved(prob_matrix, items)
10 loops, best of 3: 24.9 ms per loop

In [117]: %timeit vectorized(prob_matrix, items)
1000 loops, best of 3: 1 ms per loop

定义函数的完整脚本是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import numpy as np


def improved(prob_matrix, items):
    # transpose here for better data locality later
    cdf = np.cumsum(prob_matrix.T, axis=1)
    # random numbers are expensive, so we'll get all of them at once
    ridx = np.random.random(size=n)
    # the one loop we can't avoid, made as simple as possible
    idx = np.zeros(n, dtype=int)
    for i, r in enumerate(ridx):
        idx[i] = np.searchsorted(cdf[i], r)
    # fancy indexing all at once is faster than indexing in a loop
    return items[idx]


def original(prob_matrix, items):
    choices = np.zeros((n,))
    # This is slow, because of the loop in Python
    for i in range(n):
        choices[i] = np.random.choice(items, p=prob_matrix[:,i])
    return choices


def vectorized(prob_matrix, items):
    s = prob_matrix.cumsum(axis=0)
    r = np.random.rand(prob_matrix.shape[1])
    k = (s < r).sum(axis=0)
    return items[k]


m = 10
n = 10000 # Or some very large number

items = np.arange(m)
prob_weights = np.random.rand(m, n)
prob_matrix = prob_weights / prob_weights.sum(axis=0, keepdims=True)


我不认为完全矢量化是可能的,但是你仍然可以通过尽可能多的矢量化得到一个不错的加速。我想到的是:

1
2
3
4
5
6
7
8
9
10
11
def improved(prob_matrix, items):
    # transpose here for better data locality later
    cdf = np.cumsum(prob_matrix.T, axis=1)
    # random numbers are expensive, so we'll get all of them at once
    ridx = np.random.random(size=n)
    # the one loop we can't avoid, made as simple as possible
    idx = np.zeros(n, dtype=int)
    for i, r in enumerate(ridx):
      idx[i] = np.searchsorted(cdf[i], r)
    # fancy indexing all at once is faster than indexing in a loop
    return items[idx]

针对问题中的版本进行测试:

1
2
3
4
5
6
def original(prob_matrix, items):
    choices = np.zeros((n,))
    # This is slow, because of the loop in Python
    for i in range(n):
        choices[i] = np.random.choice(items, p=prob_matrix[:,i])
    return choices

这是加速(使用问题中给出的设置代码):

1
2
3
4
5
6
In [45]: %timeit original(prob_matrix, items)
100 loops, best of 3: 2.86 ms per loop

In [46]: %timeit improved(prob_matrix, items)
The slowest run took 4.15 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 157 μs per loop

我不确定为什么我的版本的计时有很大的差异,但是即使最慢的运行(650μs)仍然快5倍。