Fast random weighted selection across all rows of a stochastic matrix
1 2 3 | arr = numpy.array([1, 2, 3]) weights = numpy.array([0.2, 0.5, 0.3]) choice = numpy.random.choice(arr, p=weights) |
选择概率为0.2的1,概率为0.5的2,概率为0.3的3。
如果我们想以矢量化的方式快速地对每一行都是概率向量的二维数组(矩阵)执行此操作,该怎么办?也就是说,我们想要一个随机矩阵的选择向量?这是一条非常慢的路:
1 2 3 4 5 6 7 8 9 10 11 12 13 | import numpy as np m = 10 n = 100 # Or some very large number items = np.arange(m) prob_weights = np.random.rand(m, n) prob_matrix = prob_weights / prob_weights.sum(axis=0, keepdims=True) choices = np.zeros((n,)) # This is slow, because of the loop in Python for i in range(n): choices[i] = np.random.choice(items, p=prob_matrix[:,i]) |
1 2 3 4 5 6 7 8 | array([ 4., 7., 8., 1., 0., 4., 3., 7., 1., 5., 7., 5., 3., 1., 9., 1., 1., 5., 9., 8., 2., 3., 2., 6., 4., 3., 8., 4., 1., 1., 4., 0., 1., 8., 5., 3., 9., 9., 6., 5., 4., 8., 4., 2., 4., 0., 3., 1., 2., 5., 9., 3., 9., 9., 7., 9., 3., 9., 4., 8., 8., 7., 6., 4., 6., 7., 9., 5., 0., 6., 1., 3., 3., 2., 4., 7., 0., 6., 3., 5., 8., 0., 8., 3., 4., 5., 2., 2., 1., 1., 9., 9., 4., 3., 3., 2., 8., 0., 6., 1.]) |
这篇文章表明,
有没有一种只使用矢量化操作的快速方法?
这是一个完全矢量化的版本,速度相当快:
1 2 3 4 5 | def vectorized(prob_matrix, items): s = prob_matrix.cumsum(axis=0) r = np.random.rand(prob_matrix.shape[1]) k = (s < r).sum(axis=0) return items[k] |
理论上,
下面是我对
1 2 3 4 5 6 7 8 | In [115]: %timeit original(prob_matrix, items) 1 loops, best of 3: 270 ms per loop In [116]: %timeit improved(prob_matrix, items) 10 loops, best of 3: 24.9 ms per loop In [117]: %timeit vectorized(prob_matrix, items) 1000 loops, best of 3: 1 ms per loop |
定义函数的完整脚本是:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | import numpy as np def improved(prob_matrix, items): # transpose here for better data locality later cdf = np.cumsum(prob_matrix.T, axis=1) # random numbers are expensive, so we'll get all of them at once ridx = np.random.random(size=n) # the one loop we can't avoid, made as simple as possible idx = np.zeros(n, dtype=int) for i, r in enumerate(ridx): idx[i] = np.searchsorted(cdf[i], r) # fancy indexing all at once is faster than indexing in a loop return items[idx] def original(prob_matrix, items): choices = np.zeros((n,)) # This is slow, because of the loop in Python for i in range(n): choices[i] = np.random.choice(items, p=prob_matrix[:,i]) return choices def vectorized(prob_matrix, items): s = prob_matrix.cumsum(axis=0) r = np.random.rand(prob_matrix.shape[1]) k = (s < r).sum(axis=0) return items[k] m = 10 n = 10000 # Or some very large number items = np.arange(m) prob_weights = np.random.rand(m, n) prob_matrix = prob_weights / prob_weights.sum(axis=0, keepdims=True) |
我不认为完全矢量化是可能的,但是你仍然可以通过尽可能多的矢量化得到一个不错的加速。我想到的是:
1 2 3 4 5 6 7 8 9 10 11 | def improved(prob_matrix, items): # transpose here for better data locality later cdf = np.cumsum(prob_matrix.T, axis=1) # random numbers are expensive, so we'll get all of them at once ridx = np.random.random(size=n) # the one loop we can't avoid, made as simple as possible idx = np.zeros(n, dtype=int) for i, r in enumerate(ridx): idx[i] = np.searchsorted(cdf[i], r) # fancy indexing all at once is faster than indexing in a loop return items[idx] |
针对问题中的版本进行测试:
1 2 3 4 5 6 | def original(prob_matrix, items): choices = np.zeros((n,)) # This is slow, because of the loop in Python for i in range(n): choices[i] = np.random.choice(items, p=prob_matrix[:,i]) return choices |
这是加速(使用问题中给出的设置代码):
1 2 3 4 5 6 | In [45]: %timeit original(prob_matrix, items) 100 loops, best of 3: 2.86 ms per loop In [46]: %timeit improved(prob_matrix, items) The slowest run took 4.15 times longer than the fastest. This could mean that an intermediate result is being cached 10000 loops, best of 3: 157 μs per loop |
我不确定为什么我的版本的计时有很大的差异,但是即使最慢的运行(650μs)仍然快5倍。