NumPy: Vectorize finding closest value in an array for each element in another array
Updates(P)I did some small benchmarks for comparing the non-vectorized and vectorized solution(accepted answer).(p)字母名称(P)About a 10-fold speedup!(p)澄清(P)字母名称0(p)(P)I ran the benchmarks as given in the answer by@cyborg below.(p)(P)案例1:If EDOCX1 English 0 was sorted(p)字母名称字母名称(P)Firstly,for large arrays EDOCX1 genital 15 langual method is actually slower,it also eats up a lot of ram and my system hanged when I ran it on actual data.(p)(P)案例2:When EDOCX1 theographic 0 colonial is not sorted;which represents actually scenario(p)字母名称字母名称(P)I must also comment that the approach should also be memory efficient.Otherwise my 8 GB of Ram is not sufficient.在这个基础上,这是容易的足够的。(p)
如果数组较大,则应使用
1 2 3 4 5 6 7 8 9 | import numpy as np np.random.seed(0) known_array = np.random.rand(1000) test_array = np.random.rand(400) %%time differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1)) indices = np.abs(differences).argmin(axis=0) residual = np.diagonal(differences[indices,]) |
。
输出:
1 2 | CPU times: user 11 ms, sys: 15 ms, total: 26 ms Wall time: 26.4 ms |
1 2 3 4 5 6 7 8 9 10 11 12 13 | %%time index_sorted = np.argsort(known_array) known_array_sorted = known_array[index_sorted] idx1 = np.searchsorted(known_array_sorted, test_array) idx2 = np.clip(idx1 - 1, 0, len(known_array_sorted)-1) diff1 = known_array_sorted[idx1] - test_array diff2 = test_array - known_array_sorted[idx2] indices2 = index_sorted[np.where(diff1 <= diff2, idx1, idx2)] residual2 = test_array - known_array[indices] |
。
输出:
1 2 | CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 311 μs |
我们可以检查结果是否相同:
1 2 | assert np.all(residual == residual2) assert np.all(indices == indices2) |
。
tl;dr:使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | import inspect from timeit import timeit import numpy as np known_array = np.arange(-10, 10) test_array = np.random.randint(-10, 10, 1000) number = 1000 def base(known_array, test_array): def find_nearest(known_array, value): idx = (np.abs(known_array - value)).argmin() return idx indices = np.zeros_like(test_array, dtype=known_array.dtype) for i in range(len(test_array)): indices[i] = find_nearest(known_array, test_array[i]) return indices def diffs(known_array, test_array): differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1)) indices = np.abs(differences).argmin(axis=0) return indices def searchsorted1(known_array, test_array): index_sorted = np.argsort(known_array) known_array_sorted = known_array[index_sorted] idx1 = np.searchsorted(known_array_sorted, test_array) idx1[idx1 == len(known_array)] = len(known_array)-1 idx2 = np.clip(idx1 - 1, 0, len(known_array_sorted)-1) diff1 = known_array_sorted[idx1] - test_array diff2 = test_array - known_array_sorted[idx2] indices2 = index_sorted[np.where(diff1 <= diff2, idx1, idx2)] return indices2 def searchsorted2(known_array, test_array): index_sorted = np.argsort(known_array) known_array_sorted = known_array[index_sorted] known_array_middles = known_array_sorted[1:] - np.diff(known_array_sorted.astype('f'))/2 idx1 = np.searchsorted(known_array_middles, test_array) indices = index_sorted[idx1] return indices def time_f(func_name): return timeit(func_name+"(known_array, test_array)", 'from __main__ import known_array, test_array, ' + func_name, number=number) print('Speedups:') base_time = time_f('base') for func_name in ['diffs', 'searchsorted1', 'searchsorted2']: print func_name + ' is x%.1f faster than base.' % (base_time / time_f(func_name)) |
输出:
1 2 3 4 | Speedups: diffs is x29.9 faster than base. searchsorted1 is x37.4 faster than base. searchsorted2 is x64.3 faster than base. |
。
例如,您可以计算"继续"中的所有差异:
1 | differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1)) |
并使用
1 2 | indices = np.abs(differences).argmin(axis=0) residual = np.diagonal(differences[indices,]) |
号
所以
1 2 | >>> known_array = np.array([-24, -18, -13, -30, 29]) >>> test_array = np.array([-6, 4, -6, 4, 8, -4, 8, -6, 2, 8]) |
一个得到
1 2 3 4 | >>> indices array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2]) >>> residual array([ 7, 17, 7, 17, 21, 9, 21, 7, 15, 21]) |
。