关于python：NumPy：Vectorize在数组中为另一个数组中的每个元素查找最接近的值

NumPy: Vectorize finding closest value in an array for each element in another array

投入(P)字母名称：字母名称：Nupy Array；consisting of scalar values only；字母名称：EDOCX1(p)(P)EDOCX1英文版2：Nupy Array；consisting of scalar values only；EDOCX1英文版3(p)Output(P)EDOCX1音标4：Nupy Array；EDOCX1音标3；For each value in EDOCX1 penal 2 Finds the index of the closest value in EDOCX1(p)(P)EDOCX1 8:Nummy Array；EDOCX1:Anual 3；for each value in EDOCX1 penal 2.Finds the difference from the closest value in EDOCX1.(p)例如字母名称Sample Implementation(not fully vectorized)字母名称(P)What is the best way to speed up this task？Cython is an option，but，I would always prefer to be able to remove the EDOCX1 disciplinary 12 roman loop and let the code remain are pure nupy.(p)(P)NB：Following stack overflow questions were consulted(p)

Python/nupy-quickly find the index in an array closest to some value

Find the index of numerically closest value

Find nearest value in nupy array

Finding the nearest value and return the index of array in python

Finding nearest items across two lists/arrays in python

Updates(P)I did some small benchmarks for comparing the non-vectorized and vectorized solution(accepted answer).(p)字母名称(P)About a 10-fold speedup！(p)澄清(P)字母名称0(p)(P)I ran the benchmarks as given in the answer by@cyborg below.(p)(P)案例1：If EDOCX1 English 0 was sorted(p)字母名称字母名称(P)Firstly，for large arrays EDOCX1 genital 15 langual method is actually slower，it also eats up a lot of ram and my system hanged when I ran it on actual data.(p)(P)案例2:When EDOCX1 theographic 0 colonial is not sorted；which represents actually scenario(p)字母名称字母名称(P)I must also comment that the approach should also be memory efficient.Otherwise my 8 GB of Ram is not sufficient.在这个基础上，这是容易的足够的。(p)

相关讨论

如果数组较大，则应使用searchsorted：

1
2
3
4
5
6
7
8
9

import numpy as np
np.random.seed(0)
known_array = np.random.rand(1000)
test_array = np.random.rand(400)

%%time
differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1))
indices = np.abs(differences).argmin(axis=0)
residual = np.diagonal(differences[indices,])

。

输出：

1 2	CPU times: user 11 ms, sys: 15 ms, total: 26 ms Wall time: 26.4 ms

searchsorted版本：

1
2
3
4
5
6
7
8
9
10
11
12
13

%%time

index_sorted = np.argsort(known_array)
known_array_sorted = known_array[index_sorted]

idx1 = np.searchsorted(known_array_sorted, test_array)
idx2 = np.clip(idx1 - 1, 0, len(known_array_sorted)-1)

diff1 = known_array_sorted[idx1] - test_array
diff2 = test_array - known_array_sorted[idx2]

indices2 = index_sorted[np.where(diff1 <= diff2, idx1, idx2)]
residual2 = test_array - known_array[indices]

。

输出：

1 2	CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 311 μs

我们可以检查结果是否相同：

1 2	assert np.all(residual == residual2) assert np.all(indices == indices2)

。

相关讨论

tl；dr：使用numpy.searchsorted()。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

import inspect
from timeit import timeit
import numpy as np

known_array = np.arange(-10, 10)
test_array = np.random.randint(-10, 10, 1000)
number = 1000

def base(known_array, test_array):
def find_nearest(known_array, value):
idx = (np.abs(known_array - value)).argmin()
return idx
indices = np.zeros_like(test_array, dtype=known_array.dtype)
for i in range(len(test_array)):
indices[i] = find_nearest(known_array, test_array[i])
return indices

def diffs(known_array, test_array):
differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1))
indices = np.abs(differences).argmin(axis=0)
return indices

def searchsorted1(known_array, test_array):
index_sorted = np.argsort(known_array)
known_array_sorted = known_array[index_sorted]
idx1 = np.searchsorted(known_array_sorted, test_array)
idx1[idx1 == len(known_array)] = len(known_array)-1
idx2 = np.clip(idx1 - 1, 0, len(known_array_sorted)-1)
diff1 = known_array_sorted[idx1] - test_array
diff2 = test_array - known_array_sorted[idx2]
indices2 = index_sorted[np.where(diff1 <= diff2, idx1, idx2)]
return indices2

def searchsorted2(known_array, test_array):
index_sorted = np.argsort(known_array)
known_array_sorted = known_array[index_sorted]
known_array_middles = known_array_sorted[1:] - np.diff(known_array_sorted.astype('f'))/2
idx1 = np.searchsorted(known_array_middles, test_array)
indices = index_sorted[idx1]
return indices

def time_f(func_name):
return timeit(func_name+"(known_array, test_array)",
'from __main__ import known_array, test_array, ' + func_name, number=number)

print('Speedups:')
base_time = time_f('base')
for func_name in ['diffs', 'searchsorted1', 'searchsorted2']:
print func_name + ' is x%.1f faster than base.' % (base_time / time_f(func_name))

输出：

1
2
3
4

Speedups:
diffs is x29.9 faster than base.
searchsorted1 is x37.4 faster than base.
searchsorted2 is x64.3 faster than base.

。

相关讨论

例如，您可以计算"继续"中的所有差异：

1	differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1))

并使用argmin和花式索引以及np.diagonal得到所需的指数和差异：

1 2	indices = np.abs(differences).argmin(axis=0) residual = np.diagonal(differences[indices,])

号

所以

1 2	>>> known_array = np.array([-24, -18, -13, -30, 29]) >>> test_array = np.array([-6, 4, -6, 4, 8, -4, 8, -6, 2, 8])

一个得到

1
2
3
4

>>> indices
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
>>> residual
array([ 7, 17, 7, 17, 21, 9, 21, 7, 15, 21])

。

相关讨论