关于python:NumPy:Vectorize在数组中为另一个数组中的每个元素查找最接近的值

NumPy: Vectorize finding closest value in an array for each element in another array

投入(P)字母名称:字母名称:Nupy Array;consisting of scalar values only;字母名称:EDOCX1(p)(P)EDOCX1英文版2:Nupy Array;consisting of scalar values only;EDOCX1英文版3(p)Output(P)EDOCX1音标4:Nupy Array;EDOCX1音标3;For each value in EDOCX1 penal 2 Finds the index of the closest value in EDOCX1(p)(P)EDOCX1 8:Nummy Array;EDOCX1:Anual 3;for each value in EDOCX1 penal 2.Finds the difference from the closest value in EDOCX1.(p)例如字母名称Sample Implementation(not fully vectorized)字母名称(P)What is the best way to speed up this task?Cython is an option,but,I would always prefer to be able to remove the EDOCX1 disciplinary 12 roman loop and let the code remain are pure nupy.(p)(P)NB:Following stack overflow questions were consulted(p)

  • Python/nupy-quickly find the index in an array closest to some value
  • Find the index of numerically closest value
  • Find nearest value in nupy array
  • Finding the nearest value and return the index of array in python
  • Finding nearest items across two lists/arrays in python
  • Updates(P)I did some small benchmarks for comparing the non-vectorized and vectorized solution(accepted answer).(p)字母名称(P)About a 10-fold speedup!(p)澄清(P)字母名称0(p)(P)I ran the benchmarks as given in the answer by@cyborg below.(p)(P)案例1:If EDOCX1 English 0 was sorted(p)字母名称字母名称(P)Firstly,for large arrays EDOCX1 genital 15 langual method is actually slower,it also eats up a lot of ram and my system hanged when I ran it on actual data.(p)(P)案例2:When EDOCX1 theographic 0 colonial is not sorted;which represents actually scenario(p)字母名称字母名称(P)I must also comment that the approach should also be memory efficient.Otherwise my 8 GB of Ram is not sufficient.在这个基础上,这是容易的足够的。(p)


    如果数组较大,则应使用searchsorted

    1
    2
    3
    4
    5
    6
    7
    8
    9
    import numpy as np
    np.random.seed(0)
    known_array = np.random.rand(1000)
    test_array = np.random.rand(400)

    %%time
    differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1))
    indices = np.abs(differences).argmin(axis=0)
    residual = np.diagonal(differences[indices,])

    输出:

    1
    2
    CPU times: user 11 ms, sys: 15 ms, total: 26 ms
    Wall time: 26.4 ms

    searchsorted版本:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    %%time

    index_sorted = np.argsort(known_array)
    known_array_sorted = known_array[index_sorted]

    idx1 = np.searchsorted(known_array_sorted, test_array)
    idx2 = np.clip(idx1 - 1, 0, len(known_array_sorted)-1)

    diff1 = known_array_sorted[idx1] - test_array
    diff2 = test_array - known_array_sorted[idx2]

    indices2 = index_sorted[np.where(diff1 <= diff2, idx1, idx2)]
    residual2 = test_array - known_array[indices]

    输出:

    1
    2
    CPU times: user 0 ns, sys: 0 ns, total: 0 ns
    Wall time: 311 μs

    我们可以检查结果是否相同:

    1
    2
    assert np.all(residual == residual2)
    assert np.all(indices == indices2)


    tl;dr:使用numpy.searchsorted()

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    import inspect
    from timeit import timeit
    import numpy as np

    known_array = np.arange(-10, 10)
    test_array = np.random.randint(-10, 10, 1000)
    number = 1000

    def base(known_array, test_array):
        def find_nearest(known_array, value):
            idx = (np.abs(known_array - value)).argmin()
            return idx
        indices = np.zeros_like(test_array, dtype=known_array.dtype)
        for i in range(len(test_array)):
            indices[i] =  find_nearest(known_array, test_array[i])
        return indices

    def diffs(known_array, test_array):
        differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1))
        indices = np.abs(differences).argmin(axis=0)
        return indices

    def searchsorted1(known_array, test_array):
        index_sorted = np.argsort(known_array)
        known_array_sorted = known_array[index_sorted]
        idx1 = np.searchsorted(known_array_sorted, test_array)
        idx1[idx1 == len(known_array)] = len(known_array)-1
        idx2 = np.clip(idx1 - 1, 0, len(known_array_sorted)-1)
        diff1 = known_array_sorted[idx1] - test_array
        diff2 = test_array - known_array_sorted[idx2]
        indices2 = index_sorted[np.where(diff1 <= diff2, idx1, idx2)]
        return indices2

    def searchsorted2(known_array, test_array):
        index_sorted = np.argsort(known_array)
        known_array_sorted = known_array[index_sorted]
        known_array_middles = known_array_sorted[1:] - np.diff(known_array_sorted.astype('f'))/2
        idx1 = np.searchsorted(known_array_middles, test_array)
        indices = index_sorted[idx1]
        return indices

    def time_f(func_name):
        return timeit(func_name+"(known_array, test_array)",
            'from __main__ import known_array, test_array, ' + func_name, number=number)

    print('Speedups:')
    base_time = time_f('base')
    for func_name in ['diffs', 'searchsorted1', 'searchsorted2']:
        print func_name + ' is x%.1f faster than base.' % (base_time / time_f(func_name))

    输出:

    1
    2
    3
    4
    Speedups:
    diffs is x29.9 faster than base.
    searchsorted1 is x37.4 faster than base.
    searchsorted2 is x64.3 faster than base.


    例如,您可以计算"继续"中的所有差异:

    1
    differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1))

    并使用argmin和花式索引以及np.diagonal得到所需的指数和差异:

    1
    2
    indices = np.abs(differences).argmin(axis=0)
    residual = np.diagonal(differences[indices,])

    所以

    1
    2
    >>> known_array = np.array([-24, -18, -13, -30,  29])
    >>> test_array = np.array([-6,  4, -6,  4,  8, -4,  8, -6,  2,  8])

    一个得到

    1
    2
    3
    4
    >>> indices
    array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
    >>> residual
    array([ 7, 17,  7, 17, 21,  9, 21,  7, 15, 21])