关于python：如何在两个数组中找到最接近的元素？

How to find closest elements in two array?

我有两个numpy数组，比如X=[x1,x2,x3,x4], y=[y1,y2,y3,y4]。其中三个元素接近，第四个元素可能接近或不接近。

像：

1 2	X [ 84.04467948 52.42447842 39.13555678 21.99846595] y [ 78.86529444 52.42447842 38.74910101 21.99846595]

或者可以是：

1 2	X [ 84.04467948 60 52.42447842 39.13555678] y [ 78.86529444 52.42447842 38.74910101 21.99846595]

我想定义一个函数来查找两个数组中的对应索引，就像第一种情况：

y[0]对应X[0]，
y[1]对应X[1]，
y[2]对应X[2]，
y[3]对应X[3]。

在第二种情况下：

y[0]对应X[0]，
y[1]对应X[2]，
y[2]对应X[3]。
和y[3]对应于X[1]。

我不能写一个函数来完全解决这个问题，请帮忙。

相关讨论

使用此答案https://stackoverflow.com/a/8929827/3627387和https://stackoverflow.com/a/12141207/3627387

固定的

1
2
3
4
5
6
7
8
9
10
11
12
13
14

def find_closest(alist, target):
return min(alist, key=lambda x:abs(x-target))

X = [ 84.04467948, 52.42447842, 39.13555678, 21.99846595]
Y = [ 78.86529444, 52.42447842, 38.74910101, 21.99846595]

def list_matching(list1, list2):
list1_copy = list1[:]
pairs = []
for i, e in enumerate(list2):
elem = find_closest(list1_copy, e)
pairs.append([i, list1.index(elem)])
list1_copy.remove(elem)
return pairs

相关讨论

您可以从预计算距离矩阵开始，如本答案所示：

1
2
3
4
5
6

import numpy as np

X = np.array([84.04467948,60.,52.42447842,39.13555678])
Y = np.array([78.86529444,52.42447842,38.74910101,21.99846595])

dist = np.abs(X[:, np.newaxis] - Y)

号

现在可以计算沿一个轴的最小值(我选择了1，对应于为每个X找到Y的最近元素)：

1	potentialClosest = dist.argmin(axis=1)

这仍然可能包含重复项(在您的案例2中)。为了检查这一点，您可以使用np.unique找到potentialClosest中出现的所有Y指数：

1	closestFound, closestCounts = np.unique(potentialClosest, return_counts=True)

。

现在您可以通过检查closestFound.shape[0] == X.shape[0]来检查是否有副本。如果是这样的话，你就是黄金，那么potentialClosest将包含X中每个元素的合作伙伴。但是，在您的案例2中，一个元素会出现两次，因此closestFound只包含X.shape[0]-1元素，而closestCounts将不包含1元素，而是包含一个2。对于计数为1的所有元素，已找到合作伙伴。对于两个具有2计数的候选人，尽管您必须选择较近的一个，而距离较大的一个将是Y的一个元素，该元素不在closestFound中。这可以被发现为：

1
2
3

missingPartnerIndex = np.where(
np.in1d(np.arange(Y.shape[0]), closestFound)==False
)[0][0]

您可以在一个循环中进行匹配(即使使用numpy可能有更好的方法)。这个解决方案很难看，但有效。如有任何改进建议，我们将不胜感激：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

partners = np.empty_like(X, dtype=int)
nonClosePartnerFound = False
for i in np.arange(X.shape[0]):
if closestCounts[closestFound==potentialClosest[i]][0]==1:
# A unique partner was found
partners[i] = potentialClosest[i]
else:
# Partner is not unique
if nonClosePartnerFound:
partners[i] = potentialClosest[i]
else:
if np.argmin(dist[:, potentialClosest[i]]) == i:
partners[i] = potentialClosest[i]
else:
partners[i] = missingPartnerIndex
nonClosePartnerFound = True
print(partners)

。

只有当一对不接近时，这个答案才有效。如果不是这样，您必须定义如何为多个非关闭元素找到正确的合作伙伴。遗憾的是，它既不是一个非常通用的解决方案，也不是一个非常好的解决方案，但希望您能找到一个有用的起点。

相关讨论

似乎最好的方法是对两个数组(nlog(n))进行预排序，然后像遍历两个数组一样执行合并。它绝对比你在评论中指出的nn快。

相关讨论

下面简单地打印两个数组的相应索引，正如您在问题中所做的那样，因为我不确定您希望函数提供什么输出。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

X1 = [84.04467948, 52.42447842, 39.13555678, 21.99846595]
Y1 = [78.86529444, 52.42447842, 38.74910101, 21.99846595]

X2 = [84.04467948, 60, 52.42447842, 39.13555678]
Y2 = [78.86529444, 52.42447842, 38.74910101, 21.99846595]

def find_closest(x_array, y_array):
# Copy x_array as we will later remove an item with each iteration and
# require the original later
remaining_x_array = x_array[:]
for y in y_array:
differences = []
for x in remaining_x_array:
differences.append(abs(y - x))
# min_index_remaining is the index position of the closest x value
# to the given y in remaining_x_array
min_index_remaining = differences.index(min(differences))
# related_x is the closest x value of the given y
related_x = remaining_x_array[min_index_remaining]
print 'Y[%s] corresponds to X[%s]' % (y_array.index(y), x_array.index(related_x))
# Remove the corresponding x value in remaining_x_array so it
# cannot be selected twice
remaining_x_array.pop(min_index_remaining)

。

然后输出以下内容

1
2
3
4
5

find_closest(X1,Y1)
Y[0] corresponds to X[0]
Y[1] corresponds to X[1]
Y[2] corresponds to X[2]
Y[3] corresponds to X[3]

和

1
2
3
4
5

find_closest(X2,Y2)
Y[0] corresponds to X[0]
Y[1] corresponds to X[2]
Y[2] corresponds to X[3]
Y[3] corresponds to X[1]

。

希望这有帮助。