关于c＃：Linq和Binary Search – 改进这个缓慢的Where语句？

Linq and Binary Search - Improve this slow Where statement?

我有两个藏品，每个藏品大约有40000件。

列表2中的元素通过外键链接到列表1中的元素。

对于列表1中的每个元素，我想在列表2中找到对应的元素。

像这样：

1
2
3
4
5

foreach(var item in list1)
{
var match = list2.Where(child => child.ID == item.ChildID).FirstOrDefault();
item.Child = match;
}

这行得通，但速度太慢了。

现在，列表1和列表2都是通过数据库中的这些键进行排序的。所以list1是按childid排序的，list2是按id(相同的值)排序的。

我认为二进制搜索会大大加快速度，但我在某个地方看到Linq会为where子句中的列表选择最合适的策略。也许我需要显式地强制转换到排序列表？或者我可能需要实现一个自定义的二进制搜索算法和比较器？

任何见解都会受到赞赏。

谢谢。

相关讨论

我忍不住回答这个问题：—)

您的代码变慢的主要原因是您的项目将被多次读取。速度的艺术是：只在你需要的时候阅读记忆，如果你需要阅读它，尽可能少地阅读。

下面是一个例子：

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

public class Item
{
private int _id;
private List<ItemDetails> _detailItems = new List<ItemDetails>();

public Item(int id)
{
_id = id;
}

public void AddItemDetail(ItemDetails itemDetail)
{
_detailItems.Add(itemDetail);
}

public int Id
{
get { return _id; }
}
public ReadOnlyCollection<ItemDetails> DetailItems
{
get { return _detailItems.AsReadOnly(); }
}
}

public class ItemDetails
{
private int _parentId;

public ItemDetails(int parentId)
{
_parentId = parentId;
}

public int ParentId
{
get { return _parentId; }
}
}

示例代码：

主要目标是扫描列表并比较当前索引上的项和项详细信息。当parentID等于它的parentID时。将其添加到列表中，然后继续下一个详细信息。如果不一样的话，就去找下一个家长。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

// for performance tests..
DateTime startDateTime;

// create 2 lists (master/child list)
List<Item> itemList = new List<Item>();
List<ItemDetails> itemDetailList = new List<ItemDetails>();

Debug.WriteLine("# Adding items");
startDateTime = DateTime.Now;

// add items (sorted)
for (int i = 0; i < 400000; i++)
itemList.Add(new Item(i));

// show how long it took
Debug.WriteLine("Total milliseconds:" + (DateTime.Now - startDateTime).TotalMilliseconds.ToString("0") +"ms" );

// adding some random details (also sorted)
Debug.WriteLine("# Adding itemdetails");
Random rnd = new Random(DateTime.Now.Millisecond);

startDateTime = DateTime.Now;

int index = 0;
for (int i = 0; i < 800000; i++)
{
// when the random number is bigger than 2, index will be increased by 1
index += rnd.Next(5) > 2 ? 1 : 0;
itemDetailList.Add(new ItemDetails(index));
}
Debug.WriteLine("Total milliseconds:" + (DateTime.Now - startDateTime).TotalMilliseconds.ToString("0") +"ms");

// show how many items the lists contains
Debug.WriteLine("ItemList Count:" + itemList.Count());
Debug.WriteLine("ItemDetailList Count:" + itemDetailList.Count());

// matching items
Debug.WriteLine("# Matching items");
startDateTime = DateTime.Now;

int itemIndex = 0;
int itemDetailIndex = 0;

int itemMaxIndex = itemList.Count;
int itemDetailMaxIndex = itemDetailList.Count;

// while we didn't reach any end of the lists, continue...
while ((itemIndex < itemMaxIndex) && (itemDetailIndex < itemDetailMaxIndex))
{
// if the detail.parentid matches the item.id. add it to the list.
if (itemList[itemIndex].Id == itemDetailList[itemDetailIndex].ParentId)
{
itemList[itemIndex].AddItemDetail(itemDetailList[itemDetailIndex]);
// increase the detail index.
itemDetailIndex++;
}
else
// the detail.parentid didn't matches the item.id so check the next 1
itemIndex++;
}

Debug.WriteLine("Total milliseconds:" + (DateTime.Now - startDateTime).TotalMilliseconds.ToString("0") +"ms");

。结果

我花了10倍多的时间来获得更好的结果：

添加项目：总毫秒数：140ms添加项目详细信息：总毫秒数：203ms项目列表计数：400000项目详细列表计数：800000匹配项：总毫秒数：265ms

这是打字很快，可能更干净。所以我希望你能读懂。玩吧。

你好，杰罗恩。

由于两个列表都是按相同的值排序的，所以您可以并行地循环它们：

1
2
3
4
5
6
7
8
9
10
11
12

int index1 = 0, index2 = 0;
while (index1 < list1.Count && index2 < list2.Count) {
while (index1 < list1.Count && list1[index1].ChildId < list2[index2].Id) index1++;
if (index1 < list1.Count) {
while (index2 < list2.Count && list2[index2].Id < list1[index1].ChildId) index2++;
if (index2 < list2.Count && list1[index1].ChildId == list2[index2].Id) {
list1[index].Child = list2[index2];
index1++;
index2++;
}
}
}

或：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

int index1 = 0, index2 = 0;
while (index1 < list1.Count && index2 < list2.Count) {
if (list1[index1].ChildId == list2[index2].Id) {
list1[index].Child = list2[index2];
index1++;
index2++;
} else {
if (list1[index1].ChildId < list2[index2].Id) {
index1++;
} else {
index2++;
}
}
}

。

另一个有效的选择是通过将列表中的一个列表放入字典来创建索引，但这并不利用列表的顺序：

1
2
3
4
5
6
7
8
9
10

Dictionary<int, TypeOfChild> index = new Dictionary<int, TypeOfChild>();
foreach (TypeOfChild child in list2) {
index.Add(child.Id, child);
}
foreach (TypeOfParent parent in list1) {
TypeOfChild child;
if (index.TryGetValue(parent.ChildId, out child) {
parent.Child = child;
}
}

。

这个怎么样：

1
2
3
4
5
6

var joined = list1.Join(list2, x => x.ChildID, x => x.ID, (x, y) => new { x, y });

foreach (var j in joined)
{
j.x.Child = j.y;
}

。

？

我以前遇到过这个问题，基于LINQ的搜索与基于DB的搜索相比非常慢，因为它不使用任何索引。

你考虑过用字典而不是列表吗？

您可以实现一个字典，然后可以使用containskey，而不是使用where，如果containskey确实存在，则使用index访问器获取值。

样本代码：

1
2
3
4
5
6
7
8
9

Dictionary<int, Child> list2 = ...;

...

foreach(var item in list1)
{
if (list2.ContainsKey(item.ChildID))
item.Child = list2[item.ChildID];
}

在索引所需的额外内存成本上，使用索引进行访问要比搜索列表快得多。