关于c＃：使用LINQ将列表拆分为子列表

Split List into Sublists with LINQ

我是否可以用项目索引作为每个拆分的分隔符，将一个List分为几个单独的SomeObject列表？

让我举例说明：

我有一个List，我需要一个List>或List[]，这样每个结果列表将包含原始列表的3个项目组(顺序)。

如。：

原始清单：[a, g, e, w, p, s, q, f, x, y, i, m, c]。
结果列表：[a, g, e], [w, p, s], [q, f, x], [y, i, m], 。

我还需要得到的列表大小作为这个函数的参数。

尝试以下代码。

1
2
3
4
5
6
7
8

public static IList<IList<T>> Split<T>(IList<T> source)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / 3)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}

其思想是首先按索引对元素进行分组。除以3的效果是将它们分成3组。然后将每组转换成一个列表，将List的IEnumerable转换成List的List。

相关讨论

GroupBy执行隐式排序。这会扼杀性能。我们需要的是某种selectmany的逆。
@说句公道话，如果有一个内置的IEnumerable分区系统就好了。
@Jaredpar使用扩展方法可以很容易地做到这一点。我怀疑它不在其中，部分原因是它不能很好地与SQL集成。myEnumerable.ingroupsof(3).select(subEnumerable=>subEnumerable.sum()).average()'加上重载会更好。
@正义，groupby可以通过散列实现。你怎么知道Groupby的实现"会扼杀性能"？
GroupBy在枚举所有元素之前不会返回任何内容。这就是为什么速度慢的原因。列表op wants是连续的，因此更好的方法可以在枚举原始列表之前生成第一个子列表[a,g,e]。
举一个无限IEnumerable的极端例子。GroupBy(x=>f(x)).First()永远不会产生一个群体。OP询问了关于列表的问题，但是如果我们使用IEnumerable编写代码，只进行一次迭代，我们就获得了性能优势。
有没有使用split的示例代码？
值得注意的是，.GroupBy(x => x.Index % 3)将把整个集合平均分为3个部分，因此如果您有30个项目，您将得到3个10个项目的列表。如果你有30个，当前的例子会给你10个3的列表。
@但尼克订单并没有按你的方式保存。这仍然是一件好事，但你会把它们分为(0,3,6,9，…)(1,4,7,10，…)(2,5,8,11，…)。如果秩序无关紧要，那也没关系，但在这种情况下，它听起来很重要。
@reafexus感谢您指出这一点。
检查morelinq在本文中的用法：stackoverflow.com/questions/13731796/create-batches-in-linq
这个简单的代码int i=0; return source.GroupBy(x => (i++/3)).ToList()也能工作吗？对我来说很好。
无法将generic list>转换为generic ilist>？
将IList> Split(IList source)改为IList> Split(IList source)
我用.GroupBy(x => Math.Round(x.Index / chunkSize))很成功
在最后一个.ToList();：错误45不能隐式地将System.Collections.Generic.List>类型转换为System.Collections.Generic.IList>。存在显式转换(是否缺少强制转换？)
@Yfeldblum请查看我的答案，不带group stackoverflow.com/a/53532961/3360759

这个问题有点老，但我刚刚写了这篇文章，我认为它比其他建议的解决方案要优雅一点：

1
2
3
4
5
6
7
8
9
10
11

/// <summary>
/// Break a list of items into chunks of a specific size
/// </summary>
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
while (source.Any())
{
yield return source.Take(chunksize);
source = source.Skip(chunksize);
}
}

相关讨论

一般情况下，caseyb建议的方法很好，事实上，如果您通过一个List，很难出错，也许我会将其改为：

1
2
3
4
5
6
7
8
9

public static IEnumerable<IEnumerable<T>> ChunkTrivialBetter<T>(this IEnumerable<T> source, int chunksize)
{
var pos = 0;
while (source.Skip(pos).Any())
{
yield return source.Skip(pos).Take(chunksize);
pos += chunksize;
}
}

这将避免大规模的呼叫链。然而，这种方法有一个普遍的缺陷。它具体化每个块的两个枚举，以突出显示问题，尝试运行：

1
2
3
4
5

foreach (var item in Enumerable.Range(1, int.MaxValue).Chunk(8).Skip(100000).First())
{
Console.WriteLine(item);
}
// wait forever

为了克服这一点，我们可以尝试卡梅伦的方法，它通过了上面的测试，因为它只经过一次枚举。

问题是它有一个不同的缺陷，它具体化了每个块中的每一个项目，这种方法的问题在于您的内存运行得很高。

要说明这一点，请尝试运行：

1
2
3
4
5
6
7

foreach (var item in Enumerable.Range(1, int.MaxValue)
.Select(x => x + new string('x', 100000))
.Clump(10000).Skip(100).First())
{
Console.Write('.');
}
// OutOfMemoryException

最后，任何实现都应该能够处理块的无序迭代，例如：

1 2	Enumerable.Range(1,3).Chunk(2).Reverse().ToArray() // should return [3],[1,2]

许多高度优化的解决方案，比如我对这个答案的第一次修订，都失败了。同样的问题也可以在Casperone的优化答案中看到。

要解决所有这些问题，可以使用以下方法：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249

namespace ChunkedEnumerator
{
public static class Extensions
{
class ChunkedEnumerable<T> : IEnumerable<T>
{
class ChildEnumerator : IEnumerator<T>
{
ChunkedEnumerable<T> parent;
int position;
bool done = false;
T current;

public ChildEnumerator(ChunkedEnumerable<T> parent)
{
this.parent = parent;
position = -1;
parent.wrapper.AddRef();
}

public T Current
{
get
{
if (position == -1 || done)
{
throw new InvalidOperationException();
}
return current;

}
}

public void Dispose()
{
if (!done)
{
done = true;
parent.wrapper.RemoveRef();
}
}

object System.Collections.IEnumerator.Current
{
get { return Current; }
}

public bool MoveNext()
{
position++;

if (position + 1 > parent.chunkSize)
{
done = true;
}

if (!done)
{
done = !parent.wrapper.Get(position + parent.start, out current);
}

return !done;

}

public void Reset()
{
// per http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset.aspx
throw new NotSupportedException();
}
}

EnumeratorWrapper<T> wrapper;
int chunkSize;
int start;

public ChunkedEnumerable(EnumeratorWrapper<T> wrapper, int chunkSize, int start)
{
this.wrapper = wrapper;
this.chunkSize = chunkSize;
this.start = start;
}

public IEnumerator<T> GetEnumerator()
{
return new ChildEnumerator(this);
}

System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}

}

class EnumeratorWrapper<T>
{
public EnumeratorWrapper (IEnumerable<T> source)
{
SourceEumerable = source;
}
IEnumerable<T> SourceEumerable {get; set;}

Enumeration currentEnumeration;

class Enumeration
{
public IEnumerator<T> Source { get; set; }
public int Position { get; set; }
public bool AtEnd { get; set; }
}

public bool Get(int pos, out T item)
{

if (currentEnumeration != null && currentEnumeration.Position > pos)
{
currentEnumeration.Source.Dispose();
currentEnumeration = null;
}

if (currentEnumeration == null)
{
currentEnumeration = new Enumeration { Position = -1, Source = SourceEumerable.GetEnumerator(), AtEnd = false };
}

item = default(T);
if (currentEnumeration.AtEnd)
{
return false;
}

while(currentEnumeration.Position < pos)
{
currentEnumeration.AtEnd = !currentEnumeration.Source.MoveNext();
currentEnumeration.Position++;

if (currentEnumeration.AtEnd)
{
return false;
}

}

item = currentEnumeration.Source.Current;

return true;
}

int refs = 0;

// needed for dispose semantics
public void AddRef()
{
refs++;
}

public void RemoveRef()
{
refs--;
if (refs == 0 && currentEnumeration != null)
{
var copy = currentEnumeration;
currentEnumeration = null;
copy.Source.Dispose();
}
}
}

public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
if (chunksize < 1) throw new InvalidOperationException();

var wrapper = new EnumeratorWrapper<T>(source);

int currentPos = 0;
T ignore;
try
{
wrapper.AddRef();
while (wrapper.Get(currentPos, out ignore))
{
yield return new ChunkedEnumerable<T>(wrapper, chunksize, currentPos);
currentPos += chunksize;
}
}
finally
{
wrapper.RemoveRef();
}
}
}

class Program
{
static void Main(string[] args)
{
int i = 10;
foreach (var group in Enumerable.Range(1, int.MaxValue).Skip(10000000).Chunk(3))
{
foreach (var n in group)
{
Console.Write(n);
Console.Write("");
}
Console.WriteLine();
if (i-- == 0) break;
}

var stuffs = Enumerable.Range(1, 10).Chunk(2).ToArray();

foreach (var idx in new [] {3,2,1})
{
Console.Write("idx" + idx +"");
foreach (var n in stuffs[idx])
{
Console.Write(n);
Console.Write("");
}
Console.WriteLine();
}

/*

10000001 10000002 10000003
10000004 10000005 10000006
10000007 10000008 10000009
10000010 10000011 10000012
10000013 10000014 10000015
10000016 10000017 10000018
10000019 10000020 10000021
10000022 10000023 10000024
10000025 10000026 10000027
10000028 10000029 10000030
10000031 10000032 10000033
idx 3 7 8
idx 2 5 6
idx 1 3 4
*/

Console.ReadKey();

}

}
}

您还可以为块的无序迭代引入一轮优化，这超出了这里的范围。

你应该选择哪种方法？这完全取决于你想解决的问题。如果你不关心第一个缺陷，那么简单的答案是非常有吸引力的。

注意，与大多数方法一样，对于多线程来说，这是不安全的，如果您希望使它线程安全，那么需要修改EDOCX1[1]的话，事情可能会变得很奇怪。

相关讨论

您可以使用一些使用Take和Skip的查询，但我相信这会在原始列表中添加太多的迭代。

相反，我认为您应该创建自己的迭代器，如下所示：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

public static IEnumerable<IEnumerable<T>> GetEnumerableOfEnumerables<T>(
IEnumerable<T> enumerable, int groupSize)
{
// The list to return.
List<T> list = new List<T>(groupSize);

// Cycle through all of the items.
foreach (T item in enumerable)
{
// Add the item.
list.Add(item);

// If the list has the number of elements, return that.
if (list.Count == groupSize)
{
// Return the list.
yield return list;

// Set the list to a new list.
list = new List<T>(groupSize);
}
}

// Return the remainder if there is any,
if (list.Count != 0)
{
// Return the list.
yield return list;
}
}

然后您可以调用它，它启用了LINQ，这样您就可以对结果序列执行其他操作。

根据萨姆的回答，我觉得如果没有：

再次遍历列表(我最初没有这样做)
在释放块之前将项目分组具体化(对于较大的项目块，可能存在内存问题)
山姆发布的所有代码

也就是说，这是另一个过程，我在IEnumerable的扩展方法中编写了这个过程，称为Chunk：

1
2
3
4
5
6
7
8
9
10
11

public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source,
int chunkSize)
{
// Validate parameters.
if (source == null) throw new ArgumentNullException("source");
if (chunkSize <= 0) throw new ArgumentOutOfRangeException("chunkSize",
"The chunkSize parameter must be a positive value.");

// Call the internal implementation.
return source.ChunkInternal(chunkSize);
}

没有什么奇怪的，只是基本的错误检查。

继续前进到ChunkInternal：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

private static IEnumerable<IEnumerable<T>> ChunkInternal<T>(
this IEnumerable<T> source, int chunkSize)
{
// Validate parameters.
Debug.Assert(source != null);
Debug.Assert(chunkSize > 0);

// Get the enumerator. Dispose of when done.
using (IEnumerator<T> enumerator = source.GetEnumerator())
do
{
// Move to the next element. If there's nothing left
// then get out.
if (!enumerator.MoveNext()) yield break;

// Return the chunked sequence.
yield return ChunkSequence(enumerator, chunkSize);
} while (true);
}

基本上，它获取IEnumerator并手动遍历每个项。它检查当前是否有任何要枚举的项。在通过枚举每个块之后，如果没有剩余的任何项，它就会爆发出来。

一旦检测到序列中有项目，则将内部IEnumerable实施的责任委托给ChunkSequence：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

private static IEnumerable<T> ChunkSequence<T>(IEnumerator<T> enumerator,
int chunkSize)
{
// Validate parameters.
Debug.Assert(enumerator != null);
Debug.Assert(chunkSize > 0);

// The count.
int count = 0;

// There is at least one item. Yield and then continue.
do
{
// Yield the item.
yield return enumerator.Current;
} while (++count < chunkSize && enumerator.MoveNext());
}

由于MoveNext已经被调用，IEnumerator已经传递给ChunkSequence了，它生成Current返回的项目，然后增加计数，确保返回的项目不超过chunkSize个，并且在每次迭代后按顺序移动到下一个项目(但如果生成的项目数超过块大小)。

如果没有剩余的项目，那么InternalChunk方法将在外循环中进行另一次传递，但是当第二次调用MoveNext时，它仍然会根据文档返回false(emphasis mine)：

If MoveNext passes the end of the collection, the enumerator is
positioned after the last element in the collection and MoveNext
returns false. When the enumerator is at this position, subsequent
calls to MoveNext also return false until Reset is called.

此时，循环将中断，序列序列将终止。

这是一个简单的测试：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

static void Main()
{
string s ="agewpsqfxyimc";

int count = 0;

// Group by three.
foreach (IEnumerable<char> g in s.Chunk(3))
{
// Print out the group.
Console.Write("Group: {0} -", ++count);

// Print the items.
foreach (char c in g)
{
// Print the item.
Console.Write(c +",");
}

// Finish the line.
Console.WriteLine();
}
}

输出：

1
2
3
4
5

Group: 1 - a, g, e,
Group: 2 - w, p, s,
Group: 3 - q, f, x,
Group: 4 - y, i, m,
Group: 5 - c,

一个重要的注意事项是，如果您不清空整个子序列或在父序列的任何点中断，这将不起作用。这是一个重要的警告，但是如果您的用例是使用序列序列的每个元素，那么这将对您有效。

另外，如果你玩这个命令，它会做一些奇怪的事情，就像山姆在某一点上做的那样。

相关讨论

我认为这是最好的解决办法…唯一的问题是列表没有长度…它有计数。但这很容易改变。我们甚至不需要构造列表，而是返回包含对主列表引用的IEnumerable(偏移量/长度组合)，这样做会更好。那么，如果组的大小很大，我们就不会浪费内存。如果你想让我写下来，请发表评论。
@阿米尔，我想看看你写的
这是一个很好和快速-卡梅隆张贴了一个非常相似的以及在你之后，唯一要注意的是，它缓冲块，这可能导致内存不足，如果块和项目大小是大的。请看我的答案，找到另一个答案，尽管有很多毛。
@Samsaffron是的，如果你在List中有大量的项目，你显然会因为缓冲而出现内存问题。回想起来，我应该在答案中注意到这一点，但当时的重点似乎是太多的迭代。也就是说，你的解决方案确实更复杂。我还没有测试过它，但现在它让我想知道是否有一个不那么毛茸茸的解决方案。
@卡斯帕隆是的…谷歌给了我这个页面，当时我正在寻找一种方法来拆分可枚举的，对于我的特定用例，我正在拆分从数据库返回的一个非常大的记录列表，如果我将它们具体化为一个列表，它会爆炸(事实上，dapper有一个缓冲区：对于这个用例来说是错误的选项)。
很想看到一个保持相同性能的更清洁的解决方案，这是一个非常有趣的问题。
@Samsaffron我今天就做这个工作。你需要线程安全吗？我很久以前就遇到过从数据库中引入东西的问题。在数据库中进行分块(如果您的表允许的话)有很大的帮助(我有一个超深的表，有数亿行，上面有自然的分区，这样我就可以在数据库级别进行分块，而不必让查询变得疯狂)。
线程安全确实不是我需要解决的问题，我的解决方案很容易适应线程安全的问题，但多线程可能会降低性能相当一点，除非进行其他调整，发现这个问题相当有趣，这就是为什么我花了一两个小时的原因。
最初，我确实传递了枚举器，而且我的实现非常简单，但是Enumerable.Range(0, 100).Chunk(3).Reverse().ToArray()这确实很难修复。
我最初需要线程安全，因为我的LINQ查询是context.select(getJobData).clump(10).asparallel().select(pro&zwnj；&8203；cessJobs)，我需要对作业进行clump，这样线程就可以运行一段合理的时间，而不是进行大量的线程切换。P
@Sam Saffron："Enumerable.Range(0，100).Chunk(3).Reverse().ToArray()"可以通过简单的："Enumerable.Range(0，100).Chunk(3).Select(e=>e.ToList()).Reverse().ToArray()"来修复，只要您处理小数据集…因为它明显地提前实现了内存中的所有数据。
@Sam Saffron：如果您需要从数据库中获取大数据集，我会认为深入研究表达式树(例如，Linq to SQL，如果它是那样的数据库)，并以某种方式制定一个解决方案，允许在neath下映射每个"块"或"分区"…那就是OFC。不是简单的东西^^
在效率/简单性权衡方面，这显然是赢家。但是评论太多了：)

好吧，这是我的看法：

完全懒惰：在无限可枚举项上工作
无中间复制/缓冲
O(N)执行时间
当内部序列只被部分使用时也可以工作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

public static IEnumerable<IEnumerable<T>> Chunks<T>(this IEnumerable<T> enumerable,
int chunkSize)
{
if (chunkSize < 1) throw new ArgumentException("chunkSize must be positive");

using (var e = enumerable.GetEnumerator())
while (e.MoveNext())
{
var remaining = chunkSize; // elements remaining in the current chunk
var innerMoveNext = new Func<bool>(() => --remaining > 0 && e.MoveNext());

yield return e.GetChunk(innerMoveNext);
while (innerMoveNext()) {/* discard elements skipped by inner iterator */}
}
}

private static IEnumerable<T> GetChunk<T>(this IEnumerator<T> e,
Func<bool> innerMoveNext)
{
do yield return e.Current;
while (innerMoveNext());
}

示例用法

1
2
3
4
5
6
7
8

var src = new [] {1, 2, 3, 4, 5, 6};

var c3 = src.Chunks(3); // {{1, 2, 3}, {4, 5, 6}};
var c4 = src.Chunks(4); // {{1, 2, 3, 4}, {5, 6}};

var sum = c3.Select(c => c.Sum()); // {6, 15}
var count = c3.Count(); // 2
var take2 = c3.Select(c => c.Take(2)); // {{1, 2}, {4, 5}}

解释

代码通过嵌套两个基于yield的迭代器来工作。

外部迭代器必须跟踪内部(块)迭代器有效地使用了多少元素。这是通过用innerMoveNext()关闭remaining来完成的。在外部迭代器生成下一个块之前，将丢弃块中未使用的元素。这是必要的，因为如果内部可枚举项没有(完全)被消耗(例如，c3.Count()将返回6)，则会得到不一致的结果。

Note: The answer has been updated to address the shortcomings pointed out by @aolszowka.

相关讨论

很不错的。我的"正确"解决方案比这要复杂得多。这是1答案imho。
当调用toArray()时，这会遇到意外的(从API的角度来看)行为，它也不是线程安全的。
@奥尔索沃卡：你能详细说明一下吗？
@3dGrabber可能是我重新考虑代码的原因(抱歉，它太长了，无法在这里传递，基本上不是我在sourceEnumerator中传递的扩展方法)。我使用的测试用例就是这样的：int[]arraytosort=new int[]9，7，2，6，3，4，8，5，1，10，11，12，13 var source=chunkify(arraytosort，3).toarray()；导致source指示有13个块(元素的数量)。这对我来说是有意义的，除非您查询内部枚举，否则枚举器不会递增。
@3dGrabber(续)此外，由于内部枚举在访问之前不会生成，因此尝试在内部枚举之间执行某种类型的线程操作可能会导致线程在另一个线程读取sourceEnumerator.current之前调用sourceEnumerator.moveNext()的争用条件。至少我是这么想的，如果我错了，请告诉我，我愿意学习！
@Aolszowka：非常有效的观点。我添加了一个警告和用法部分。代码假定您在内部可枚举的上迭代。但是，有了你的解决方案，你就丧失了懒惰。我认为应该可以通过一个自定义的高速缓存IEnumerator来实现这两个方面的最佳效果。如果我找到解决方案，我会把它贴在这里…
yield return enumerator.getchunk(chunkSize.toArray()；是一个以缓冲为代价的简单解决方案。另一种选择是，当上一个内部可枚举项在下一个外部可枚举项生成时没有耗尽时抛出异常。(我怀疑你只是在需要的时候才考虑缓冲，那也很酷。)
修复了问题并更新了答案
@3dGrabber我试图用它(因为它很优雅)来处理非懒惰的情况，以分割更大的复杂对象集合(基本上是get和.tolist())，但似乎不能让它返回超过第一个块的内容。没有自定义枚举器。意识到这一点是模糊的，你知道为什么这会发生在一个直接的(非通用的)副本上吗？
@3格雷勃不介意，对不起打扰了。在我的例子中，我扩展了innermovenext来做一些日志记录，并且能够看到对while (innerMoveNext()) {/* discard elements skipped by inner iterator */}的调用是问题所在——我一删除它，它就对我有效了。我不确定我对你的消息来源足够熟悉，不知道这行是否有某种用途，但在我的例子中，在GetChunk中的枚举似乎是多余的。
在花了一些宝贵的时间(忽略了我之前/隐藏的评论)之后，我同意caseyb的观点，这是一个伟大、优雅(最终)的简单解决方案。但我认为，discard elements...评论行没有可测量的目的(由于"内部"循环中的do/while，剩余的总是小于0)，它混淆了这个问题——尝试将int chunkSize改为uint，享受聚会。(否则，如果你"部分消费"，那么不管怎样，你都要在那条线之前停下来。)以防万一人们损失了一两天的时间，我试着去理解它。
为了了解while (innerMoveNext())的目的，将其注释掉，并从示例中运行c3.Count()。

完全懒惰，不计数或复制：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

public static class EnumerableExtensions
{

public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int len)
{
if (len == 0)
throw new ArgumentNullException();

var enumer = source.GetEnumerator();
while (enumer.MoveNext())
{
yield return Take(enumer.Current, enumer, len);
}
}

private static IEnumerable<T> Take<T>(T head, IEnumerator<T> tail, int len)
{
while (true)
{
yield return head;
if (--len == 0)
break;
if (tail.MoveNext())
head = tail.Current;
else
break;
}
}
}

相关讨论

我认为下面的建议是最快的。为了使用数组，我牺牲了可枚举源的懒散。提前复制并知道每个子列表的长度。

1
2
3
4
5
6
7
8
9
10

public static IEnumerable<T[]> Chunk<T>(this IEnumerable<T> items, int size)
{
T[] array = items as T[] ?? items.ToArray();
for (int i = 0; i < array.Length; i+=size)
{
T[] chunk = new T[Math.Min(size, array.Length - i)];
Array.Copy(array, i, chunk, 0, chunk.Length);
yield return chunk;
}
}

我们可以改进@jaredpar的解决方案来进行真正的懒惰评估。我们使用GroupAdjacentBy方法，生成具有相同键的连续元素组：

1
2
3
4

sequence
.Select((x, i) => new { Value = x, Index = i })
.GroupAdjacentBy(x=>x.Index/3)
.Select(g=>g.Select(x=>x.Value))

由于群是一个接一个地产生的，所以这个解对长序列或无限序列有效地工作。

Interactive为此提供了Buffer()。一些快速测试显示性能类似于Sam的解决方案。

相关讨论

几年前我写了一个丛扩展方法。效果很好，是这里最快的实现。P

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

/// <summary>
/// Clumps items into same size lots.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="source">The source list of items.</param>
/// <param name="size">The maximum size of the clumps to make.</param>
/// <returns>A list of list of items, where each list of items is no bigger than the size given.</returns>
public static IEnumerable<IEnumerable<T>> Clump<T>(this IEnumerable<T> source, int size)
{
if (source == null)
throw new ArgumentNullException("source");
if (size < 1)
throw new ArgumentOutOfRangeException("size","size must be greater than 0");

return ClumpIterator<T>(source, size);
}

private static IEnumerable<IEnumerable<T>> ClumpIterator<T>(IEnumerable<T> source, int size)
{
Debug.Assert(source != null,"source is null.");

T[] items = new T[size];
int count = 0;
foreach (var item in source)
{
items[count] = item;
count++;

if (count == size)
{
yield return items;
items = new T[size];
count = 0;
}
}
if (count > 0)
{
if (count == size)
yield return items;
else
{
T[] tempItems = new T[count];
Array.Copy(items, tempItems, count);
yield return tempItems;
}
}
}

相关讨论

这是我几个月前写的一个清单分割程序：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

public static List<List<T>> Chunk<T>(
List<T> theList,
int chunkSize
)
{
List<List<T>> result = theList
.Select((x, i) => new {
data = x,
indexgroup = i / chunkSize
})
.GroupBy(x => x.indexgroup, x => x.data)
.Select(g => new List<T>(g))
.ToList();

return result;
}

相关讨论

这是一个古老的问题，但这是我最后得出的结论；它只枚举一次可枚举的，但为每个分区创建了列表。当像某些实现那样调用ToArray()时，它不会受到意外行为的影响：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

public static IEnumerable<IEnumerable<T>> Partition<T>(IEnumerable<T> source, int chunkSize)
{
if (source == null)
{
throw new ArgumentNullException("source");
}

if (chunkSize < 1)
{
throw new ArgumentException("Invalid chunkSize:" + chunkSize);
}

using (IEnumerator<T> sourceEnumerator = source.GetEnumerator())
{
IList<T> currentChunk = new List<T>();
while (sourceEnumerator.MoveNext())
{
currentChunk.Add(sourceEnumerator.Current);
if (currentChunk.Count == chunkSize)
{
yield return currentChunk;
currentChunk = new List<T>();
}
}

if (currentChunk.Any())
{
yield return currentChunk;
}
}
}

相关讨论

我发现这个小片段做得很好。

1
2
3
4
5
6
7
8
9
10

public static IEnumerable<List<T>> Chunked<T>(this List<T> source, int chunkSize)
{
var offset = 0;

while (offset < source.Count)
{
yield return source.GetRange(offset, Math.Min(source.Count - offset, chunkSize));
offset += chunkSize;
}
}

我们发现大卫B的解决方案效果最好。但我们将其应用于更通用的解决方案：

1
2
3

list.GroupBy(item => item.SomeProperty)
.Select(group => new List<T>(group))
.ToArray();

相关讨论

旧代码，但这是我一直使用的：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

public static IEnumerable<List<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
{
var toReturn = new List<T>(max);
foreach (var item in source)
{
toReturn.Add(item);
if (toReturn.Count == max)
{
yield return toReturn;
toReturn = new List<T>(max);
}
}
if (toReturn.Any())
{
yield return toReturn;
}
}

相关讨论

这个怎么样？

1
2
3
4
5
6

var input = new List<string> {"a","g","e","w","p","s","q","f","x","y","i","m","c" };
var k = 3

var res = Enumerable.Range(0, (input.Count - 1) / k + 1)
.Select(i => input.GetRange(i * k, Math.Min(k, input.Count - i * k)))
.ToList();

据我所知，getrange()是线性的。所以这应该表现得很好。

下面的解决方案是我能想到的最紧凑的解决方案，即O(N)。

1
2
3
4
5
6
7
8
9
10
11
12

public static IEnumerable<T[]> Chunk<T>(IEnumerable<T> source, int chunksize)
{
var list = source as IList<T> ?? source.ToList();
for (int start = 0; start < list.Count; start += chunksize)
{
T[] chunk = new T[Math.Min(chunksize, list.Count - start)];
for (int i = 0; i < chunk.Length; i++)
chunk[i] = list[start + i];

yield return chunk;
}
}

如果列表的类型为System.Collections.Generic，则可以使用"copy to"方法将数组的元素复制到其他子数组。指定要复制的开始元素和元素数。

您还可以对原始列表进行3个克隆，并使用每个列表上的"removerange"将列表缩小到您想要的大小。

或者只需创建一个助手方法来为您完成这项工作。

这是一个古老的解决方案，但我有不同的方法。我使用Skip移动到所需的偏移量，使用Take提取所需数量的元素：

1
2
3
4
5
6
7
8
9
10
11
12

public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source,
int chunkSize)
{
if (chunkSize <= 0)
throw new ArgumentOutOfRangeException($"{nameof(chunkSize)} should be > 0");

var nbChunks = (int)Math.Ceiling((double)source.Count()/chunkSize);

return Enumerable.Range(0, nbChunks)
.Select(chunkNb => source.Skip(chunkNb*chunkSize)
.Take(chunkSize));
}

相关讨论

就把我的两分钱放进去。如果您想"存储"列表(从左到右可视化)，可以执行以下操作：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

public static List<List<T>> Buckets<T>(this List<T> source, int numberOfBuckets)
{
List<List<T>> result = new List<List<T>>();
for (int i = 0; i < numberOfBuckets; i++)
{
result.Add(new List<T>());
}

int count = 0;
while (count < source.Count())
{
var mod = count % numberOfBuckets;
result[mod].Add(source[count]);
count++;
}
return result;
}

使用模块化分区：

1
2
3
4
5

public IEnumerable<IEnumerable<string>> Split(IEnumerable<string> input, int chunkSize)
{
var chunks = (int)Math.Ceiling((double)input.Count() / (double)chunkSize);
return Enumerable.Range(0, chunks).Select(id => input.Where(s => s.GetHashCode() % chunks == id));
}

另一种方法是使用Rx缓冲运算符

1
2
3
4
5
6
7

//using System.Linq;
//using System.Reactive.Linq;
//using System.Reactive.Threading.Tasks;

var observableBatches = anAnumerable.ToObservable().Buffer(size);

var batches = aList.ToObservable().Buffer(size).ToList().ToTask().GetAwaiter().GetResult();

相关讨论

对于任何对打包/维护解决方案感兴趣的人，morelinq库提供了符合您请求行为的Batch扩展方法：

1 2	IEnumerable<char> source ="Example string"; IEnumerable<IEnumerable<char>> chunksOfThreeChars = source.Batch(3);

Batch的实现类似于Cameron Macfarland的答案，在返回之前添加了一个用于转换块/批的过载，并且性能相当好。

我得到了初步的答案，并将其作为一个IOC容器来决定在何处拆分。(对于谁来说，在搜索答案的同时阅读这篇文章，只想把3个项目分开？)

此方法允许根据需要对任何类型的项进行拆分。

1
2
3
4
5
6
7
8
9
10
11
12
13

public static List<List<T>> SplitOn<T>(List<T> main, Func<T, bool> splitOn)
{
int groupIndex = 0;

return main.Select( item => new
{
Group = (splitOn.Invoke(item) ? ++groupIndex : groupIndex),
Value = item
})
.GroupBy( it2 => it2.Group)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}

所以对于操作来说，代码是

1
2
3
4
5

var it = new List<string>()
{"a","g","e","w","p","s","q","f","x","y","i","m","c" };

int index = 0;
var result = SplitOn(it, (itm) => (index++ % 3) == 0 );

所以表现得像山姆·萨弗隆的方法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size),"Size must be greater than zero.");

return BatchImpl(source, size).TakeWhile(x => x.Any());
}

static IEnumerable<IEnumerable<T>> BatchImpl<T>(this IEnumerable<T> source, int size)
{
var values = new List<T>();
var group = 1;
var disposed = false;
var e = source.GetEnumerator();

try
{
while (!disposed)
{
yield return GetBatch(e, values, group, size, () => { e.Dispose(); disposed = true; });
group++;
}
}
finally
{
if (!disposed)
e.Dispose();
}
}

static IEnumerable<T> GetBatch<T>(IEnumerator<T> e, List<T> values, int group, int size, Action dispose)
{
var min = (group - 1) * size + 1;
var max = group * size;
var hasValue = false;

while (values.Count < min && e.MoveNext())
{
values.Add(e.Current);
}

for (var i = min; i <= max; i++)
{
if (i <= values.Count)
{
hasValue = true;
}
else if (hasValue = e.MoveNext())
{
values.Add(e.Current);
}
else
{
dispose();
}

if (hasValue)
yield return values[i - 1];
else
yield break;
}
}

}

可以使用无限生成器：

1
2
3

a.Zip(a.Skip(1), (x, y) => Enumerable.Repeat(x, 1).Concat(Enumerable.Repeat(y, 1)))
.Zip(a.Skip(2), (xy, z) => xy.Concat(Enumerable.Repeat(z, 1)))
.Where((x, i) => i % 3 == 0)

演示代码：https://ideone.com/gkml7m

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

using System;
using System.Collections.Generic;
using System.Linq;

public class Test
{
private static void DoIt(IEnumerable<int> a)
{
Console.WriteLine(String.Join("", a));

foreach (var x in a.Zip(a.Skip(1), (x, y) => Enumerable.Repeat(x, 1).Concat(Enumerable.Repeat(y, 1))).Zip(a.Skip(2), (xy, z) => xy.Concat(Enumerable.Repeat(z, 1))).Where((x, i) => i % 3 == 0))
Console.WriteLine(String.Join("", x));

Console.WriteLine();
}

public static void Main()
{
DoIt(new int[] {1});
DoIt(new int[] {1, 2});
DoIt(new int[] {1, 2, 3});
DoIt(new int[] {1, 2, 3, 4});
DoIt(new int[] {1, 2, 3, 4, 5});
DoIt(new int[] {1, 2, 3, 4, 5, 6});
}
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

1

1 2

1 2 3
1 2 3

1 2 3 4
1 2 3

1 2 3 4 5
1 2 3

1 2 3 4 5 6
1 2 3
4 5 6

但实际上，我更喜欢编写没有LINQ的对应方法。

插入我的两分钱…

通过使用要分块的源的列表类型，我发现了另一个非常紧凑的解决方案：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

public static IEnumerable<IEnumerable<TSource>> Chunk<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
// copy the source into a list
var chunkList = source.ToList();

// return chunks of 'chunkSize' items
while (chunkList.Count > chunkSize)
{
yield return chunkList.GetRange(0, chunkSize);
chunkList.RemoveRange(0, chunkSize);
}

// return the rest
yield return chunkList;
}