How do you split a list into evenly sized chunks?
我有一个任意长度的列表,我需要把它分成大小相等的块并对其进行操作。有一些明显的方法可以做到这一点,比如保留一个计数器和两个列表,当第二个列表填满时,将其添加到第一个列表中,并清空第二个列表以获取下一轮数据,但这可能非常昂贵。
我想知道对于任何长度的列表,比如使用生成器,是否有人对此有一个很好的解决方案。
我在找一些对
相关问题:什么是最"Python式"的方法,以块的形式迭代列表?
下面是一个生成您想要的块的生成器:
1 2 3 4 | def chunks(l, n): """Yield successive n-sized chunks from l.""" for i in range(0, len(l), n): yield l[i:i + n] |
1 2 3 4 5 6 7 8 9 | import pprint pprint.pprint(list(chunks(range(10, 75), 10))) [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74]] |
如果您使用的是python 2,那么应该使用
1 2 3 4 | def chunks(l, n): """Yield successive n-sized chunks from l.""" for i in xrange(0, len(l), n): yield l[i:i + n] |
此外,您可以简单地使用列表理解而不是编写函数。Python 3:
1 | [l[i:i + n] for i in range(0, len(l), n)] |
python 2版本:
1 | [l[i:i + n] for i in xrange(0, len(l), n)] |
如果你想要超简单的东西:
1 2 3 | def chunks(l, n): n = max(1, n) return (l[i:i+n] for i in xrange(0, len(l), n)) |
对于python 3.x,使用
直接从(旧的)python文档(itertools的配方)中获取:
1 2 3 4 5 | from itertools import izip, chain, repeat def grouper(n, iterable, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return izip(*[chain(iterable, repeat(padvalue, n-1))]*n) |
如J.F.Sebastian建议的当前版本:
1 2 3 4 5 6 7 | #from itertools import izip_longest as zip_longest # for Python 2.x from itertools import zip_longest # for Python 3.x #from six.moves import zip_longest # for both (uses the six compat library) def grouper(n, iterable, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue) |
我猜吉多的时间机器工作会工作,会工作的,会再次工作。
这些解决方案之所以有效,是因为
我知道这有点老套,但我不知道为什么没人提到
1 2 3 4 5 6 7 8 | lst = range(50) In [26]: np.array_split(lst,5) Out[26]: [array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]), array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]), array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]), array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])] |
我很惊讶没有人想到使用
1 2 3 4 5 | from itertools import islice def chunk(it, size): it = iter(it) return iter(lambda: tuple(islice(it, size)), ()) |
演示:
1 2 | >>> list(chunk(range(14), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)] |
这与任何一个iterable一起工作,并延迟地生成输出。它返回元组而不是迭代器,但我认为它仍然具有一定的优雅性。它也不填充;如果您想要填充,上面的简单变化就足够了:
1 2 3 4 5 | from itertools import islice, chain, repeat def chunk_pad(it, size, padval=None): it = chain(iter(it), repeat(padval)) return iter(lambda: tuple(islice(it, size)), (padval,) * size) |
演示:
1 2 3 4 | >>> list(chunk_pad(range(14), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)] >>> list(chunk_pad(range(14), 3, 'a')) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')] |
像基于
1 2 3 4 5 6 7 8 9 10 | _no_padding = object() def chunk(it, size, padval=_no_padding): if padval == _no_padding: it = iter(it) sentinel = () else: it = chain(iter(it), repeat(padval)) sentinel = (padval,) * size return iter(lambda: tuple(islice(it, size)), sentinel) |
演示:
1 2 3 4 5 6 | >>> list(chunk(range(14), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)] >>> list(chunk(range(14), 3, None)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)] >>> list(chunk(range(14), 3, 'a')) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')] |
我相信这是最短的春格建议,提供可选填充。
正如Tomasz Gandor所观察到的,如果两个填充块遇到一个长的pad值序列,它们将意外停止。以下是以合理方式解决该问题的最后一个变体:
1 2 3 4 5 6 7 8 9 | _no_padding = object() def chunk(it, size, padval=_no_padding): it = iter(it) chunker = iter(lambda: tuple(islice(it, size)), ()) if padval == _no_padding: yield from chunker else: for ch in chunker: yield ch if len(ch) == size else ch + (padval,) * (size - len(ch)) |
演示:
1 2 3 4 | >>> list(chunk([1, 2, (), (), 5], 2)) [(1, 2), ((), ()), (5,)] >>> list(chunk([1, 2, None, None, 5], 2, None)) [(1, 2), (None, None), (5, None)] |
这是一个在任意ITerables上工作的生成器:
1 2 3 4 5 6 | def split_seq(iterable, size): it = iter(iterable) item = list(itertools.islice(it, size)) while item: yield item item = list(itertools.islice(it, size)) |
例子:
1 2 3 4 5 6 7 8 9 10 | >>> import pprint >>> pprint.pprint(list(split_seq(xrange(75), 10))) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74]] |
1 2 | def chunk(input, size): return map(None, *([iter(input)] * size)) |
简单而优雅
1 2 | l = range(1, 1000) print [l[x:x+10] for x in xrange(0, len(l), 10)] |
或者如果你喜欢:
1 2 | chunks = lambda l, n: [l[x: x+n] for x in xrange(0, len(l), n)] chunks(l, 10) |
评论其他答案:
这些答案中没有一个是大小相等的块,它们都会在末尾留下一个不完整的块,所以它们没有完全平衡。如果您使用这些函数来分配工作,那么您就有了一个很可能比其他函数完成得好的前景,因此当其他函数继续努力工作时,它将无所事事。
例如,当前的热门答案以以下结尾:
1 2 | [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74]] |
我只是讨厌最后那个小矮子!
其他的,如
为什么我们不能更好地分配这些呢?
我的解决方案这里有一个平衡的解决方案,它改编自我在生产中使用的一个函数(在python 3中注意用
1 2 3 4 5 | def baskets_from(items, maxbaskets=25): baskets = [[] for _ in xrange(maxbaskets)] # in Python 3 use range for i, item in enumerate(items): baskets[i % maxbaskets].append(item) return filter(None, baskets) |
我创建了一个生成器,如果你把它放到一个列表中,它也会这样做:
1 2 3 4 5 6 | def iter_baskets_from(items, maxbaskets=3): '''generates evenly balanced baskets from indexable iterable''' item_count = len(items) baskets = min(item_count, maxbaskets) for x_i in xrange(baskets): yield [items[y_i] for y_i in xrange(x_i, item_count, baskets)] |
最后,由于我看到上面所有的函数都以连续的顺序返回元素(如给定的那样):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | def iter_baskets_contiguous(items, maxbaskets=3, item_count=None): ''' generates balanced baskets from iterable, contiguous contents provide item_count if providing a iterator that doesn't support len() ''' item_count = item_count or len(items) baskets = min(item_count, maxbaskets) items = iter(items) floor = item_count // baskets ceiling = floor + 1 stepdown = item_count % baskets for x_i in xrange(baskets): length = ceiling if x_i < stepdown else floor yield [items.next() for _ in xrange(length)] |
产量
要测试它们:
1 2 3 4 5 6 7 8 9 10 11 12 | print(baskets_from(xrange(6), 8)) print(list(iter_baskets_from(xrange(6), 8))) print(list(iter_baskets_contiguous(xrange(6), 8))) print(baskets_from(xrange(22), 8)) print(list(iter_baskets_from(xrange(22), 8))) print(list(iter_baskets_contiguous(xrange(22), 8))) print(baskets_from('ABCDEFG', 3)) print(list(iter_baskets_from('ABCDEFG', 3))) print(list(iter_baskets_contiguous('ABCDEFG', 3))) print(baskets_from(xrange(26), 5)) print(list(iter_baskets_from(xrange(26), 5))) print(list(iter_baskets_contiguous(xrange(26), 5))) |
打印出:
1 2 3 4 5 6 7 8 9 10 11 12 | [[0], [1], [2], [3], [4], [5]] [[0], [1], [2], [3], [4], [5]] [[0], [1], [2], [3], [4], [5]] [[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]] [[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]] [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]] [['A', 'D', 'G'], ['B', 'E'], ['C', 'F']] [['A', 'D', 'G'], ['B', 'E'], ['C', 'F']] [['A', 'B', 'C'], ['D', 'E'], ['F', 'G']] [[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]] [[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]] [[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]] |
注意,连续的生成器提供的块与其他两个生成器具有相同的长度模式,但是这些项都是有序的,并且它们被均匀地划分,就像一个可以划分离散元素列表一样。
我在这个问题的副本中看到了最棒的python-ish答案:
1 2 3 4 5 6 7 | from itertools import zip_longest a = range(1, 16) i = iter(a) r = list(zip_longest(i, i, i)) >>> print(r) [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)] |
您可以为任何n创建n元组。如果
1 | [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)] |
如果列表被平均划分,那么可以用
如果知道列表大小:
1 2 | def SplitList(mylist, chunk_size): return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)] |
如果没有(迭代器):
1 2 3 4 5 6 7 8 9 | def IterChunks(sequence, chunk_size): res = [] for item in sequence: res.append(item) if len(res) >= chunk_size: yield res res = [] if res: yield res # yield the last, incomplete, portion |
在后一种情况下,如果您可以确保序列始终包含给定大小的整块(即没有不完整的最后一块),则可以以更漂亮的方式重新表述它。
例如,如果块大小为3,则可以执行以下操作:
1 | zip(*[iterable[i::3] for i in range(3)]) |
来源:http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/
当块大小为固定数字时,我会使用这个,例如"3",并且永远不会更改。
生成器表达式:
1 2 | def chunks(seq, n): return (seq[i:i+n] for i in xrange(0, len(seq), n)) |
如。
1 | print list(chunks(range(1, 1000), 10)) |
toolz库为此具有
1 2 3 4 | from toolz.itertoolz.core import partition list(partition(2, [1, 2, 3, 4])) [(1, 2), (3, 4)] |
我很喜欢Tzot和J.F.Sebastian提出的python Doc版本,但它有两个缺点:
- 不是很明确
- 我通常不希望在最后一个块中有填充值
在我的代码中我经常使用这个:
1 2 3 4 5 6 | from itertools import islice def chunks(n, iterable): iterable = iter(iterable) while True: yield tuple(islice(iterable, n)) or iterable.next() |
更新:一个懒块版本:
1 2 3 4 5 6 | from itertools import chain, islice def chunks(n, iterable): iterable = iter(iterable) while True: yield chain([next(iterable)], islice(iterable, n-1)) |
现在,我认为我们需要一个递归生成器,以防万一…
在Python 2中:
1 2 3 4 5 6 | def chunks(li, n): if li == []: return yield li[:n] for e in chunks(li[n:], n): yield e |
在Python 3中:
1 2 3 4 5 | def chunks(li, n): if li == []: return yield li[:n] yield from chunks(li[n:], n) |
此外,在大量外星人入侵的情况下,一个装饰过的递归生成器可能变得方便:
1 2 3 4 5 6 7 8 9 10 11 12 13 | def dec(gen): def new_gen(li, n): for e in gen(li, n): if e == []: return yield e return new_gen @dec def chunks(li, n): yield li[:n] for e in chunks(li[n:], n): yield e |
我对不同方法的性能很好奇,这里是:
在python 3.5.1上测试
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | import time batch_size = 7 arr_len = 298937 #---------slice------------- print(" slice") start = time.time() arr = [i for i in range(0, arr_len)] while True: if not arr: break tmp = arr[0:batch_size] arr = arr[batch_size:-1] print(time.time() - start) #-----------index----------- print(" index") arr = [i for i in range(0, arr_len)] start = time.time() for i in range(0, round(len(arr) / batch_size + 1)): tmp = arr[batch_size * i : batch_size * (i + 1)] print(time.time() - start) #----------batches 1------------ def batch(iterable, n=1): l = len(iterable) for ndx in range(0, l, n): yield iterable[ndx:min(ndx + n, l)] print(" batches 1") arr = [i for i in range(0, arr_len)] start = time.time() for x in batch(arr, batch_size): tmp = x print(time.time() - start) #----------batches 2------------ from itertools import islice, chain def batch(iterable, size): sourceiter = iter(iterable) while True: batchiter = islice(sourceiter, size) yield chain([next(batchiter)], batchiter) print(" batches 2") arr = [i for i in range(0, arr_len)] start = time.time() for x in batch(arr, batch_size): tmp = x print(time.time() - start) #---------chunks------------- def chunks(l, n): """Yield successive n-sized chunks from l.""" for i in range(0, len(l), n): yield l[i:i + n] print(" chunks") arr = [i for i in range(0, arr_len)] start = time.time() for x in chunks(arr, batch_size): tmp = x print(time.time() - start) #-----------grouper----------- from itertools import zip_longest # for Python 3.x #from six.moves import zip_longest # for both (uses the six compat library) def grouper(iterable, n, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue) arr = [i for i in range(0, arr_len)] print(" grouper") start = time.time() for x in grouper(arr, batch_size): tmp = x print(time.time() - start) |
结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | slice 31.18285083770752 index 0.02184295654296875 batches 1 0.03503894805908203 batches 2 0.22681021690368652 chunks 0.019841909408569336 grouper 0.006506919860839844 |
您也可以使用
1 2 3 4 5 | >>> from utilspie import iterutils >>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> list(iterutils.get_chunks(a, 5)) [[1, 2, 3, 4, 5], [6, 7, 8, 9]] |
您可以通过pip安装
1 | sudo pip install utilspie |
免责声明:我是utilspie库的创建者。
1 | [AA[i:i+SS] for i in range(len(AA))[::SS]] |
其中aa是数组,ss是块大小。例如:
1 2 3 4 | >>> AA=range(10,21);SS=3 >>> [AA[i:i+SS] for i in range(len(AA))[::SS]] [[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]] # or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3 |
代码:
1 2 3 4 5 6 7 8 9 10 | def split_list(the_list, chunk_size): result_list = [] while the_list: result_list.append(the_list[:chunk_size]) the_list = the_list[chunk_size:] return result_list a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] print split_list(a_list, 3) |
结果:
1 | [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]] |
呵呵,单线版
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | In [48]: chunk = lambda ulist, step: map(lambda i: ulist[i:i+step], xrange(0, len(ulist), step)) In [49]: chunk(range(1,100), 10) Out[49]: [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36, 37, 38, 39, 40], [41, 42, 43, 44, 45, 46, 47, 48, 49, 50], [51, 52, 53, 54, 55, 56, 57, 58, 59, 60], [61, 62, 63, 64, 65, 66, 67, 68, 69, 70], [71, 72, 73, 74, 75, 76, 77, 78, 79, 80], [81, 82, 83, 84, 85, 86, 87, 88, 89, 90], [91, 92, 93, 94, 95, 96, 97, 98, 99]] |
不调用len(),这对大列表很好:
1 2 3 4 5 6 7 | def splitter(l, n): i = 0 chunk = l[:n] while chunk: yield chunk i += n chunk = l[i:i+n] |
这是给Iterables的:
1 2 3 4 5 6 | def isplitter(l, n): l = iter(l) chunk = list(islice(l, n)) while chunk: yield chunk chunk = list(islice(l, n)) |
上述功能性风味:
1 2 3 4 | def isplitter2(l, n): return takewhile(bool, (tuple(islice(start, n)) for start in repeat(iter(l)))) |
或:
1 2 3 | def chunks_gen_sentinel(n, seq): continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n)) return iter(imap(tuple, continuous_slices).next,()) |
或:
1 2 3 | def chunks_gen_filter(n, seq): continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n)) return takewhile(bool,imap(tuple, continuous_slices)) |
1 2 3 4 5 6 | def split_seq(seq, num_pieces): start = 0 for i in xrange(num_pieces): stop = start + len(seq[i::num_pieces]) yield seq[start:stop] start = stop |
用途:
1 2 3 4 | seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] for seq in split_seq(seq, 3): print seq |
另一个更明确的版本。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | def chunkList(initialList, chunkSize): """ This function chunks a list into sub lists that have a length equals to chunkSize. Example: lst = [3, 4, 9, 7, 1, 1, 2, 3] print(chunkList(lst, 3)) returns [[3, 4, 9], [7, 1, 1], [2, 3]] """ finalList = [] for i in range(0, len(initialList), chunkSize): finalList.append(initialList[i:i+chunkSize]) return finalList |
再来一个解决方案
1 2 3 4 5 6 7 8 9 10 11 12 13 | def make_chunks(data, chunk_size): while data: chunk, data = data[:chunk_size], data[chunk_size:] yield chunk >>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2): ... print chunk ... [1, 2] [3, 4] [5, 6] [7] >>> |
此时,我认为我们需要强制的匿名递归函数。
1 2 | Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args))) chunks = Y(lambda f: lambda n: [n[0][:n[1]]] + f((n[0][n[1]:], n[1])) if len(n[0]) > 0 else []) |
考虑使用matplotlib.cBook片段
例如:
1 2 3 4 | import matplotlib.cbook as cbook segments = cbook.pieces(np.arange(20), 3) for s in segments: print s |
1 2 3 | a = [1, 2, 3, 4, 5, 6, 7, 8, 9] CHUNK = 4 [a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )] |
请参阅此参考资料
1 2 3 4 5 6 7 8 | >>> orange = range(1, 1001) >>> otuples = list( zip(*[iter(orange)]*10)) >>> print(otuples) [(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)] >>> olist = [list(i) for i in otuples] >>> print(olist) [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]] >>> |
Python 3
因为这里的每个人都在谈论迭代器。
1 2 3 | from boltons import iterutils list(iterutils.chunked_iter(list(range(50)), 11)) |
输出:
1 2 3 4 5 | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21], [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32], [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], [44, 45, 46, 47, 48, 49]] |
但是,如果你不想对记忆发泄,你可以用老办法把完整的
我意识到这个问题已经过时了(在Google上被绊倒了),但是像下面这样的问题肯定比任何复杂的建议都简单明了,而且只使用切片:
1 2 3 4 5 6 7 8 9 10 11 | def chunker(iterable, chunksize): for i,c in enumerate(iterable[::chunksize]): yield iterable[i*chunksize:(i+1)*chunksize] >>> for chunk in chunker(range(0,100), 10): ... print list(chunk) ... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] [20, 21, 22, 23, 24, 25, 26, 27, 28, 29] ... etc ... |
您可以使用numpy的array_split函数,例如
为了确保块的大小完全相同,请使用
根据这个答案,投票结果最高的答案在结尾处留下一个"小矮子"。这是我的解决方案,让你尽可能地得到大小均匀的块,没有小矮子。基本上,它试图精确地选择分割列表的小数点,但只需将其四舍五入为最接近的整数:
1 2 3 4 5 6 7 8 | from __future__ import division # not needed in Python 3 def n_even_chunks(l, n): """Yield n as even chunks as possible from l.""" last = 0 for i in range(1, n+1): cur = int(round(i * (len(l) / n))) yield l[last:cur] last = cur |
论证:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | >>> pprint.pprint(list(n_even_chunks(list(range(100)), 9))) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21], [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32], [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], [44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55], [56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66], [67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77], [78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88], [89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]] >>> pprint.pprint(list(n_even_chunks(list(range(100)), 11))) [[0, 1, 2, 3, 4, 5, 6, 7, 8], [9, 10, 11, 12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23, 24, 25, 26], [27, 28, 29, 30, 31, 32, 33, 34, 35], [36, 37, 38, 39, 40, 41, 42, 43, 44], [45, 46, 47, 48, 49, 50, 51, 52, 53, 54], [55, 56, 57, 58, 59, 60, 61, 62, 63], [64, 65, 66, 67, 68, 69, 70, 71, 72], [73, 74, 75, 76, 77, 78, 79, 80, 81], [82, 83, 84, 85, 86, 87, 88, 89, 90], [91, 92, 93, 94, 95, 96, 97, 98, 99]] |
与投票最多的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | >>> pprint.pprint(list(chunks(list(range(100)), 100//9))) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21], [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32], [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], [44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54], [55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65], [66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76], [77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87], [88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98], [99]] >>> pprint.pprint(list(chunks(list(range(100)), 100//11))) [[0, 1, 2, 3, 4, 5, 6, 7, 8], [9, 10, 11, 12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23, 24, 25, 26], [27, 28, 29, 30, 31, 32, 33, 34, 35], [36, 37, 38, 39, 40, 41, 42, 43, 44], [45, 46, 47, 48, 49, 50, 51, 52, 53], [54, 55, 56, 57, 58, 59, 60, 61, 62], [63, 64, 65, 66, 67, 68, 69, 70, 71], [72, 73, 74, 75, 76, 77, 78, 79, 80], [81, 82, 83, 84, 85, 86, 87, 88, 89], [90, 91, 92, 93, 94, 95, 96, 97, 98], [99]] |
以下是其他方法的列表:
鉴于
1 2 3 4 5 6 7 8 | import itertools as it import collections as ct import more_itertools as mit iterable = range(11) n = 3 |
代码
标准图书馆
1 2 | list(it.zip_longest(*[iter(iterable)] * n)) # [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)] |
1 2 3 4 5 6 | d = {} for i, x in enumerate(iterable): d.setdefault(i//n, []).append(x) list(d.values()) # [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]] |
1 2 3 4 5 6 | dd = ct.defaultdict(list) for i, x in enumerate(iterable): dd[i//n].append(x) list(dd.values()) # [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]] |
1 2 3 4 5 6 7 8 9 10 11 | list(mit.chunked(iterable, n)) # [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]] list(mit.sliced(iterable, n)) # [range(0, 3), range(3, 6), range(6, 9), range(9, 11)] list(mit.grouper(n, iterable)) # [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)] list(mit.windowed(iterable, len(iterable)//n, step=n)) # [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)] |
工具书类
zip_longest (相关岗位,相关岗位)setdefault (排序结果需要python 3.6+)collections.defaultdict (排序结果需要python 3.6+)more_itertools.chunked (相关过账)more_itertools.sliced more_itertools.grouper (相关岗位)more_itertools.windowed (另见stagger 、zip_offset )
+实现ITertools食谱等的第三方库。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | def chunks(iterable,n): """assumes n is an integer>0 """ iterable=iter(iterable) while True: result=[] for i in range(n): try: a=next(iterable) except StopIteration: break else: result.append(a) if result: yield result else: break g1=(i*i for i in range(10)) g2=chunks(g1,3) print g2 '<generator object chunks at 0x0337B9B8>' print list(g2) '[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]' |
使用列表理解:
1 2 3 | l = [1,2,3,4,5,6,7,8,9,10,11,12] k = 5 #chunk size print [tuple(l[x:y]) for (x, y) in [(x, x+k) for x in range(0, len(l), k)]] |
下面是使用itertools.groupby的一个想法:
1 2 3 | def chunks(l, n): c = itertools.count() return (it for _, it in itertools.groupby(l, lambda x: next(c)//n)) |
这将返回一个生成器。如果需要列表列表,只需将最后一行替换为
1 | return [list(it) for _, it in itertools.groupby(l, lambda x: next(c)//n)] |
返回列表的示例:
1 2 | >>> chunks('abcdefghij', 4) [['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'j']] |
(所以是的,这是因为"符文问题",在给定的情况下,这可能是问题,也可能不是问题。)
让r是块大小,l是初始列表,你可以这样做。
1 | chunkL = [ [i for i in L[r*k:r*(k+1)] ] for k in range(len(L)/r)] |
1 2 3 4 | >>> f = lambda x, n, acc=[]: f(x[n:], n, acc+[(x[:n])]) if x else acc >>> f("Hallo Welt", 3) ['Hal', 'lo ', 'Wel', 't'] >>> |
如果你在括号里-我拿起了一本关于二郎的书:)
我不认为我看到了这个选项,所以只是添加另一个选项:)
1 2 3 4 5 | def chunks(iterable, chunk_size): i = 0; while i < len(iterable): yield iterable[i:i+chunk_size] i += chunk_size |
1 2 3 4 5 6 7 8 9 | def chunk(lst): out = [] for x in xrange(2, len(lst) + 1): if not len(lst) % x: factor = len(lst) / x break while lst: out.append([lst.pop(0) for x in xrange(factor)]) return out |
我下面有一个解决方案确实有效,但比这个解决方案更重要的是对其他方法的一些评论。首先,一个好的解决方案不应该要求一个循环按顺序遍历子迭代器。如果我跑
1 2 3 4 5 | g = paged_iter(list(range(50)), 11)) i0 = next(g) i1 = next(g) list(i1) list(i0) |
最后一个命令的适当输出是
1 | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] |
不
1 | [] |
因为大多数基于itertools的解决方案都会返回。这不仅仅是对按顺序访问迭代器的常见无聊限制。假设一个消费者试图清除输入不良的数据,而这些数据颠倒了5个块的适当顺序,即数据看起来像[b5、a5、d5、c5],应该像[a5、b5、c5、d5](其中a5只是5个元素,而不是子列表)。此使用者将查看分组函数所声明的行为,并毫不犹豫地编写如下循环
1 2 3 4 5 6 7 8 9 | i = 0 out = [] for it in paged_iter(data,5) if (i % 2 == 0): swapped = it else: out += list(it) out += list(swapped) i = i + 1 |
如果您偷偷地假设子迭代器总是按顺序完全使用,那么这将产生奇怪的错误结果。如果您想交错块中的元素,情况会更糟。
第二,大量建议的解决方案隐式地依赖于这样一个事实:迭代器具有确定性的顺序(它们没有,例如set),虽然使用islice的一些解决方案可能是可以的,但我很担心。
第三,itertools-grouper方法有效,但配方依赖于Zip-Longest(或Zip)函数的内部行为,而这些函数不是其发布行为的一部分。尤其是,grouper函数只起作用,因为在zip longer(i0…in)中,总是按next(i0)、next(i1)、…重新开始前的下一个(in)。当Grouper传递同一迭代器对象的n个副本时,它依赖于此行为。
最后,虽然下面的解决方案可以改进,但是如果您作出上述批评的假设,即子迭代器是按顺序访问的,并且在没有这种假设的情况下完全仔细阅读的,那么必须隐式(通过调用链)或显式(通过deques或其他数据结构)将每个子迭代器的元素存储在某个地方。所以不要像我一样浪费时间,假设你可以用一些巧妙的技巧来解决这个问题。
1 2 3 4 5 6 7 8 9 10 11 | def paged_iter(iterat, n): itr = iter(iterat) deq = None try: while(True): deq = collections.deque(maxlen=n) for q in range(n): deq.append(next(itr)) yield (i for i in deq) except StopIteration: yield (i for i in deq) |
上面的答案(由Koffein)有一个小问题:列表总是被拆分成相等数量的拆分,而不是每个分区的项目数。这是我的版本。"//chs+1"考虑到项目的数量可能无法按分区大小精确划分,因此最后一个分区只能部分填充。
1 2 3 4 | # Given 'l' is your list chs = 12 # Your chunksize partitioned = [ l[i*chs:(i*chs)+chs] for i in range((len(l) // chs)+1) ] |
我专门为这个目的写了一个小图书馆,这里有。库的
1 2 3 4 | import iterlib print list(iterlib.chunked(xrange(1, 1000), 10)) # prints [(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), (11, 12, 13, 14, 15, 16, 17, 18, 19, 20), ...] |
没有魔法,但简单正确:
1 2 3 4 5 6 7 8 9 10 | def chunks(iterable, n): """Yield successive n-sized chunks from iterable.""" values = [] for i, item in enumerate(iterable, 1): values.append(item) if i % n == 0: yield values values = [] if values: yield values |
不完全一样,但还是不错
1 2 3 4 5 | def chunks(l, chunks): return zip(*[iter(l)]*chunks) l = range(1, 1000) print chunks(l, 10) -> [ ( 1..10 ), ( 11..20 ), .., ( 991..999 ) ] |
- 适用于任何不可更改的
- 内部数据是生成器对象(不是列表)
- 一班轮
1 2 3 4 5 6 7 8 9 | In [259]: get_in_chunks = lambda itr,n: ( (v for _,v in g) for _,g in itertools.groupby(enumerate(itr),lambda (ind,_): ind/n)) In [260]: list(list(x) for x in get_in_chunks(range(30),7)) Out[260]: [[0, 1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13], [14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27], [28, 29]] |
就像@aaronhall一样,我到这里来寻找大致大小相等的块。对此有不同的解释。在我的例子中,如果所需的大小是n,我希望每个组的大小大于等于n。因此,在上述大多数情况下创建的孤儿应重新分配给其他群体。
这可以通过以下方式实现:
1 2 3 4 5 6 7 8 | def nChunks(l, n): """ Yield n successive chunks from l. Works for lists, pandas dataframes, etc """ newn = int(1.0 * len(l) / n + 0.5) for i in xrange(0, n-1): yield l[i*newn:i*newn+newn] yield l[n*newn-newn:] |
(从将一个列表分成大约相等长度的n个部分)简单地称之为nchunks(l,l/n)或nchunks(l,floor(l/n))。
我提出了以下解决方案,但没有创建临时列表对象,它应该可以与任何不可重复的对象一起使用。请注意,此版本适用于python 2.x:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | def chunked(iterable, size): stop = [] it = iter(iterable) def _next_chunk(): try: for _ in xrange(size): yield next(it) except StopIteration: stop.append(True) return while not stop: yield _next_chunk() for it in chunked(xrange(16), 4): print list(it) |
输出:
1 2 3 4 5 | [0, 1, 2, 3] [4, 5, 6, 7] [8, 9, 10, 11] [12, 13, 14, 15] [] |
正如您所看到的,如果len(iterable)%size==0,那么我们还有额外的空迭代器对象。但我不认为这是个大问题。
由于我必须这样做,下面是我的解决方案,给定一个生成器和一个批量大小:
1 2 3 4 5 6 7 8 | def pop_n_elems_from_generator(g, n): elems = [] try: for idx in xrange(0, n): elems.append(g.next()) return elems except StopIteration: return elems |
我不喜欢按块大小拆分元素的想法,例如脚本可以将101到3个块划分为[50、50、1]。为了满足我的需要,我需要按比例分配,保持秩序不变。首先,我写了自己的剧本,效果很好,很简单。但我后来看到了这个答案,剧本比我的好,我收回它。这是我的剧本:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | def proportional_dividing(N, n): """ N - length of array (bigger number) n - number of chunks (smaller number) output - arr, containing N numbers, diveded roundly to n chunks """ arr = [] if N == 0: return arr elif n == 0: arr.append(N) return arr r = N // n for i in range(n-1): arr.append(r) arr.append(N-r*(n-1)) last_n = arr[-1] # last number always will be r <= last_n < 2*r # when last_n == r it's ok, but when last_n > r ... if last_n > r: # ... and if difference too big (bigger than 1), then if abs(r-last_n) > 1: #[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7] # N=29, n=12 # we need to give unnecessary numbers to first elements back diff = last_n - r for k in range(diff): arr[k] += 1 arr[-1] = r # and we receive [3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2] return arr def split_items(items, chunks): arr = proportional_dividing(len(items), chunks) splitted = [] for chunk_size in arr: splitted.append(items[:chunk_size]) items = items[chunk_size:] print(splitted) return splitted items = [1,2,3,4,5,6,7,8,9,10,11] chunks = 3 split_items(items, chunks) split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm'], 3) split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm', 'n'], 3) split_items(range(100), 4) split_items(range(99), 4) split_items(range(101), 4) |
输出:
1 2 3 4 5 6 | [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]] [['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'g', 'k', 'l', 'm']] [['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'g'], ['k', 'l', 'm', 'n']] [range(0, 25), range(25, 50), range(50, 75), range(75, 100)] [range(0, 25), range(25, 50), range(50, 75), range(75, 99)] [range(0, 25), range(25, 50), range(50, 75), range(75, 101)] |
它在V2/V3中工作,是可内联的、基于生成器的,并且仅使用标准库:
1 2 3 | import itertools def split_groups(iter_in, group_size): return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size)) |
您可以使用dask将列表拆分为大小均匀的块。dask还有一个额外的好处,那就是保存内存对于非常大的数据来说是最好的。为了获得最佳结果,您应该将列表直接加载到DASK数据帧中,以便在列表非常大时节省内存。根据您对列表的具体操作,dask有一个完整的函数API,您可以使用:http://docs.dask.org/en/latest/dataframe-api.html
1 2 3 4 5 6 7 8 9 | import pandas as pd import dask.dataframe as dd split = 4 my_list = range(100) df = dd.from_pandas(pd.DataFrame(my_list), npartitions = split) my_list = [ df.get_partition(n).compute().iloc[:,0].tolist() for n in range(split) ] # [[1,2,3,..],[26,27,28...],[51,52,53...],[76,77,78...]] |
使用python的列表理解
1 2 3 4 5 6 7 | [range(t,t+10) for t in range(1,1000,10)] [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],.... ....[981, 982, 983, 984, 985, 986, 987, 988, 989, 990], [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]] |
访问此链接了解列表理解
1 2 3 4 5 6 7 8 9 10 11 | def chunked(iterable, size): chunk = () for item in iterable: chunk += (item,) if len(chunk) % size == 0: yield chunk chunk = () if chunk: yield chunk |
是的,这是一个古老的问题,但我不得不发布这个问题,因为它甚至比类似的问题短一点。是的,结果看起来杂乱无章,但如果长度差不多相等…
1 2 3 4 5 6 7 | >>> n = 3 # number of groups >>> biglist = range(30) >>> >>> [ biglist[i::n] for i in xrange(n) ] [[0, 3, 6, 9, 12, 15, 18, 21, 24, 27], [1, 4, 7, 10, 13, 16, 19, 22, 25, 28], [2, 5, 8, 11, 14, 17, 20, 23, 26, 29]] |
1 2 3 4 5 6 7 | def split(arr, size): L = len(arr) assert 0 < size <= L s, r = divmod(L, size) t = s + 1 a = ([arr[p:p+t] for p in range(0, r*t, t)] + [arr[p:p+s] for p in range(r*t, L, s)]) return a |
灵感来源于http://wordaline.org/articles/sliting-a-list-equally-with-python
在itertools下没有人使用tee()函数?
http://docs.python.org/2/library/itertools.html itertools.tee
1 2 3 | >>> import itertools >>> itertools.tee([1,2,3,4,5,6],3) (<itertools.tee object at 0x02932DF0>, <itertools.tee object at 0x02932EB8>, <itertools.tee object at 0x02932EE0>) |
这将把列表拆分为3个迭代器,循环迭代器将得到等长的子列表。