关于python：迭代列表时更改列表的最佳方法

Best method for changing a list while iterating over it

本问题已经有最佳答案，请猛点这里访问。

我在Python脚本(V2.6)中有几个实例，需要在其中就地修改列表。我需要从列表中弹出值，以响应用户的交互式输入，并希望知道执行此操作的最干净方法。目前，我有非常肮脏的解决方案：a)将列表中要删除的项设置为false，并使用过滤器或列表理解将其删除；b)在循环过程中生成一个全新的列表，这似乎不必要地向名称空间添加变量并占用内存。

这个问题的一个例子如下：

1
2
3
4
5
6
7

for i, folder in enumerate(to_run_folders):
if get_size(folder) < byte_threshold:
ans = raw_input(('The folder {0}/ is less than {1}MB.' + \
' Would you like to exclude it from' + \
' compression? ').format(folder, megabyte_threshold))
if 'y' in ans.strip().lower():
to_run_folders.pop(i)

我想查看列表中的每个文件夹。如果当前文件夹小于某个大小，我想询问用户是否要排除它。如果有，从列表中弹出文件夹。

这个例程的问题是，如果我遍历列表，就会得到意外的行为和提前终止。如果我通过切片迭代一个副本，那么pop不会得到正确的值，因为索引被移动，并且随着更多的项目被弹出，问题也会复杂化。我还需要在脚本的其他区域中进行这种动态列表调整。对于这种功能有什么干净的方法吗？

相关讨论

可以向后循环列表，也可以使用视图对象。

有关如何向后循环列表的信息，请参阅https://stackoverflow.com/a/181062/711085。基本上使用reversed(yourList)(这会创建一个向后访问的视图对象)。

如果需要索引，可以执行reversed(enumerate(yourList))，但这会在内存中有效地创建一个临时列表，因为enumerate需要在reversed启动之前运行。您将需要执行索引操作，或者执行以下操作：

1
2
3

for i in xrange(len(yourList)-1, -1, -1):
item = yourList[i]
...

甚至更干净：reversed知道range，所以你可以在python3中这样做，或者在python2中这样做，如果你使用xrange：

1
2
3

for i in reversed(range(len(yourList))):
item = yourList[i]
...

(证明：你可以做next(reversed(range(10**10)))，但如果使用python2，这会使你的电脑崩溃)

相关讨论

你可以向后循环

向后的：

1
2
3
4
5
6

x = range(10)
l = len(x)-1 # max index

for i, v in enumerate(reversed(x)):
if v % 2:
x.pop(l-i) # l-1 is the forward index

相关讨论

好的，我已经测量了溶液。反解大致相同。前进的while环大约慢4倍。但是！对于100000个随机整数的列表，patrik的脏解大约快80倍[修正patrik2中的错误]：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114

import timeit
import random

def solution_ninjagecko1(lst):
for i in xrange(len(lst)-1, -1, -1):
if lst[i] % 2 != 0: # simulation of the choice
del lst[i]
return lst

def solution_jdi(lst):
L = len(lst) - 1
for i, v in enumerate(reversed(lst)):
if v % 2 != 0:
lst.pop(L-i) # L-1 is the forward index
return lst

def solution_Patrik(lst):
for i, v in enumerate(lst):
if v % 2 != 0: # simulation of the choice
lst[i] = None
return [v for v in lst if v is not None]

def solution_Patrik2(lst):
##buggy lst = [v for v in lst if v % 2 != 0]
##buggy return [v for v in lst if v is not None]
# ... corrected to
return [v for v in lst if v % 2 != 0]

def solution_pepr(lst):
i = 0 # indexing the processed item
n = 0 # enumerating the original position
while i < len(lst):
if lst[i] % 2 != 0: # simulation of the choice
del lst[i] # i unchanged if item deleted
else:
i += 1 # i moved to the next
n += 1
return lst

def solution_pepr_reversed(lst):
i = len(lst) - 1 # indexing the processed item backwards
while i > 0:
if lst[i] % 2 != 0: # simulation of the choice
del lst[i] # i unchanged if item deleted
i -= 1 # i moved to the previous
return lst

def solution_steveha(lst):
def should_keep(x):
return x % 2 == 0
return filter(should_keep, lst)

orig_lst = range(30)
print 'range() generated list of the length', len(orig_lst)
print orig_lst[:20] + ['...'] # to have some fun :)

lst = orig_lst[:] # copy of the list
print solution_ninjagecko1(lst)

lst = orig_lst[:] # copy of the list
print solution_jdi(lst)

lst = orig_lst[:] # copy of the list
print solution_Patrik(lst)

lst = orig_lst[:] # copy of the list
print solution_pepr(lst)

orig_lst = [random.randint(1, 1000000) for n in xrange(100000)]
print '
random list of the length', len(orig_lst)
print orig_lst[:20] + ['...'] # to have some fun :)

lst = orig_lst[:] # copy of the list
t = timeit.timeit('solution_ninjagecko1(lst)',
'from __main__ import solution_ninjagecko1, lst',
number=1)
print 'solution_ninjagecko1: ', t

lst = orig_lst[:] # copy of the list
t = timeit.timeit('solution_jdi(lst)',
'from __main__ import solution_jdi, lst',
number=1)
print 'solution_jdi: ', t

lst = orig_lst[:] # copy of the list
t = timeit.timeit('solution_Patrik(lst)',
'from __main__ import solution_Patrik, lst',
number=1)
print 'solution_Patrik: ', t

lst = orig_lst[:] # copy of the list
t = timeit.timeit('solution_Patrik2(lst)',
'from __main__ import solution_Patrik2, lst',
number=1)
print 'solution_Patrik2: ', t

lst = orig_lst[:] # copy of the list
t = timeit.timeit('solution_pepr_reversed(lst)',
'from __main__ import solution_pepr_reversed, lst',
number=1)
print 'solution_pepr_reversed: ', t

lst = orig_lst[:] # copy of the list
t = timeit.timeit('solution_pepr(lst)',
'from __main__ import solution_pepr, lst',
number=1)
print 'solution_pepr: ', t

lst = orig_lst[:] # copy of the list
t = timeit.timeit('solution_steveha(lst)',
'from __main__ import solution_steveha, lst',
number=1)
print 'solution_steveha: ', t

它在我的控制台上打印：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

c:\tmp\_Python\Patrick\so10305762>python a.py
range() generated list of the length 30
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, '...']
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]

random list of the length 100000
[915411, 954538, 794388, 847204, 846603, 454132, 866165, 640004, 930488, 609138,
333405, 986073, 318301, 728151, 996047, 117633, 455353, 581737, 55350, 485030,
'...']
solution_ninjagecko1: 2.41921752625
solution_jdi: 2.45477176569
solution_Patrik: 0.0468565138865
solution_Patrik2: 0.024270403082
solution_pepr_reversed: 2.43338888043
solution_pepr: 9.11879694207

所以，我尝试了更长的列表。只使用两倍的时间(在我的旧电脑上)会有很大的不同。帕特里克的肮脏解决方案表现得很好。它比反向解快约200倍：

1
2
3
4
5
6
7
8
9
10

random list of the length 200000
[384592, 170167, 598270, 832363, 123557, 81804, 319315, 445945, 178732, 726600,
516835, 392267, 552608, 40807, 349215, 208111, 880032, 520614, 384119, 350090,
'...']
solution_ninjagecko1: 17.362140719
solution_jdi: 17.86837545
solution_Patrik: 0.0957998851809
solution_Patrik2: 0.0500024444448
solution_pepr_reversed: 17.5078452708
solution_pepr: 52.175648581

[在Ninjagecko的评论后添加]

校正后的patrick 2溶液比2级patrick溶液快两倍。

为了模拟不经常删除元素，像if v % 2 != 0:这样的测试被改为if v % 100 == 0:。然后大约1%的项目应该被删除。很明显，这需要更少的时间。对于列表中的500000个随机整数，结果如下：

1
2
3
4
5
6
7
8
9
10

random list of the length 500000
[403512, 138135, 552313, 427971, 42358, 500926, 686944, 304889, 916659, 112636,
791585, 461948, 82622, 522768, 485408, 774048, 447505, 830220, 791421, 580706,
'...']
solution_ninjagecko1: 6.79284210703
solution_jdi: 6.84066913532
solution_Patrik: 0.241242951269
solution_Patrik2: 0.162481823807
solution_pepr_reversed: 6.92106007886
solution_pepr: 7.12900522273

帕特里克的解决方案比以前快了30倍。

[增加2012/04/25]

另一个可行的解决方案，循环向前，和帕特里克的解决方案一样快。删除元素时，它不会移动所有尾部。相反，它将需要的元素移动到它们的最终位置，然后剪切列表中未使用的尾部。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

def solution_pepr2(lst):
i = 0
for v in lst:
lst[i] = v # moving the element (sometimes unneccessary)
if v % 100 != 0: # simulation of the choice
i += 1 # here will be the next one
lst[i:] = [] # cutting the tail of the length of the skipped
return lst

# The following one only adds the enumerate to simulate the situation when
# it is needed -- i.e. slightly slower but with the same complexity.
def solution_pepr2enum(lst):
i = 0
for n, v in enumerate(lst):
lst[i] = v # moving the element (sometimes unneccessary)
if v % 100 != 0: # simulation of the choice
i += 1 # here will be the next one
lst[i:] = [] # cutting the tail of the length of the skipped
return lst

与上述v % 100 != 0方案相比：

1
2
3
4
5
6
7
8
9
10
11
12

random list of the length 500000
[533094, 600755, 58260, 295962, 347612, 851487, 523927, 665648, 537403, 238660,
781030, 940052, 878919, 565870, 717745, 408465, 410781, 560173, 51010, 730322,
'...']
solution_ninjagecko1: 1.38956896051
solution_jdi: 1.42314502685
solution_Patrik: 0.135545530079
solution_Patrik2: 0.0926935780151
solution_pepr_reversed: 1.43573239178
solution_steveha: 0.122824246805
solution_pepr2: 0.0938177241656
solution_pepr2enum: 0.11096263294

相关讨论

有趣。不幸的是，您没有检查reversed(range(len(yourList)))解决方案(尽管如果检查了，它将与第一个解决方案大致相同)。不过，我不认为基准是合理的；在这些基准中，您删除了一半元素。在这种情况下，我只需要执行[x for i,x in enumerate(lst) if i%2!=0]，而忽略就地需求；这将获得比您基准测试的最快解决方案快两倍的结果。此外，您提供的解决方案不是"脏的"解决方案，因为它不在适当的位置，并且使用了[...]。
的确，如果你只删除列表中20%的元素，patrick方法的速度大约是4倍，如果你只删除列表中10%的元素，根据你的测试，patrick方法的速度大约是2倍，但不幸的是，我没有时间再检查它们。
@我也很惊讶。我试图删除大约1%的随机元素(见编辑文本)。对于500000个元素，patrick的方法仍然快30倍。问题是：我们为什么要坚持就地解决方案？
Patrick2中的错误纠正了——大约快了两倍。
这是一种愚蠢的测试。它的权重取决于实际删除的数量。当我用pass替换内环逻辑，用if False等替换案例测试时…为了防止删除操作的发生，@ninjagecko先出来，然后是我的，然后是其他人。
我刚刚在solution_steveha()中添加了一个，这是filter()的一个简单用法。在我的电脑上，这是第二快的，大约是最快的一半(patrik2)。我没有加入基准速度，因为我电脑上的时间都不一样。
@史蒂文哈：不过，还是要在没有删除任何元素的情况下尝试这个测试。当帕特里克不得不处理一个更大的数据集时，它就崩溃了。
@JDI，使用filter()的简单、干净的解决方案比solution_Patrik()快。它比solution_Patrik2()慢，但这是一个列表理解，用它自己的表达式进行过滤；速度获胜是由于缺少函数调用，因此是不现实的。写一个名为raw_input()的清单理解是不实际的，它是逻辑链的一部分；如果你尝试它，它肯定会很难看。我更喜欢简单而干净的解决方案，即使速度较慢，但我已经证明它确实不是。
@steveha:filter()解决方案与列表理解基本相同。它也不是就地解决方案(与解决方案_patrick2相同)。它更一般，因此它支付函数调用的费用。我同意与raw_input的结合会使它变得有趣。然而，过滤掉不需要的元素的目标并不是那么无聊；)
我已经添加了新的、快速的、到位的、向前移动的解决方案，作为解决方案。请看一下；)
@函数调用的开销使代码运行更慢。但是，过滤函数的复杂性(它调用raw_input())这意味着过滤几乎肯定会打包成一个函数。因此，我不认为对不调用函数的解决方案进行基准测试是有价值的。我个人更喜欢列表理解解决方案，但是filter()解决方案是基准测试中调用函数进行过滤的最快的解决方案。如果你让solution_pepr2()调用一个函数，它将变慢；我测试并确认了它。
当然，在这种情况下可以避免使用这个函数；只是为了好玩，我写了它并将它添加到我的答案中。讨厌！功能更好。
@史蒂文哈：我喜欢所有可以从中学习到的解决方案。他们强迫思考这个问题。我试着让它们在测量时间方面具有可比性。我同意这通常不是最重要的事情。但有时也可能。实际上，pepr2可以稍加修改，使其在语法上与filter()相似。主要的区别在于它在适当的地方工作——根据情况，这可能是正负两种情况。

Currently I have the very dirty solutions of a) setting items in the list that I want to remove to False and removing them with a filter or list comprehension or b) generating an entirely new list while going through the loop, which seems to be needlessly adding variables to the namespace and taking up memory.

实际上，这不是那个肮脏的解决方案。清单通常有多长？即使创建新列表也不应该消耗太多的内存，因为列表只包含引用。

您还可以在while循环中循环并为自己枚举，如果用户决定(可能单独计算原始位置)，则执行del lst[n]。

处理这个问题的最佳方法，最"Python式"的方法，实际上是循环遍历您的列表，并创建一个只包含您想要的文件夹的新列表。我会这样做：

1
2
3
4
5
6
7
8
9

def want_folder(fname):
if get_size(folder) >= byte_threshold:
return True
ans = raw_input(('The folder {0}/ is less than {1}MB.' + \
' Would you like to exclude it from' + \
' compression? ').format(folder, megabyte_threshold))
return 'y' not in ans.strip().lower()

to_run_folders = [fname for fname in to_run_folders if want_folder(fname)]

如果您的列表真的很大，那么您可能需要担心这个解决方案的性能并使用肮脏的技巧。但是，如果您的列表如此之大，那么让一个人回答所有可能出现的文件的是/否问题可能有点疯狂。

性能是一个实际问题还是一种烦人的担忧？因为我很确定上面的代码足够快，可以实际使用，而且比复杂的代码更容易理解和修改。

编辑：@jdi在评论中建议使用itertools.ifilter()或filter()。

我测试过了，这实际上应该比我上面显示的更快：

1	to_run_folders = filter(want_folder, to_run_folders)

我刚刚复制了@pepr的基准代码，并使用filter()测试了解决方案，如图所示。它是第二快的整体，只有帕特里克2更快。Patrik2的速度是以前的两倍，但同样的，任何数据集足够小以至于让一个人回答是/否的问题都是可行的，很可能足够小以至于两个因素都不重要。

编辑：只是为了好玩，我继续写了一个纯粹的列表理解版本。它只有一个要计算的表达式，没有python函数调用。

1
2
3
4
5
6
7

to_run_folders = [fname for fname in to_run_folders
if get_size(fname) >= mb_threshold or
'y' not in raw_input(('The folder {0}/ is less than {1}MB.' +
' Would you like to exclude it from compression? '
).format(fname, mb_threshold)).strip().lower()

]

讨厌！我更喜欢做一个函数。

相关讨论