Best method for changing a list while iterating over it
我在Python脚本(V2.6)中有几个实例,需要在其中就地修改列表。我需要从列表中弹出值,以响应用户的交互式输入,并希望知道执行此操作的最干净方法。目前,我有非常肮脏的解决方案:a)将列表中要删除的项设置为false,并使用过滤器或列表理解将其删除;b)在循环过程中生成一个全新的列表,这似乎不必要地向名称空间添加变量并占用内存。
这个问题的一个例子如下:
1 2 3 4 5 6 7 | for i, folder in enumerate(to_run_folders): if get_size(folder) < byte_threshold: ans = raw_input(('The folder {0}/ is less than {1}MB.' + \ ' Would you like to exclude it from' + \ ' compression? ').format(folder, megabyte_threshold)) if 'y' in ans.strip().lower(): to_run_folders.pop(i) |
我想查看列表中的每个文件夹。如果当前文件夹小于某个大小,我想询问用户是否要排除它。如果有,从列表中弹出文件夹。
这个例程的问题是,如果我遍历列表,就会得到意外的行为和提前终止。如果我通过切片迭代一个副本,那么pop不会得到正确的值,因为索引被移动,并且随着更多的项目被弹出,问题也会复杂化。我还需要在脚本的其他区域中进行这种动态列表调整。对于这种功能有什么干净的方法吗?
可以向后循环列表,也可以使用视图对象。
有关如何向后循环列表的信息,请参阅https://stackoverflow.com/a/181062/711085。基本上使用
如果需要索引,可以执行
1 2 3 | for i in xrange(len(yourList)-1, -1, -1): item = yourList[i] ... |
甚至更干净:
1 2 3 | for i in reversed(range(len(yourList))): item = yourList[i] ... |
(证明:你可以做
你可以向后循环
向后的:
1 2 3 4 5 6 | x = range(10) l = len(x)-1 # max index for i, v in enumerate(reversed(x)): if v % 2: x.pop(l-i) # l-1 is the forward index |
好的,我已经测量了溶液。反解大致相同。前进的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | import timeit import random def solution_ninjagecko1(lst): for i in xrange(len(lst)-1, -1, -1): if lst[i] % 2 != 0: # simulation of the choice del lst[i] return lst def solution_jdi(lst): L = len(lst) - 1 for i, v in enumerate(reversed(lst)): if v % 2 != 0: lst.pop(L-i) # L-1 is the forward index return lst def solution_Patrik(lst): for i, v in enumerate(lst): if v % 2 != 0: # simulation of the choice lst[i] = None return [v for v in lst if v is not None] def solution_Patrik2(lst): ##buggy lst = [v for v in lst if v % 2 != 0] ##buggy return [v for v in lst if v is not None] # ... corrected to return [v for v in lst if v % 2 != 0] def solution_pepr(lst): i = 0 # indexing the processed item n = 0 # enumerating the original position while i < len(lst): if lst[i] % 2 != 0: # simulation of the choice del lst[i] # i unchanged if item deleted else: i += 1 # i moved to the next n += 1 return lst def solution_pepr_reversed(lst): i = len(lst) - 1 # indexing the processed item backwards while i > 0: if lst[i] % 2 != 0: # simulation of the choice del lst[i] # i unchanged if item deleted i -= 1 # i moved to the previous return lst def solution_steveha(lst): def should_keep(x): return x % 2 == 0 return filter(should_keep, lst) orig_lst = range(30) print 'range() generated list of the length', len(orig_lst) print orig_lst[:20] + ['...'] # to have some fun :) lst = orig_lst[:] # copy of the list print solution_ninjagecko1(lst) lst = orig_lst[:] # copy of the list print solution_jdi(lst) lst = orig_lst[:] # copy of the list print solution_Patrik(lst) lst = orig_lst[:] # copy of the list print solution_pepr(lst) orig_lst = [random.randint(1, 1000000) for n in xrange(100000)] print ' random list of the length', len(orig_lst) print orig_lst[:20] + ['...'] # to have some fun :) lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_ninjagecko1(lst)', 'from __main__ import solution_ninjagecko1, lst', number=1) print 'solution_ninjagecko1: ', t lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_jdi(lst)', 'from __main__ import solution_jdi, lst', number=1) print 'solution_jdi: ', t lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_Patrik(lst)', 'from __main__ import solution_Patrik, lst', number=1) print 'solution_Patrik: ', t lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_Patrik2(lst)', 'from __main__ import solution_Patrik2, lst', number=1) print 'solution_Patrik2: ', t lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_pepr_reversed(lst)', 'from __main__ import solution_pepr_reversed, lst', number=1) print 'solution_pepr_reversed: ', t lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_pepr(lst)', 'from __main__ import solution_pepr, lst', number=1) print 'solution_pepr: ', t lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_steveha(lst)', 'from __main__ import solution_steveha, lst', number=1) print 'solution_steveha: ', t |
它在我的控制台上打印:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | c:\tmp\_Python\Patrick\so10305762>python a.py range() generated list of the length 30 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, '...'] [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28] [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28] [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28] [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28] random list of the length 100000 [915411, 954538, 794388, 847204, 846603, 454132, 866165, 640004, 930488, 609138, 333405, 986073, 318301, 728151, 996047, 117633, 455353, 581737, 55350, 485030, '...'] solution_ninjagecko1: 2.41921752625 solution_jdi: 2.45477176569 solution_Patrik: 0.0468565138865 solution_Patrik2: 0.024270403082 solution_pepr_reversed: 2.43338888043 solution_pepr: 9.11879694207 |
所以,我尝试了更长的列表。只使用两倍的时间(在我的旧电脑上)会有很大的不同。帕特里克的肮脏解决方案表现得很好。它比反向解快约200倍:
1 2 3 4 5 6 7 8 9 10 | random list of the length 200000 [384592, 170167, 598270, 832363, 123557, 81804, 319315, 445945, 178732, 726600, 516835, 392267, 552608, 40807, 349215, 208111, 880032, 520614, 384119, 350090, '...'] solution_ninjagecko1: 17.362140719 solution_jdi: 17.86837545 solution_Patrik: 0.0957998851809 solution_Patrik2: 0.0500024444448 solution_pepr_reversed: 17.5078452708 solution_pepr: 52.175648581 |
[在Ninjagecko的评论后添加]
校正后的patrick 2溶液比2级patrick溶液快两倍。
为了模拟不经常删除元素,像
1 2 3 4 5 6 7 8 9 10 | random list of the length 500000 [403512, 138135, 552313, 427971, 42358, 500926, 686944, 304889, 916659, 112636, 791585, 461948, 82622, 522768, 485408, 774048, 447505, 830220, 791421, 580706, '...'] solution_ninjagecko1: 6.79284210703 solution_jdi: 6.84066913532 solution_Patrik: 0.241242951269 solution_Patrik2: 0.162481823807 solution_pepr_reversed: 6.92106007886 solution_pepr: 7.12900522273 |
帕特里克的解决方案比以前快了30倍。
[增加2012/04/25]
另一个可行的解决方案,循环向前,和帕特里克的解决方案一样快。删除元素时,它不会移动所有尾部。相反,它将需要的元素移动到它们的最终位置,然后剪切列表中未使用的尾部。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | def solution_pepr2(lst): i = 0 for v in lst: lst[i] = v # moving the element (sometimes unneccessary) if v % 100 != 0: # simulation of the choice i += 1 # here will be the next one lst[i:] = [] # cutting the tail of the length of the skipped return lst # The following one only adds the enumerate to simulate the situation when # it is needed -- i.e. slightly slower but with the same complexity. def solution_pepr2enum(lst): i = 0 for n, v in enumerate(lst): lst[i] = v # moving the element (sometimes unneccessary) if v % 100 != 0: # simulation of the choice i += 1 # here will be the next one lst[i:] = [] # cutting the tail of the length of the skipped return lst |
与上述
1 2 3 4 5 6 7 8 9 10 11 12 | random list of the length 500000 [533094, 600755, 58260, 295962, 347612, 851487, 523927, 665648, 537403, 238660, 781030, 940052, 878919, 565870, 717745, 408465, 410781, 560173, 51010, 730322, '...'] solution_ninjagecko1: 1.38956896051 solution_jdi: 1.42314502685 solution_Patrik: 0.135545530079 solution_Patrik2: 0.0926935780151 solution_pepr_reversed: 1.43573239178 solution_steveha: 0.122824246805 solution_pepr2: 0.0938177241656 solution_pepr2enum: 0.11096263294 |
Currently I have the very dirty solutions of a) setting items in the list that I want to remove to False and removing them with a filter or list comprehension or b) generating an entirely new list while going through the loop, which seems to be needlessly adding variables to the namespace and taking up memory.
实际上,这不是那个肮脏的解决方案。清单通常有多长?即使创建新列表也不应该消耗太多的内存,因为列表只包含引用。
您还可以在
处理这个问题的最佳方法,最"Python式"的方法,实际上是循环遍历您的列表,并创建一个只包含您想要的文件夹的新列表。我会这样做:
1 2 3 4 5 6 7 8 9 | def want_folder(fname): if get_size(folder) >= byte_threshold: return True ans = raw_input(('The folder {0}/ is less than {1}MB.' + \ ' Would you like to exclude it from' + \ ' compression? ').format(folder, megabyte_threshold)) return 'y' not in ans.strip().lower() to_run_folders = [fname for fname in to_run_folders if want_folder(fname)] |
如果您的列表真的很大,那么您可能需要担心这个解决方案的性能并使用肮脏的技巧。但是,如果您的列表如此之大,那么让一个人回答所有可能出现的文件的是/否问题可能有点疯狂。
性能是一个实际问题还是一种烦人的担忧?因为我很确定上面的代码足够快,可以实际使用,而且比复杂的代码更容易理解和修改。
编辑:@jdi在评论中建议使用
我测试过了,这实际上应该比我上面显示的更快:
1 | to_run_folders = filter(want_folder, to_run_folders) |
我刚刚复制了@pepr的基准代码,并使用
编辑:只是为了好玩,我继续写了一个纯粹的列表理解版本。它只有一个要计算的表达式,没有python函数调用。
1 2 3 4 5 6 7 | to_run_folders = [fname for fname in to_run_folders if get_size(fname) >= mb_threshold or 'y' not in raw_input(('The folder {0}/ is less than {1}MB.' + ' Would you like to exclude it from compression? ' ).format(fname, mb_threshold)).strip().lower() ] |
讨厌!我更喜欢做一个函数。