Fast way to remove a few items from a list/queue
这是一个类似问题的后续行动,它要求最好的写作方式
1 2 3 | for item in somelist: if determine(item): code_to_remove_item |
似乎大家的共识是
1 | somelist[:] = [x for x in somelist if not determine(x)] |
号
但是,我认为如果您只删除一些项目,那么大多数项目都会被复制到同一个对象中,这可能很慢。在回答另一个相关问题时,有人建议:
1 2 3 | for item in reversed(somelist): if determine(item): somelist.remove(item) |
但是,在这里,
更新:我也做了一些时间测试,代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import timeit setup =""" import random random.seed(1) b = [(random.random(),random.random()) for i in xrange(1000)] c = [] def tokeep(x): return (x[1]>.45) and (x[1]<.5) """ listcomp =""" c[:] = [x for x in b if tokeep(x)] """ filt =""" c = filter(tokeep, b) """ print"list comp =", timeit.timeit(listcomp,setup, number = 10000) print"filtering =", timeit.timeit(filt,setup, number = 10000) |
。
得到:
1 2 | list comp = 4.01255393028 filtering = 3.59962391853 |
列表理解是渐进最优解:
1 | somelist = [x for x in somelist if not determine(x)] |
号
它只传递一次列表,所以在O(N)时间内运行。由于需要对每个对象调用determine(),所以任何算法都至少需要O(n)个操作。列表理解确实需要做一些复制,但它只是复制对对象的引用,而不是复制对象本身。
在python中,从列表中删除项目是O(n),所以循环中带有remove、pop或del的任何内容都是O(n**2)。
而且,在cpython列表中,理解比循环更快。
一个DeGe是优化的头部和尾部去除,而不是在中间任意去除。删除本身很快,但是您仍然需要遍历列表到删除点。如果您要遍历整个长度,那么过滤deque和过滤列表(使用
你可以避免这样的复制,但我没有特别的理由相信这比简单的列表理解要快——可能不是:
1 2 3 4 5 6 | write_i = 0 for read_i in range(len(L)): L[write_i] = L[read_i] if L[read_i] not in ['a', 'c']: write_i += 1 del L[write_i:] |
由于
1 2 3 | for idx, item in enumerate(somelist): if determine(item): del somelist[idx] |
但是:您不应该在遍历列表时修改它。它迟早会咬你的。首先使用
如果需要删除o(1)中的项,可以使用hashmaps
我试了一下。我的解决方案速度较慢,但需要较少的内存开销(即不创建新的数组)。在某些情况下甚至可能更快!
此代码自首次发布以来已被编辑
我有时间问题,我可能做错了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | import timeit setup =""" import random random.seed(1) global b setup_b = [(random.random(), random.random()) for i in xrange(1000)] c = [] def tokeep(x): return (x[1]>.45) and (x[1]<.5) # define and call to turn into psyco bytecode (if using psyco) b = setup_b[:] def listcomp(): c[:] = [x for x in b if tokeep(x)] listcomp() b = setup_b[:] def filt(): c = filter(tokeep, b) filt() b = setup_b[:] def forfilt(): marked = (i for i, x in enumerate(b) if tokeep(x)) shift = 0 for n in marked: del b[n - shift] shift += 1 forfilt() b = setup_b[:] def forfiltCheating(): marked = (i for i, x in enumerate(b) if (x[1] > .45) and (x[1] < .5)) shift = 0 for n in marked: del b[n - shift] shift += 1 forfiltCheating() """ listcomp =""" b = setup_b[:] listcomp() """ filt =""" b = setup_b[:] filt() """ forfilt =""" b = setup_b[:] forfilt() """ forfiltCheating = ''' b = setup_b[:] forfiltCheating() ''' psycosetup = ''' import psyco psyco.full() ''' print"list comp =", timeit.timeit(listcomp, setup, number = 10000) print"filtering =", timeit.timeit(filt, setup, number = 10000) print 'forfilter = ', timeit.timeit(forfilt, setup, number = 10000) print 'forfiltCheating = ', timeit.timeit(forfiltCheating, setup, number = 10000) print ' now with psyco ' print"list comp =", timeit.timeit(listcomp, psycosetup + setup, number = 10000) print"filtering =", timeit.timeit(filt, psycosetup + setup, number = 10000) print 'forfilter = ', timeit.timeit(forfilt, psycosetup + setup, number = 10000) print 'forfiltCheating = ', timeit.timeit(forfiltCheating, psycosetup + setup, number = 10000) |
。
下面是结果
1 2 3 4 5 6 7 8 9 10 11 | list comp = 6.56407690048 filtering = 5.64738512039 forfilter = 7.31555104256 forfiltCheating = 4.8994679451 now with psyco list comp = 8.0485959053 filtering = 7.79016900063 forfilter = 9.00477004051 forfiltCheating = 4.90830993652 |
我一定是对psyco做了什么错事,因为它实际上运行得比较慢。