python multiprocessing vs threading for cpu bound work on windows and linux
因此,我编写了一些测试代码,以了解与线程相比,多处理模块将如何在CPU绑定的工作上进行扩展。在Linux上,我得到了预期的性能提高:
1 2 3 4 | linux (dual quad core xeon): serialrun took 1192.319 ms parallelrun took 346.727 ms threadedrun took 2108.172 ms |
我的双核MacBook Pro显示了相同的行为:
1 2 3 4 | osx (dual core macbook pro) serialrun took 2026.995 ms parallelrun took 1288.723 ms threadedrun took 5314.822 ms |
然后我在一台Windows机器上进行了尝试,得到了一些非常不同的结果。
1 2 3 4 | windows (i7 920): serialrun took 1043.000 ms parallelrun took 3237.000 ms threadedrun took 2343.000 ms |
为什么,为什么,多处理方法在Windows上会慢得多?
测试代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | #!/usr/bin/env python import multiprocessing import threading import time def print_timing(func): def wrapper(*arg): t1 = time.time() res = func(*arg) t2 = time.time() print '%s took %0.3f ms' % (func.func_name, (t2-t1)*1000.0) return res return wrapper def counter(): for i in xrange(1000000): pass @print_timing def serialrun(x): for i in xrange(x): counter() @print_timing def parallelrun(x): proclist = [] for i in xrange(x): p = multiprocessing.Process(target=counter) proclist.append(p) p.start() for i in proclist: i.join() @print_timing def threadedrun(x): threadlist = [] for i in xrange(x): t = threading.Thread(target=counter) threadlist.append(t) t.start() for i in threadlist: i.join() def main(): serialrun(50) parallelrun(50) threadedrun(50) if __name__ == '__main__': main() |
用于多处理的python文档将Windows中的问题归咎于缺少os.fork()。在这里可能适用。
看看当你导入psyco时会发生什么。首先,安装简单:
1 2 3 4 5 6 7 8 | C:\Users\hughdbrown>\Python26\scripts\easy_install.exe psyco Searching for psyco Best match: psyco 1.6 Adding psyco 1.6 to easy-install.pth file Using c:\python26\lib\site-packages Processing dependencies for psyco Finished processing dependencies for psyco |
将其添加到python脚本的顶部:
1 2 | import psyco psyco.full() |
我得到这些结果时没有:
1 2 3 | serialrun took 1191.000 ms parallelrun took 3738.000 ms threadedrun took 2728.000 ms |
我通过以下方式获得这些结果:
1 2 3 | serialrun took 43.000 ms parallelrun took 3650.000 ms threadedrun took 265.000 ms |
平行线仍然很慢,但其他的线会燃烧橡胶。
编辑:同样,尝试使用多处理池。(这是我第一次尝试这个,速度太快了,我想我一定错过了什么。)
1 2 3 4 | @print_timing def parallelpoolrun(reps): pool = multiprocessing.Pool(processes=4) result = pool.apply_async(counter, (reps,)) |
结果:
1 2 3 4 5 | C:\Users\hughdbrown\Documents\python\StackOverflow>python 1289813.py serialrun took 57.000 ms parallelrun took 3716.000 ms parallelpoolrun took 128.000 ms threadedrun took 58.000 ms |
在UNIX变体下,进程更轻。Windows进程很重,启动要花很多时间。线程是在Windows上执行多处理的推荐方法。
有人说,在Windows上创建进程比在Linux上更昂贵。如果你在网站上搜索,你会发现一些信息。这是我很容易找到的。
刚开始游泳池需要很长时间。我在"现实世界"程序中发现,如果我可以打开一个池,并将其用于许多不同的进程,通过方法调用(通常使用map.async)传递引用,那么在Linux上我可以节省几个百分点,但在Windows上,我通常可以将花费的时间减半。对于我的特定问题,Linux总是更快,但即使在Windows上,我也能从多处理中获得好处。
当前,counter()函数没有修改太多的状态。尝试更改counter(),以便它修改许多内存页。然后运行一个CPU绑定的循环。看看Linux和Windows之间是否还有很大的差异。
我现在没有运行python 2.6,所以我不能自己尝试它。