Dead simple example of using Multiprocessing Queue, Pool and Locking
我试图在http://docs.python.org/dev/library/multiprocessing.html上阅读文档,但我仍然在努力处理多处理队列、池和锁定。现在,我能够构建下面的示例。
关于队列和池,我不确定我是否以正确的方式理解了这个概念,所以如果我错了,请纠正我。我想达到的目标是每次处理2个请求(本例中数据列表有8个),那么,我应该使用什么?池创建两个可以处理两个不同队列(最多2个)的进程,还是每次只使用队列处理2个输入?锁定将是正确打印输出。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | import multiprocessing import time data = (['a', '2'], ['b', '4'], ['c', '6'], ['d', '8'], ['e', '1'], ['f', '3'], ['g', '5'], ['h', '7'] ) def mp_handler(var1): for indata in var1: p = multiprocessing.Process(target=mp_worker, args=(indata[0], indata[1])) p.start() def mp_worker(inputs, the_time): print" Processs %s\tWaiting %s seconds" % (inputs, the_time) time.sleep(int(the_time)) print" Process %s\tDONE" % inputs if __name__ == '__main__': mp_handler(data) |
对于您的问题,最好的解决方案是使用
这里是您的程序的一个稍微重新排列的版本,这次只有2个进程被修饰成一个
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import multiprocessing import time data = ( ['a', '2'], ['b', '4'], ['c', '6'], ['d', '8'], ['e', '1'], ['f', '3'], ['g', '5'], ['h', '7'] ) def mp_worker((inputs, the_time)): print" Processs %s\tWaiting %s seconds" % (inputs, the_time) time.sleep(int(the_time)) print" Process %s\tDONE" % inputs def mp_handler(): p = multiprocessing.Pool(2) p.map(mp_worker, data) if __name__ == '__main__': mp_handler() |
号
注意,
输出:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | Processs a Waiting 2 seconds Processs b Waiting 4 seconds Process a DONE Processs c Waiting 6 seconds Process b DONE Processs d Waiting 8 seconds Process c DONE Processs e Waiting 1 seconds Process e DONE Processs f Waiting 3 seconds Process d DONE Processs g Waiting 5 seconds Process f DONE Processs h Waiting 7 seconds Process g DONE Process h DONE |
按照以下@thales注释进行编辑:
如果您希望"为每个池限制一个锁",这样您的进程就可以成对运行,那么ala:
A等待B等待A完成,B完成C等待,D等待C完成,D完成…
然后将handler函数更改为针对每对数据启动池(共2个进程):
1 2 3 4 5 | def mp_handler(): subdata = zip(data[0::2], data[1::2]) for task1, task2 in subdata: p = multiprocessing.Pool(2) p.map(mp_worker, (task1, task2)) |
。
现在您的输出是:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | Processs a Waiting 2 seconds Processs b Waiting 4 seconds Process a DONE Process b DONE Processs c Waiting 6 seconds Processs d Waiting 8 seconds Process c DONE Process d DONE Processs e Waiting 1 seconds Processs f Waiting 3 seconds Process e DONE Process f DONE Processs g Waiting 5 seconds Processs h Waiting 7 seconds Process g DONE Process h DONE |
以下是我关于这个主题的个人要点:
这里是要点(欢迎拉请求!):https://gist.github.com/thorsummoner/b5b1dfcff7e7fd334ec
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | import multiprocessing import sys THREADS = 3 # Used to prevent multiple threads from mixing thier output GLOBALLOCK = multiprocessing.Lock() def func_worker(args): """This function will be called by each thread. This function can not be a class method. """ # Expand list of args into named args. str1, str2 = args del args # Work # ... # Serial-only Portion GLOBALLOCK.acquire() print(str1) print(str2) GLOBALLOCK.release() def main(argp=None): """Multiprocessing Spawn Example """ # Create the number of threads you want pool = multiprocessing.Pool(THREADS) # Define two jobs, each with two args. func_args = [ ('Hello', 'World',), ('Goodbye', 'World',), ] try: # Spawn up to 9999999 jobs, I think this is the maximum possible. # I do not know what happens if you exceed this. pool.map_async(func_worker, func_args).get(9999999) except KeyboardInterrupt: # Allow ^C to interrupt from any thread. sys.stdout.write('\033[0m') sys.stdout.write('User Interupt ') pool.close() if __name__ == '__main__': main() |
。
这可能与这个问题没有100%的关联,但是在我搜索一个使用多处理和队列的例子时,它首先出现在谷歌上。
这是一个基本的示例类,您可以将项目实例化并放入队列中,然后等待队列完成。这就是我所需要的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | from multiprocessing import JoinableQueue from multiprocessing.context import Process class Renderer: queue = None def __init__(self, nb_workers=2): self.queue = JoinableQueue() self.processes = [Process(target=self.upload) for i in range(nb_workers)] for p in self.processes: p.start() def render(self, item): self.queue.put(item) def upload(self): while True: item = self.queue.get() if item is None: break # process your item here self.queue.task_done() def terminate(self): """ wait until queue is empty and terminate processes""" self.queue.join() for p in self.processes: p.terminate() r = Renderer() r.render(item1) r.render(item2) r.terminate() |
。
对于使用Komodo edit(Win10)等编辑器的每个人,将
1 2 3 4 5 | def mp_worker((inputs, the_time)): print" Process %s\tWaiting %s seconds" % (inputs, the_time) time.sleep(int(the_time)) print" Process %s\tDONE" % inputs sys.stdout.flush() |
或者作为第一行:
1 2 | if __name__ == '__main__': sys.stdout.flush() |
。
这有助于查看脚本运行期间发生的事情;而不必查看黑色的命令行框。
下面是我的代码中的一个示例(对于线程池,只需更改类名,您就拥有了进程池):
1 2 3 4 5 6 7 8 9 10 11 12 13 | def execute_run(rp): ... do something pool = ThreadPoolExecutor(6) for mat in TESTED_MATERIAL: for en in TESTED_ENERGIES: for ecut in TESTED_E_CUT: rp = RunParams( simulations, DEST_DIR, PARTICLE, mat, 960, 0.125, ecut, en ) pool.submit(execute_run, rp) pool.join() |
基本上:
pool = ThreadPoolExecutor(6) 为6个线程创建池- 然后你有一堆的for's,可以把任务添加到池中。
pool.submit(execute_run, rp) 将任务添加到池中,第一个arogment是在线程/进程中调用的函数,其余参数传递给被调用的函数。pool.join 等待所有任务完成。