关于python：使用多处理队列，池和锁定的简单例子

Dead simple example of using Multiprocessing Queue, Pool and Locking

我试图在http://docs.python.org/dev/library/multiprocessing.html上阅读文档，但我仍然在努力处理多处理队列、池和锁定。现在，我能够构建下面的示例。

关于队列和池，我不确定我是否以正确的方式理解了这个概念，所以如果我错了，请纠正我。我想达到的目标是每次处理2个请求(本例中数据列表有8个)，那么，我应该使用什么？池创建两个可以处理两个不同队列(最多2个)的进程，还是每次只使用队列处理2个输入？锁定将是正确打印输出。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

import multiprocessing
import time

data = (['a', '2'], ['b', '4'], ['c', '6'], ['d', '8'],
['e', '1'], ['f', '3'], ['g', '5'], ['h', '7']
)

def mp_handler(var1):
for indata in var1:
p = multiprocessing.Process(target=mp_worker, args=(indata[0], indata[1]))
p.start()

def mp_worker(inputs, the_time):
print" Processs %s\tWaiting %s seconds" % (inputs, the_time)
time.sleep(int(the_time))
print" Process %s\tDONE" % inputs

if __name__ == '__main__':
mp_handler(data)

对于您的问题，最好的解决方案是使用Pool。使用Queue并具有单独的"队列馈送"功能可能是过火的。

这里是您的程序的一个稍微重新排列的版本，这次只有2个进程被修饰成一个Pool。我认为这是最简单的方法，对原始代码的更改最少：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

import multiprocessing
import time

data = (
['a', '2'], ['b', '4'], ['c', '6'], ['d', '8'],
['e', '1'], ['f', '3'], ['g', '5'], ['h', '7']
)

def mp_worker((inputs, the_time)):
print" Processs %s\tWaiting %s seconds" % (inputs, the_time)
time.sleep(int(the_time))
print" Process %s\tDONE" % inputs

def mp_handler():
p = multiprocessing.Pool(2)
p.map(mp_worker, data)

if __name__ == '__main__':
mp_handler()

号

注意，mp_worker()函数现在接受一个参数(前两个参数的元组)，因为map()函数将输入数据分为子列表，每个子列表作为工作函数的单个参数提供。

输出：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Processs a Waiting 2 seconds
Processs b Waiting 4 seconds
Process a DONE
Processs c Waiting 6 seconds
Process b DONE
Processs d Waiting 8 seconds
Process c DONE
Processs e Waiting 1 seconds
Process e DONE
Processs f Waiting 3 seconds
Process d DONE
Processs g Waiting 5 seconds
Process f DONE
Processs h Waiting 7 seconds
Process g DONE
Process h DONE

按照以下@thales注释进行编辑：

如果您希望"为每个池限制一个锁"，这样您的进程就可以成对运行，那么ala:

A等待B等待A完成，B完成C等待，D等待C完成，D完成…

然后将handler函数更改为针对每对数据启动池(共2个进程)：

1
2
3
4
5

def mp_handler():
subdata = zip(data[0::2], data[1::2])
for task1, task2 in subdata:
p = multiprocessing.Pool(2)
p.map(mp_worker, (task1, task2))

。

现在您的输出是：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Processs a Waiting 2 seconds
Processs b Waiting 4 seconds
Process a DONE
Process b DONE
Processs c Waiting 6 seconds
Processs d Waiting 8 seconds
Process c DONE
Process d DONE
Processs e Waiting 1 seconds
Processs f Waiting 3 seconds
Process e DONE
Process f DONE
Processs g Waiting 5 seconds
Processs h Waiting 7 seconds
Process g DONE
Process h DONE

相关讨论

以下是我关于这个主题的个人要点：

这里是要点(欢迎拉请求！)：https://gist.github.com/thorsummoner/b5b1dfcff7e7fd334ec

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

import multiprocessing
import sys

THREADS = 3

# Used to prevent multiple threads from mixing thier output
GLOBALLOCK = multiprocessing.Lock()

def func_worker(args):
"""This function will be called by each thread.
This function can not be a class method.
"""
# Expand list of args into named args.
str1, str2 = args
del args

# Work
# ...

# Serial-only Portion
GLOBALLOCK.acquire()
print(str1)
print(str2)
GLOBALLOCK.release()

def main(argp=None):
"""Multiprocessing Spawn Example
"""
# Create the number of threads you want
pool = multiprocessing.Pool(THREADS)

# Define two jobs, each with two args.
func_args = [
('Hello', 'World',),
('Goodbye', 'World',),
]

try:
# Spawn up to 9999999 jobs, I think this is the maximum possible.
# I do not know what happens if you exceed this.
pool.map_async(func_worker, func_args).get(9999999)
except KeyboardInterrupt:
# Allow ^C to interrupt from any thread.
sys.stdout.write('\033[0m')
sys.stdout.write('User Interupt
')
pool.close()

if __name__ == '__main__':
main()

。

相关讨论

这可能与这个问题没有100%的关联，但是在我搜索一个使用多处理和队列的例子时，它首先出现在谷歌上。

这是一个基本的示例类，您可以将项目实例化并放入队列中，然后等待队列完成。这就是我所需要的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

from multiprocessing import JoinableQueue
from multiprocessing.context import Process

class Renderer:
queue = None

def __init__(self, nb_workers=2):
self.queue = JoinableQueue()
self.processes = [Process(target=self.upload) for i in range(nb_workers)]
for p in self.processes:
p.start()

def render(self, item):
self.queue.put(item)

def upload(self):
while True:
item = self.queue.get()
if item is None:
break

# process your item here

self.queue.task_done()

def terminate(self):
""" wait until queue is empty and terminate processes"""
self.queue.join()
for p in self.processes:
p.terminate()

r = Renderer()
r.render(item1)
r.render(item2)
r.terminate()

。

相关讨论

对于使用Komodo edit(Win10)等编辑器的每个人，将sys.stdout.flush()添加到：

1
2
3
4
5

def mp_worker((inputs, the_time)):
print" Process %s\tWaiting %s seconds" % (inputs, the_time)
time.sleep(int(the_time))
print" Process %s\tDONE" % inputs
sys.stdout.flush()

或者作为第一行：

1 2	if __name__ == '__main__': sys.stdout.flush()

。

这有助于查看脚本运行期间发生的事情；而不必查看黑色的命令行框。

下面是我的代码中的一个示例(对于线程池，只需更改类名，您就拥有了进程池)：

1
2
3
4
5
6
7
8
9
10
11
12
13

def execute_run(rp):
... do something

pool = ThreadPoolExecutor(6)
for mat in TESTED_MATERIAL:
for en in TESTED_ENERGIES:
for ecut in TESTED_E_CUT:
rp = RunParams(
simulations, DEST_DIR,
PARTICLE, mat, 960, 0.125, ecut, en
)
pool.submit(execute_run, rp)
pool.join()

基本上：

pool = ThreadPoolExecutor(6)为6个线程创建池
然后你有一堆的for's，可以把任务添加到池中。
pool.submit(execute_run, rp)将任务添加到池中，第一个arogment是在线程/进程中调用的函数，其余参数传递给被调用的函数。
pool.join等待所有任务完成。

相关讨论