python multiprocessing.Pool杀死*特定*长时间运行或挂起的进程

python multiprocessing.Pool kill specific long running or hung process

我需要执行许多并行数据库连接和查询的池。我想使用multiprocessing.Pool或current.futures ProcessPoolExecutor。 Python 2.7.5

在某些情况下，查询请求花费的时间太长或永远无法完成(挂起/僵尸进程)。我想终止已超时的multiprocessing.Pool或current.futures ProcessPoolExecutor中的特定进程。

这是一个如何杀死/重新生成整个进程池的示例，但是理想情况下，我将CPU的抖动最小化，因为我只想杀死一个特定的长时间运行的进程，该进程在超时秒后仍未返回数据。

由于某些原因，在返回并完成所有结果之后，以下代码似乎无法终止/加入进程Pool。它可能与超时发生时终止工作进程有关，但是池被杀死时会创建新的工作进程，并且结果符合预期。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

from multiprocessing import Pool
import time
import numpy as np
from threading import Timer
import thread, time, sys

def f(x):
time.sleep(x)
return x

if __name__ == '__main__':
pool = Pool(processes=4, maxtasksperchild=4)

results = [(x, pool.apply_async(f, (x,))) for x in np.random.randint(10, size=10).tolist()]

while results:
try:
x, result = results.pop(0)
start = time.time()
print result.get(timeout=5), '%d done in %f Seconds!' % (x, time.time()-start)

except Exception as e:
print str(e)
print '%d Timeout Exception! in %f' % (x, time.time()-start)
for p in pool._pool:
if p.exitcode is None:
p.terminate()

pool.terminate()
pool.join()

我不完全理解你的问题。您说要停止一个特定的进程，但是在异常处理阶段，您要对所有作业调用终止。不知道为什么要这么做。另外，我很确定使用multiprocessing.Pool中的内部变量不是很安全。综上所述，我想您的问题是，为什么在发生超时时该程序无法完成。如果这是问题所在，则可以使用以下方法：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

from multiprocessing import Pool
import time
import numpy as np
from threading import Timer
import thread, time, sys

def f(x):
time.sleep(x)
return x

if __name__ == '__main__':
pool = Pool(processes=4, maxtasksperchild=4)

results = [(x, pool.apply_async(f, (x,))) for x in np.random.randint(10, size=10).tolist()]

result = None
start = time.time()
while results:
try:
x, result = results.pop(0)
print result.get(timeout=5), '%d done in %f Seconds!' % (x, time.time()-start)
except Exception as e:
print str(e)
print '%d Timeout Exception! in %f' % (x, time.time()-start)
for i in reversed(range(len(pool._pool))):
p = pool._pool[i]
if p.exitcode is None:
p.terminate()
del pool._pool[i]

pool.terminate()
pool.join()

关键是您需要从池中删除项目。仅对它们调用终止是不够的。

我也遇到了这个问题。

原始代码和@stacksia编辑的版本存在相同的问题：
在这两种情况下，当仅其中一个进程达到超时时(即完成pool._pool的循环时)，它将杀死所有当前正在运行的进程。

在我的解决方案下面找到。它涉及到@luart建议的每个工作进程创建一个.pid文件。如果有一种方法可以标记每个工作进程，它将起作用(在下面的代码中，x可以完成此工作)。
如果有人有更好的解决方案(例如将PID保存在内存中)，请共享它。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76

#!/usr/bin/env python

from multiprocessing import Pool
import time, os
import subprocess

def f(x):
PID = os.getpid()
print 'Started:', x, 'PID=', PID

pidfile ="/tmp/PoolWorker_"+str(x)+".pid"

if os.path.isfile(pidfile):
print"%s already exists, exiting" % pidfile
sys.exit()

file(pidfile, 'w').write(str(PID))

# Do the work here
time.sleep(x*x)

# Delete the PID file
os.remove(pidfile)

return x*x

if __name__ == '__main__':
pool = Pool(processes=3, maxtasksperchild=4)

results = [(x, pool.apply_async(f, (x,))) for x in [1,2,3,4,5,6]]

pool.close()

while results:
print results
try:
x, result = results.pop(0)
start = time.time()
print result.get(timeout=3), '%d done in %f Seconds!' % (x, time.time()-start)

except Exception as e:
print str(e)
print '%d Timeout Exception! in %f' % (x, time.time()-start)

# We know which process gave us an exception: it is"x", so let's kill it!

# First, let's get the PID of that process:
pidfile = '/tmp/PoolWorker_'+str(x)+'.pid'
PID = None
if os.path.isfile(pidfile):
PID = str(open(pidfile).read())
print x, 'pidfile=',pidfile, 'PID=', PID

# Now, let's check if there is indeed such process runing:
for p in pool._pool:
print p, p.pid
if str(p.pid)==PID:
print 'Found it still running!', p, p.pid, p.is_alive(), p.exitcode

# We can also double-check how long it's been running with system 'ps' command:"
tt = str(subprocess.check_output('ps -p"'+str(p.pid)+'" o etimes=', shell=True)).strip()
print 'Run time from OS (may be way off the real time..) = ', tt

# Now, KILL the m*$@r:
p.terminate()
pool._pool.remove(p)
pool._repopulate_pool()

# Let's not forget to remove the pidfile
os.remove(pidfile)

break

pool.terminate()
pool.join()

许多人建议使用鹅卵石。它看起来不错，但仅适用于Python3。如果有人能够为python 2.6导入卵石，那就太好了。

码农家园

python multiprocessing.Pool杀死特定长时间运行或挂起的进程

python multiprocessing.Pool kill specific long running or hung process

python multiprocessing.Pool kill *specific* long running or hung process

python multiprocessing.Pool kill specific long running or hung process