使用Python的Multiprocessing模块执行同时和单独的SEAWAT / MODFLOW模型运行

Using Python's Multiprocessing module to execute simultaneous and separate SEAWAT/MODFLOW model runs

我正在尝试在我的8处理器64位Windows7机器上完成100个型号的运行。我想同时运行7个模型实例，以减少我的总运行时间(每个模型运行大约9.5分钟)。我已经看过一些与Python的多处理模块相关的线程，但是仍然缺少一些东西。

使用多处理模块

如何在多处理器系统上生成并行子进程？

python多处理队列

我的流程：

我有100个不同的参数集，我想通过seawat/modflow来比较结果。我已经为每个模型运行预构建了模型输入文件，并将它们存储在各自的目录中。我希望能够一次运行7个模型，直到完成所有实现。过程之间不需要通信，也不需要显示结果。到目前为止，我只能按顺序生成模型：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

import os,subprocess
import multiprocessing as mp

ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
files = []
for f in os.listdir(ws + r'\fieldgen
eals'):
if f.endswith('.npy'):
files.append(f)

## def work(cmd):
## return subprocess.call(cmd, shell=False)

def run(f,def_param=ws):
real = f.split('_')[2].split('.')[0]
print 'Realization %s' % real

mf2k = r'c:\modflow\mf2k.1_19\bin\mf2k.exe '
mf2k5 = r'c:\modflow\MF2005_1_8\bin\mf2005.exe '
seawatV4 = r'c:\modflow\swt_v4_00_04\exe\swt_v4.exe '
seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe '

exe = seawatV4x64
swt_nam = ws + r'
eals
eal%s\ss\ss.nam_swt' % real

os.system( exe + swt_nam )

if __name__ == '__main__':
p = mp.Pool(processes=mp.cpu_count()-1) #-leave 1 processor available for system and other processes
tasks = range(len(files))
results = []
for f in files:
r = p.map_async(run(f), tasks, callback=results.append)

我把if __name__ == 'main':改成了下面的脚本，希望它能弥补我觉得for loop在上面的脚本中缺少并行性的问题。但是，模型甚至无法运行(没有python错误)：

1
2
3

if __name__ == '__main__':
p = mp.Pool(processes=mp.cpu_count()-1) #-leave 1 processor available for system and other processes
p.map_async(run,((files[f],) for f in range(len(files))))

号

非常感谢您的任何帮助！

编辑：2012年3月26日东部时间13:31

使用@j.f.sebastian下面的答案中的"manual pool"方法，可以并行执行external.exe。模型实现一次分8批调用，但它不会等待这8次运行完成后再调用下一批，依此类推：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

from __future__ import print_function
import os,subprocess,sys
import multiprocessing as mp
from Queue import Queue
from threading import Thread

def run(f,ws):
real = f.split('_')[-1].split('.')[0]
print('Realization %s' % real)
seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe '
swt_nam = ws + r'
eals
eal%s\ss\ss.nam_swt' % real
subprocess.check_call([seawatV4x64, swt_nam])

def worker(queue):
"""Process files from the queue."""
for args in iter(queue.get, None):
try:
run(*args)
except Exception as e: # catch exceptions to avoid exiting the
# thread prematurely
print('%r failed: %s' % (args, e,), file=sys.stderr)

def main():
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen
eals')
q = Queue()
for f in os.listdir(wdir):
if f.endswith('.npy'):
q.put_nowait((os.path.join(wdir, f), ws))

# start threads
threads = [Thread(target=worker, args=(q,)) for _ in range(8)]
for t in threads:
t.daemon = True # threads die if the program dies
t.start()

for _ in threads: q.put_nowait(None) # signal no more files
for t in threads: t.join() # wait for completion

if __name__ == '__main__':

mp.freeze_support() # optional if the program is not frozen
main()

没有可用的错误跟踪。当调用单个模型实现文件时，run()函数执行其职责，就像使用多个文件一样。唯一的区别是，对于多个文件，尽管每个实例都会立即关闭，并且只允许完成一个模型运行，此时脚本会优雅地退出(退出代码0)，但它被称为len(files)次。

在main()中添加一些print语句可以显示一些关于活动线程计数和线程状态的信息(注意，这只是对8个实现文件的一个测试，以使屏幕截图更易于管理，理论上所有8个文件都应该同时运行，但是在它们生成并立即死亡的地方，行为仍然会继续。除了一个)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

def main():
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen\test')
q = Queue()
for f in os.listdir(wdir):
if f.endswith('.npy'):
q.put_nowait((os.path.join(wdir, f), ws))

# start threads
threads = [Thread(target=worker, args=(q,)) for _ in range(mp.cpu_count())]
for t in threads:
t.daemon = True # threads die if the program dies
t.start()
print('Active Count a',threading.activeCount())
for _ in threads:
print(_)
q.put_nowait(None) # signal no more files
for t in threads:
print(t)
t.join() # wait for completion
print('Active Count b',threading.activeCount())

。

screenshot 。

**读取"D:\\Data\\Users...的行是我手动停止模型从运行到完成时抛出的错误信息。一旦我停止运行模型，剩余的线程状态行将被报告，脚本将退出。

编辑2012年3月26日16:24东部时间

SeaAt允许并发执行，就像我以前做的那样，使用ipython手动生成实例，并从每个模型文件文件夹启动实例。这一次，我将从一个位置(即脚本所在的目录)启动所有模型运行。看来，罪魁祸首可能正是Seafat节省了部分产量。当运行seaat时，它会立即创建与模型运行相关的文件。其中一个文件没有保存到模型实现所在的目录中，而是保存在脚本所在的顶部目录中。这将阻止任何后续线程在同一位置保存相同的文件名(由于这些文件名是通用的，并且对每个实现都不特定，所以它们都希望这样做)。SeaAt窗口打开的时间不够长，我无法阅读，甚至看不到有错误消息，我只是在返回时意识到这一点，并尝试使用ipython运行代码，ipython直接显示SeaAt的打印输出，而不是打开一个新窗口来运行程序。

我接受@J.F.Sebastian的回答，因为一旦我解决了这个模型可执行的问题，他提供的线程代码就可以满足我的需要。

最终代码

在subprocess.check_调用中添加了cwd参数，以便在自己的目录中启动seaat的每个实例。非常关键。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

from __future__ import print_function
import os,subprocess,sys
import multiprocessing as mp
from Queue import Queue
from threading import Thread
import threading

def run(f,ws):
real = f.split('_')[-1].split('.')[0]
print('Realization %s' % real)
seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe '
cwd = ws + r'
eals
eal%s\ss' % real
swt_nam = ws + r'
eals
eal%s\ss\ss.nam_swt' % real
subprocess.check_call([seawatV4x64, swt_nam],cwd=cwd)

def worker(queue):
"""Process files from the queue."""
for args in iter(queue.get, None):
try:
run(*args)
except Exception as e: # catch exceptions to avoid exiting the
# thread prematurely
print('%r failed: %s' % (args, e,), file=sys.stderr)

def main():
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen
eals')
q = Queue()
for f in os.listdir(wdir):
if f.endswith('.npy'):
q.put_nowait((os.path.join(wdir, f), ws))

# start threads
threads = [Thread(target=worker, args=(q,)) for _ in range(mp.cpu_count()-1)]
for t in threads:
t.daemon = True # threads die if the program dies
t.start()
for _ in threads: q.put_nowait(None) # signal no more files
for t in threads: t.join() # wait for completion

if __name__ == '__main__':
mp.freeze_support() # optional if the program is not frozen
main()

相关讨论

考虑到您的run函数实际上生成一个进程来完成这项工作，您也可以使用多线程而不是多处理。
谢谢你的建议，如果我不能和MP模块保持一致，我可能会走这条路——我不想换一个模块，因为我花了太多时间来阅读这个模块。
目前的行为与预期的行为有什么不同，这一点尚不清楚。什么是预期行为？如果用print_args.py替换seawatV4x64呼叫会发生什么？顺便说一句，您不需要在threading解决方案中导入multiprocessing。
@j.f.sebastian，预期的行为是代码对它在目录fieldgen
eals中找到的每个参数文件运行一次模型。它将与mp.cpu_count()并行执行此操作，在运行所有参数文件之前，在自己的处理器上并行运行的模型数量。现在所发生的是，代码同时为所有参数文件生成所有模型运行，除一个外，其余所有都立即退出，只剩下一个完整的模型运行。
当我用print_args.py替换seawatV4x64时，我得到了一个很好的打印输出，这些参数被传递到run()函数中。但是，一些模型运行的args是空白的(即，实现24和28的args之间有3行空白)。行为不一致，某些连续实现之间有空行，其他非连续实现之间有或多或少的空行)。
print_args.py的要点是显示mp.cpu_count()python print args.py进程正在并发运行，它们不会立即退出，而是在暂停之后(可以设置一个常量超时而不是random.randint()来查看它)，等等。当多个进程并行写入同一位置时，会出现空行(在本例中为stderr)。尝试手动运行两个seawatV4x64实例，看看它是否允许并发执行。
@J.F.Sebastian，我想我已经知道是什么导致了模型崩溃，除了一个。感谢您抽出时间让我达到我的目标，让我的代码以正确的数量(7或8批)同时生成多个线程。我在上面的问题上发布了一个编辑，因为我对正在发生的事情的描述太长了，无法发表评论。
您可以将cwd=unique_for_the_model_directory参数添加到check_call()中，以在不同的目录中启动。
@塞巴斯蒂安，你是我的英雄！以东十一〔十六〕给你！

我在Python代码中看不到任何计算。如果您只需要并行执行几个外部程序，就可以使用subprocess来运行程序，使用threading模块来保持运行的进程数不变，但最简单的代码是使用multiprocessing.Pool：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

#!/usr/bin/env python
import os
import multiprocessing as mp

def run(filename_def_param):
filename, def_param = filename_def_param # unpack arguments
... # call external program on `filename`

def safe_run(*args, **kwargs):
"""Call run(), catch exceptions."""
try: run(*args, **kwargs)
except Exception as e:
print("error: %s run(*%r, **%r)" % (e, args, kwargs))

def main():
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
workdir = os.path.join(ws, r'fieldgen
eals')
files = ((os.path.join(workdir, f), ws)
for f in os.listdir(workdir) if f.endswith('.npy'))

# start processes
pool = mp.Pool() # use all available CPUs
pool.map(safe_run, files)

if __name__=="__main__":
mp.freeze_support() # optional if the program is not frozen
main()

如果有很多文件，那么可以用for _ in pool.imap_unordered(safe_run, files): pass替换pool.map()。

还有mutiprocessing.dummy.Pool，它提供与multiprocessing.Pool相同的接口，但使用线程而不是在这种情况下更合适的进程。

您不需要保留一些空闲的CPU。只需使用一个低优先级启动可执行文件的命令(在Linux上，它是一个nice程序)。

ThreadPoolExecutor示例

concurrent.futures.ThreadPoolExecutor既简单又充分，但它需要第三方依赖于python 2.x(它在自python 3.2以来的stdlib中)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

#!/usr/bin/env python
import os
import concurrent.futures

def run(filename, def_param):
... # call external program on `filename`

# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen
eals')
files = (os.path.join(wdir, f) for f in os.listdir(wdir) if f.endswith('.npy'))

# start threads
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
future_to_file = dict((executor.submit(run, f, ws), f) for f in files)

for future in concurrent.futures.as_completed(future_to_file):
f = future_to_file[future]
if future.exception() is not None:
print('%r generated an exception: %s' % (f, future.exception()))
# run() doesn't return anything so `future.result()` is always `None`

号

或者如果我们忽略了run()提出的异常：

1
2
3
4
5
6
7
8

from itertools import repeat

... # the same

# start threads
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
executor.map(run, files, repeat(ws))
# run() doesn't return anything so `map()` results can be ignored

subprocess+threading解决方案(手动池)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

#!/usr/bin/env python
from __future__ import print_function
import os
import subprocess
import sys
from Queue import Queue
from threading import Thread

def run(filename, def_param):
... # define exe, swt_nam
subprocess.check_call([exe, swt_nam]) # run external program

def worker(queue):
"""Process files from the queue."""
for args in iter(queue.get, None):
try:
run(*args)
except Exception as e: # catch exceptions to avoid exiting the
# thread prematurely
print('%r failed: %s' % (args, e,), file=sys.stderr)

# start threads
q = Queue()
threads = [Thread(target=worker, args=(q,)) for _ in range(8)]
for t in threads:
t.daemon = True # threads die if the program dies
t.start()

# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen
eals')
for f in os.listdir(wdir):
if f.endswith('.npy'):
q.put_nowait((os.path.join(wdir, f), ws))

for _ in threads: q.put_nowait(None) # signal no more files
for t in threads: t.join() # wait for completion

。

相关讨论

谢谢你的回答，我宁愿继续使用MP模块，因为我已经花了几天时间阅读了它；如果我不需要的话，我不想现在就换别的模块。然而，函数同时调用所有100个实现-99立即结束，剩下的是一个实际运行的实现。我想我曾经尝试过popen模块，结果也差不多。有什么想法吗？mp.cpu_count=8.
再次运行它，它看起来像是调用了一批7次运行的模型(我设置了mp.Pool(processes=mp.cpu_count()-1))，但它并不等待这7次运行完成，然后调用下一批等等。进步！
@Jason:run()函数必须阻塞，直到给定filename的所有工作完成。用subprocess.check_call([exe, swt_nam])替换os.system(exe + swt_nam)。是否会产生任何错误？它是立即返回还是等待？检查所有路径是否正确。
我得到了同样的行为，除了在我关闭了唯一一个运行的模型之后，我得到了这个错误：Exception in thread Thread-2: Traceback (most recent call last): File"C:\Python26\lib\threading.py", line 532, in __bootstrap_inner self.run() File"C:\Python26\lib\threading.py", line 484, in run self.__target(*self.__args, **self.__kwargs) File"C:\Python26\lib\multiprocessing\pool.py", line 259, in _handle_results task = get() TypeError: ('__init__() takes exactly 3 arguments (1 given)', , ())。
我刚刚意识到脚本是在抛出那个错误后挂起的，必须手动终止它。
我的run函数中的print语句只有在一个模型运行退出后才输出(即得到100行‘实现1’、‘实现2’、‘实现n’…)。如果我让模型自己完成，就没有错误。
@杰森：输出可能被缓冲，打印到stderr(或者在print语句后添加sys.stdout.flush())。我添加了只使用subprocess和threading模块的变体。只是想让你知道它的样子。
非常感谢您抽出宝贵的时间和帮助。我正在努力在我的机器上获得concurrent.futures安装的后端口版本(我正在使用python 2.6)。一旦我到了可以尝试你的解决方案的地方，我会更新的。
我尝试了子进程+线程(手动池)解决方案，但仍然得到相同的结果。
我会让run()函数首先使用一个文件，即编写一个脚本，除了用一些固定参数调用run()之外什么都不做，并100%确保它按预期工作。你在这里不需要ThreadPoolExecutor。您是否像在我的multiprocessing示例中一样将所有全局代码移动到main()？更新您的问题，包括您使用的代码和完整的错误追溯(如果有的话)。
我已经按照你的要求更新了这个问题。再次感谢您的病人和坚定的支持！
@杰森：现在，run()可以并行工作，您可以返回到使用multiprocessing的更简单的代码。
我去试试，再次感谢你的帮助。
@Jason:btw，有一个mutiprocessing.dummy.Pool提供与multiprocessing.Pool相同的接口，但是使用线程而不是在这种情况下更合适的进程。
谢谢。我将尝试将其构建到下一个建模项目中。
自我说(3年前)以后，我想我会更新这个评论线程。我将把这段代码合并到我的下一个建模项目中。我现在准备校准另一个地下水流量模型。现在，我只是设置参数估计程序，在同一台机器上运行一个主设备和一个从设备。然而，我们最终将在集群上运行pest(pestomepage.org)来校准完整的瞬态模型。再次感谢您的帮助！
我有86000个文件要处理，这会导致"打开的文件太多"
@Briennakh答案并不是说你的文件应该同时打开。确保在处理完文件后将其关闭。
是的，我刚刚意识到我的错误。谢谢您！JFS

下面是我在内存中保持最小x个线程数的方法。它是线程和多处理模块的组合。这可能是不寻常的其他技术，如尊敬的同胞解释了以上，但可能是值得重视的。为了解释，我采取的方案是一次爬行至少5个网站。

所以这里是：

1
2
3
4
5
6
7
8
9
10
11

#importing dependencies.
from multiprocessing import Process
from threading import Thread
import threading

# Crawler function
def crawler(domain):
# define crawler technique here.
output.write(scrapeddata +"
")
pass

接下来是线程控制器功能。此函数将控制线程到主内存的流。它将继续激活线程以保持线程数"最小"限制，即5。而且，在所有活动线程(acitveCount)完成之前，它不会退出。

它将保持最少的threadnum(5)startprocess函数线程(这些线程最终将从processlist启动进程，同时将它们加入到超过60秒的时间中)。启动线程控制器后，将有2个线程不包含在上述5个限制中，即主线程和线程控制器线程本身。这就是为什么threading.activeCount()！=2已使用。

1
2
3
4
5
6
7
8
9

def threadController():
print"Thread count before child thread starts is:-", threading.activeCount(), len(processList)
# staring first thread. This will make the activeCount=3
Thread(target = startProcess).start()
# loop while thread List is not empty OR active threads have not finished up.
while len(processList) != 0 or threading.activeCount() != 2:
if (threading.activeCount() < (threadNum + 2) and # if count of active threads are less than the Minimum AND
len(processList) != 0): # processList is not empty
Thread(target = startProcess).start() # This line would start startThreads function as a seperate thread **

。

StartProcess函数作为一个单独的线程，将从ProcessList启动进程。此函数(**started as a different thread)的目的是它将成为进程的父线程。所以当它将以60秒的超时连接它们时，这将停止StartProcess线程向前移动，但这不会停止ThreadController的执行。这样，线程控制器就可以按需要工作了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

def startProcess():
pr = processList.pop(0)
pr.start()
pr.join(60.00) # joining the thread with time out of 60 seconds as a float.

if __name__ == '__main__':
# a file holding a list of domains
domains = open("Domains.txt","r").read().split("
")
output = open("test.txt","a")
processList = [] # thread list
threadNum = 5 # number of thread initiated processes to be run at one time

# making process List
for r in range(0, len(domains), 1):
domain = domains[r].strip()
p = Process(target = crawler, args = (domain,))
processList.append(p) # making a list of performer threads.

# starting the threadController as a seperate thread.
mt = Thread(target = threadController)
mt.start()
mt.join() # won't let go next until threadController thread finishes.

output.close()
print"Done"

。

除了在内存中保持最少的线程数之外，我的目标还包括一些可以避免在内存中阻塞线程或进程的东西。我是用超时功能来完成的。我为任何打字错误道歉。

我希望这项工程对世界上任何人都有帮助。当做，维卡斯·高塔姆