使用Python的Multiprocessing模块执行同时和单独的SEAWAT / MODFLOW模型运行

Using Python's Multiprocessing module to execute simultaneous and separate SEAWAT/MODFLOW model runs

我正在尝试在我的8处理器64位Windows7机器上完成100个型号的运行。我想同时运行7个模型实例,以减少我的总运行时间(每个模型运行大约9.5分钟)。我已经看过一些与Python的多处理模块相关的线程,但是仍然缺少一些东西。

使用多处理模块

如何在多处理器系统上生成并行子进程?

python多处理队列

我的流程:

我有100个不同的参数集,我想通过seawat/modflow来比较结果。我已经为每个模型运行预构建了模型输入文件,并将它们存储在各自的目录中。我希望能够一次运行7个模型,直到完成所有实现。过程之间不需要通信,也不需要显示结果。到目前为止,我只能按顺序生成模型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import os,subprocess
import multiprocessing as mp

ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
files = []
for f in os.listdir(ws + r'\fieldgen
eals'
):
    if f.endswith('.npy'):
        files.append(f)

## def work(cmd):
##     return subprocess.call(cmd, shell=False)

def run(f,def_param=ws):
    real = f.split('_')[2].split('.')[0]
    print 'Realization %s' % real

    mf2k = r'c:\modflow\mf2k.1_19\bin\mf2k.exe '
    mf2k5 = r'c:\modflow\MF2005_1_8\bin\mf2005.exe '
    seawatV4 = r'c:\modflow\swt_v4_00_04\exe\swt_v4.exe '
    seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe '

    exe = seawatV4x64
    swt_nam = ws + r'
eals
eal%s\ss\ss.nam_swt'
% real

    os.system( exe + swt_nam )


if __name__ == '__main__':
    p = mp.Pool(processes=mp.cpu_count()-1) #-leave 1 processor available for system and other processes
    tasks = range(len(files))
    results = []
    for f in files:
        r = p.map_async(run(f), tasks, callback=results.append)

我把if __name__ == 'main':改成了下面的脚本,希望它能弥补我觉得for loop在上面的脚本中缺少并行性的问题。但是,模型甚至无法运行(没有python错误):

1
2
3
if __name__ == '__main__':
    p = mp.Pool(processes=mp.cpu_count()-1) #-leave 1 processor available for system and other processes
    p.map_async(run,((files[f],) for f in range(len(files))))

非常感谢您的任何帮助!

编辑:2012年3月26日东部时间13:31

使用@j.f.sebastian下面的答案中的"manual pool"方法,可以并行执行external.exe。模型实现一次分8批调用,但它不会等待这8次运行完成后再调用下一批,依此类推:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
from __future__ import print_function
import os,subprocess,sys
import multiprocessing as mp
from Queue import Queue
from threading import Thread

def run(f,ws):
    real = f.split('_')[-1].split('.')[0]
    print('Realization %s' % real)
    seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe '
    swt_nam = ws + r'
eals
eal%s\ss\ss.nam_swt'
% real
    subprocess.check_call([seawatV4x64, swt_nam])

def worker(queue):
   """Process files from the queue."""
    for args in iter(queue.get, None):
        try:
            run(*args)
        except Exception as e: # catch exceptions to avoid exiting the
                               # thread prematurely
            print('%r failed: %s' % (args, e,), file=sys.stderr)

def main():
    # populate files
    ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
    wdir = os.path.join(ws, r'fieldgen
eals'
)
    q = Queue()
    for f in os.listdir(wdir):
        if f.endswith('.npy'):
            q.put_nowait((os.path.join(wdir, f), ws))

    # start threads
    threads = [Thread(target=worker, args=(q,)) for _ in range(8)]
    for t in threads:
        t.daemon = True # threads die if the program dies
        t.start()

    for _ in threads: q.put_nowait(None) # signal no more files
    for t in threads: t.join() # wait for completion

if __name__ == '__main__':

    mp.freeze_support() # optional if the program is not frozen
    main()

没有可用的错误跟踪。当调用单个模型实现文件时,run()函数执行其职责,就像使用多个文件一样。唯一的区别是,对于多个文件,尽管每个实例都会立即关闭,并且只允许完成一个模型运行,此时脚本会优雅地退出(退出代码0),但它被称为len(files)次。

main()中添加一些print语句可以显示一些关于活动线程计数和线程状态的信息(注意,这只是对8个实现文件的一个测试,以使屏幕截图更易于管理,理论上所有8个文件都应该同时运行,但是在它们生成并立即死亡的地方,行为仍然会继续。除了一个):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def main():
    # populate files
    ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
    wdir = os.path.join(ws, r'fieldgen\test')
    q = Queue()
    for f in os.listdir(wdir):
        if f.endswith('.npy'):
            q.put_nowait((os.path.join(wdir, f), ws))

    # start threads
    threads = [Thread(target=worker, args=(q,)) for _ in range(mp.cpu_count())]
    for t in threads:
        t.daemon = True # threads die if the program dies
        t.start()
    print('Active Count a',threading.activeCount())
    for _ in threads:
        print(_)
        q.put_nowait(None) # signal no more files
    for t in threads:
        print(t)
        t.join() # wait for completion
    print('Active Count b',threading.activeCount())

screenshot

**读取"D:\\Data\\Users...的行是我手动停止模型从运行到完成时抛出的错误信息。一旦我停止运行模型,剩余的线程状态行将被报告,脚本将退出。

编辑2012年3月26日16:24东部时间

SeaAt允许并发执行,就像我以前做的那样,使用ipython手动生成实例,并从每个模型文件文件夹启动实例。这一次,我将从一个位置(即脚本所在的目录)启动所有模型运行。看来,罪魁祸首可能正是Seafat节省了部分产量。当运行seaat时,它会立即创建与模型运行相关的文件。其中一个文件没有保存到模型实现所在的目录中,而是保存在脚本所在的顶部目录中。这将阻止任何后续线程在同一位置保存相同的文件名(由于这些文件名是通用的,并且对每个实现都不特定,所以它们都希望这样做)。SeaAt窗口打开的时间不够长,我无法阅读,甚至看不到有错误消息,我只是在返回时意识到这一点,并尝试使用ipython运行代码,ipython直接显示SeaAt的打印输出,而不是打开一个新窗口来运行程序。

我接受@J.F.Sebastian的回答,因为一旦我解决了这个模型可执行的问题,他提供的线程代码就可以满足我的需要。

最终代码

在subprocess.check_调用中添加了cwd参数,以便在自己的目录中启动seaat的每个实例。非常关键。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
from __future__ import print_function
import os,subprocess,sys
import multiprocessing as mp
from Queue import Queue
from threading import Thread
import threading

def run(f,ws):
    real = f.split('_')[-1].split('.')[0]
    print('Realization %s' % real)
    seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe '
    cwd = ws + r'
eals
eal%s\ss'
% real
    swt_nam = ws + r'
eals
eal%s\ss\ss.nam_swt'
% real
    subprocess.check_call([seawatV4x64, swt_nam],cwd=cwd)

def worker(queue):
   """Process files from the queue."""
    for args in iter(queue.get, None):
        try:
            run(*args)
        except Exception as e: # catch exceptions to avoid exiting the
                               # thread prematurely
            print('%r failed: %s' % (args, e,), file=sys.stderr)

def main():
    # populate files
    ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
    wdir = os.path.join(ws, r'fieldgen
eals'
)
    q = Queue()
    for f in os.listdir(wdir):
        if f.endswith('.npy'):
            q.put_nowait((os.path.join(wdir, f), ws))

    # start threads
    threads = [Thread(target=worker, args=(q,)) for _ in range(mp.cpu_count()-1)]
    for t in threads:
        t.daemon = True # threads die if the program dies
        t.start()
    for _ in threads: q.put_nowait(None) # signal no more files
    for t in threads: t.join() # wait for completion

if __name__ == '__main__':
    mp.freeze_support() # optional if the program is not frozen
    main()


我在Python代码中看不到任何计算。如果您只需要并行执行几个外部程序,就可以使用subprocess来运行程序,使用threading模块来保持运行的进程数不变,但最简单的代码是使用multiprocessing.Pool

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/usr/bin/env python
import os
import multiprocessing as mp

def run(filename_def_param):
    filename, def_param = filename_def_param # unpack arguments
    ... # call external program on `filename`

def safe_run(*args, **kwargs):
   """Call run(), catch exceptions."""
    try: run(*args, **kwargs)
    except Exception as e:
        print("error: %s run(*%r, **%r)" % (e, args, kwargs))

def main():
    # populate files
    ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
    workdir = os.path.join(ws, r'fieldgen
eals'
)
    files = ((os.path.join(workdir, f), ws)
             for f in os.listdir(workdir) if f.endswith('.npy'))

    # start processes
    pool = mp.Pool() # use all available CPUs
    pool.map(safe_run, files)

if __name__=="__main__":
    mp.freeze_support() # optional if the program is not frozen
    main()

如果有很多文件,那么可以用for _ in pool.imap_unordered(safe_run, files): pass替换pool.map()

还有mutiprocessing.dummy.Pool,它提供与multiprocessing.Pool相同的接口,但使用线程而不是在这种情况下更合适的进程。

您不需要保留一些空闲的CPU。只需使用一个低优先级启动可执行文件的命令(在Linux上,它是一个nice程序)。

ThreadPoolExecutor示例

concurrent.futures.ThreadPoolExecutor既简单又充分,但它需要第三方依赖于python 2.x(它在自python 3.2以来的stdlib中)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/usr/bin/env python
import os
import concurrent.futures

def run(filename, def_param):
    ... # call external program on `filename`

# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen
eals'
)
files = (os.path.join(wdir, f) for f in os.listdir(wdir) if f.endswith('.npy'))

# start threads
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
    future_to_file = dict((executor.submit(run, f, ws), f) for f in files)

    for future in concurrent.futures.as_completed(future_to_file):
        f = future_to_file[future]
        if future.exception() is not None:
           print('%r generated an exception: %s' % (f, future.exception()))
        # run() doesn't return anything so `future.result()` is always `None`

或者如果我们忽略了run()提出的异常:

1
2
3
4
5
6
7
8
from itertools import repeat

... # the same

# start threads
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
     executor.map(run, files, repeat(ws))
     # run() doesn't return anything so `map()` results can be ignored

subprocess+threading解决方案(手动池)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/bin/env python
from __future__ import print_function
import os
import subprocess
import sys
from Queue import Queue
from threading import Thread

def run(filename, def_param):
    ... # define exe, swt_nam
    subprocess.check_call([exe, swt_nam]) # run external program

def worker(queue):
   """Process files from the queue."""
    for args in iter(queue.get, None):
        try:
            run(*args)
        except Exception as e: # catch exceptions to avoid exiting the
                               # thread prematurely
            print('%r failed: %s' % (args, e,), file=sys.stderr)

# start threads
q = Queue()
threads = [Thread(target=worker, args=(q,)) for _ in range(8)]
for t in threads:
    t.daemon = True # threads die if the program dies
    t.start()

# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen
eals'
)
for f in os.listdir(wdir):
    if f.endswith('.npy'):
        q.put_nowait((os.path.join(wdir, f), ws))

for _ in threads: q.put_nowait(None) # signal no more files
for t in threads: t.join() # wait for completion


下面是我在内存中保持最小x个线程数的方法。它是线程和多处理模块的组合。这可能是不寻常的其他技术,如尊敬的同胞解释了以上,但可能是值得重视的。为了解释,我采取的方案是一次爬行至少5个网站。

所以这里是:

1
2
3
4
5
6
7
8
9
10
11
#importing dependencies.
from multiprocessing import Process
from threading import Thread
import threading

# Crawler function
def crawler(domain):
    # define crawler technique here.
    output.write(scrapeddata +"
"
)
    pass

接下来是线程控制器功能。此函数将控制线程到主内存的流。它将继续激活线程以保持线程数"最小"限制,即5。而且,在所有活动线程(acitveCount)完成之前,它不会退出。

它将保持最少的threadnum(5)startprocess函数线程(这些线程最终将从processlist启动进程,同时将它们加入到超过60秒的时间中)。启动线程控制器后,将有2个线程不包含在上述5个限制中,即主线程和线程控制器线程本身。这就是为什么threading.activeCount()!=2已使用。

1
2
3
4
5
6
7
8
9
def threadController():
    print"Thread count before child thread starts is:-", threading.activeCount(), len(processList)
    # staring first thread. This will make the activeCount=3
    Thread(target = startProcess).start()
    # loop while thread List is not empty OR active threads have not finished up.
    while len(processList) != 0 or threading.activeCount() != 2:
        if (threading.activeCount() < (threadNum + 2) and # if count of active threads are less than the Minimum AND
            len(processList) != 0):                            # processList is not empty
                Thread(target = startProcess).start()         # This line would start startThreads function as a seperate thread **

StartProcess函数作为一个单独的线程,将从ProcessList启动进程。此函数(**started as a different thread)的目的是它将成为进程的父线程。所以当它将以60秒的超时连接它们时,这将停止StartProcess线程向前移动,但这不会停止ThreadController的执行。这样,线程控制器就可以按需要工作了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def startProcess():
    pr = processList.pop(0)
    pr.start()
    pr.join(60.00) # joining the thread with time out of 60 seconds as a float.

if __name__ == '__main__':
    # a file holding a list of domains
    domains = open("Domains.txt","r").read().split("
"
)
    output = open("test.txt","a")
    processList = [] # thread list
    threadNum = 5 # number of thread initiated processes to be run at one time

    # making process List
    for r in range(0, len(domains), 1):
        domain = domains[r].strip()
        p = Process(target = crawler, args = (domain,))
        processList.append(p) # making a list of performer threads.

    # starting the threadController as a seperate thread.
    mt = Thread(target = threadController)
    mt.start()
    mt.join() # won't let go next until threadController thread finishes.

    output.close()
    print"Done"

除了在内存中保持最少的线程数之外,我的目标还包括一些可以避免在内存中阻塞线程或进程的东西。我是用超时功能来完成的。我为任何打字错误道歉。

我希望这项工程对世界上任何人都有帮助。当做,维卡斯·高塔姆