Python: execute cat subprocess in parallel
我正在远程服务器上运行几个
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | class MainProcessor(mp.Process): def __init__(self, peaks_array): super(MainProcessor, self).__init__() self.peaks_array = peaks_array def run(self): for peak_arr in self.peaks_array: peak_processor = PeakProcessor(peak_arr) peak_processor.start() class PeakProcessor(mp.Process): def __init__(self, peak_arr): super(PeakProcessor, self).__init__() self.peak_arr = peak_arr def run(self): command = 'ssh remote_host cat files_to_process | zgrep --mmap"regex" ' log_lines = (subprocess.check_output(command, shell=True)).split(' ') process_data(log_lines) |
但是,这会导致子流程的顺序执行("ssh…cat…')命令。第二个峰值等待第一个峰值完成,依此类推。
如何修改此代码,以便子进程调用并行运行,同时仍然能够单独收集每个调用的输出?
您不需要
1 2 3 4 5 6 7 8 | #!/usr/bin/env python from subprocess import Popen # run commands in parallel processes = [Popen("echo {i:d}; sleep 2; echo {i:d}".format(i=i), shell=True) for i in range(5)] # collect statuses exitcodes = [p.wait() for p in processes] |
它同时运行5个shell命令。注:这里既不使用螺纹,也不使用
它很方便,但不必使用线程从子进程收集输出:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #!/usr/bin/env python from multiprocessing.dummy import Pool # thread pool from subprocess import Popen, PIPE, STDOUT # run commands in parallel processes = [Popen("echo {i:d}; sleep 2; echo {i:d}".format(i=i), shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True) for i in range(5)] # collect output in parallel def get_lines(process): return process.communicate()[0].splitlines() outputs = Pool(len(processes)).map(get_lines, processes) |
相关:Python线程化多个bash子进程?。
以下是在同一线程中同时从多个子进程中获取输出的代码示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | #!/usr/bin/env python3 import asyncio import sys from asyncio.subprocess import PIPE, STDOUT @asyncio.coroutine def get_lines(shell_command): p = yield from asyncio.create_subprocess_shell(shell_command, stdin=PIPE, stdout=PIPE, stderr=STDOUT) return (yield from p.communicate())[0].splitlines() if sys.platform.startswith('win'): loop = asyncio.ProactorEventLoop() # for subprocess' pipes on Windows asyncio.set_event_loop(loop) else: loop = asyncio.get_event_loop() # get commands output in parallel coros = [get_lines('"{e}" -c"print({i:d}); import time; time.sleep({i:d})"' .format(i=i, e=sys.executable)) for i in range(5)] print(loop.run_until_complete(asyncio.gather(*coros))) loop.close() |
另一种方法(而不是将shell进程置于后台的其他建议)是使用多线程。
您使用的
1 | thread.start_new_thread ( myFuncThatDoesZGrep) |
要收集结果,可以这样做:
1 2 3 4 5 6 7 8 | class MyThread(threading.Thread): def run(self): self.finished = False # Your code to run the command here. blahBlah() # When finished.... self.finished = True self.results = [] |
在多线程上运行链接中如上所述的线程。当线程对象的mythread.finished==true时,可以通过mythread.results收集结果。