Multiprocessing a file in Python, then writing the result to disk
我想做以下工作:
- 从csv文件读取数据
- 处理所述csv的每一行(假设这是一个长的网络操作)
- 将结果写入另一个文件
我试着把这个和这个答案粘在一起,但几乎没有成功。不会调用第二个队列的代码,因此不会写入磁盘。如何让进程知道有第二个队列?
请注意,我不必是
我的代码到目前为止
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | import multiprocessing import os import time in_queue = multiprocessing.Queue() out_queue = multiprocessing.Queue() def worker_main(in_queue, out_queue): print (os.getpid(),"working") while True: item = in_queue.get(True) print (os.getpid(),"got", item) time.sleep(1) #long network processing print (os.getpid(),"done", item) # put the processed items to be written to disl out_queue.put("processed:" + str(item)) pool = multiprocessing.Pool(3, worker_main,(in_queue,out_queue)) for i in range(5): # let's assume this is the file reading part in_queue.put(i) with open('out.txt', 'w') as file: while not out_queue.empty(): try: value = q.get(timeout = 1) file.write(value + ' ') except Exception as qe: print ("Empty Queue or dead process") |
尝试执行代码时遇到的第一个问题是:
1 2 3 | An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module |
我必须在
由于您的目标是迭代一个文件的行,因此
This method chops the iterable into a number of chunks which it
submits to the process pool as separate tasks.
下面是一个工作示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | import multiprocessing import os import time def worker_main(item): print(os.getpid(),"got", item) time.sleep(1) #long network processing print(os.getpid(),"done", item) # put the processed items to be written to disl return"processed:" + str(item) if __name__ == '__main__': with multiprocessing.Pool(3) as pool: with open('out.txt', 'w') as file: # range(5) simulating a 5 row csv file. for proc_row in pool.imap(worker_main, range(5)): file.write(proc_row + ' ') # printed output: # 1368 got 0 # 9228 got 1 # 12632 got 2 # 1368 done 0 # 1368 got 3 # 9228 done 1 # 9228 got 4 # 12632 done 2 # 1368 done 3 # 9228 done 4 |
1 2 3 4 5 | processed:0 processed:1 processed:2 processed:3 processed:4 |
注意,我也不用排队。