sharing a dictionary and an array among pool of processes
我一直在尝试创建一个字典,它将设备mac id作为键和列表中与该mac相对应的信息。这样的东西。
1 2 3 | {00-00-0A-14-01-06:[['CMTS-51-55_10.20', '10.20.1.1', '342900', 'Cable6/0/0', '110', 'Cable6/0/0-upstream0', '129', 'Cable6/0/0-downstream', '00-00-0A-14-01-06', '10.20.1.6', '11', '1', '1424419744000', '692306', 'SignalingDown', '1', '118800000', '990000', '0', '0', '0', '342900'], ['CMTS-51-55_10.20', '10.20.1.1', '343800', 'Cable6/0/0', '110', 'Cable6/0/0-upstream0', '129', 'Cable6/0/0-downstream', '00-00-0A-14-01-06', '10.20.1.6', '11', '1', '1424420644000', '692306', 'SignalingDown', '1', '118800000', '990000', '0', '0', '0', '343800'], ['CMTS-51-55_10.20', '10.20.1.1', '342900', 'Cable6/0/0', '110', 'Cable6/0/0-upstream0', '129', 'Cable6/0/0-downstream', '00-00-0A-14-01-06', '10.20.1.6', '11', '1', '1424419744000', '377773', 'SignalingUp', '2', '118800000', '990000', '0', '0', '0', '342900']]} |
从保存在多个文件夹中的多个文件中检索这些数据值。一个文件夹可以有多个文件。
我将这个文件夹列表提供给进程池。这样,在一个进程中,一个文件夹中的所有文件都会被执行。
我正在维护一个本地字典(collection.defaultdict),用完整的信息填充它,然后将该信息放在共享的dictionany(manager.dict)中,我将它作为池对象的参数。
我也给了一个字符数组来共享子进程和主进程之间的一些模板信息。
我试图检查多处理部分中的共享任务,但我似乎没有让它工作。
请有人帮助我。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | #!/usr/local/bin/pypy from multiprocessing import Process from multiprocessing import Pool, Manager ,Value, Array import collections from collections import defaultdict import itertools import os def info(title): print title print 'module name:', __name__ if hasattr(os, 'getppid'): # only available on Unix print 'parent process:', os.getppid() print 'process id:', os.getpid() def f(template,mydict): name = 'bob' info('function f') resultDeltaArray = collections.defaultdict(list) resultDeltaArray['b'].append("hi") resultDeltaArray['b'].append("bye") resultDeltaArray['c'].append("bye") resultDeltaArray['c'].append("bye") template ="name" print resultDeltaArray #print"templaate1", template for k,v in resultDeltaArray.viewitems(): mydict[k] = v print 'hello', name #mydict = resultDeltaArray for k,v in mydict.items(): print mydict[k] #del mydict[k] if __name__ == '__main__': info('main line') manager = Manager() mydict = manager.dict() template = Array('c',50) #mydict[''] = [] #print mydict todopool = Pool(2) todopool.map_async(f, itertools.repeat(template),itertools.repeat(mydict)) #print"hi" #p = Process(target=f, args=('bob',template,mydict)) #p.start() #p.join() print mydict mydict.clear() print mydict print"template2", template |
代码是检查多处理部分。这不是实际的实施。
在这种情况下,它只是挂起并且在打印后没有做任何事情:
1 2 3 4 | main line module name: __main__ parent process: 27301 process id: 27852 |
当我尝试使用ctrl-C中断该过程时,它会在打印后再次卡住
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | Traceback (most recent call last): File"/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/process.py", line 258, in _bootstrap Process PoolWorker-2: Traceback (most recent call last): File"/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python /2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File"/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python /2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File"/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/pool.py", line 85, in worker self.run() File"/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File"/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/pool.py", line 85, in worker task = get() File"/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/queues.py", line 374, in get racquire() KeyboardInterrupt task = get() File"/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/queues.py", line 376, in get return recv() |
我是否以正确的方式使用东西? Pool对象不允许多处理数组或manager.dict作为参数吗?有没有其他方法做同样的事情?
Dicts(作为内存中的哈希表实现)的设计不是为了促进进程之间的共享(这种进程本质上不共享内存)。
考虑使用具有共享内存的线程,可能使用
文档还展示了如何使用队列和管道跨进程共享数据,但这可能不是您想要的(共享键/值存储):https://docs.python.org/2.7/library/multiprocessing。HTML#交换对象之间的流程