How to pass a function with more than one argument to python concurrent.futures.ProcessPoolExecutor.map()?
我希望
第一个问题:有更好的方法吗?在numberlist的大小可能是数百万到数十亿个元素的情况下,因此ref大小必须遵循numberlist,这种方法不必要地占用宝贵的内存,我想避免。我这样做是因为我读到
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | import concurrent.futures as cf nmax = 10 numberlist = range(nmax) ref = [5, 5, 5, 5, 5, 5, 5, 5, 5, 5] workers = 3 def _findmatch(listnumber, ref): print('def _findmatch(listnumber, ref):') x='' listnumber=str(listnumber) ref = str(ref) print('listnumber = {0} and ref = {1}'.format(listnumber, ref)) if ref in listnumber: x = listnumber print('x = {0}'.format(x)) return x a = map(lambda x, y: _findmatch(x, y), numberlist, ref) for n in a: print(n) if str(ref[0]) in n: print('match') with cf.ProcessPoolExecutor(max_workers=workers) as executor: #for n in executor.map(_findmatch, numberlist): for n in executor.map(lambda x, y: _findmatch(x, ref), numberlist, ref): print(type(n)) print(n) if str(ref[0]) in n: print('match') |
运行上面的代码,我发现
1 2 3 4 5 6 | Traceback (most recent call last): File"/usr/lib/python3.5/multiprocessing/queues.py", line 241, in _feed obj = ForkingPickler.dumps(obj) File"/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps cls(buf, protocol).dump(obj) _pickle.PicklingError: Can't pickle <function <lambda> at 0x7fd2a14db0d0>: attribute lookup <lambda> on __main__ failed |
问题2:为什么会出现此错误?如何使用concurrent.futures.ProcessPoolExecutor.map()调用具有多个参数的函数?
要先回答你的第二个问题,你会得到一个例外,因为像你正在使用的
1 2 3 | with cf.ProcessPoolExecutor(max_workers=workers) as executor: for n in executor.map(_findmatch, numberlist, ref): ... |
关于传递第二个常量参数而不创建巨型列表的第一个问题,您可以通过多种方式解决这个问题。一种方法可能是使用
但是更好的方法可能是编写一个额外的函数来为你传递常量参数。 (也许这就是你尝试使用
1 2 3 4 5 6 | def _helper(x): return _findmatch(x, 5) with cf.ProcessPoolExecutor(max_workers=workers) as executor: for n in executor.map(_helper, numberlist): ... |
(1)无需列出清单。您可以使用
(2)您需要将命名函数传递给
工作代码是
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | import concurrent.futures as cf import itertools nmax = 10 numberlist = range(nmax) workers = 3 def _findmatch(listnumber, ref): print('def _findmatch(listnumber, ref):') x='' listnumber=str(listnumber) ref = str(ref) print('listnumber = {0} and ref = {1}'.format(listnumber, ref)) if ref in listnumber: x = listnumber print('x = {0}'.format(x)) return x with cf.ProcessPoolExecutor(max_workers=workers) as executor: #for n in executor.map(_findmatch, numberlist): for n in executor.map(_findmatch, numberlist, itertools.repeat(5)): print(type(n)) print(n) #if str(ref[0]) in n: # print('match') |
关于你的第一个问题,我是否正确理解你想要传递一个参数,该参数的值仅在你调用
1 2 3 4 5 6 7 8 9 | from functools import partial refval = 5 def _findmatch(ref, listnumber): # arguments swapped ... with cf.ProcessPoolExecutor(max_workers=workers) as executor: for n in executor.map(partial(_findmatch, refval), numberlist): ... |
回覆。问题2,第一部分:我还没有找到试图挑选(序列化)应该并行执行的函数的确切代码片段,但这听起来很自然 - 不仅是参数而且还有该功能必须以某种方式转移给工人,并且可能必须为此转移进行序列化。事实上
回覆。问题2,第二部分:如果你想在
1 2 | for n in executor.map(_findmatch, numberlist, ref): ... |