关于python:为什么不能对发电机进行酸洗?

Why can't generators be pickled?

Python的泡菜(我说的是标准的Python 2.5 / 2.6 / 2.7)不能腌制锁,文件对象等。

它也不能pickle生成器和lambda表达式(或任何其他匿名代码),因为pickle实际上只存储名称引用。

在锁和操作系统相关的功能的情况下,你不能腌制它们的原因显而易见并且有意义。

但为什么你不能发泡?

注意:为了清楚起见 - 我对基本原因(或设计决策中的假设和选择)感兴趣,而不是"因为它给你一个Pickle错误"。

我意识到这个问题有点广泛的目标,所以这里有一个经验法则,你是否回答:"如果这些假设被提出,或者允许的发电机的类型在某种程度上受到限制,那么酸洗发电机会再次工作吗?"


有很多关于这方面的信息。 有关该问题的"官方消息",请阅读(已关闭)Python bugtracker问题。

由做出决定的人之一的核心推理在此博客上详细说明:

Since a generator is essentially a souped-up function, we would need to save its bytecode, which is not guarantee to be backward-compatible between Python’s versions, and its frame, which holds the state of the generator such as local variables, closures and the instruction pointer. And this latter is rather cumbersome to accomplish, since it basically requires to make the whole interpreter picklable. So, any support for pickling generators would require a large number of changes to CPython’s core.

Now if an object unsupported by pickle (e.g., a file handle, a socket, a database connection, etc) occurs in the local variables of a generator, then that generator could not be pickled automatically, regardless of any pickle support for generators we might implement. So in that case, you would still need to provide custom __getstate__ and __setstate__ methods. This problem renders any pickling support for generators rather limited.

并提到了两个建议的解决方法:

Anyway, if you need for a such feature, then look into Stackless Python which does all the above. And since Stackless’s interpreter is picklable, you also get process migration for free. This means you can interrupt a tasklet (the name for Stackless’s green threads), pickle it, send the pickle to a another machine, unpickle it, resume the tasklet, and voilà you’ve just migrated a process. This is freaking cool feature!

But in my humble opinion, the best solution to this problem to the rewrite the generators as simple iterators (i.e., one with a __next__ method). Iterators are easy and efficient space-wise to pickle because their state is explicit. You would still need to handle objects representing some external state explicitly however; you cannot get around this.


你实际上可以,具体取决于实现。 PyPy和Stackless Python都允许这个(在某种程度上):

1
2
3
4
5
6
7
8
9
10
11
Python 2.7.1 (dcae7aed462b, Aug 17 2011, 09:46:15)
[PyPy 1.6.0 with GCC 4.0.1] on darwin
Type"help","copyright","credits" or"license" for more information.
And now for something completely different: ``Not your usual analyses.''
>>>> import pickle
>>>> gen = (x for x in range(100))
>>>> next(gen)
0
>>>> pickled = pickle.dumps(gen)
>>>> next(pickle.loads(pickled))
1

在CPython中,还可以创建一个迭代器对象来模拟可选择的生成器。