关于python：数据如何在多个调用函数调用中保持持久性？

How can data remain persistent across multiple calls of decorated function?

以下函数将用作存储已计算值结果的修饰器。如果之前已经计算过参数，函数将返回存储在cache字典中的值：

1
2
3
4
5
6
7
8
9

def cached(f):
f.cache = {}
def _cachedf(*args):
if args not in f.cache:
f.cache[args] = f(*args)

return f.cache[args]

return _cachedf

我认识到(错误地)cache不需要是函数对象的属性。事实上，以下代码也适用：

1
2
3
4
5
6
7
8

def cached(f):
cache = {} # <---- not an attribute this time!
def _cachedf(*args):
if args not in cache:
cache[args] = f(*args)

return cache[args]
return _cachedf

我很难理解cache对象如何在多个调用中保持持久性。我多次尝试调用多个缓存函数，但找不到任何冲突或问题。

有人能帮我理解，即使在返回_cachedf函数之后，cache变量仍然存在吗？

您正在这里创建一个闭包：函数_cachedf()从封闭范围中关闭变量cache。这使cache在函数对象存在的时间内保持活动。

编辑：也许我应该添加一些关于这在Python中如何工作以及CPython如何实现这一点的详细信息。

让我们来看一个简单的例子：

1
2
3
4
5
6

def f():
a = []
def g():
a.append(1)
return len(a)
return g

交互式解释器中的示例用法

1
2
3
4
5
6
7

>>> h = f()
>>> h()
1
>>> h()
2
>>> h()
3

在编译包含函数f()的模块时，编译器发现函数g()引用了来自封闭范围并将此外部引用存储在代码中与函数f()对应的对象(具体来说，它添加了将a命名为f.__code__.co_cellvars。

那么当调用函数f()时会发生什么呢？第一行创建一个新的列表对象并将其绑定到名称a。下一行创建新的函数对象(使用在模块的编译)，并将其绑定到名称g。身体此时没有执行g()，最后是funciton对象返回。

由于f()的代码对象有一个注释，名称a是由本地函数引用，当输入f()。此单元格包含对实际列表的引用对象a绑定到，函数g()引用这个细胞。这样，列表对象和单元格就可以保持活动状态当函数f()退出时。

相关讨论

Can anyone please help me understand how the cache variable still exists even after the _cachedf function is returned?

它与Python的引用计数垃圾收集器有关。cache变量将被保存并可访问，因为函数_cachedf引用了该变量，而cached的调用方引用了该变量。当您再次调用函数时，仍然使用最初创建的同一个函数对象，因此您仍然可以访问缓存。

在销毁对缓存的所有引用之前，不会丢失缓存。您可以使用del运算符来完成此操作。

例如：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

>>> import time
>>> def cached(f):
... cache = {} # <---- not an attribute this time!
... def _cachedf(*args):
... if args not in cache:
... cache[args] = f(*args)
... return cache[args]
... return _cachedf
...
...
>>> def foo(duration):
... time.sleep(duration)
... return True
...
...
>>> bob = cached(foo)
>>> bob(2) # Takes two seconds
True
>>> bob(2) # returns instantly
True
>>> del bob # Deletes reference to bob (aka _cachedf) which holds ref to cache
>>> bob = cached(foo)
>>> bob(2) # takes two seconds
True
>>>

对于记录来说，您试图实现的是一种叫memoization的方法，在decorator模式页面上有一个更完整的memoizing decorator，它执行相同的操作，但是使用decorator类。您的代码和基于类的decorator本质上是相同的，在存储之前，基于类的decorator检查哈希能力。

编辑(2017-02-02)：@simingie评论：cached(foo)(2)总是会造成延迟。

这是因为cached(foo)返回带有新缓存的新函数。调用cached(foo)(2)时，会创建一个新的新(空)缓存，然后立即调用缓存函数。

因为缓存是空的，找不到值，所以它重新运行底层函数。相反，做cached_foo = cached(foo)，然后多次调用cached_foo(2)。这只会导致第一次呼叫的延迟。此外，如果用作装饰，它将按预期工作：

1
2
3
4
5
6

@cached
def my_long_function(arg1, arg2):
return long_operation(arg1,arg2)

my_long_function(1,2) # incurs delay
my_long_function(1,2) # doesn't

如果您不熟悉装饰师，请看一下这个答案，了解上面的代码的含义。