关于python：CPython中的全局解释器锁（GIL）是什么？

What is the global interpreter lock (GIL) in CPython?

什么是全局解释器锁，为什么它是一个问题？

从python中删除gil会产生很多噪音，我想知道为什么这一点如此重要。我自己从来没有写过编译器或解释器，所以不要在细节上节俭，我可能需要它们来理解。

python的gil旨在序列化不同线程对解释器内部的访问。在多核系统中，这意味着多线程不能有效地利用多核。(如果gil没有导致这个问题，大多数人不会关心gil——它只是作为一个问题提出，因为多核系统的普及率越来越高。)如果你想详细了解它，你可以看这段视频或看这组幻灯片。可能信息太多了，但您确实询问了详细信息：—)

注意，python的gil实际上只是参考实现cpython的一个问题。Jython和Ironpython没有镀金。作为一个Python开发人员，通常不会遇到gil，除非您正在编写一个C扩展。C扩展编写器需要在扩展阻塞I/O时释放gil，这样Python进程中的其他线程就有机会运行。

相关讨论

假设您有多个线程，它们实际上并不接触彼此的数据。它们应该尽可能独立地执行。如果您有一个"全局锁"，您需要获取它来(比如)调用一个函数，那么这可能会成为一个瓶颈。一开始有多个线程并不会给您带来太多好处。

把它与现实世界作一个类比：假设100个开发者在一家只有一个咖啡杯的公司工作。大多数开发人员将花时间等待咖啡而不是编码。

这些都不是特定于python的——我不知道python首先需要一个gil来做什么。不过，希望它能给你一个更好的概念。

型

让我们先了解一下python gil提供了什么：

任何操作/指令都在解释器中执行。gil确保解释器在特定时刻由单个线程持有。您的具有多个线程的python程序在一个解释器中工作。在任何特定的时刻，这个解释器都由一个线程持有。这意味着只有保存解释器的线程在任何时刻都在运行。

为什么这是一个问题：

您的机器可能有多个核心/处理器。多个核心允许多个线程同时执行，即多个线程可以在任何特定时刻执行。但由于解释器是由一个线程持有的，所以其他线程不做任何事情，即使它们可以访问核心。因此，多个内核并不能提供任何优势，因为在任何时候，只有一个内核在使用，它是当前包含解释器的线程所使用的核心。所以，您的程序将需要像单线程程序一样长的时间来执行。

然而，潜在的阻塞或长时间运行操作(如I/O、图像处理和numpy数字处理)发生在gil之外。从这里拿走。因此，对于此类操作，尽管存在gil，多线程操作仍将比单线程操作更快。所以，吉尔并不总是一个瓶颈。

编辑：gil是cpython的一个实现细节。Ironpython和Jython没有gil，所以真正的多线程程序应该在它们中是可能的，因为我从来没有使用过pypy和Jython，对此也不确定。

相关讨论

型

python不允许在最真实的意义上使用多线程。它有一个多线程包，但是如果您希望多线程加速代码，那么使用它通常不是一个好主意。python有一个名为global解释器锁(gil)的构造。

https://www.youtube.com/watch？V=ph374fjqfpe

gil确保一次只能执行一个"线程"。一根线获得了金边，做了一点工作，然后把金边传递到下一根线上。这种情况发生得很快，所以在人眼看来，你的线程似乎是并行执行的，但它们实际上只是轮流使用相同的CPU核心。所有这些gil传递增加了执行开销。这意味着，如果您想让代码运行得更快，那么使用线程包通常不是一个好主意。

有理由使用Python的线程包。如果你想同时运行一些东西，而效率不是一个问题，那么它完全是好的和方便的。或者，如果您运行的代码需要等待一些东西(比如某个IO)，那么它可能很有意义。但是线程库不允许您使用额外的CPU核心。

多线程可以外包给操作系统(通过进行多处理)、一些调用Python代码的外部应用程序(例如，spark或hadoop)或一些Python代码调用的代码(例如：可以让Python代码调用一个执行昂贵多线程操作的C函数)。

每当两个线程访问同一个变量时，就会出现问题。例如，在C++中，避免问题的方法是定义一些互斥锁，以防止两个线程同时进入一个对象的设置器。

在Python中可以执行多线程，但不能同时执行两个线程粒度小于一条python指令。正在运行的线程正在获取一个名为gil的全局锁。

这意味着，如果您开始编写一些多线程代码以充分利用您的多核处理器，您的性能将不会提高。通常的解决方法是进行多进程。

请注意，如果您在用C编写的方法中，就可以释放gil。

Gil的使用不是Python固有的，而是它的一些解释器，包括最常见的cpython。(已编辑，请参阅评论)

gil问题在python 3000中仍然有效。

相关讨论

python 3.7文档

我还想强调一下python threading文档中的以下引用：

CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

这链接到global interpreter lock的词汇表条目，说明gil意味着python中的线程并行性不适合CPU绑定的任务：

The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.

However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally-intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O.

Past efforts to create a"free-threaded" interpreter (one which locks shared data at a much finer granularity) have not been successful because performance suffered in the common single-processor case. It is believed that overcoming this performance issue would make the implementation much more complicated and therefore costlier to maintain.

这句话还暗示了dict和变量赋值作为cpython实现细节也是线程安全的：

python变量赋值是原子的吗？
python字典中的线程安全

接下来，multiprocessing包的文档解释了它如何通过生成过程克服gil，同时公开类似于threading的接口：

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

concurrent.futures.ProcessPoolExecutor的文档解释说它使用multiprocessing作为后端：

The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.

它应该与使用线程而不是进程的其他基类ThreadPoolExecutor形成对比。

ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously.

由此我们得出结论：ThreadPoolExecutor只适用于I/O绑定的任务，而ProcessPoolExecutor也可以处理CPU绑定的任务。

下面的问题问为什么gil首先存在：为什么全局解释器锁定？

过程与线程实验

在多处理vs线程python中，我对python中的进程vs线程做了一个实验性的分析。

快速预览结果：

enter image description here

型

我想分享一个关于视觉效果的多线程的例子。所以这里有一个典型的死锁情况

1
2
3
4
5
6

static void MyCallback(const Context &context){
Auto<Lock> lock(GetMyMutexFromContext(context));
...
EvalMyPythonString(str); //A function that takes the GIL
...
}

现在考虑序列中导致死锁的事件。

1
2
3
4
5
6
7
8

╔═══╦════════════════════════════════════════╦══════════════════════════════════════╗
║ ║ Main Thread ║ Other Thread ║
╠═══╬════════════════════════════════════════╬══════════════════════════════════════╣
║ 1 ║ Python Command acquires GIL ║ Work started ║
║ 2 ║ Computation requested ║ MyCallback runs and acquires MyMutex ║
║ 3 ║ ║ MyCallback now waits for GIL ║
║ 4 ║ MyCallback runs and waits for MyMutex ║ waiting for GIL ║
╚═══╩════════════════════════════════════════╩══════════════════════════════════════╝

号

型

为什么python(cpython和其他人)使用gil

来自http://wiki.python.org/moin/globalinterpreterlock

在cpython中，全局解释器锁(gil)是一个互斥体，它可以防止多个本机线程同时执行python字节码。这个锁是必要的，主要是因为cpython的内存管理不是线程安全的。

如何从python中删除它？

和Lua一样，也许python可以启动多个VM，但python不这样做，我想还有其他一些原因。

在numpy或其他一些python扩展库中，有时将gil释放到其他线程可以提高整个程序的效率。