关于多线程：如何在Python中使用线程？

How to use threading in Python?

我试图理解Python中的线程。我看过文档和示例，但坦率地说，许多示例过于复杂，我很难理解它们。

如何清楚地显示多线程划分的任务？

相关讨论

自从2010年提出这个问题以来，如何使用带有map和pool的python进行简单的多线程处理已经有了真正的简化。

下面的代码来自一篇文章/博客文章，您应该明确地检查(没有附属关系)-并行性在一行中：更好的日常线程处理任务模型。我将在下面总结一下-它最终只是几行代码：

1
2
3

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(my_function, my_array)

多线程版本：

1
2
3

results = []
for item in my_array:
results.append(my_function(item))

描述

Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.

Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.

enter image description here

实施

Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy.

multiprocessing.dummy与多处理模块完全相同，但使用线程代替(一个重要的区别-对CPU密集型任务使用多个进程；IO(和IO)中的线程)：

multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
'http://www.python.org',
'http://www.python.org/about/',
'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
'http://www.python.org/doc/',
'http://www.python.org/download/',
'http://www.python.org/getit/',
'http://www.python.org/community/',
'https://wiki.python.org/moin/',
]

# make the Pool of workers
pool = ThreadPool(4)

# open the urls in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# close the pool and wait for the work to finish
pool.close()
pool.join()

以及计时结果：

1
2
3
4

Single thread: 14.4 seconds
4 Pool: 3.1 seconds
8 Pool: 1.4 seconds
13 Pool: 1.3 seconds

传递多个参数(仅在python 3.3及更高版本中使用此方法)：

要传递多个数组：

1	results = pool.starmap(function, zip(list_a, list_b))

或者传递常量和数组：

1	results = pool.starmap(function, zip(itertools.repeat(constant), list_a))

如果您使用的是早期版本的Python，则可以通过此解决方案传递多个参数。

(感谢用户136036的帮助评论)

相关讨论

这只是因为它是如此新鲜的张贴选票不足。这个答案很好地工作，并演示了"map"功能，它比这里的其他答案更容易理解语法。
@Jeffcrowe这种简单的并行化在保持结构简单和干净的同时得到了真正的改进。我很惊讶在这篇文章中没有发布这样一个有用的方法。
如果要传递多个参数，请阅读：stackoverflow.com/questions/5442910/&hellip；
或者在您的(@user136036)上面直接回复：stackoverflow.com/a/5443941/2379433
这真的是同时使用多个内核吗？这些docs.python.org/dev/library/multiprocessing.html和chriskiehl.com/article/parallelism-in-one-line似乎表明multiprocessing.dummy实际上不是同时使用多个内核，而是使用不同的内核交错使用。至少在我看来，这就是现在的情况。
这是偶数线程而不是进程吗？似乎它试图进行多进程处理！=多线程
顺便说一句，伙计们，你们可以写with Pool(8) as p: p.map( *whatever* )，也可以去掉簿记行。
@Barafuabrino：虽然这很有用，但可能值得注意的是，这只在Python3.3+中有效。
实际上，这种方法的缺点是它不能在基于类的函数中工作
回答很好，但不是应该是results.append(my_function(item))而不是results += my_function(item)吗？我也会在开头加上results = []，以明确结果是一个列表；现在看来，结果似乎是一个数字。
@BrianMcCutchon I添加了results = []，但.append()不是实际附加列表，而不是扩展列表吗？stackoverflow.com/a/252711/2327328
@phlshem no，.append()用于向列表中添加单个元素；+=用于将列表连接到另一个列表的末尾。在口译员那里试试。spam = []; spam += 5导致错误，但spam.append(5)工作正常。
@BrianMcCutchon在本例中，我曾设想函数的响应是一个数组。
@非利士但以东十一〔十二〕没有这样的假设。因此，这两段代码是不等价的。
不要忘记关闭然后终止池，否则线程可以保持活动状态pool.map(any_function, iterable)pool.close()pool.terminate()。
我不明白这一点，from multiprocessing.dummy import Pool as ThreadPool ，这里Pool是作为别名ThreadPool导入的，但是在with中，你怎么做with Pool(4) as pool:应该是个错误！！！！
@Noobeditor这是另一个用户最近编辑的。请查看编辑历史记录，如果您认为它不起作用，请重新编辑。我认为它们的意思是一种隐式的关闭池的方式(类似于打开文件)。
@不睡觉的人谢谢你指出这一点。我自己编辑的。
URL的顺序是保留在结果中，还是由首先返回的线程对结果进行排序？
您如何留下这个答案，更不用说它只对I/O操作有用？这只在一个线程上运行，这在大多数情况下都是无用的，实际上比按正常方式运行要慢。
@frobot-不仅用于I/O.urls = [ numpy.random.random((5000000, 1)), numpy.random.random((5000000, 1)), numpy.random.random((5000000, 1)), numpy.random.random((5000000, 1)) ]; result = pool.map(sum, urls)(需要用def_main包装)，并移除.dummy
如何从pool.map内部调用函数？
除非我在__main__之外运行它，否则不会工作。也许它可以在函数内部工作，但没有测试
如果function引发异常，如何管理非阻塞行为？
救了我的一天！！第一次在python中体验线程，感觉很棒！谢谢！
精彩的答案和参考链接！非常感谢！
伟大的职位！一行中的并行链接对我不起作用，无法编辑文章，因此在此处添加了正确的链接。
@Philshem你能解释一下如果输入是来自文件的行怎么办吗？它类似于您的文章，但我不是从数组而是从文件中读取行。然后，我想把结果写在一个文件中。
@用户9371654-请作为新问题发布
@菲尔舍姆，你能在这里检查我的问题吗：链接。提供的答案太复杂，无法添加或自定义。我需要这样一种干净简单的方法，但只能从文件中读/写。

下面是一个简单的例子：您需要尝试一些可选的URL，并返回第一个要响应的URL的内容。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

import Queue
import threading
import urllib2

# called by each thread
def get_url(q, url):
q.put(urllib2.urlopen(url).read())

theurls = ["http://google.com","http://yahoo.com"]

q = Queue.Queue()

for u in theurls:
t = threading.Thread(target=get_url, args = (q,u))
t.daemon = True
t.start()

s = q.get()
print s

在这种情况下，线程被用作一种简单的优化：每个子线程都在等待一个URL来解析和响应，以便将其内容放到队列中；每个线程都是一个守护进程(如果主线程结束，则不会保持进程的运行，这比不结束更常见)；主线程启动所有子线程，队列中的get不是要ait直到其中一个执行了put，然后发出结果并终止(这会删除任何可能仍在运行的子线程，因为它们是守护进程线程)。

在python中正确使用线程总是与I/O操作相连接(因为cpython不使用多个内核来运行CPU绑定的任务，所以线程化的唯一原因是在等待某些I/O时不阻塞进程)。队列几乎总是将工作分包给线程和/或收集工作结果的最佳方法，顺便说一下，它们本质上是线程安全的，因此可以避免您担心锁、条件、事件、信号量和其他线程间协调/通信概念。

相关讨论

再次感谢，马泰利博特。我已经更新了这个示例，等待所有URL响应：导入队列、线程、urlib2 q=queue.queue()urls=''a.com b.com c.com'''。split()urls_received=0 def get_url(q，url)：req=urlib2.request(url)resp=urlib2.urlopen(req)q.put(resp.read())全局URL_received urls_received+=1 print u在u rls中为u接收rls:t=threading.thread(target=get_url，args=(q，u))t.daemon=true t.start()，而q.empty()和u rls_received
@JRM：如果你看下面的下一个答案，我认为等待线程完成的更好方法是使用join()方法，因为这会使主线程等待直到完成，而不需要不断检查值来消耗处理器。@亚历克斯：谢谢，这正是我理解如何使用线程所需要的。
对于python3，将"import urllib2"替换为"import urllib.request as urlib2"。在打印语句中加上括号。
这里的链接很好
对于python 3，将Queue模块名替换为Queue模块名。方法名称相同。
我注意到解决方案只打印出其中一页。要从队列中打印两页，只需再次运行命令：s = q.get()print s@krs013，您不需要join，因为queue.get()被阻塞。
@默认情况下，tomanderson Queue.get()不是阻塞的，您必须提供一个参数Queue.get([block[, timeout]])。所以应该是Queue.get(True)，以便阻塞。来源
@tofs：错误，因为默认情况下block是True，所以queue.get()是阻塞的。资料来源：你引用的那个；()
@给你的催眠道具；)
@philshem，为了让第一个响应，现代的替代方法实际上是concurrent.futures.as_completed(py3.2中的stdlib，更好的是，pypi backport用于早期的python版本)。
Jython(是的，有一种Python的味道)给你真正的多线程，没有吉尔！不知道时间、基准等。
@htmldrum，谢谢您的代码，但是"url-received+=1"不是线程安全的
使用这个例子，如果说这些URL的大小为1亿个URL，那么它还能正常工作吗？
@亚历克斯·马泰利，你能告诉我在你的代码示例中会运行多少线程吗？

注意：对于python中的实际并行化，应该使用多处理模块来分叉并行执行的多个进程(由于全局解释器锁，python线程提供交错，但实际上是串行执行的，而不是并行执行的，并且仅在交错I/O操作时才有用)。

但是，如果您只是在寻找交错(或者正在执行可以并行化的I/O操作，尽管使用了全局解释器锁)，那么线程模块就是开始的地方。作为一个非常简单的例子，让我们考虑通过并行求和子范围来求和大范围的问题：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

import threading

class SummingThread(threading.Thread):
def __init__(self,low,high):
super(SummingThread, self).__init__()
self.low=low
self.high=high
self.total=0

def run(self):
for i in range(self.low,self.high):
self.total+=i

thread1 = SummingThread(0,500000)
thread2 = SummingThread(500000,1000000)
thread1.start() # This actually causes the thread to run
thread2.start()
thread1.join() # This waits until the thread has completed
thread2.join()
# At this point, both threads have completed
result = thread1.total + thread2.total
print result

请注意，上面是一个非常愚蠢的例子，因为它绝对没有I/O，并且由于全局解释器锁的缘故，将在cpython中以串行方式执行，尽管是交错的(带有额外的上下文切换开销)。

相关讨论

和前面提到的一样，cpython只能将线程用于gil导致的IO等待。如果要从多个内核中受益于CPU绑定的任务，请使用多处理：

1
2
3
4
5
6
7
8
9

from multiprocessing import Process

def f(name):
print 'hello', name

if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()

相关讨论

只需注意，线程不需要队列。

这是我能想象的最简单的例子，它显示了10个进程同时运行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

import threading
from random import randint
from time import sleep

def print_number(number):
# Sleeps a random 1 to 10 seconds
rand_int_var = randint(1, 10)
sleep(rand_int_var)
print"Thread" + str(number) +" slept for" + str(rand_int_var) +" seconds"

thread_list = []

for i in range(1, 10):
# Instantiates the thread
# (i) does not make a sequence, so (i,)
t = threading.Thread(target=print_number, args=(i,))
# Sticks the thread in a list so that it remains accessible
thread_list.append(t)

# Starts threads
for thread in thread_list:
thread.start()

# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
thread.join()

# Demonstrates that the main process waited for threads to complete
print"Done"

相关讨论

亚历克斯·马泰利的回答帮助了我，但是这里有一个我认为更有用的修改版本(至少对我来说)。

更新：在python2和python3中工作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

try:
# for python3
import queue
from urllib.request import urlopen
except:
# for python2
import Queue as queue
from urllib2 import urlopen

import threading

worker_data = ['http://google.com', 'http://yahoo.com', 'http://bing.com']

#load up a queue with your data, this will handle locking
q = queue.Queue()
for url in worker_data:
q.put(url)

#define a worker function
def worker(url_queue):
queue_full = True
while queue_full:
try:
#get your data off the queue, and do some work
url = url_queue.get(False)
data = urlopen(url).read()
print(len(data))

except queue.Empty:
queue_full = False

#create as many threads as you want
thread_count = 5
for i in range(thread_count):
t = threading.Thread(target=worker, args = (q,))
t.start()

相关讨论

我发现这非常有用：创建尽可能多的线程作为核心，让它们执行大量的任务(在本例中，调用shell程序)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

import Queue
import threading
import multiprocessing
import subprocess

q = Queue.Queue()
for i in range(30): #put 30 tasks in the queue
q.put(i)

def worker():
while True:
item = q.get()
#execute a task: call a shell program and wait until it completes
subprocess.call("echo"+str(item), shell=True)
q.task_done()

cpus=multiprocessing.cpu_count() #detect number of cores
print("Creating %d threads" % cpus)
for i in range(cpus):
t = threading.Thread(target=worker)
t.daemon = True
t.start()

q.join() #block until all tasks are done

相关讨论

给定函数f的线程如下：

1 2	import threading threading.Thread(target=f).start()

向f传递参数

1	threading.Thread(target=f, args=(a,b,c)).start()

相关讨论

python 3具有启动并行任务的功能。这使我们的工作更容易。

它有线程池和进程池。

以下内容提供了一种见解：

线程池执行器示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))

进程池执行器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

import concurrent.futures
import math

PRIMES = [
112272535095293,
112582705942171,
112272535095293,
115280095190773,
115797848077099,
1099726899285419]

def is_prime(n):
if n % 2 == 0:
return False

sqrt_n = int(math.floor(math.sqrt(n)))
for i in range(3, sqrt_n + 1, 2):
if n % i == 0:
return False
return True

def main():
with concurrent.futures.ProcessPoolExecutor() as executor:
for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
print('%d is prime: %s' % (number, prime))

if __name__ == '__main__':
main()

对于我来说，线程处理的最佳示例是监视异步事件。看看这个代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14

# thread_test.py
import threading
import time

class Monitor(threading.Thread):
def __init__(self, mon):
threading.Thread.__init__(self)
self.mon = mon

def run(self):
while True:
if self.mon[0] == 2:
print"Mon = 2"
self.mon[0] = 3;

您可以通过打开ipython会话并执行以下操作来使用此代码：

1
2
3
4
5
6
7
8

>>>from thread_test import Monitor
>>>a = [0]
>>>mon = Monitor(a)
>>>mon.start()
>>>a[0] = 2
Mon = 2
>>>a[0] = 2
Mon = 2

等几分钟

1 2	>>>a[0] = 2 Mon = 2

相关讨论

大多数文档和教程都使用了python的Threading和Queue模块，对于初学者来说，它们似乎是压倒性的。

也许考虑一下python 3的concurrent.futures.ThreadPoolExecutor模块。结合with条款和清单理解，这可能是一个真正的魅力。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

from concurrent.futures import ThreadPoolExecutor, as_completed

def get_url(url):
# Your actual program here. Using threading.Lock() if necessary
return""

# List of urls to fetch
urls = ["url1","url2"]

with ThreadPoolExecutor(max_workers = 5) as executor:

# Create threads
futures = {executor.submit(get_url, url) for url in urls}

# as_completed() gives you the threads once finished
for f in as_completed(futures):
# Get the results
rs = f.result()

使用全新的concurrent.futures模块

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

def sqr(val):
import time
time.sleep(0.1)
return val * val

def process_result(result):
print(result)

def process_these_asap(tasks):
import concurrent.futures

with concurrent.futures.ProcessPoolExecutor() as executor:
futures = []
for task in tasks:
futures.append(executor.submit(sqr, task))

for future in concurrent.futures.as_completed(futures):
process_result(future.result())
# Or instead of all this just do:
# results = executor.map(sqr, tasks)
# list(map(process_result, results))

def main():
tasks = list(range(10))
print('Processing {} tasks'.format(len(tasks)))
process_these_asap(tasks)
print('Done')
return 0

if __name__ == '__main__':
import sys
sys.exit(main())

执行者的方法可能对所有以前用Java弄脏手的人来说是熟悉的。

还有一点需要注意的是：为了让整个宇宙保持清醒，如果你不使用with上下文(这是非常棒的，它为你做到了这一点)，不要忘记关闭你的池/执行器。

下面是使用线程导入csv的非常简单的示例。[库包含可能因用途不同而有所不同]

帮助程序函数：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

from threading import Thread
from project import app
import csv

def import_handler(csv_file_name):
thr = Thread(target=dump_async_csv_data, args=[csv_file_name])
thr.start()

def dump_async_csv_data(csv_file_name):
with app.app_context():
with open(csv_file_name) as File:
reader = csv.DictReader(File)
for row in reader:
#DB operation/query

驱动程序功能：

1	import_handler(csv_file_name)

多线程和简单的例子，这将是有益的。您可以运行它并轻松理解多线程如何在Python中工作。在前一个线程完成工作之前，我使用锁来防止访问其他线程。通过使用

tLock = threading.BoundedSemaphore(value=4)

这一行代码可以一次允许多个进程，并保留线程的其余部分，这些线程将在稍后或完成之前的进程后运行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

import threading
import time

#tLock = threading.Lock()
tLock = threading.BoundedSemaphore(value=4)
def timer(name, delay, repeat):
print "

Timer:", name," Started"
tLock.acquire()
print"

", name," has the acquired the lock"
while repeat > 0:
time.sleep(delay)
print"

", name,":", str(time.ctime(time.time()))
repeat -= 1

print"

", name," is releaseing the lock"
tLock.release()
print"

Timer:", name," Completed"

def Main():
t1 = threading.Thread(target=timer, args=("Timer1", 2, 5))
t2 = threading.Thread(target=timer, args=("Timer2", 3, 5))
t3 = threading.Thread(target=timer, args=("Timer3", 4, 5))
t4 = threading.Thread(target=timer, args=("Timer4", 5, 5))
t5 = threading.Thread(target=timer, args=("Timer5", 0.1, 5))

t1.start()
t2.start()
t3.start()
t4.start()
t5.start()

print"

Main Complete"

if __name__ =="__main__":
Main()

我在这里看到了很多例子，其中没有执行真正的工作+它们主要是CPU限制的。下面是一个CPU绑定任务的例子，它计算1000万到1050万之间的所有素数。我用了这四种方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139

import math
import timeit
import threading
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def time_stuff(fn):
"""
Measure time of execution of a function
"""
def wrapper(*args, **kwargs):
t0 = timeit.default_timer()
fn(*args, **kwargs)
t1 = timeit.default_timer()
print("{} seconds".format(t1 - t0))
return wrapper

def find_primes_in(nmin, nmax):
"""
Compute a list of prime numbers between the given minimum and maximum arguments
"""
primes = []

#Loop from minimum to maximum
for current in range(nmin, nmax + 1):

#Take the square root of the current number
sqrt_n = int(math.sqrt(current))
found = False

#Check if the any number from 2 to the square root + 1 divides the current numnber under consideration
for number in range(2, sqrt_n + 1):

#If divisible we have found a factor, hence this is not a prime number, lets move to the next one
if current % number == 0:
found = True
break

#If not divisible, add this number to the list of primes that we have found so far
if not found:
primes.append(current)

#I am merely printing the length of the array containing all the primes but feel free to do what you want
print(len(primes))

@time_stuff
def sequential_prime_finder(nmin, nmax):
"""
Use the main process and main thread to compute everything in this case
"""
find_primes_in(nmin, nmax)

@time_stuff
def threading_prime_finder(nmin, nmax):
"""
If the minimum is 1000 and the maximum is 2000 and we have 4 workers
1000 - 1250 to worker 1
1250 - 1500 to worker 2
1500 - 1750 to worker 3
1750 - 2000 to worker 4
so lets split the min and max values according to the number of workers
"""
nrange = nmax - nmin
threads = []
for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin + (i + 1) * nrange/8)

#Start the thrread with the min and max split up to compute
#Parallel computation will not work here due to GIL since this is a CPU bound task
t = threading.Thread(target = find_primes_in, args = (start, end))
threads.append(t)
t.start()

#Dont forget to wait for the threads to finish
for t in threads:
t.join()

@time_stuff
def processing_prime_finder(nmin, nmax):
"""
Split the min, max interval similar to the threading method above but use processes this time
"""
nrange = nmax - nmin
processes = []
for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin + (i + 1) * nrange/8)
p = multiprocessing.Process(target = find_primes_in, args = (start, end))
processes.append(p)
p.start()

for p in processes:
p.join()

@time_stuff
def thread_executor_prime_finder(nmin, nmax):
"""
Split the min max interval similar to the threading method but use thread pool executor this time
This method is slightly faster than using pure threading as the pools manage threads more efficiently
This method is still slow due to the GIL limitations since we are doing a CPU bound task
"""
nrange = nmax - nmin
with ThreadPoolExecutor(max_workers = 8) as e:
for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin + (i + 1) * nrange/8)
e.submit(find_primes_in, start, end)

@time_stuff
def process_executor_prime_finder(nmin, nmax):
"""
Split the min max interval similar to the threading method but use the process pool executor
This is the fastest method recorded so far as it manages process efficiently + overcomes GIL limitations
RECOMMENDED METHOD FOR CPU BOUND TASKS
"""
nrange = nmax - nmin
with ProcessPoolExecutor(max_workers = 8) as e:
for i in range(8):
start = int(nmin + i * nrange/8)
end = int(nmin + (i + 1) * nrange/8)
e.submit(find_primes_in, start, end)

def main():
nmin = int(1e7)
nmax = int(1.05e7)
print("Sequential Prime Finder Starting")
sequential_prime_finder(nmin, nmax)
print("Threading Prime Finder Starting")
threading_prime_finder(nmin, nmax)
print("Processing Prime Finder Starting")
processing_prime_finder(nmin, nmax)
print("Thread Executor Prime Finder Starting")
thread_executor_prime_finder(nmin, nmax)
print("Process Executor Finder Starting")
process_executor_prime_finder(nmin, nmax)

main()

这是我的Mac OSX 4核心机器上的结果

1
2
3
4
5
6
7
8
9
10

Sequential Prime Finder Starting
9.708213827005238 seconds
Threading Prime Finder Starting
9.81836523200036 seconds
Processing Prime Finder Starting
3.2467174359990167 seconds
Thread Executor Prime Finder Starting
10.228896902000997 seconds
Process Executor Finder Starting
2.656402041000547 seconds

相关讨论

上面的解决方案都没有在我的GNU/Linux服务器上实际使用多个内核(我没有管理权限)。他们只是在一个核心上运行。我使用较低级别的os.fork接口生成多个进程。这就是适用于我的代码：

1
2
3
4
5
6
7
8
9

from os import fork

values = ['different', 'values', 'for', 'threads']

for i in range(len(values)):
p = fork()
if p == 0:
my_function(values[i])
break

1
2
3
4
5
6
7
8
9
10
11

import threading
import requests

def send():

r = requests.get('https://www.stackoverlow.com')

thread = []
t = threading.Thread(target=send())
thread.append(t)
t.start()

相关讨论