how enable requests async mode?
对于此代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | import sys import gevent from gevent import monkey monkey.patch_all() import requests import urllib2 def worker(url, use_urllib2=False): if use_urllib2: content = urllib2.urlopen(url).read().lower() else: content = requests.get(url, prefetch=True).content.lower() title = content.split('')[1].split('')[0].strip() urls = ['http://www.mail.ru']*5 def by_requests(): jobs = [gevent.spawn(worker, url) for url in urls] gevent.joinall(jobs) def by_urllib2(): jobs = [gevent.spawn(worker, url, True) for url in urls] gevent.joinall(jobs) if __name__=='__main__': from timeit import Timer t = Timer(stmt="by_requests()", setup="from __main__ import by_requests") print 'by requests: %s seconds'%t.timeit(number=3) t = Timer(stmt="by_urllib2()", setup="from __main__ import by_urllib2") print 'by urllib2: %s seconds'%t.timeit(number=3) sys.exit(0) |
这个结果:
1 2 | by requests: 18.3397213892 seconds by urllib2: 2.48605842363 seconds |
在嗅探器中它看起来像这样:
对不起Kenneth Reitz。他的图书馆很精彩。
我很蠢。我需要为httplib选择猴子补丁,如下所示:
1 | gevent.monkey.patch_all(httplib=True) |
因为默认情况下禁用了httplib的补丁。
正如Kenneth所指出的,我们可以做的另一件事是让
这样做意味着我们无法"线程化"回调部分。但这应该没问题,因为由于请求/响应延迟,只能通过HTTP请求预期主要收益。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | import sys import gevent from gevent import monkey monkey.patch_all() import requests from requests import async import urllib2 def call_back(resp): content = resp.content title = content.split('')[1].split('')[0].strip() return title def worker(url, use_urllib2=False): if use_urllib2: content = urllib2.urlopen(url).read().lower() title = content.split('')[1].split('')[0].strip() else: rs = [async.get(u) for u in url] resps = async.map(rs) for resp in resps: call_back(resp) urls = ['http://www.mail.ru']*5 def by_requests(): worker(urls) def by_urllib2(): jobs = [gevent.spawn(worker, url, True) for url in urls] gevent.joinall(jobs) if __name__=='__main__': from timeit import Timer t = Timer(stmt="by_requests()", setup="from __main__ import by_requests") print 'by requests: %s seconds'%t.timeit(number=3) t = Timer(stmt="by_urllib2()", setup="from __main__ import by_urllib2") print 'by urllib2: %s seconds'%t.timeit(number=3) sys.exit(0) |
这是我的一个结果:
1 2 | by requests: 2.44117593765 seconds by urllib2: 4.41298294067 seconds |
请求已将gevent支持集成到代码库中:
http://docs.python-requests.org/en/latest/user/advanced/#asynchronous-requests
来自请求doc阻止或非阻止:
If you are concerned about the use of blocking IO, there are lots of projects out there that combine Requests with one of Python's asynchronicity frameworks. Two excellent examples are grequests and requests-futures.
我在我的机器上运行了你的代码(
1 2 3 4 5 | by requests: 3.7847161293 seconds by urllib2: 4.92611193657 seconds by requests: 2.90777993202 seconds by urllib2: 7.99798607826 seconds |