关于python:如何启用请求异步模式?

how enable requests async mode?

对于此代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import sys

import gevent
from gevent import monkey

monkey.patch_all()

import requests
import urllib2

def worker(url, use_urllib2=False):
    if use_urllib2:
        content = urllib2.urlopen(url).read().lower()
    else:
        content = requests.get(url, prefetch=True).content.lower()
    title = content.split('')[1].split('')[0].strip()

urls = ['http://www.mail.ru']*5

def by_requests():
    jobs = [gevent.spawn(worker, url) for url in urls]
    gevent.joinall(jobs)

def by_urllib2():
    jobs = [gevent.spawn(worker, url, True) for url in urls]
    gevent.joinall(jobs)

if __name__=='__main__':
    from timeit import Timer
    t = Timer(stmt="by_requests()", setup="from __main__ import by_requests")  
    print 'by requests: %s seconds'%t.timeit(number=3)
    t = Timer(stmt="by_urllib2()", setup="from __main__ import by_urllib2")  
    print 'by urllib2: %s seconds'%t.timeit(number=3)
    sys.exit(0)

这个结果:

1
2
by requests: 18.3397213892 seconds
by urllib2: 2.48605842363 seconds

在嗅探器中它看起来像这样:

><br />
说明:前5个请求由请求库发出,接下来的5个请求由urllib2库发送。<br />
红色 - 是工作冻结的时间,黑暗 - 当数据接收... wtf ?!
</p>
<p>
如果套接字库修补并且库必须以相同的方式工作,它是如何可行的?<br />
如何在没有requests.async的情况下使用请求进行异步工作?
</p>
<div class=


对不起Kenneth Reitz。他的图书馆很精彩。

我很蠢。我需要为httplib选择猴子补丁,如下所示:

1
gevent.monkey.patch_all(httplib=True)

因为默认情况下禁用了httplib的补丁。


正如Kenneth所指出的,我们可以做的另一件事是让requests模块处理异步部分。我已相应地更改了您的代码。同样,对我来说,结果始终表明requests模块的性能优于urllib2

这样做意味着我们无法"线程化"回调部分。但这应该没问题,因为由于请求/响应延迟,只能通过HTTP请求预期主要收益。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import sys

import gevent
from gevent import monkey

monkey.patch_all()

import requests
from requests import async
import urllib2

def call_back(resp):
    content = resp.content
    title = content.split('')[1].split('')[0].strip()
    return title

def worker(url, use_urllib2=False):
    if use_urllib2:
        content = urllib2.urlopen(url).read().lower()
        title = content.split('')[1].split('')[0].strip()

    else:
        rs = [async.get(u) for u in url]
        resps = async.map(rs)
        for resp in resps:
            call_back(resp)

urls = ['http://www.mail.ru']*5

def by_requests():
    worker(urls)
def by_urllib2():
    jobs = [gevent.spawn(worker, url, True) for url in urls]
    gevent.joinall(jobs)

if __name__=='__main__':
    from timeit import Timer
    t = Timer(stmt="by_requests()", setup="from __main__ import by_requests")
    print 'by requests: %s seconds'%t.timeit(number=3)
    t = Timer(stmt="by_urllib2()", setup="from __main__ import by_urllib2")
    print 'by urllib2: %s seconds'%t.timeit(number=3)
    sys.exit(0)

这是我的一个结果:

1
2
by requests: 2.44117593765 seconds
by urllib2: 4.41298294067 seconds


请求已将gevent支持集成到代码库中:

http://docs.python-requests.org/en/latest/user/advanced/#asynchronous-requests


来自请求doc阻止或非阻止:

If you are concerned about the use of blocking IO, there are lots of projects out there that combine Requests with one of Python's asynchronicity frameworks. Two excellent examples are grequests and requests-futures.


我在我的机器上运行了你的代码(python 2.7.1gevent 0.13.0requests 0.10.6)。事实证明,使用请求模块时,时间总是好一两秒。你使用的是什么版本?升级可能只是为您解决问题。

1
2
3
4
5
by requests: 3.7847161293 seconds
by urllib2: 4.92611193657 seconds

by requests: 2.90777993202 seconds
by urllib2: 7.99798607826 seconds