Python请求的异步请求

Asynchronous Requests with Python requests

我尝试了在Python请求库的文档中提供的示例。

使用async.map(rs),我得到了响应代码,但我想得到请求的每个页面的内容。例如,这不起作用:

1
2
out = async.map(rs)
print out[0].content


注释

以下答案不适用于请求v0.13.0+。在编写这个问题之后,异步功能被移到了grequest。但是,您可以用下面的grequests替换requests,它应该可以工作。

我留下这个答案是为了反映最初关于使用请求

要使用async.map异步执行多个任务,您必须:

  • 为要对每个对象(任务)执行的操作定义函数
  • 在您的请求中添加该函数作为事件挂钩
  • 在所有请求/行动的列表中呼叫async.map
  • 例子:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    from requests import async
    # If using requests > v0.13.0, use
    # from grequests import async

    urls = [
        'http://python-requests.org',
        'http://httpbin.org',
        'http://python-guide.org',
        'http://kennethreitz.com'
    ]

    # A simple task to do to each response object
    def do_something(response):
        print response.url

    # A list to hold our things to do via async
    async_list = []

    for u in urls:
        # The"hooks = {..." part is where you define what you want to do
        #
        # Note the lack of parentheses following do_something, this is
        # because the response will be used as the first argument automatically
        action_item = async.get(u, hooks = {'response' : do_something})

        # Add the task to our list of things to do via async
        async_list.append(action_item)

    # Do our list of things to do via async
    async.map(async_list)


    async现在是一个独立的模块:grequests

    请参见:https://github.com/kennethreitz/grequests

    还有:通过python发送多个HTTP请求的理想方法?

    安装:

    1
    $ pip install grequests

    用途:

    构建堆栈:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    import grequests

    urls = [
        'http://www.heroku.com',
        'http://tablib.org',
        'http://httpbin.org',
        'http://python-requests.org',
        'http://kennethreitz.com'
    ]

    rs = (grequests.get(u) for u in urls)

    发送堆栈

    1
    grequests.map(rs)

    结果看起来像

    1
    [<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]

    grequest似乎没有为并发请求设置限制,即当多个请求发送到同一个服务器时。


    我测试了未来请求和grequest。grequest速度更快,但会带来猴子补丁和其他依赖性问题。请求预购比grequest慢几倍。我决定将自己的请求简单地封装到threadpollexecutor中,速度几乎和grequest一样快,但没有外部依赖性。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    import requests
    import concurrent.futures

    def get_urls():
        return ["url1","url2"]

    def load_url(url, timeout):
        return requests.get(url, timeout = timeout)

    with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

        future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
        for future in concurrent.futures.as_completed(future_to_url):
            url = future_to_url[future]
            try:
                data = future.result()
            except Exception as exc:
                resp_err = resp_err + 1
            else:
                resp_ok = resp_ok + 1


    也许要求期货是另一个选择。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    from requests_futures.sessions import FuturesSession

    session = FuturesSession()
    # first request is started in background
    future_one = session.get('http://httpbin.org/get')
    # second requests is started immediately
    future_two = session.get('http://httpbin.org/get?foo=bar')
    # wait for the first request to complete, if it hasn't already
    response_one = future_one.result()
    print('response one status: {0}'.format(response_one.status_code))
    print(response_one.content)
    # wait for the second request to complete, if it hasn't already
    response_two = future_two.result()
    print('response two status: {0}'.format(response_two.status_code))
    print(response_two.content)

    在Office文档中也建议这样做。如果你不想牵扯到Gevent,那是个不错的选择。


    我知道这已经关闭了一段时间,但我认为在请求库上提升另一个异步解决方案可能会很有用。

    1
    2
    3
    4
    5
    list_of_requests = ['http://moop.com', 'http://doop.com', ...]

    from simple_requests import Requests
    for response in Requests().swarm(list_of_requests):
        print response.content

    文档如下:http://pythonhosted.org/simple-requests/


    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    threads=list()

    for requestURI in requests:
        t = Thread(target=self.openURL, args=(requestURI,))
        t.start()
        threads.append(t)

    for thread in threads:
        thread.join()

    ...

    def openURL(self, requestURI):
        o = urllib2.urlopen(requestURI, timeout = 600)
        o...


    一段时间以来,我一直在使用Python请求对Github的Gist API进行异步调用。

    例如,请参见下面的代码:

    https://github.com/davidthewatson/flasgist/blob/master/views.py_l60-72

    这种风格的python可能不是最清楚的例子,但我可以向您保证代码可以工作。如果这让你感到困惑,请告诉我,我会记录下来。


    如果您想使用asyncio,那么requests-asyncrequests提供异步/等待功能-https://github.com/encode/requests-async


    我还尝试了一些在Python中使用异步方法的方法,我有多么幸运地使用Twisted进行异步编程。它有较少的问题,并且有很好的文档记录。这是一个链接的东西西米拉尔什么你正在试图扭曲。

    http://pythonquirks.blogspot.com/2011/04/twisted-asynchronous-http-request.html