Asynchronous Requests with Python requests
我尝试了在Python请求库的文档中提供的示例。
使用
1 2 | out = async.map(rs) print out[0].content |
注释
以下答案不适用于请求v0.13.0+。在编写这个问题之后,异步功能被移到了grequest。但是,您可以用下面的
我留下这个答案是为了反映最初关于使用请求 要使用
例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | from requests import async # If using requests > v0.13.0, use # from grequests import async urls = [ 'http://python-requests.org', 'http://httpbin.org', 'http://python-guide.org', 'http://kennethreitz.com' ] # A simple task to do to each response object def do_something(response): print response.url # A list to hold our things to do via async async_list = [] for u in urls: # The"hooks = {..." part is where you define what you want to do # # Note the lack of parentheses following do_something, this is # because the response will be used as the first argument automatically action_item = async.get(u, hooks = {'response' : do_something}) # Add the task to our list of things to do via async async_list.append(action_item) # Do our list of things to do via async async.map(async_list) |
请参见:https://github.com/kennethreitz/grequests
还有:通过python发送多个HTTP请求的理想方法?
安装:1 | $ pip install grequests |
用途:
构建堆栈:
1 2 3 4 5 6 7 8 9 10 11 | import grequests urls = [ 'http://www.heroku.com', 'http://tablib.org', 'http://httpbin.org', 'http://python-requests.org', 'http://kennethreitz.com' ] rs = (grequests.get(u) for u in urls) |
发送堆栈
1 | grequests.map(rs) |
结果看起来像
1 | [<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>] |
grequest似乎没有为并发请求设置限制,即当多个请求发送到同一个服务器时。
我测试了未来请求和grequest。grequest速度更快,但会带来猴子补丁和其他依赖性问题。请求预购比grequest慢几倍。我决定将自己的请求简单地封装到threadpollexecutor中,速度几乎和grequest一样快,但没有外部依赖性。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | import requests import concurrent.futures def get_urls(): return ["url1","url2"] def load_url(url, timeout): return requests.get(url, timeout = timeout) with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor: future_to_url = {executor.submit(load_url, url, 10): url for url in get_urls()} for future in concurrent.futures.as_completed(future_to_url): url = future_to_url[future] try: data = future.result() except Exception as exc: resp_err = resp_err + 1 else: resp_ok = resp_ok + 1 |
也许要求期货是另一个选择。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | from requests_futures.sessions import FuturesSession session = FuturesSession() # first request is started in background future_one = session.get('http://httpbin.org/get') # second requests is started immediately future_two = session.get('http://httpbin.org/get?foo=bar') # wait for the first request to complete, if it hasn't already response_one = future_one.result() print('response one status: {0}'.format(response_one.status_code)) print(response_one.content) # wait for the second request to complete, if it hasn't already response_two = future_two.result() print('response two status: {0}'.format(response_two.status_code)) print(response_two.content) |
在Office文档中也建议这样做。如果你不想牵扯到Gevent,那是个不错的选择。
我知道这已经关闭了一段时间,但我认为在请求库上提升另一个异步解决方案可能会很有用。
1 2 3 4 5 | list_of_requests = ['http://moop.com', 'http://doop.com', ...] from simple_requests import Requests for response in Requests().swarm(list_of_requests): print response.content |
文档如下:http://pythonhosted.org/simple-requests/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | threads=list() for requestURI in requests: t = Thread(target=self.openURL, args=(requestURI,)) t.start() threads.append(t) for thread in threads: thread.join() ... def openURL(self, requestURI): o = urllib2.urlopen(requestURI, timeout = 600) o... |
一段时间以来,我一直在使用Python请求对Github的Gist API进行异步调用。
例如,请参见下面的代码:
https://github.com/davidthewatson/flasgist/blob/master/views.py_l60-72
这种风格的python可能不是最清楚的例子,但我可以向您保证代码可以工作。如果这让你感到困惑,请告诉我,我会记录下来。
如果您想使用asyncio,那么
我还尝试了一些在Python中使用异步方法的方法,我有多么幸运地使用Twisted进行异步编程。它有较少的问题,并且有很好的文档记录。这是一个链接的东西西米拉尔什么你正在试图扭曲。
http://pythonquirks.blogspot.com/2011/04/twisted-asynchronous-http-request.html