Timeout for python requests.get entire response
我正在收集一个网站列表的统计数据,为了简单起见,我使用了它的请求。这是我的代码:
1 2 3 4 5 | data=[] websites=['http://google.com', 'http://bbc.co.uk'] for w in websites: r= requests.get(w, verify=False) data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) ) |
现在,我希望
这个问题以前也很有趣,但没有一个答案是清楚的。为了得到一个好的答案,我将为此付出一些赏金。
我听说不使用请求是个好主意,但是我应该如何得到请求提供的好东西。(元组中的那些)
设置超时参数:
1 | r = requests.get(w, verify=False, timeout=10) |
只要您不根据该请求设置
使用eventlet怎么样?如果您想在10秒后使请求超时,即使正在接收数据,此代码段也适用于您:
1 2 3 4 5 6 | import requests import eventlet eventlet.monkey_patch() with eventlet.Timeout(10): requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip", verify=False) |
更新:http://docs.python requests.org/en/master/user/advanced/超时
在新版本的
如果为超时指定单个值,如下所示:
1 | r = requests.get('https://github.com', timeout=5) |
超时值将同时应用于
1 | r = requests.get('https://github.com', timeout=(3.05, 27)) |
如果远程服务器非常慢,您可以告诉请求永远等待响应,方法是将none作为超时值传递,然后检索一杯咖啡。
1 | r = requests.get('https://github.com', timeout=None) |
我的旧(可能过时)答案(很久以前贴过):
有其他方法可以克服这个问题:
1。使用
发件人:https://github.com/kennethreitz/requests/issues/1928 issuecomment-35811896
1
2
3
4
5
6
7
8
9 import requests from requests.adapters import TimeoutSauce
class MyTimeout(TimeoutSauce):
def __init__(self, *args, **kwargs):
connect = kwargs.get('connect', 5)
read = kwargs.get('read', connect)
super(MyTimeout, self).__init__(connect=connect, read=read)
requests.adapters.TimeoutSauce = MyTimeoutThis code should cause us to set the read timeout as equal to the
connect timeout, which is the timeout value you pass on your
Session.get() call. (Note that I haven't actually tested this code, so
it may need some quick debugging, I just wrote it straight into the
GitHub window.)
2。使用来自kevinburke的分叉请求:https://github.com/kevinburke/requests/tree/connect-timeout
文档:https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst
If you specify a single value for the timeout, like this:
1 r = requests.get('https://github.com', timeout=5)The timeout value will be applied to both the connect and the read
timeouts. Specify a tuple if you would like to set the values
separately:
1 r = requests.get('https://github.com', timeout=(3.05, 27))
Kevinburke已要求将其合并到主请求项目中,但尚未被接受。
要创建超时,可以使用信号。
解决这个问题的最好办法可能是
下面是一些示例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | import signal from time import sleep class TimeoutException(Exception): """ Simple Exception to be called on timeouts.""" pass def _timeout(signum, frame): """ Raise an TimeoutException. This is intended for use as a signal handler. The signum and frame arguments passed to this are ignored. """ # Raise TimeoutException with system default timeout message raise TimeoutException() # Set the handler for the SIGALRM signal: signal.signal(signal.SIGALRM, _timeout) # Send the SIGALRM signal in 10 seconds: signal.alarm(10) try: # Do our code: print('This will take 11 seconds...') sleep(11) print('done!') except TimeoutException: print('It timed out!') finally: # Abort the sending of the SIGALRM signal: signal.alarm(0) |
对此有一些警告:
但是,这些都在标准的python库中!除了sleep函数导入,它只是一个导入。如果你要在很多地方使用超时,你可以很容易地把超时、超时和信号放到一个函数中,然后调用它。或者你可以做一个装饰并把它放到函数上,见下面链接的答案。
您还可以将其设置为"上下文管理器",以便与
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | import signal class Timeout(): """ Timeout for use with the `with` statement.""" class TimeoutException(Exception): """ Simple Exception to be called on timeouts.""" pass def _timeout(signum, frame): """ Raise an TimeoutException. This is intended for use as a signal handler. The signum and frame arguments passed to this are ignored. """ raise Timeout.TimeoutException() def __init__(self, timeout=10): self.timeout = timeout signal.signal(signal.SIGALRM, Timeout._timeout) def __enter__(self): signal.alarm(self.timeout) def __exit__(self, exc_type, exc_value, traceback): signal.alarm(0) return exc_type is Timeout.TimeoutException # Demonstration: from time import sleep print('This is going to take maximum 10 seconds...') with Timeout(10): sleep(15) print('No timeout?') print('Done') |
这种上下文管理器方法的一个可能缺点是,您不知道代码是否实际超时。
资料来源及推荐阅读:
- 信号文件
- 这是@david narayan在超时时给出的答案。他把上面的代码组织成一个装饰师。
截至2019年1月,您可以使用
1 | requests.get(url, timeout=10) |
注:
timeout is not a time limit on the entire response download; rather,
an exception is raised if the server has not issued a response for
timeout seconds (more precisely, if no bytes have been received on the
underlying socket for timeout seconds). If no timeout is specified
explicitly, requests do not time out.
这可能有点过分,但是芹菜分布式任务队列对超时有很好的支持。
特别是,您可以定义一个软时间限制,它只会在流程中引发异常(这样您就可以清除),和/或一个硬时间限制,当超过时间限制时终止任务。
在封面下,它使用了与"before"文章中引用的相同的信号方法,但以一种更可用和更易于管理的方式。如果您监视的网站列表很长,那么您可能会受益于它的主要功能——管理大量任务执行的各种方法。
我相信您可以使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | import multiprocessing import requests def call_with_timeout(func, args, kwargs, timeout): manager = multiprocessing.Manager() return_dict = manager.dict() # define a wrapper of `return_dict` to store the result. def function(return_dict): return_dict['value'] = func(*args, **kwargs) p = multiprocessing.Process(target=function, args=(return_dict,)) p.start() # Force a max. `timeout` or wait for the process to finish p.join(timeout) # If thread is still active, it didn't finish: raise TimeoutError if p.is_alive(): p.terminate() p.join() raise TimeoutError else: return return_dict['value'] call_with_timeout(requests.get, args=(url,), kwargs={'timeout': 10}, timeout=60) |
传递给
对不起,我想知道为什么没有人提出以下简单的解决方案?-哦
1 2 | ## request requests.get('www.mypage.com', timeout=20) |
此代码适用于SocketerRor 11004和10060……
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | # -*- encoding:UTF-8 -*- __author__ = 'ACE' import requests from PyQt4.QtCore import * from PyQt4.QtGui import * class TimeOutModel(QThread): Existed = pyqtSignal(bool) TimeOut = pyqtSignal() def __init__(self, fun, timeout=500, parent=None): """ @param fun: function or lambda @param timeout: ms """ super(TimeOutModel, self).__init__(parent) self.fun = fun self.timeer = QTimer(self) self.timeer.setInterval(timeout) self.timeer.timeout.connect(self.time_timeout) self.Existed.connect(self.timeer.stop) self.timeer.start() self.setTerminationEnabled(True) def time_timeout(self): self.timeer.stop() self.TimeOut.emit() self.quit() self.terminate() def run(self): self.fun() bb = lambda: requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip") a = QApplication([]) z = TimeOutModel(bb, 500) print 'timeout' a.exec_() |
尽管有关于请求的问题,但我发现这很容易用pycurl curlopt-timeout或curlopt-timeout-ms来处理。
不需要线程或信号:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import pycurl import StringIO url = 'http://www.example.com/example.zip' timeout_ms = 1000 raw = StringIO.StringIO() c = pycurl.Curl() c.setopt(pycurl.TIMEOUT_MS, timeout_ms) # total timeout in milliseconds c.setopt(pycurl.WRITEFUNCTION, raw.write) c.setopt(pycurl.NOSIGNAL, 1) c.setopt(pycurl.URL, url) c.setopt(pycurl.HTTPGET, 1) try: c.perform() except pycurl.error: traceback.print_exc() # error generated on timeout pass # or just pass if you don't want to print the error |
timeout=(连接超时,数据读取超时)或给出单个参数(timeout=1)
1 2 3 4 5 6 7 | import requests try: req = requests.request('GET', 'https://www.google.com',timeout=(1,1)) print(req) except requests.ReadTimeout: print("READ TIME OUT") |
嗯,我在这个页面上尝试了很多解决方案,但仍然面临不稳定、随机挂起、连接性能差的问题。
我现在使用的是curl,我非常高兴它的"max-time"功能和全球性能,即使实现如此糟糕:
1 | content=commands.getoutput('curl -m6 -Ss"http://mywebsite.xyz"') |
在这里,我定义了一个最长6秒的时间参数,包括连接和传输时间。
我相信curl有一个很好的python绑定,如果你喜欢坚持python语法的话:)
设置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | try: start = time() timeout = 5 with get(config['source']['online'], stream=True, timeout=timeout) as r: r.raise_for_status() content = bytes() content_gen = r.iter_content(1024) while True: if time()-start > timeout: raise TimeoutError('Time out! ({} seconds)'.format(timeout)) try: content += next(content_gen) except StopIteration: break data = content.decode().split(' ') if len(data) in [0, 1]: raise ValueError('Bad requests data') except (exceptions.RequestException, ValueError, IndexError, KeyboardInterrupt, TimeoutError) as e: print(e) with open(config['source']['local']) as f: data = [line.strip() for line in f.readlines()] |
此处讨论https://redd.it/80kp1h
如果您使用的是选项
1 2 3 4 5 6 7 8 9 10 11 12 | r = requests.get( 'http://url_to_large_file', timeout=1, # relevant only for underlying socket stream=True) with open('/tmp/out_file.txt'), 'wb') as f: start_time = time.time() for chunk in r.iter_content(chunk_size=1024): if chunk: # filter out keep-alive new chunks f.write(chunk) if time.time() - start_time > 8: raise Exception('Request took longer than 8s') |
解决方案不需要信号或多处理。
另一个解决方案(从http://docs.python requests.org/en/master/user/advanced/流式上载获得)
上传前,您可以找到内容大小:
1 2 3 4 5 6 7 8 9 | TOO_LONG = 10*1024*1024 # 10 Mb big_url ="http://ipv4.download.thinkbroadband.com/1GB.zip" r = requests.get(big_url, stream=True) print (r.headers['content-length']) # 1073741824 if int(r.headers['content-length']) < TOO_LONG: # upload content: content = r.content |
但是要小心,发送者可能会在"内容长度"响应字段中设置不正确的值。
有一个名为time out decorator的包,您可以使用它来超时任何python函数。
1 2 3 4 5 6 | @timeout_decorator.timeout(5) def mytest(): print("Start") for i in range(1,10): time.sleep(1) print("{} seconds have passed".format(i)) |
它使用了一些答案建议的信号方法。或者,您可以告诉它使用多处理而不是信号(例如,如果您在多线程环境中)。
如果是这样,请创建一个看门狗线程,在10秒后将请求的内部状态弄乱,例如:
- 关闭底层套接字,理想情况下
- 如果请求重试操作,则触发异常
请注意,根据系统库的不同,您可能无法设置DNS解析的截止时间。
我提出了一个更直接的解决方案,这无疑是丑陋的,但解决了真正的问题。有点像这样:
1 2 3 4 | resp = requests.get(some_url, stream=True) resp.raw._fp.fp._sock.settimeout(read_timeout) # This will load the entire response even though stream is set content = resp.content |
你可以在这里阅读完整的解释