python请求超时获取整个响应

Timeout for python requests.get entire response

我正在收集一个网站列表的统计数据，为了简单起见，我使用了它的请求。这是我的代码：

1
2
3
4
5

data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
r= requests.get(w, verify=False)
data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )

现在，我希望requests.get在10秒后超时，这样循环就不会被卡住。

这个问题以前也很有趣，但没有一个答案是清楚的。为了得到一个好的答案，我将为此付出一些赏金。

我听说不使用请求是个好主意，但是我应该如何得到请求提供的好东西。(元组中的那些)

相关讨论

设置超时参数：

1	r = requests.get(w, verify=False, timeout=10)

只要您不根据该请求设置stream=True，如果连接时间超过10秒，或者服务器发送数据时间超过10秒，那么对requests.get()的调用就会超时。

相关讨论

使用eventlet怎么样？如果您想在10秒后使请求超时，即使正在接收数据，此代码段也适用于您：

1
2
3
4
5
6

import requests
import eventlet
eventlet.monkey_patch()

with eventlet.Timeout(10):
requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip", verify=False)

相关讨论

当然，这是不必要的复杂。
为什么这是不受欢迎的@alvaro？我刚刚查看了eventlet，在他们页面的底部有一个例子，非常类似于我要做的事情？！
我指的是@holdenweb的最后评论
它并不是未经认可的，但您的解决方案涉及导入第三方模块。考虑到在任何时候只有一个套接字处于活动状态，设置socket模块中的默认超时似乎比较简单，该模块对于任何进行联网的对象都是常驻的。除非有一些我不知道的令人信服的优势，这不会让我感到意外。
@holdenweb导入模块是否复杂？这涉及到导入第三方模块，就像请求模块一样。当读取超时发生时，设置默认超时将起作用，但正如我理解的问题一样，用户希望在n秒后将请求超时，无论是否读取超时。
谢谢您。我现在了解了您的解决方案的技术优势(您在回答的开头非常简洁地陈述了这一点)，并对其进行了升级。第三方模块的问题不是要导入它们，而是要确保它们是要导入的，因此我自己倾向于尽可能使用标准库。
@霍登网，谢谢，确实，当我回顾我最初的帖子时，我意识到我的解释不够完整。
实际上，使用setDefaultTimeout是一个很好的方法：这里有一个例子，如@holdenweb所提到的那样——>link，这是我经常使用的一个选项。
这是一个很好的答案，但它带来了不必要的依赖。你可以用信号得到完全相同的语法，看我的答案。
干得不错！+ 1！我想超时和C库中的任何处理(例如，针对HTTPS请求的SSL处理)都不涉及名称解析，但是优雅胜过了我所关心的任何问题：p
是否需要eventlet.monkey_patch()？
是的，socket模块需要猴补丁，所以至少需要eventlet.monkey_patch(socket=True)模块。
截至2018年，该答案已过时。使用requests.get('https://github.com', timeout=5) 。
请求开发人员的这条评论很好地解释了为什么请求没有总的响应时间超时，以及它们的建议。
一点也不理想。请参阅下面的答案。也根本不适合我。
我有一个递归错误：当我使用这个包时，超过了最大递归深度。它对我不起作用

更新：http://docs.python requests.org/en/master/user/advanced/超时

在新版本的requests中：

如果为超时指定单个值，如下所示：

1	r = requests.get('https://github.com', timeout=5)

超时值将同时应用于connect和read超时。如果要单独设置值，请指定元组：

1	r = requests.get('https://github.com', timeout=(3.05, 27))

如果远程服务器非常慢，您可以告诉请求永远等待响应，方法是将none作为超时值传递，然后检索一杯咖啡。

1	r = requests.get('https://github.com', timeout=None)

我的旧(可能过时)答案(很久以前贴过)：

有其他方法可以克服这个问题：

1。使用TimeoutSauce内部类

发件人：https://github.com/kennethreitz/requests/issues/1928 issuecomment-35811896

1
2
3
4
5
6
7
8
9
import requests from requests.adapters import TimeoutSauce

class MyTimeout(TimeoutSauce):
def __init__(self, *args, **kwargs):
connect = kwargs.get('connect', 5)
read = kwargs.get('read', connect)
super(MyTimeout, self).__init__(connect=connect, read=read)

requests.adapters.TimeoutSauce = MyTimeout

This code should cause us to set the read timeout as equal to the
connect timeout, which is the timeout value you pass on your
Session.get() call. (Note that I haven't actually tested this code, so
it may need some quick debugging, I just wrote it straight into the
GitHub window.)

2。使用来自kevinburke的分叉请求：https://github.com/kevinburke/requests/tree/connect-timeout

文档：https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst

If you specify a single value for the timeout, like this:

1
r = requests.get('https://github.com', timeout=5)

The timeout value will be applied to both the connect and the read
timeouts. Specify a tuple if you would like to set the values
separately:

1
r = requests.get('https://github.com', timeout=(3.05, 27))

Kevinburke已要求将其合并到主请求项目中，但尚未被接受。

相关讨论

要创建超时，可以使用信号。

解决这个问题的最好办法可能是

将异常设置为报警信号的处理程序

以10秒延迟呼叫报警信号

调用try-except-finally块内的函数。

如果函数超时，则到达except块。

在最后一个模块中，您中止警报，因此稍后不再进行信号处理。

下面是一些示例代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

import signal
from time import sleep

class TimeoutException(Exception):
""" Simple Exception to be called on timeouts."""
pass

def _timeout(signum, frame):
""" Raise an TimeoutException.

This is intended for use as a signal handler.
The signum and frame arguments passed to this are ignored.

"""
# Raise TimeoutException with system default timeout message
raise TimeoutException()

# Set the handler for the SIGALRM signal:
signal.signal(signal.SIGALRM, _timeout)
# Send the SIGALRM signal in 10 seconds:
signal.alarm(10)

try:
# Do our code:
print('This will take 11 seconds...')
sleep(11)
print('done!')
except TimeoutException:
print('It timed out!')
finally:
# Abort the sending of the SIGALRM signal:
signal.alarm(0)

对此有一些警告：

它不是线程安全的，信号总是传递到主线程，因此您不能将它放在任何其他线程中。

在调度信号和执行实际代码之后，会有轻微的延迟。这意味着这个例子会超时，即使它只睡了10秒钟。

但是，这些都在标准的python库中！除了sleep函数导入，它只是一个导入。如果你要在很多地方使用超时，你可以很容易地把超时、超时和信号放到一个函数中，然后调用它。或者你可以做一个装饰并把它放到函数上，见下面链接的答案。

您还可以将其设置为"上下文管理器"，以便与with语句一起使用：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

import signal
class Timeout():
""" Timeout for use with the `with` statement."""

class TimeoutException(Exception):
""" Simple Exception to be called on timeouts."""
pass

def _timeout(signum, frame):
""" Raise an TimeoutException.

This is intended for use as a signal handler.
The signum and frame arguments passed to this are ignored.

"""
raise Timeout.TimeoutException()

def __init__(self, timeout=10):
self.timeout = timeout
signal.signal(signal.SIGALRM, Timeout._timeout)

def __enter__(self):
signal.alarm(self.timeout)

def __exit__(self, exc_type, exc_value, traceback):
signal.alarm(0)
return exc_type is Timeout.TimeoutException

# Demonstration:
from time import sleep

print('This is going to take maximum 10 seconds...')
with Timeout(10):
sleep(15)
print('No timeout?')
print('Done')

这种上下文管理器方法的一个可能缺点是，您不知道代码是否实际超时。

资料来源及推荐阅读：

信号文件
这是@david narayan在超时时给出的答案。他把上面的代码组织成一个装饰师。

相关讨论

截至2019年1月，您可以使用requests的timeout论点，即：

1	requests.get(url, timeout=10)

注：

timeout is not a time limit on the entire response download; rather,
an exception is raised if the server has not issued a response for
timeout seconds (more precisely, if no bytes have been received on the
underlying socket for timeout seconds). If no timeout is specified
explicitly, requests do not time out.

相关讨论

这可能有点过分，但是芹菜分布式任务队列对超时有很好的支持。

特别是，您可以定义一个软时间限制，它只会在流程中引发异常(这样您就可以清除)，和/或一个硬时间限制，当超过时间限制时终止任务。

在封面下，它使用了与"before"文章中引用的相同的信号方法，但以一种更可用和更易于管理的方式。如果您监视的网站列表很长，那么您可能会受益于它的主要功能——管理大量任务执行的各种方法。

相关讨论

我相信您可以使用multiprocessing，而不依赖于第三方软件包：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

import multiprocessing
import requests

def call_with_timeout(func, args, kwargs, timeout):
manager = multiprocessing.Manager()
return_dict = manager.dict()

# define a wrapper of `return_dict` to store the result.
def function(return_dict):
return_dict['value'] = func(*args, **kwargs)

p = multiprocessing.Process(target=function, args=(return_dict,))
p.start()

# Force a max. `timeout` or wait for the process to finish
p.join(timeout)

# If thread is still active, it didn't finish: raise TimeoutError
if p.is_alive():
p.terminate()
p.join()
raise TimeoutError
else:
return return_dict['value']

call_with_timeout(requests.get, args=(url,), kwargs={'timeout': 10}, timeout=60)

传递给kwargs的超时是从服务器获取任何响应的超时，参数timeout是获取完整响应的超时。

相关讨论

对不起，我想知道为什么没有人提出以下简单的解决方案？-哦

1 2	## request requests.get('www.mypage.com', timeout=20)

相关讨论

此代码适用于SocketerRor 11004和10060……

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

# -*- encoding:UTF-8 -*-
__author__ = 'ACE'
import requests
from PyQt4.QtCore import *
from PyQt4.QtGui import *

class TimeOutModel(QThread):
Existed = pyqtSignal(bool)
TimeOut = pyqtSignal()

def __init__(self, fun, timeout=500, parent=None):
"""
@param fun: function or lambda
@param timeout: ms
"""
super(TimeOutModel, self).__init__(parent)
self.fun = fun

self.timeer = QTimer(self)
self.timeer.setInterval(timeout)
self.timeer.timeout.connect(self.time_timeout)
self.Existed.connect(self.timeer.stop)
self.timeer.start()

self.setTerminationEnabled(True)

def time_timeout(self):
self.timeer.stop()
self.TimeOut.emit()
self.quit()
self.terminate()

def run(self):
self.fun()

bb = lambda: requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip")

a = QApplication([])

z = TimeOutModel(bb, 500)
print 'timeout'

a.exec_()

相关讨论

尽管有关于请求的问题，但我发现这很容易用pycurl curlopt-timeout或curlopt-timeout-ms来处理。

不需要线程或信号：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

import pycurl
import StringIO

url = 'http://www.example.com/example.zip'
timeout_ms = 1000
raw = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.TIMEOUT_MS, timeout_ms) # total timeout in milliseconds
c.setopt(pycurl.WRITEFUNCTION, raw.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, url)
c.setopt(pycurl.HTTPGET, 1)
try:
c.perform()
except pycurl.error:
traceback.print_exc() # error generated on timeout
pass # or just pass if you don't want to print the error

timeout=(连接超时，数据读取超时)或给出单个参数(timeout=1)

1
2
3
4
5
6
7

import requests

try:
req = requests.request('GET', 'https://www.google.com',timeout=(1,1))
print(req)
except requests.ReadTimeout:
print("READ TIME OUT")

嗯，我在这个页面上尝试了很多解决方案，但仍然面临不稳定、随机挂起、连接性能差的问题。

我现在使用的是curl，我非常高兴它的"max-time"功能和全球性能，即使实现如此糟糕：

1	content=commands.getoutput('curl -m6 -Ss"http://mywebsite.xyz"')

在这里，我定义了一个最长6秒的时间参数，包括连接和传输时间。

我相信curl有一个很好的python绑定，如果你喜欢坚持python语法的话：)

设置stream=True，使用r.iter_content(1024)。是的，江户十一〔五〕不知何故不适合我。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

try:
start = time()
timeout = 5
with get(config['source']['online'], stream=True, timeout=timeout) as r:
r.raise_for_status()
content = bytes()
content_gen = r.iter_content(1024)
while True:
if time()-start > timeout:
raise TimeoutError('Time out! ({} seconds)'.format(timeout))
try:
content += next(content_gen)
except StopIteration:
break
data = content.decode().split('
')
if len(data) in [0, 1]:
raise ValueError('Bad requests data')
except (exceptions.RequestException, ValueError, IndexError, KeyboardInterrupt,
TimeoutError) as e:
print(e)
with open(config['source']['local']) as f:
data = [line.strip() for line in f.readlines()]

此处讨论https://redd.it/80kp1h

如果您使用的是选项stream=True，您可以这样做：

1
2
3
4
5
6
7
8
9
10
11
12

r = requests.get(
'http://url_to_large_file',
timeout=1, # relevant only for underlying socket
stream=True)

with open('/tmp/out_file.txt'), 'wb') as f:
start_time = time.time()
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
if time.time() - start_time > 8:
raise Exception('Request took longer than 8s')

解决方案不需要信号或多处理。

另一个解决方案(从http://docs.python requests.org/en/master/user/advanced/流式上载获得)

上传前，您可以找到内容大小：

1
2
3
4
5
6
7
8
9

TOO_LONG = 10*1024*1024 # 10 Mb
big_url ="http://ipv4.download.thinkbroadband.com/1GB.zip"
r = requests.get(big_url, stream=True)
print (r.headers['content-length'])
# 1073741824

if int(r.headers['content-length']) < TOO_LONG:
# upload content:
content = r.content

但是要小心，发送者可能会在"内容长度"响应字段中设置不正确的值。

有一个名为time out decorator的包，您可以使用它来超时任何python函数。

1
2
3
4
5
6

@timeout_decorator.timeout(5)
def mytest():
print("Start")
for i in range(1,10):
time.sleep(1)
print("{} seconds have passed".format(i))

它使用了一些答案建议的信号方法。或者，您可以告诉它使用多处理而不是信号(例如，如果您在多线程环境中)。

如果是这样，请创建一个看门狗线程，在10秒后将请求的内部状态弄乱，例如：

关闭底层套接字，理想情况下
如果请求重试操作，则触发异常

请注意，根据系统库的不同，您可能无法设置DNS解析的截止时间。

我提出了一个更直接的解决方案，这无疑是丑陋的，但解决了真正的问题。有点像这样：

1
2
3
4

resp = requests.get(some_url, stream=True)
resp.raw._fp.fp._sock.settimeout(read_timeout)
# This will load the entire response even though stream is set
content = resp.content

你可以在这里阅读完整的解释

相关讨论