Why doesn't requests.get() return? What is the default timeout that requests.get() uses?
在我的脚本中,
1 2 3 4 5 6 7 8 9 10 11 | import requests print ("requesting..") # This call never returns! r = requests.get( "http://www.justdial.com", proxies = {'http': '222.255.169.74:8080'}, ) print(r.ok) |
可能的原因是什么?有什么补救办法吗?
What is the default timeout that get uses?
默认超时是
当您传入超时值时会发生什么?
1 2 3 4 5 | r = requests.get( 'http://www.justdial.com', proxies={'http': '222.255.169.74:8080'}, timeout=5 ) |
来自请求文档:
You can tell Requests to stop waiting for a response after a given
number of seconds with the timeout parameter:
1
2
3
4 >>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
File"<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)Note:
timeout is not a time limit on the entire response download; rather,
an exception is raised if the server has not issued a response for
timeout seconds (more precisely, if no bytes have been received on the
underlying socket for timeout seconds).
即使
1。使用
发件人:https://github.com/kennethreitz/requests/issues/1928 issuecomment-35811896
1
2
3
4
5
6
7
8
9
10
11 import requests from requests.adapters import TimeoutSauce
class MyTimeout(TimeoutSauce):
def __init__(self, *args, **kwargs):
if kwargs['connect'] is None:
kwargs['connect'] = 5
if kwargs['read'] is None:
kwargs['read'] = 5
super(MyTimeout, self).__init__(*args, **kwargs)
requests.adapters.TimeoutSauce = MyTimeoutThis code should cause us to set the read timeout as equal to the
connect timeout, which is the timeout value you pass on your
Session.get() call. (Note that I haven't actually tested this code, so
it may need some quick debugging, I just wrote it straight into the
GitHub window.)
2。使用来自kevinburke的分叉请求:https://github.com/kevinburke/requests/tree/connect-timeout
文档:https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst
If you specify a single value for the timeout, like this:
1 r = requests.get('https://github.com', timeout=5)The timeout value will be applied to both the connect and the read
timeouts. Specify a tuple if you would like to set the values
separately:
1 r = requests.get('https://github.com', timeout=(3.05, 27))
注意:更改已经合并到主请求项目中。
三。使用类似问题中已提到的
回顾了所有的答案,得出问题仍然存在的结论。在某些站点上,请求可能会无限地挂起,并且使用多处理似乎有些杀伤力。下面是我的方法(python 3.5+):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | import asyncio import aiohttp async def get_http(url): async with aiohttp.ClientSession(conn_timeout=1, read_timeout=3) as client: try: async with client.get(url) as response: content = await response.text() return content, response.status except Exception: pass loop = asyncio.get_event_loop() task = loop.create_task(get_http('http://example.com')) loop.run_until_complete(task) result = task.result() if result is not None: content, status = task.result() if status == 200: print(content) |
我想要一个默认的超时很容易添加到一堆代码中(假设超时可以解决您的问题)
这是我从提交到存储库请求的票据中获得的解决方案。
信用证:https://github.com/kennethreitz/requests/issues/2011 issuecomment-477784399
解决方案是这里的最后几行,但是为了更好的上下文,我显示了更多的代码。我喜欢使用会话进行重试行为。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | import requests import functools from requests.adapters import HTTPAdapter,Retry def requests_retry_session( retries=10, backoff_factor=2, status_forcelist=(500, 502, 503, 504), session=None, ) -> requests.Session: session = session or requests.Session() retry = Retry( total=retries, read=retries, connect=retries, backoff_factor=backoff_factor, status_forcelist=status_forcelist, ) adapter = HTTPAdapter(max_retries=retry) session.mount('http://', adapter) session.mount('https://', adapter) # set default timeout for method in ('get', 'options', 'head', 'post', 'put', 'patch', 'delete'): setattr(session, method, functools.partial(getattr(session, method), timeout=30)) return session |
然后你可以这样做:
1 2 | requests_session = requests_retry_session() r = requests_session.get(url=url,... |