关于python：为什么request.get()不返回？

Why doesn't requests.get() return? What is the default timeout that requests.get() uses?

在我的脚本中，requests.get从不返回：

1
2
3
4
5
6
7
8
9
10
11

import requests

print ("requesting..")

# This call never returns!
r = requests.get(
"http://www.justdial.com",
proxies = {'http': '222.255.169.74:8080'},
)

print(r.ok)

可能的原因是什么？有什么补救办法吗？get使用的默认超时是什么？

相关讨论

What is the default timeout that get uses?

默认超时是None，这意味着它将等待(挂起)直到连接关闭。

当您传入超时值时会发生什么？

1
2
3
4
5

r = requests.get(
'http://www.justdial.com',
proxies={'http': '222.255.169.74:8080'},
timeout=5
)

相关讨论

来自请求文档：

You can tell Requests to stop waiting for a response after a given
number of seconds with the timeout parameter:

1
2
3
4
>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
File"<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

Note:

timeout is not a time limit on the entire response download; rather,
an exception is raised if the server has not issued a response for
timeout seconds (more precisely, if no bytes have been received on the
underlying socket for timeout seconds).

即使timeout是1秒，requests.get()返回的时间也很长。有几种方法可以克服这个问题：

1。使用TimeoutSauce内部类

发件人：https://github.com/kennethreitz/requests/issues/1928 issuecomment-35811896

1
2
3
4
5
6
7
8
9
10
11
import requests from requests.adapters import TimeoutSauce

class MyTimeout(TimeoutSauce):
def __init__(self, *args, **kwargs):
if kwargs['connect'] is None:
kwargs['connect'] = 5
if kwargs['read'] is None:
kwargs['read'] = 5
super(MyTimeout, self).__init__(*args, **kwargs)

requests.adapters.TimeoutSauce = MyTimeout

This code should cause us to set the read timeout as equal to the
connect timeout, which is the timeout value you pass on your
Session.get() call. (Note that I haven't actually tested this code, so
it may need some quick debugging, I just wrote it straight into the
GitHub window.)

2。使用来自kevinburke的分叉请求：https://github.com/kevinburke/requests/tree/connect-timeout

文档：https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst

If you specify a single value for the timeout, like this:

1
r = requests.get('https://github.com', timeout=5)

The timeout value will be applied to both the connect and the read
timeouts. Specify a tuple if you would like to set the values
separately:

1
r = requests.get('https://github.com', timeout=(3.05, 27))

注意：更改已经合并到主请求项目中。

三。使用类似问题中已提到的evenlet或signal：python请求超时。获取整个响应

相关讨论

回顾了所有的答案，得出问题仍然存在的结论。在某些站点上，请求可能会无限地挂起，并且使用多处理似乎有些杀伤力。下面是我的方法(python 3.5+)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

import asyncio

import aiohttp

async def get_http(url):
async with aiohttp.ClientSession(conn_timeout=1, read_timeout=3) as client:
try:
async with client.get(url) as response:
content = await response.text()
return content, response.status
except Exception:
pass

loop = asyncio.get_event_loop()
task = loop.create_task(get_http('http://example.com'))
loop.run_until_complete(task)
result = task.result()
if result is not None:
content, status = task.result()
if status == 200:
print(content)

相关讨论

我想要一个默认的超时很容易添加到一堆代码中(假设超时可以解决您的问题)

这是我从提交到存储库请求的票据中获得的解决方案。

信用证：https://github.com/kennethreitz/requests/issues/2011 issuecomment-477784399

解决方案是这里的最后几行，但是为了更好的上下文，我显示了更多的代码。我喜欢使用会话进行重试行为。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

import requests
import functools
from requests.adapters import HTTPAdapter,Retry

def requests_retry_session(
retries=10,
backoff_factor=2,
status_forcelist=(500, 502, 503, 504),
session=None,
) -> requests.Session:
session = session or requests.Session()
retry = Retry(
total=retries,
read=retries,
connect=retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
# set default timeout
for method in ('get', 'options', 'head', 'post', 'put', 'patch', 'delete'):
setattr(session, method, functools.partial(getattr(session, method), timeout=30))
return session

然后你可以这样做：

1 2	requests_session = requests_retry_session() r = requests_session.get(url=url,...