Python request.get fails to get an answer for a url I can open on my browser
我正在学习如何使用python请求(Python 3),我正在尝试创建一个简单的requests.get来从多个网站获取HTML代码。 虽然它适用于大多数人,但有一个我遇到了麻烦。
当我打电话:http://es.rs-online.com/一切正常:
1 2 3 4 | In [1]: import requests ...:html = requests.get("http://es.rs-online.com/") In [2]:html Out[2]: <Response [200]> |
但是,当我使用http://es.farnell.com/进行尝试时,python无法解决该地址并永远继续处理它。 如果我设置超时,无论多长时间,
1 2 3 | import requests headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'} html = requests.get("http://es.farnell.com/",headers=headers, timeout=5, allow_redirects = True ) |
5秒后,我收到预期的超时通知。
1 | ReadTimeout: HTTPConnectionPool(host='es.farnell.com', port=80): Read timed out. (read timeout=5) |
有谁知道可能是什么问题?
问题出在你的标题中。 请记住,当您发送的标题内容时,某些网站比其他网站更宽松。 要解决此问题,您应该将当前标头替换为:
1 | headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36',"Upgrade-Insecure-Requests":"1","DNT":"1","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language":"en-US,en;q=0.5","Accept-Encoding":"gzip, deflate"} |
我还建议您将get请求发送到
总而言之,您的代码应如下所示:
1 2 3 4 5 | import requests headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36',"Upgrade-Insecure-Requests":"1","DNT":"1","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language":"en-US,en;q=0.5","Accept-Encoding":"gzip, deflate"} html = requests.get("https://es.farnell.com",headers=headers) |
希望这可以帮助。