Python 3 urllib HTTP Error 412: Precondition Failed
我正在尝试解析网站的HTML数据。 我写了这段代码:
1 2 3 4 5 6 7 8 9 10 | import urllib.request def parse(url): response = urllib.request.urlopen(url) html = response.read() strHTML = html.decode() return strHTML website ="http://www.manarat.ac.bd/" string = parse(website) |
但它显示此错误:
Traceback (most recent call last):
File"C:\Users\pupewekate\Videos
AW\2.py", line 11, in
string = parse(website)
File"C:\Users\pupewekate\Videos
AW\2.py", line 5, in parse
response = urllib.request.urlopen(url)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib
equest.py",
line 223, in urlopen return opener.open(url, data, timeout)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib
equest.py",
line 532, in open response = meth(req, response)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib
equest.py",
line 642, in http_response 'http', request, response, code, msg,
hdrs)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib
equest.py",
line 570, in error return > self._call_chain(*args)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib
equest.py",
line 504, in _call_chain result = func(*args)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib
equest.py",
line 650, in http_error_default raise HTTPError(req.full_url, code,
msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 412: Precondition
Failed
有解决方案吗
您可以使用请求模块,因为它更容易实现,否则如果您决定使用urllib,您可以使用:
1 2 3 4 5 6 7 8 9 | import urllib def parse(url): headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'} response = urllib.request.urlopen(url,headers=headers) print response website ="http://www.manarat.ac.bd/" string = parse(website) |
该网站检查用户代理标头。 如果它无法识别其值,则返回状态代码412:
1 2 3 4 5 6 7 | import requests print(requests.get('http://www.manarat.ac.bd/')) # <Response [412]> print(requests.get('http://www.manarat.ac.bd/', headers={'User-Agent': 'Chrome'})) # <Response [200]> |
有关如何在urlib中设置用户代理,请参阅此答案。