关于html:Python 3 urllib HTTP错误412:前置条件失败

Python 3 urllib HTTP Error 412: Precondition Failed

我正在尝试解析网站的HTML数据。 我写了这段代码:

1
2
3
4
5
6
7
8
9
10
import urllib.request

def parse(url):
    response = urllib.request.urlopen(url)
    html = response.read()
    strHTML = html.decode()
    return strHTML

website ="http://www.manarat.ac.bd/"
string = parse(website)

但它显示此错误:

Traceback (most recent call last):
File"C:\Users\pupewekate\Videos
AW\2.py", line 11, in
string = parse(website)
File"C:\Users\pupewekate\Videos
AW\2.py", line 5, in parse
response = urllib.request.urlopen(url)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib
equest.py",
line 223, in urlopen return opener.open(url, data, timeout)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib
equest.py",
line 532, in open response = meth(req, response)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib
equest.py",
line 642, in http_response 'http', request, response, code, msg,
hdrs)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib
equest.py",
line 570, in error return > self._call_chain(*args)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib
equest.py",
line 504, in _call_chain result = func(*args)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib
equest.py",
line 650, in http_error_default raise HTTPError(req.full_url, code,
msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 412: Precondition
Failed

有解决方案吗


您可以使用请求模块,因为它更容易实现,否则如果您决定使用urllib,您可以使用:

1
2
3
4
5
6
7
8
9
import urllib

def parse(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
    response = urllib.request.urlopen(url,headers=headers)
    print response

website ="http://www.manarat.ac.bd/"
string = parse(website)

该网站检查用户代理标头。 如果它无法识别其值,则返回状态代码412:

1
2
3
4
5
6
7
import requests

print(requests.get('http://www.manarat.ac.bd/'))
# <Response [412]>

print(requests.get('http://www.manarat.ac.bd/', headers={'User-Agent': 'Chrome'}))
# <Response [200]>

有关如何在urlib中设置用户代理,请参阅此答案。