关于beautifulsoup:Python 3.5 urllib.request 403 Forbidden Error

Python 3.5 urllib.request 403 Forbidden Error

1
2
3
4
5
6
7
8
9
10
import urllib.request
import urllib
from bs4 import BeautifulSoup


url ="https://www.brightscope.com/ratings"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page,"html.parser")

print(soup.title)

我试图去上面的网站,代码不断吐出403 Forbidden Error。

有任何想法吗?

C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\python.exe"C:/Users/jerem/PycharmProjects/webscraper/url scraper.py"
Traceback (most recent call last):
File"C:/Users/jerem/PycharmProjects/webscraper/url scraper.py", line 7, in
page = urllib.request.urlopen(url)
File"C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 163, in urlopen
return opener.open(url, data, timeout)
File"C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 472, in open
response = meth(req, response)
File"C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File"C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 510, in error
return self._call_chain(*args)
File"C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 444, in _call_chain
result = func(*args)
File"C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib
equest.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden


1
2
3
4
5
6
7
8
9
10
import requests
from bs4 import BeautifulSoup


url ="https://www.brightscope.com/ratings"
headers = {'User-Agent':'Mozilla/5.0'}
page = requests.get(url)
soup = BeautifulSoup(page.text,"html.parser")

print(soup.title)

出:

1
BrightScope Ratings

首先,使用requests而不是urllib

然后,将headers添加到requests,如果没有,该网站将禁止您,因为默认的User-Agent是爬虫,该网站不喜欢。