关于beautifulsoup:从PythonAnywhere刮痧

Scraping from PythonAnywhere

我在PythonAnywhere上有一个免费帐户,我试图运行以下脚本,本地工作正常。

我想知道我得到的错误是出于技术原因还是只是PythonAnywhere禁止人们仅从某些网站的平台中删除?

你知道其他免费网站我可以废弃任何东西吗?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import requests
from bs4 import BeautifulSoup as bs

def scrapMarketwatch(address):
    #creating formatting data from scrapdata
    r = requests.get(address)
    c = r.content
    sup = bs(c,"html.parser")
    print(sup)


scrapMarketwatch('http://www.marketwatch.com/investing/future/sp%20500%20futures')

print('


 PARAGRAPH
 SPACE


')

scrapMarketwatch('https://www.bloomberg.com/quote/USDJPY:CUR')

我收到以下错误:

File
"/usr/local/lib/python3.6/dist-packages/requests/packages/urllib3/util/retry.py",
line 376, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause)) requests.packages.urllib3.exceptions.MaxRetryError:
HTTPSConnectionPool(host='www.bloomberg.com', port=443): Max retries
exceeded with url: /quote/USDJPY:CUR (Caused by ProxyError('Cannot
conn ect to proxy.', OSError('Tunnel connection failed: 403
Forbidden',))) During handling of the above exception, another
exception occurred: Traceback (most recent call last): File
"/home/sylvester83/scrapit/try2.py", line 20, in
scrapMarketwatch('https://www.bloomberg.com/quote/USDJPY:CUR') File"/home/sylvester83/scrapit/try2.py", line 10, in scrapMarketwatch
r = requests.get(address) File"/usr/local/lib/python3.6/dist-packages/requests/api.py", line 70, in
get
return request('get', url, params=params, **kwargs) File"/usr/local/lib/python3.6/dist-packages/requests/api.py", line 56, in
request
return session.request(method=method, url=url, **kwargs) File"/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line
488, in request
resp = self.send(prep, **send_kwargs) File"/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line
609, in send
r = adapter.send(request, **kwargs) File"/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line
485, in send
raise ProxyError(e, request=request) requests.exceptions.ProxyError:
HTTPSConnectionPool(host='www.bloomberg.com', port=443): Max retries
exceeded with url: /quote/USDJPY:CUR (Caused by ProxyError('Cannot
connect to proxy.', OSEr ror('Tunnel connection failed: 403
Forbidden',)))


PythonAnywhere免费帐户只允许访问其白名单中的外部网站。 这些允许的站点提供机器API。 您可以要求添加其他网站,但如果您要删除它们则不会。