Trying to get the html from a website
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | def main: with open(sourcefile, 'r', encoding='utf-8') as main_file: for line in main_file: htmlcontent = reader(line) def reader(line): with urllib.request.urlopen(line) as url_file: try: url_file.read().decode('UTF-8') except urllib.error.URLError as url_err: print('Error opening url: ', url, url_err) except UnicodeDecodeError as decode_err: print('Error decoding url: ', url, decode_err) return url_file |
大家好,我对python很陌生,我有一个关于从网站上读取HTML代码的问题。所以我使用的是如图所示的正则表达式,我试图简单地从一个网站返回HTML代码。变量
最好使用请求模块。一行代码
1 2 3 | import requests html = requests.get("www.domain.tld").text |
这样可以将网站内容保存在
1 2 3 4 5 6 7 8 9 | import urllib url ="www.domain.tld" seed_url = urllib.urlopen(url) html_content = seed_url.read() seed_url.close() print(html_content) |