Download file from web in Python 3
我正在创建一个程序,它将通过读取在同一游戏/应用程序的.jad文件中指定的URL来从Web服务器下载.jar(Java)文件。我使用的是python 3.2.1
我已经设法从jad文件中提取jar文件的URL(每个jad文件都包含jar文件的URL),但是正如您所想象的,提取的值是type()字符串。
相关功能如下:
1 2 3 4 5 6 7 | def downloadFile(URL=None): import httplib2 h = httplib2.Http(".cache") resp, content = h.request(URL,"GET") return content downloadFile(URL_from_file) |
但是,我总是得到一个错误,说上面函数中的类型必须是字节,而不是字符串。我尝试过使用url.encode("utf-8")和bytes(url,encoding='utf-8'),但我总是会得到相同或类似的错误。
所以基本上我的问题是,当URL存储在字符串类型中时,如何从服务器下载文件?
如果要将网页内容转换为变量,只需
1 2 3 4 5 6 | import urllib.request ... url = 'http://example.com/' response = urllib.request.urlopen(url) data = response.read() # a `bytes` object text = data.decode('utf-8') # a `str`; this step can't be used if data is binary |
下载和保存文件的最简单方法是使用
1 2 3 4 | import urllib.request ... # Download the file from `url` and save it locally under `file_name`: urllib.request.urlretrieve(url, file_name) |
1 2 3 4 5 | import urllib.request ... # Download the file from `url`, save it in a temporary directory and get the # path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable: file_name, headers = urllib.request.urlretrieve(url) |
但请记住,
因此,最正确的方法是使用
1 2 3 4 5 6 | import urllib.request import shutil ... # Download the file from `url` and save it locally under `file_name`: with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: shutil.copyfileobj(response, out_file) |
如果这看起来太复杂,您可能需要更简单地将整个下载内容存储在一个
1 2 3 4 5 6 | import urllib.request ... # Download the file from `url` and save it locally under `file_name`: with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: data = response.read() # a `bytes` object out_file.write(data) |
可以实时提取
1 2 3 4 5 6 7 8 9 | import urllib.request import gzip ... # Read the first 64 bytes of the file inside the .gz archive located at `url` url = 'http://example.com/something.gz' with urllib.request.urlopen(url) as response: with gzip.GzipFile(fileobj=response) as uncompressed: file_header = uncompressed.read(64) # a `bytes` object # Or do anything shown above using `uncompressed` instead of `response`. |
每当我需要与HTTP请求相关的东西时,我都会使用
首先,安装
1 | $ pip install requests |
然后代码:
1 2 3 4 5 6 7 8 9 10 | from requests import get # to make GET request def download(url, file_name): # open in binary mode with open(file_name,"wb") as file: # get request response = get(url) # write to file file.write(response.content) |
我希望我理解正确的问题,即:当URL存储在字符串类型中时,如何从服务器下载文件?
我下载文件并使用以下代码在本地保存:
1 2 3 4 5 6 7 8 9 | import requests url = 'https://www.python.org/static/img/python-logo.png' fileName = 'D:\Python\dwnldPythonLogo.png' req = requests.get(url) file = open(fileName, 'wb') for chunk in req.iter_content(100000): file.write(chunk) file.close() |
这里我们可以在python3中使用urlib的遗留接口:
以下函数和类是从python 2模块urlib(而不是urlib2)移植的。在将来的某个时候,它们可能会被弃用。
示例(2行代码):
1 2 3 4 | import urllib.request url = 'https://www.python.org/static/img/python-logo.png' urllib.request.urlretrieve(url,"logo.png") |
您可以使用wget,这是流行的下载shell工具。https://pypi.python.org/pypi/wget这是最简单的方法,因为它不需要打开目标文件。下面是一个例子。
10是的,明确地说,请求是用于与HTTP请求相关的东西的很好的包。但是我们需要注意输入数据的编码类型,下面是一个解释差异的示例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | from requests import get # case when the response is byte array url = 'some_image_url' response = get(url) with open('output', 'wb') as file: file.write(response.content) # case when the response is text # Here unlikely if the reponse content is of type **iso-8859-1** we will have to override the response encoding url = 'some_page_url' response = get(url) # override encoding by real educated guess as provided by chardet r.encoding = r.apparent_encoding with open('output', 'w', encoding='utf-8') as file: file.write(response.content) |
1 2 3 4 5 6 7 8 9 10 11 12 | from urllib import request def get(url): with request.urlopen(url) as r: return r.read() def download(url, file=None): if not file: file = url.split('/')[-1] with open(file, 'wb') as f: f.write(get(url)) |