使用gzip编码以块的形式下载大文件（Python 3.4）

downloading a large file in chunks with gzip encoding (Python 3.4)

如果我请求一个文件并指定gzip的编码，我该如何处理？

通常，当我有一个大文件时，我会执行以下操作：

1
2
3
4
5

while True:
chunk = resp.read(CHUNK)
if not chunk: break
writer.write(chunk)
writer.flush()

其中块的大小以字节为单位，writer是一个open()对象，resp是从urllib请求生成的请求响应。

因此，在大多数情况下，当响应头包含作为返回编码的"gzip"时，我会执行以下操作：

1
2
3
4

decomp = zlib.decompressobj(16+zlib.MAX_WBITS)
data = decomp.decompress(resp.read())
writer.write(data)
writer.flush()

或者：

1 2	f = gzip.GzipFile(fileobj=buf) writer.write(f.read())

其中buf是bytesio()。

但是，如果我尝试解压缩gzip响应，我会遇到以下问题：

1
2
3
4
5
6
7

while True:
chunk = resp.read(CHUNK)
if not chunk: break
decomp = zlib.decompressobj(16+zlib.MAX_WBITS)
data = decomp.decompress(chunk)
writer.write(data)
writer.flush()

有没有一种方法可以在gzip数据分成小块时对其进行解压缩？还是需要将整个文件写入磁盘，解压缩，然后将其移动到最终文件名？使用32位Python时，我遇到的部分问题是，我可以消除内存不足的错误。

谢谢你

我想我找到了一个我想分享的解决方案。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

def _chunk(response, size=4096):
""" downloads a web response in pieces"""
method = response.headers.get("content-encoding")
if method =="gzip":
d = zlib.decompressobj(16+zlib.MAX_WBITS)
b = resp.read(size)
while b:
data = d.decompress(b)
yield data
b = resp.read(size)
del data
else:
while True:
chunk = response.read(size)
if not chunk: break
yield chunk

如果有人有更好的解决方案，请添加到其中。基本上我的错误是创建zlib.decompressobj()。我在错误的地方创造了它。

这似乎在python 2和3中都有效，所以有一个加号。