在Python gzip.open()中设置’encoding’似乎不起作用

Setting up 'encoding' in Python's gzip.open() doesn't seem to work

即使我试图在python的gzip.open()中指定编码,它似乎总是使用cp1252.py来编码文件的内容。我的代码:

1
2
with gzip.open('file.gz', 'rt', 'cp1250') as f:
    content = f.read()

回应:

File"C:\Python34\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 52893: character maps to undefined


Python 3 x

gzip.open定义为:

gzip.open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None)

因此,gzip.open('file.gz', 'rt', 'cp1250')向其发送了以下论据:-文件名='file.gz'-模式="RT"-compressLevel='cp1250'

这显然是错误的,因为目的是使用"cp1250"编码。encoding参数可以作为第四个位置参数或关键字参数发送:

1
2
3
gzip.open('file.gz', 'rt', 5, 'cp1250')  # 4th positional argument

gzip.open('file.gz', 'rt', encoding='cp1250') # keyword argument

Python 2 x

python 2版本的gzip.open不接受encoding参数,也不接受文本模式,因此在读取数据后必须明确解码:

1
2
3
4
with gzip.open('file.gz', 'rb') as f:
    data = f.read()

decoded_data = data.decode('cp1250')