Convert gzipped data fetched by urllib2 to HTML
我目前使用mechanize来读取gzip压缩的网页,如下所示:
1 2 3 4 | br = mechanize.Browser() br.set_handle_gzip(True) response = br.open(url) data = response.read() |
我想知道如何将urllib2提取的gzip压缩数据解压缩为HTML文本?
1 2 3 4 5 6 | req = urllib2.Request(url) opener = urllib2.build_opener() response = opener.open(req) data = response.read() if response.info()['content-encoding'] == 'gzip': HOW TO DECOMPRESS DATA TO HTML |
试试这个:
1 2 3 4 5 | import StringIO data = StringIO.StringIO(data) import gzip gzipper = gzip.GzipFile(fileobj=data) html = gzipper.read() |
1 2 3 4 5 6 7 8 9 10 11 | def ungzip(r,b): headers = r.info() if ('Content-Encoding' in headers.keys() and headers['Content-Encoding']=='gzip') or \ ('content-encoding' in headers.keys() and headers['content-encoding']=='gzip'): import gzip gz = gzip.GzipFile(fileobj=r, mode='rb') html = gz.read() gz.close() headers['Content-type'] = 'text/html; charset=utf-8' r.set_data(html) b.set_response(r) |