urllib2.quote does not work properly
我正在尝试获取包含变音符号的页面的HTML(í,?...)。 问题是
就我而言,引用应该将包含变音符号的url转换为正确的url。
这是一个例子:
1 2 3 4 5 | url = 'http://www.example.com/vydavatelství/' print urllib2.quote(url) >> http%3A//www.example.com/vydavatelstv%C3%AD/ |
问题是它出于某种原因改变了
response = urllib2.urlopen(req)
File"C:\Python27\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File"C:\Python27\lib\urllib2.py", line 437, in open
response = meth(req, response)
File"C:\Python27\lib\urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File"C:\Python27\lib\urllib2.py", line 475, in error
return self._call_chain(*args)
File"C:\Python27\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File"C:\Python27\lib\urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request
- TL; DR -
两件事情。 首先确保你在python脚本的顶部包含你的shebang
1 2 3 | url = 'http://www.example.com/vydavatelství/' urllib2.quote(url,':/') >>> http://www.example.com/vydavatelstv%C3%AD/ |
- 再多一点 -
所以这里的第一个问题是urllib2的文档很差。 关闭Kamal提供的链接,我没有在文档中提及
话虽如此,让我解释一下。
1 2 3 4 5 | urllib.parse.quote(string, safe='/', encoding=None, errors=None) ## string: string your trying to encode ## safe: string contain characters to ignore. Defualt is '/' ## encoding: type of encoding url is in. Default is utf-8 ## errors: specifies how errors are handled. Default is 'strict' which throws a UnicodeEncodeError, I think. |