如何比较python中的unicode和str

How to compare unicode and str in Python

我的代码：

1 2	a = '汉' b = u'汉'

这两个字是同一个汉字。但很明显，a == b就是False。我该怎么解决这个问题？注意，我不能将a转换为utf-8，因为我无法访问代码。我需要将b转换为a使用的编码。

所以，我的问题是，我该怎么把b的编码转换成a的编码呢？

相关讨论

如果您不知道a的编码，则需要：

检测a的编码

使用检测到的编码对b进行编码

首先，为了检测a的编码，我们使用chardet。

1	$ pip install chardet

现在让我们使用它：

1
2
3
4

>>> import chardet
>>> a = '汉'
>>> chardet.detect(a)
{'confidence': 0.505, 'encoding': 'utf-8'}

因此，要实际完成您的要求：

1
2
3
4
5

>>> encoding = chardet.detect(a)['encoding']
>>> b = u'汉'
>>> b_encoded = b.encode(encoding)
>>> a == b_encoded
True

使用str.decode解码编码字符串a：

1
2
3
4

>>> a = '汉'
>>> b = u'汉'
>>> a.decode('utf-8') == b
True

注：根据源代码编码替换utf-8。