在Python中将Unicode转义为Emoji

Escaped Unicode to Emoji in Python

我正在尝试将转义的Unicode转换为Emojis。

例：

1
2
3
4
5
6

>>> emoji ="??"
>>> emoji_text ="\\ud83d\\ude00"
>>> print(emoji)
??
>>> print(emoji_text)
\ud83d\ude00

而不是" ud83d ude00"，我想打印？

我找到了一个可行但不实际的简单技巧：

1
2
3

>>> import json
>>> json.loads('"\\ud83d\\ude00"')
'??'

相关讨论

您的示例接近JSON的ensure_ascii=True字符串输出，除了需要在字符串中使用双引号之外。它包含U + FFFF以上的Unicode字符的Unicode转义的高/低替代。

请注意，unicode-escape编解码器不能用于转换。它将创建带有替代字符的Unicode字符串，这是非法的。您将无法打印或编码字符串以进行序列化。

1
2
3
4
5

>>> s ="\\ud83d\\ude00"
>>> s = s.encode('ascii').decode('unicode-escape')
>>> s
'\ud83d\ude00'
>>> print(s) # UnicodeEncodeError: surrogates not allowed

以下代码将用其Unicode代码点替换Unicode代理。如果您还有其他非代理Unicode转义符，它将也用其代码点替换它们。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

import re

def process(m):
'''process(m) -> Unicode code point

m is a regular expression match object that has groups below:
1: high Unicode surrogate 4-digit hex code d800-dbff
2: low Unicode surrogate 4-digit hex code dc00-dfff
3: None
OR
1: None
2: None
3: Unicode 4-digit hex code 0000-d700,e000-ffff
'''
if m.group(3) is None:
# Construct code point from UTF-16 surrogates
hi = int(m.group(1),16) & 0x3FF
lo = int(m.group(2),16) & 0x3FF
cp = 0x10000 | hi << 10 | lo
else:
cp = int(m.group(3),16)
return chr(cp)

s ="Hello\\u9a6c\\u514b\\ud83d\\ude00"
s = re.sub(r'\\u(d[89ab][0-9a-f]{2})\\u(d[cdef][0-9a-f]{2})|\\u([0-9a-f]{4})',process,s)
print(s)

输出：

Hello马克??