Escaped Unicode to Emoji in Python
我正在尝试将转义的Unicode转换为Emojis。
例:
1 2 3 4 5 6 | >>> emoji ="??" >>> emoji_text ="\\ud83d\\ude00" >>> print(emoji) ?? >>> print(emoji_text) \ud83d\ude00 |
而不是" ud83d ude00",我想打印?
我找到了一个可行但不实际的简单技巧:
1 2 3 | >>> import json >>> json.loads('"\\ud83d\\ude00"') '??' |
您的示例接近JSON的
请注意,
1 2 3 4 5 | >>> s ="\\ud83d\\ude00" >>> s = s.encode('ascii').decode('unicode-escape') >>> s '\ud83d\ude00' >>> print(s) # UnicodeEncodeError: surrogates not allowed |
以下代码将用其Unicode代码点替换Unicode代理。 如果您还有其他非代理Unicode转义符,它将也用其代码点替换它们。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | import re def process(m): '''process(m) -> Unicode code point m is a regular expression match object that has groups below: 1: high Unicode surrogate 4-digit hex code d800-dbff 2: low Unicode surrogate 4-digit hex code dc00-dfff 3: None OR 1: None 2: None 3: Unicode 4-digit hex code 0000-d700,e000-ffff ''' if m.group(3) is None: # Construct code point from UTF-16 surrogates hi = int(m.group(1),16) & 0x3FF lo = int(m.group(2),16) & 0x3FF cp = 0x10000 | hi << 10 | lo else: cp = int(m.group(3),16) return chr(cp) s ="Hello\\u9a6c\\u514b\\ud83d\\ude00" s = re.sub(r'\\u(d[89ab][0-9a-f]{2})\\u(d[cdef][0-9a-f]{2})|\\u([0-9a-f]{4})',process,s) print(s) |
输出:
1 | Hello马克?? |