What is the difference between utf8mb4 and utf8 charsets in MySQL?
mysql中
我已经知道了ascii、utf-8、utf-16和utf-32编码;但是我很想知道
使用
UTF-8是一种可变长度编码。在UTF-8的情况下,这意味着存储一个代码点需要一到四个字节。但是,MySQL的编码"utf8"(别名"utf8mb3")每个代码点最多只能存储三个字节。
因此,字符集"utf8"/"utf8mb3"不能存储所有的Unicode码位:它只支持范围0x000到0xffff,称为"基本多语言平面"。另请参见Unicode编码的比较。
这就是MySQL文档(上一版本的同一页面)必须要说的:
The character set named utf8[/utf8mb3] uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:
For a BMP character, utf8[/utf8mb3] and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.
For a supplementary character, utf8[/utf8mb3] cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8[/utf8mb3] cannot store the character at all, you do not have any supplementary characters in utf8[/utf8mb3] columns and you need not worry about converting characters or losing data when upgrading utf8[/utf8mb3] data from older versions of MySQL.
因此,如果希望您的列支持存储BMP之外的字符(并且您通常希望这样做),例如emoji,请使用"utf8mb4"。另请参见实际使用中最常见的非BMP Unicode字符是什么?.
一本关于MathiasBynens如何在MySQL数据库中支持完整Unicode的好书也可以为这方面提供一些帮助。
摘自MySQL8.0参考手册:
utf8mb4 : A UTF-8 encoding of the Unicode character set using one to
four bytes per character.
utf8mb3 : A UTF-8 encoding of the Unicode character set using one to
three bytes per character.
在mysql中,
因此,不管这个别名是什么,您都可以有意识地为自己设置一个