关于编码：关于Joel Spolsky的Unicode文章的澄清

encodingunicodeutf-8

Clarification on Joel Spolsky's Unicode article

我正在读乔尔·斯波斯基的流行Unicode文章，有一个例子我不明白。

"hex min，hex max"是什么意思？这些值代表什么？最小值和最大值是多少？

二进制只能有1或0。为什么我在这里看到成吨的字母"V"？

http://www.joelonsoftware.com/articles/unicode.html enter image description here

相关讨论

十六进制最小/最大值定义了Unicode字符的范围(通常用十六进制的Unicode数字表示)。

v指的是原数的位。

所以第一句话是：

The unicode characters in the range 0 (hex 00) to 127 (hex 7F) (a 7
bit number) are represented by a 1 byte bit string starting with '0'
followed by all 7 bits of the unicode number.

第二句话是：

The unicode numbers in the range 128 (hex 0800) to 2047 (07FF) (an 11
bit number) are represented by a 2 byte bit string where the first
byte starts with '110' followed by the first 5 of the 11 bits, and the
second byte starts with '10' followed by the remaining 6 of the 11 bits

等

希望有意义

相关讨论

请注意，Joel文章中的表包含了Unicode中不存在也永远不会存在的代码点。事实上，UTF-8永远不需要超过4个字节，尽管基础上的UTF-8方案可以进一步扩展，如图所示。

该表的一个更细微的版本可以在"如何知道每个字符使用多少字节"中找到？它指出了一些差距。例如，字节0xc0、0xc1和0xf5..0xff不能以有效的utf-8显示。您还可以在非常好的坏的UTF-8示例测试数据中看到有关无效UTF-8的信息。

在您显示的表中，hex-min和hex-max值是最小和最大u+wxyz值，可以使用"二进制字节序列"列中的字节数来表示。请注意，Unicode中的最大代码点是U+10ffff(它被定义/保留为非字符)。这是使用仅使用4个字节(两个UTF-16码位)的UTF-16替代编码方案可以表示的最大值。