What are the differences between utf8_general_ci and utf8_unicode_ci?
Possible Duplicate:
What's the difference between utf8_general_ci and utf8_unicode_ci
号
我有两个Unicode选项,对于MySQL数据库来说很有希望。
1 2 | utf8_general_ci unicode (multilingual), case-insensitive utf8_unicode_ci unicode (multilingual), case-insensitive |
你能解释一下utf8_-general_-ci和utf8_-unicode_-ci的区别吗?在设计数据库时,选择其中一个对另一个的影响是什么?
- 转换为Unicode规范化形式d进行规范化分解
- 删除任何组合字符
- 转换为大写
这在Unicode上不能正常工作,因为它不理解Unicode大小写。单是Unicode的大小写就比一个注重ASCII的方法要复杂得多。例如:
- 小写的"?"是"吗?"但是大写的呢?""是"ss"。
- 有两个小写的希腊符号,但只有一个大写的;考虑"是吗?"西格玛?"。
- 像这样的字母?不要分解为"o"加上音调符号,这意味着它不会正确排序。
还有许多其他的微妙之处。
EDOCX1[1]的成本是有点比
资料来源:http://forums.mysql.com/read.php?103187048188748消息-188748
从MySQL文档中的Unicode字符集:
For any Unicode character set, operations performed using the
_general_ci collation are faster than those for the_unicode_ci collation. For example, comparisons for theutf8_general_ci collation are faster, but slightly less correct, than comparisons forutf8_unicode_ci . The reason for this is thatutf8_unicode_ci supports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages"? " is equal to"ss ".utf8_unicode_ci also supports contractions and ignorable characters.utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.