关于javascript：在什么JS引擎中，toLowerCase和toUpperCase敏感？

In what JS engines, specifically, are toLowerCase & toUpperCase locale-sensitive?

在一些库的代码中(例如AngularJS，链接指向代码中的特定行)，我可以看到使用自定义大小写转换函数而不是标准的函数。假设在土耳其语言环境的浏览器中，标准函数不能按预期工作，这是合理的：

1 2	console.log("SCRIPT".toLowerCase()); //"scr?pt" console.log("script".toUpperCase()); //"SCR?PT"

但这是真的还是曾经的事？浏览器真的是这样吗？如果是，他们中的哪一个做的？那么node.js呢？其他JS引擎？

toLocaleLowerCase和toLocaleUpperCase方法的存在意味着toLowerCase和toUpperCase是区域不变的，不是吗？

具体来说，对于什么浏览器，Angular团队会保留代码：if ('i' !== 'I'.toLowerCase())...？

如果您的浏览器(设备)使用土耳其语或阿塞拜疆地区，请运行此代码段，如果您发现问题确实存在，请给我写信。

1
2
3
4
5
6
7
8

if ('i' !== 'I'.toLowerCase()) {
document.write('Ooops! toLowerCase is locale-sensitive in your browser. ' +
'Please write your user-agent in the comments to this question: ' +
navigator.userAgent);
} else {
document.write('toLowerCase isn\'t locale-sensitive in your browser. ' +
'Everything works as expected!');
}

1	<html lang="tr">

相关讨论

注意：请注意，我不能测试它！

根据ECMAScript规范：

String.prototype.toLowerCase ( )

[...]

For the purposes of this operation, the 16-bit code units of the
Strings are treated as code points in the Unicode Basic Multilingual
Plane. Surrogate code points are directly transferred from S to L
without any mapping.

The result must be derived according to the case mappings in the
Unicode character database (this explicitly includes not only the
UnicodeData.txt file, but also the SpecialCasings.txt file that
accompanies it in Unicode 2.1.8 and later).

[...]

String.prototype.toLocaleLowerCase ( )

This function works exactly the same as toLowerCase except that its
result is intended to yield the correct result for the host
environment’s current locale, rather than a locale-independent result.
There will only be a difference in the few cases (such as Turkish)
where the rules for that language conflict with the regular Unicode
case mappings.

[...]

根据Unicode字符数据库特殊大小写：

[...]

Format

The entries in this file are in the following machine-readable format:

; ; ; (;)? #

无条件映射
[…]

Preserve canonical equivalence for I with dot. Turkic is handled
below.

0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE
[…]

Language-Sensitive Mappings
These are characters whose full case mappings depend on language and perhaps also
context (which characters come before or after). For more information
see the header of this file and the Unicode Standard.

立陶宛人

Lithuanian retains the dot in a lowercase i when followed by accents.

Remove DOT ABOVE after"i" with upper or titlecase

0307; 0307; ; ; lt After_Soft_Dotted; # COMBINING DOT ABOVE

Introduce an explicit dot above when lowercasing capital I's and J's
whenever there are more accents above.
(of the accents used in Lithuanian: grave, acute, tilde above, and ogonek)

0049; 0069 0307; 0049; 0049; lt More_Above; # LATIN CAPITAL LETTER I
004A; 006A 0307; 004A; 004A; lt More_Above; # LATIN CAPITAL LETTER J
012E; 012F 0307; 012E; 012E; lt More_Above; # LATIN CAPITAL LETTER I WITH OGONEK
00CC; 0069 0307 0300; 00CC; 00CC; lt; # LATIN CAPITAL LETTER I WITH GRAVE
00CD; 0069 0307 0301; 00CD; 00CD; lt; # LATIN CAPITAL LETTER I WITH ACUTE
0128; 0069 0307 0303; 0128; 0128; lt; #LATIN CAPITAL LETTER I WITH TILDE
土耳其语和阿塞拜疆语

I and i-dotless; I-dot and i are case pairs in Turkish and Azeri
The following rules handle those cases.

0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE

When lowercasing, remove dot_above in the sequence I + dot_above, which will turn into i.
This matches the behavior of the canonically equivalent I-dot_above

0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE
0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE

When lowercasing, unless an I is before a dot_above, it turns into a dotless i.

0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I
0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I

When uppercasing, i turns into a dotted capital I

0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I
0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I

Note: the following case is already in the UnicodeData.txt file.

0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I

EOF

< /块引用>
另外，根据绝对初学者的javascript(作者：terry mcnavage)：

1
2
3
4
>"I".toLowerCase() //"i"
>"i".toUpperCase() //"I"
>"I".toLocaleLowerCase() //"<dotless-i>"
>"i".toLocaleUpperCase() //"<dotted-I>"

Note: toLocaleLowerCase() and toLocaleUpperCase() convert case based on your OS settings. You'd have to change those settings to Turkish for the previous sample to work. Or just take my word for it!

根据Bobine关于将javascript字符串转换为小写的评论？问题：

Accept-Language and navigator.language are two completely separate
settings. Accept-Language reflects the user's chosen preferences for
what languages they want to receive in web pages (and this setting is
unfortuately inaccessible to JS). navigator.language merely reflects
which localisation of the web browser was installed, and should
generally not be used for anything. Both of these values are unrelated
to the system locale, which is the bit that decides what
toLocaleLowerCase() will do; that's an OS-level setting out of scope
of the browser's prefs.

因此，将lang="tr-TR"设置为html不会反映真实的测试用例，因为它是一个操作系统设置，需要复制特殊的外壳示例。
我认为在使用toLowerCase()或toUpperCase()时，只有小写的dotted-i或大写的dotless-i特定于区域。
根据那些可信/官方的消息来源，我认为你是对的：'i' !== 'I'.toLowerCase()总是认为是错误的。
但是，正如我说的，我不能在这里测试它。

相关讨论

对不起，我的问题不是理论问题。这是关于现实的。代码中存在这些检查是有实际原因的。我问：这是什么原因？很高兴听到有处理这一特殊问题的第一手经验的人。

@索恩，是的，我的答案有理论依据。但它也有实践！当我指出一些第三方的经验，比如Bobince和Terry的经验(即使这些经验不是我的)，它们都是关于现实的。好吧，不管怎样，我认为这在某种程度上是一个很好的贡献。