What are the rules for JavaScript's automatic semicolon insertion (ASI)?
好吧,首先我应该问一下这是否依赖于浏览器。
我已经读过,如果找到一个无效的令牌,但代码段在该无效令牌之前有效,则在令牌之前插入分号(如果前面有换行符)。
但是,由分号插入引起的错误引用的常见示例是:
1 2 | return _a+b; |
..它似乎不遵循这个规则,因为_a将是一个有效的标记。
另一方面,分解调用链按预期工作:
1 2 | $('#myButton') .click(function(){alert("Hello!")}); |
有没有人对规则有更深入的描述?
首先,您应该知道哪些语句受自动分号插入影响(为简洁起见,也称为ASI):
- 空的陈述
-
var 声明 - 表达陈述
-
do-while 声明 -
continue 声明 -
break 声明 -
return 声明 -
throw 声明
ASI的具体规则在说明书和章节11.9.1自动分号插入规则中有所描述
描述了三种情况:
当遇到语法不允许的标记(
-
令牌与前一个令牌分开至少一个
LineTerminator 。 -
令牌是
}
例如。:
1 2 | { 1 2 } 3 |
变成了
1 2 | { 1 ;2 ;} 3; |
当遇到令牌输入流的末尾并且解析器无法将输入令牌流解析为单个完整的程序时,则在输入流的末尾自动插入分号。
例如。:
1 2 | a = b ++c |
转变为:
1 2 | a = b; ++c; |
这种情况发生在某些语法生成允许令牌的情况下,但是生产是限制生产,在限制令牌之前自动插入分号。
限制作品:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | UpdateExpression : LeftHandSideExpression [no LineTerminator here] ++ LeftHandSideExpression [no LineTerminator here] -- ContinueStatement : continue ; continue [no LineTerminator here] LabelIdentifier ; BreakStatement : break ; break [no LineTerminator here] LabelIdentifier ; ReturnStatement : return ; return [no LineTerminator here] Expression ; ThrowStatement : throw [no LineTerminator here] Expression ; ArrowFunction : ArrowParameters [no LineTerminator here] => ConciseBody YieldExpression : yield [no LineTerminator here] * AssignmentExpression yield [no LineTerminator here] AssignmentExpression |
使用
1 2 | return "something"; |
变成了
1 2 | return; "something"; |
直接来自ECMA-262,第五版ECMAScript规范:
7.9.1 Rules of Automatic Semicolon Insertion
There are three basic rules of semicolon insertion:
When, as the program is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:
- The offending token is separated from the previous token by at least one
LineTerminator .- The offending token is }.
When, as the program is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single complete ECMAScript Program , then a semicolon is automatically inserted at the end of the input stream.When, as the program is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a restricted production and the token would be the first token for a terminal or nonterminal immediately following the annotation"[no LineTerminator here]" within the restricted production (and therefore such a token is called a restricted token), and the restricted token is separated from the previous token by at least one LineTerminator, then a semicolon is automatically inserted before the restricted token.However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become one of the two semicolons in the header of a for statement (see 12.6.3).
我无法理解规范中的这3条规则 - 希望有更简单的英语 - 但这是我从JavaScript中收集到的:The Definitive Guide,第6版,David Flanagan,O'Reilly,2011:
引用:
JavaScript does not treat every line break as a semicolon: it usually treats line breaks as semicolons only if it can’t parse the code without the semicolons.
另一个引用:代码
1 2 3 4 | var a a = 3 console.log(a) |
JavaScript does not treat the second line break as a semicolon because it can continue parsing the longer statement a = 3;
和:
two exceptions to the general rule that JavaScript interprets line breaks as semicolons when it cannot parse the second line as a continuation of the statement on the first line. The first exception involves the return, break, and continue statements
... If a line break appears after any of these words ... JavaScript will always interpret that line break as a semicolon.
... The second exception involves the ++ and ?? operators ... If you want to use either of these operators as postfix operators, they must appear on the same line as the expression they apply to. Otherwise, the line break will be treated as a semicolon, and the ++ or -- will be parsed as a prefix operator applied to the code that follows. Consider this code, for example:
1 2 3 | x ++ y |
It is parsed as
x; ++y; , not asx++; y
所以我想简化它,这意味着:
一般来说,只要有意义,JavaScript就会将其视为代码的延续 - 除了2种情况:(1)在某些关键字之后,如
关于"将其视为代码的延续,只要它有意义"的部分使得它感觉像正则表达式的贪婪匹配。
如上所述,这意味着对于带有换行符的
(再次引用:如果在任何这些单词之后出现换行符[例如
并且由于这个原因,经典的例子
1 2 3 4 | return { foo: 1 } |
将无法正常工作,因为JavaScript解释器会将其视为:
1 2 3 4 | return; // returning nothing { foo: 1 } |
1 2 3 | return { foo: 1 } |
为了它正常工作。如果您在任何声明之后遵循使用
1 2 3 | return { foo: 1 }; |
关于分号插入和var语句,请注意在使用var但跨越多行时忘记逗号。有人昨天在我的代码中发现了这个:
1 2 | var srcRecords = src.records srcIds = []; |
它运行但效果是srcIds声明/赋值是全局的,因为由于自动分号插入,因为该语句被认为已完成,因此前一行上的var的本地声明不再适用。
我发现JavaScript的自动分号插入的最大上下文描述来自一本关于Crafting Interpreters的书。
JavaScript’s"automatic semicolon insertion" rule is the odd one. Where other languages assume most newlines are meaningful and only a few should be ignored in multi-line statements, JS assumes the opposite. It treats all of your newlines as meaningless whitespace unless it encounters a parse error. If it does, it goes back and tries turning the previous newline into a semicolon to get something grammatically valid.
他继续描述它,因为你会编码气味。
This design note would turn into a design diatribe if I went into complete detail about how that even works, much less all the various ways that that is a bad idea. It’s a mess. JavaScript is the only language I know where many style guides demand explicit semicolons after every statement even though the language theoretically lets you elide them.