关于sql:SELECT子句中多个set-returns函数的预期行为是什么?

What is the expected behaviour for multiple set-returning functions in SELECT clause?

我试图通过两个set-returns函数得到一个"交叉连接",但在某些情况下我没有得到"交叉连接",请参阅示例

行为1:当设置的长度相同时,它会逐个匹配每个集合中的项目

1
2
3
4
5
6
7
postgres=# SELECT generate_series(1,3), generate_series(5,7) ORDER BY 1,2;
 generate_series | generate_series
-----------------+-----------------
               1 |               5
               2 |               6
               3 |               7
(3 ROWS)

行为2:当设置的长度不同时,它会"交叉连接"这些集合

1
2
3
4
5
6
7
8
9
10
postgres=# SELECT generate_series(1,2), generate_series(5,7) ORDER BY 1,2;
 generate_series | generate_series
-----------------+-----------------
               1 |               5
               1 |               6
               1 |               7
               2 |               5
               2 |               6
               2 |               7
(6 ROWS)

我想我在这里不了解一些事情,有人可以解释一下这种行为吗?

另一个例子,甚至更奇怪:

1
2
3
4
5
6
7
8
postgres=# SELECT generate_series(1,2) x, generate_series(1,4) y ORDER BY x,y;
 x | y
---+---
 1 | 1
 1 | 3
 2 | 2
 2 | 4
(4 ROWS)

我正在寻找标题中问题的答案,理想情况是链接到文档。


Postgres 10或更新

为较小的集添加空值。使用generate_series()进行演示:

1
2
3
SELECT generate_series( 1,  2) AS row2
     , generate_series(11, 13) AS row3
     , generate_series(21, 24) AS row4;
1
2
3
4
5
6
row2 | row3 | row4
-----+------+-----
   1 |   11 |   21
   2 |   12 |   22
NULL |   13 |   23
NULL | NULL |   24

dbfiddle在这里

Postgres 10的手册:

If there is more than one set-returning function in the query's select
list, the behavior is similar to what you get from putting the
functions into a single LATERAL ROWS FROM( ... ) FROM-clause item. For
each row from the underlying query, there is an output row using the
first result from each function, then an output row using the second
result, and so on. If some of the set-returning functions produce
fewer outputs than others, null values are substituted for the missing
data, so that the total number of rows emitted for one underlying row
is the same as for the set-returning function that produced the most
outputs. Thus the set-returning functions run"in lockstep" until they
are all exhausted, and then execution continues with the next
underlying row.

这结束了传统上奇怪的行为。

Postgres 9.6或更高版本

结果行的数量(有点令人惊讶!)是同一SELECT列表中所有集合的最低公倍数。 (如果所有的设置大小都没有公约数,那么只能像CROSS JOIN一样!)演示:

1
2
3
SELECT generate_series( 1,  2) AS row2
     , generate_series(11, 13) AS row3
     , generate_series(21, 24) AS row4;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
row2 | row3 | row4
-----+------+-----
   1 |   11 |   21
   2 |   12 |   22
   1 |   13 |   23
   2 |   11 |   24
   1 |   12 |   21
   2 |   13 |   22
   1 |   11 |   23
   2 |   12 |   24
   1 |   13 |   21
   2 |   11 |   22
   1 |   12 |   23
   2 |   13 |   24

dbfiddle在这里

Postgres 9.6手册中记录了SQL函数返回集的章节,以及避免它的建议:

Note: The key problem with using set-returning functions in the select
list, rather than the FROM clause, is that putting more than one
set-returning function in the same select list does not behave very
sensibly. (What you actually get if you do so is a number of output
rows equal to the least common multiple of the numbers of rows
produced by each set-returning function.) The LATERAL syntax produces
less surprising results when calling multiple set-returning functions,
and should usually be used instead.

大胆强调我的。

单个设置返回功能正常(但在from列表中仍然更清晰),但现在不鼓励使用同一SELECT列表中的多个。在我们进行LATERAL连接之前,这是一个有用的功能。现在它只是历史的镇流器。

有关:

  • PostgreSQL中的并行unfst()和排序顺序
  • 并联多个阵列
  • LATERAL和PostgreSQL中的子查询有什么区别?

文档中有关于该问题的唯一注释。我不确定这是否解释了所描述的行为。也许更重要的是不推荐使用这样的函数:

Currently, functions returning sets can also be called in the select list of a query. For each row that the query generates by itself, the function returning set is invoked, and an output row is generated for each element of the function's result set. Note, however, that this capability is deprecated and might be removed in future releases.


我找不到任何相关的文档。但是,我可以描述我观察到的行为。

集合生成函数每个返回有限数量的行。 Postgres似乎运行set生成函数,直到所有这些函数都在最后一行 - 或者,当所有函数都回到第一行时更有可能停止。从技术上讲,这将是系列长度的最小公倍数(LCM)。

我不确定为什么会这样。而且,正如我在评论中所说,我认为通常将函数放在from子句中会更好。