DISTINCT ON in an aggregate function in postgres
对于我的问题,我们有一个架构,其中一张照片有很多标签和许多评论。 因此,如果我有一个查询,我想要所有的注释和标记,它会将行相乘。 因此,如果一张照片有2个标签和13条评论,我会为这张照片获得26行:
1 2 3 4 5 6 7 8 | SELECT tag.name, comment.comment_id FROM photo LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id |
这对大多数事情都很好,但这意味着如果我
1 2 3 4 5 6 7 | SELECT json_agg(tag.name) AS tags FROM photo LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id GROUP BY photo.photo_id |
相反,我想要一个只有'郊区'和'城市'的数组,如下所示:
1 2 3 4 | [ {"tag_id":1,"name":"suburban"}, {"tag_id":2,"name":"city"} ] |
我可以
那么如何在Postgres中的聚合函数中模拟
每当你有一个中心表并希望将它左连接到表A中的许多行并且还将它连接到表B中的许多行时,就会出现重复行的这些问题。如果你不小心,它尤其可以抛弃像
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | WITH tags AS ( SELECT photo.photo_id, json_agg(row_to_json(tag.*)) AS tags FROM photo LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id GROUP BY photo.photo_id ), comments AS ( SELECT photo.photo_id, json_agg(row_to_json(comment.*)) AS comments FROM photo LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id GROUP BY photo.photo_id ) SELECT COALESCE(tags.photo_id, comments.photo_id) AS photo_id, tags.tags, comments.comments FROM tags FULL OUTER JOIN comments ON tags.photo_id = comments.photo_id |
编辑:如果你真的想在没有CTE的情况下加入所有东西,看起来它给出了正确的结果:
1 2 3 4 5 6 7 8 | SELECT photo.photo_id, to_json(array_agg(DISTINCT tag.*)) AS tags, to_json(array_agg(DISTINCT comment.*)) AS comments FROM photo LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id GROUP BY photo.photo_id |
最便宜和最简单的
- 两个SQL LEFT JOINS产生不正确的结果
最适合返回少数选定的行
假设您实际上不想检索整个表,而是一次只检索一个或几个选定的照片,并使用聚合的详细信息,最优雅且可能最快的方法是使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | SELECT * FROM photo p CROSS JOIN LATERAL ( SELECT json_agg(c) AS comments FROM comment c WHERE photo_id = p.photo_id ) c1 CROSS JOIN LATERAL ( SELECT json_agg(t) AS tags FROM photo_tag pt JOIN tag t USING (tag_id) WHERE pt.photo_id = p.photo_id ) t WHERE p.photo_id = 2; -- arbitrary selection |
这将从
要在基础数据中另外折叠重复项,请参见下文。
笔记:
-
LATERAL 和json_agg() 需要Postgres 9.3或更高版本。 -
json_agg(c) 是json_agg(c.*) 的缩写。 -
我们不需要
LEFT JOIN ,因为像json_agg() 这样的聚合函数总是返回一行。
通常,您只需要列的子集 - 至少不包括冗余
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | SELECT * FROM photo p CROSS JOIN LATERAL ( SELECT json_agg(json_build_object('comment_id', comment_id , 'comment', comment)) AS comments FROM comment WHERE photo_id = p.photo_id ) c CROSS JOIN LATERAL ( SELECT json_agg(t) AS tags FROM photo_tag pt JOIN tag t USING (tag_id) WHERE pt.photo_id = p.photo_id ) t WHERE p.photo_id = 2; |
Postgres 9.4引入了
- 在SQL中返回JSON对象数组(Postgres)
还允许自由选择JSON密钥名称,您不必坚持列名称。
最适合归还整张桌子
要返回所有行,这样更有效:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | SELECT p.* , COALESCE(c1.comments, '[]') AS comments , COALESCE(t.tags, '[]') AS tags FROM photo p LEFT JOIN ( SELECT photo_id , json_agg(json_build_object('comment_id', comment_id , 'comment', comment)) AS comments FROM comment c GROUP BY 1 ) c1 USING (photo_id) LEFT JOIN LATERAL ( SELECT photo_id , json_agg(t) AS tags FROM photo_tag pt JOIN tag t USING (tag_id) GROUP BY 1 ) t USING (photo_id); |
一旦我们检索到足够的行,这比
请注意连接条件中的
还删除基表中的现有重复项
您不能只是
- 如何在json列中查询空对象?
有各种更好的方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | SELECT * FROM photo p CROSS JOIN LATERAL ( SELECT json_agg(to_json(c1.comment)) AS comments1 , json_agg(json_build_object('comment', c1.comment)) AS comments2 , json_agg(to_json(c1)) AS comments3 FROM ( SELECT DISTINCT c.comment -- folding dupes here FROM comment c WHERE c.photo_id = p.photo_id -- ORDER BY comment -- any particular order? ) c1 ) c2 CROSS JOIN LATERAL ( SELECT jsonb_agg(DISTINCT t) AS tags -- demonstrating jsonb_agg FROM photo_tag pt JOIN tag t USING (tag_id) WHERE pt.photo_id = p.photo_id ) t WHERE p.photo_id = 2; |
演示
db <>在这里小提琴
旧的SQL Fiddle回溯到Postgres 9.3
Postgres的旧SQL小提琴9.6
如注释中所述,json_agg不会将行序列化为对象,而是构建一个包含传递它的值的JSON数组。您需要
1 2 3 4 5 6 7 | SELECT json_agg(DISTINCT row_to_json(comment)) AS tags FROM photo LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id GROUP BY photo.photo_id |