SQL Server: How to Join to first row
我将使用一个具体但假设的例子。
每个订单通常只有一个行项目:
命令:
1 2 3 4 | OrderGUID OrderNumber ========= ============ {FFB2...} STL-7442-1 {3EC6...} MPT-9931-8A |
LineItems:
1 2 3 4 | LineItemGUID ORDER ID Quantity Description ============ ======== ======== ================================= {098FBE3...} 1 7 prefabulated amulite {1609B09...} 2 32 spurving bearing |
但偶尔会有一个包含两行项目的订单:
1 2 3 4 | LineItemID ORDER ID Quantity Description ========== ======== ======== ================================= {A58A1...} 6,784,329 5 pentametric fan {0E9BC...} 6,784,329 5 differential girdlespring |
通常在向用户显示订单时:
1 2 3 4 | SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders INNER JOIN LineItems ON Orders.OrderID = LineItems.OrderID |
我想在订单上显示单个项目。但是,如果此临时订单包含两个(或多个)项目,则订单似乎会重复:
1 2 3 4 5 6 | OrderNumber Quantity Description =========== ======== ==================== STL-7442-1 7 prefabulated amulite MPT-9931-8A 32 spurving bearing KSG-0619-81 5 panametric fan KSG-0619-81 5 differential girdlespring |
我真正想要的是让SQL Server选择一个,因为它足够好:
1 2 3 4 5 | OrderNumber Quantity Description =========== ======== ==================== STL-7442-1 7 prefabulated amulite MPT-9931-8A 32 differential girdlespring KSG-0619-81 5 panametric fan |
如果我有冒险精神,我可能会给用户看一个省略号,表示有不止一个:
1 2 3 4 5 | OrderNumber Quantity Description =========== ======== ==================== STL-7442-1 7 prefabulated amulite MPT-9931-8A 32 differential girdlespring KSG-0619-81 5 panametric fan, ... |
所以问题是如何
- 消除"重复"行
- 只连接到其中一行,以避免重复
第一次尝试
我的第一次尝试是只加入"前1"行项目:
1 2 3 4 5 6 7 | SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders INNER JOIN ( SELECT TOP 1 LineItems.Quantity, LineItems.Description FROM LineItems WHERE LineItems.OrderID = Orders.OrderID) LineItems2 ON 1=1 |
但这就产生了错误:
The column or prefix 'Orders' does not
match with a table name or alias name
used in the query.
可能是因为内部选择没有看到外部表。
1 2 3 4 5 6 7 8 9 | SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders JOIN LineItems ON LineItems.LineItemGUID = ( SELECT TOP 1 LineItemGUID FROM LineItems WHERE OrderID = Orders.OrderID ) |
在
1 2 3 4 5 6 7 8 | SELECT Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description FROM Orders CROSS APPLY ( SELECT TOP 1 LineItems.Quantity, LineItems.Description FROM LineItems WHERE LineItems.OrderID = Orders.OrderID ) LineItems2 |
请注意,没有
查询的多次调用可以为同一顺序提供不同的行项目,即使底层没有更改。
如果您想要确定的顺序,那么应该在最里面的查询中添加一个
我知道这个问题不久前就得到了解答,但是在处理大型数据集时,嵌套查询可能会很昂贵。这里有一个不同的解决方案,其中嵌套查询只运行一次,而不是针对返回的每一行。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders INNER JOIN ( SELECT Orders.OrderNumber, MAX(LineItem.LineItemID) AS LineItemID FROM Orders INNER JOIN LineItems ON Orders.OrderNumber = LineItems.OrderNumber GROUP BY Orders.OrderNumber ) AS Items ON Orders.OrderNumber = Items.OrderNumber INNER JOIN LineItems ON Items.LineItemID = LineItems.LineItemID |
你可以这样做:
1 2 3 4 5 6 7 8 9 10 11 12 13 | SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders INNER JOIN LineItems ON Orders.OrderID = LineItems.OrderID WHERE LineItems.LineItemID = ( SELECT MIN(LineItemID) FROM LineItems WHERE OrderID = Orders.OrderID ) |
这需要
@quassnoi答案很好,在某些情况下(特别是如果外部表很大),使用窗口函数可能会更有效地查询,如下所示:
1 2 3 4 5 6 7 8 9 | SELECT Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description FROM Orders LEFT JOIN ( SELECT LineItems.Quantity, LineItems.Description, OrderId, ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY (SELECT NULL)) AS RowNum FROM LineItems ) LineItems2 ON LineItems2.OrderId = Orders.OrderID AND RowNum = 1 |
有时您只需要测试哪个查询提供了更好的性能。
,另一个使用公共表表达式的函数:
1 2 3 4 5 6 7 | WITH firstOnly AS ( SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description, ROW_NUMBER() OVER (partiton BY Orders.OrderID ORDER BY Orders.OrderID) lp FROM Orders JOIN LineItems ON Orders.OrderID = LineItems.OrderID ) SELECT * FROM firstOnly WHERE lp = 1 |
或者,最后,您可能希望显示所有连接的行?
此处使用逗号分隔的版本:
1 2 3 4 5 6 7 8 | SELECT * FROM Orders o CROSS apply ( SELECT CAST((SELECT l.Description + ',' FROM LineItems l WHERE l.OrderID = s.OrderID FOR xml path('')) AS nvarchar(MAX)) l ) LINES |
相关子查询是依赖外部查询的子查询。它就像SQL中的for循环。对于外部查询中的每一行,子查询将运行一次:
1 2 3 4 5 6 | SELECT * FROM users JOIN widgets ON widgets.id = ( SELECT id FROM widgets WHERE widgets.user_id = users.id ORDER BY created_at DESC LIMIT 1 ) |
编辑:无论如何,奎斯诺有更好的答案。
对于sql2k,如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | SELECT Orders.OrderNumber , LineItems.Quantity , LineItems.Description FROM ( SELECT Orders.OrderID , Orders.OrderNumber , FirstLineItemID = ( SELECT TOP 1 LineItemID FROM LineItems WHERE LineItems.OrderID = Orders.OrderID ORDER BY LineItemID -- or whatever else ) FROM Orders ) Orders JOIN LineItems ON LineItems.OrderID = Orders.OrderID AND LineItems.LineItemID = Orders.FirstLineItemID |
从SQL Server 2012及以后,我认为这将起到关键作用:
1 2 3 4 5 6 | SELECT DISTINCT o.OrderNumber , FIRST_VALUE(li.Quantity) OVER ( PARTITION BY o.OrderNumber ORDER BY li.Description ) AS Quantity , FIRST_VALUE(li.Description) OVER ( PARTITION BY o.OrderNumber ORDER BY li.Description ) AS Description FROM Orders AS o INNER JOIN LineItems AS li ON o.OrderID = li.OrderID |
我最喜欢使用不存在子句来运行此查询。我认为这是运行此类查询的最有效方法:
1 2 3 4 5 6 7 8 9 10 11 12 | SELECT o.OrderNumber, li.Quantity, li.Description FROM Orders AS o INNER JOIN LineItems AS li ON li.OrderID = o.OrderID WHERE NOT EXISTS ( SELECT 1 FROM LineItems AS li_later WHERE li_later.OrderID = o.OrderID AND li_later.LineItemGUID > li.LineItemGUID ) |
但我并没有用这里建议的其他方法来测试这个方法。
我使用左联接和按orders.ordernumber分组来解决类似的问题。有没有理由不这样做?
1 2 3 4 5 | SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders LEFT JOIN LineItems ON Orders.OrderID = LineItems.OrderID GROUP BY Orders.OrderNumber |
我会用你自己的问题回答你的答案:
1 2 3 4 5 6 7 | Orders LineItems +-------------+ +---------+----------+---------------+ | OrderNumber | | OrderID | Quantity | Description | +-------------+ +---------+----------+---------------+ | 22586 | | 22586 | 17 | Trunion | +-------------+ | 22586 | 3 | Girdle Spring | +---------+----------+---------------+ |
将这两个订单号连接在一起可以得到:
1 2 3 4 5 6 | OrderNumber Quantity Description ----------- -------- ------------- 22586 17 Trunion 22586 3 Girdle Spring 2 ROW(s) affected |
我们希望它只返回一行:
1 2 3 4 5 | OrderNumber Quantity Description ----------- -------- ------------- 22586 17 Trunion 1 ROW(s) affected |
这就是为什么我使用group by orders.ordernumber,它只返回每个ordernumber一行。
试过交叉,效果很好,但需要稍长时间。将行列调整为"最大",并添加组以保持速度并删除额外记录。
下面是调整后的查询:
1 2 3 4 5 | SELECT Orders.OrderNumber, MAX(LineItems.Quantity), MAX(LineItems.Description) FROM Orders INNER JOIN LineItems ON Orders.OrderID = LineItems.OrderID GROUP BY Orders.OrderNumber |