Retrieving the last record in each group - MySQL
有一个表
1 2 3 4 5 6 7 8 | Id Name Other_Columns ------------------------- 1 A A_data_1 2 A A_data_2 3 A A_data_3 4 B B_data_1 5 B B_data_2 6 C C_data_1 |
如果我运行一个查询
1 2 3 | 1 A A_data_1 4 B B_data_1 6 C C_data_1 |
什么查询将返回以下结果?
1 2 3 | 3 A A_data_3 5 B B_data_2 6 C C_data_1 |
也就是说,应该返回每个组中的最后一条记录。
目前,这是我使用的查询:
1 2 3 4 5 6 7 | SELECT * FROM (SELECT * FROM messages ORDER BY id DESC) AS x GROUP BY name |
但这看起来效率很低。还有其他方法可以达到同样的效果吗?
MySQL8.0现在支持窗口功能,就像几乎所有流行的SQL实现一样。使用此标准语法,我们可以编写最大的每组n个查询:
1 2 3 4 5 | WITH ranked_messages AS ( SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn FROM messages AS m ) SELECT * FROM ranked_messages WHERE rn = 1; |
下面是我在2009年为这个问题写的原始答案:
我这样写解决方案:
1 2 3 4 | SELECT m1.* FROM messages m1 LEFT JOIN messages m2 ON (m1.name = m2.name AND m1.id < m2.id) WHERE m2.id IS NULL; |
在性能方面,根据数据的性质,一种或另一种解决方案可能更好。因此,您应该测试这两个查询,并使用一个在给定数据库的情况下性能更好的查询。
例如,我有一个StackOverflow August数据转储的副本。我会用它作为基准。
我将编写一个查询来查找给定用户ID(我的)的最新文章。
首先在子查询中使用@eric和
1 2 3 4 5 6 7 8 | SELECT p1.postid FROM Posts p1 INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid FROM Posts pi GROUP BY pi.owneruserid) p2 ON (p1.postid = p2.maxpostid) WHERE p1.owneruserid = 20860; 1 ROW IN SET (1 MIN 17.89 sec) |
即使是
1 2 3 4 5 6 7 8 | +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ | id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | REF | ROWS | Extra | +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 76756 | | | 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | USING WHERE | | 2 | DERIVED | pi | INDEX | NULL | OwnerUserId | 8 | NULL | 1151268 | USING INDEX | +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ 3 ROWS IN SET (16.09 sec) |
现在,使用我对
1 2 3 4 5 6 | SELECT p1.postid FROM Posts p1 LEFT JOIN posts p2 ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid) WHERE p2.postid IS NULL AND p1.owneruserid = 20860; 1 ROW IN SET (0.28 sec) |
1 2 3 4 5 6 7 | +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ | id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | REF | ROWS | Extra | +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ | 1 | SIMPLE | p1 | REF | OwnerUserId | OwnerUserId | 8 | const | 1384 | USING INDEX | | 1 | SIMPLE | p2 | REF | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | USING WHERE; USING INDEX; NOT EXISTS | +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ 2 ROWS IN SET (0.00 sec) |
这是我的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | CREATE TABLE `posts` ( `PostId` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT, `PostTypeId` BIGINT(20) UNSIGNED NOT NULL, `AcceptedAnswerId` BIGINT(20) UNSIGNED DEFAULT NULL, `ParentId` BIGINT(20) UNSIGNED DEFAULT NULL, `CreationDate` datetime NOT NULL, `Score` INT(11) NOT NULL DEFAULT '0', `ViewCount` INT(11) NOT NULL DEFAULT '0', `Body` text NOT NULL, `OwnerUserId` BIGINT(20) UNSIGNED NOT NULL, `OwnerDisplayName` VARCHAR(40) DEFAULT NULL, `LastEditorUserId` BIGINT(20) UNSIGNED DEFAULT NULL, `LastEditDate` datetime DEFAULT NULL, `LastActivityDate` datetime DEFAULT NULL, `Title` VARCHAR(250) NOT NULL DEFAULT '', `Tags` VARCHAR(150) NOT NULL DEFAULT '', `AnswerCount` INT(11) NOT NULL DEFAULT '0', `CommentCount` INT(11) NOT NULL DEFAULT '0', `FavoriteCount` INT(11) NOT NULL DEFAULT '0', `ClosedDate` datetime DEFAULT NULL, PRIMARY KEY (`PostId`), UNIQUE KEY `PostId` (`PostId`), KEY `PostTypeId` (`PostTypeId`), KEY `AcceptedAnswerId` (`AcceptedAnswerId`), KEY `OwnerUserId` (`OwnerUserId`), KEY `LastEditorUserId` (`LastEditorUserId`), KEY `ParentId` (`ParentId`), CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`) ) ENGINE=InnoDB; |
(P)Upd:2017-03-31,the version 5.7.5 of Mysql made the only□Ufull□UU Group?UU UU UU UU UU UU UU UU UU?by switch enabled by default(Hence,Non-deterministic Group by Queries became Disabled).Moreover,they updated the group by implementation and the solution might not work as expected anymore even with the disabled switch.一个需要检查。(p)(P)Bill Karwin's solution above works fine when item count within groups is rather small,but the performance of the query becomes bad when the groups are rather large,since the solution requires about EDOCX1 original/of only EDOCX1(p)(P)I made my tests on a innodb table of EDOCX1 university 2 common rows with EDOCX1The table contains testresults for functional tests and has the EDOCX1 single 4 as the primary key.Thus,EDOCX1(英文)5 is a group and I was searching for the last EDOCX1(p)(P)Bill's solution has already been running for several hours on my dell E4310 and I do not know when it i s going to finish even though it operates on a coverage index(Hence EDOCX1 penal 8).(p)(P)I have a couple of other solutions that are based on the same ideas:(p)
- If the underlying index is btree index(which is usually the case),the largest EDOCX1 original 9.Pair is the last value within each EDOCX1 original 10,that is the first for each EDOCX1 penogical 10.If we walk through the index in descending order;
- 如果我们读到了一份指数所涵盖的价值,那么这些价值就可以在指数的顺序中读到。
- EACH Index implicability contains primary key columns appended to that(that is the primary key is in the coverage index).在解决问题的过程中,我直接在你的案件中发挥作用,你只需要在结果中增加一个关键的专栏。
- In many cases it is much cheaper to collect the required row ids in the required order in a subquery and join the result of the subquery on the id.Since for each row in the subquery result mysql will need a single fetch based on primary key,the subquery will be put first in the join and the rows will be output in the order of the IDS in the subqErry(if we omit explicit order by the join)
(P)3 ways mysql uses indexes is a great article to understand some details.(p)(P)解决方案1(p)(P)This one is incredibly fast,it takes about 0.8 secs on my 18M+rows:(p)字母名称(P)If you want to change the order to asc,put it in a subquery,return the IDS only and use that as the subquery to join to the rest of the columns:(p)字母名称(P)This one takes about 1.2 secs on my data.(p)(P)解决方案2(p)(P)This is another solution that take about 19 seconds for my table:(p)字母名称(P)It returns tests in descending order as well.It is much slower since it does a full index scan but it is here to give you a n idea how to output n max rows for each group.(p)(P)The disadvantage of the query is that its result cannot be checked by the query cache.(p)
使用子查询返回正确的分组,因为您已经完成了一半。
试试这个:
1 2 3 4 5 6 7 | SELECT a.* FROM messages a INNER JOIN (SELECT name, MAX(id) AS maxid FROM messages GROUP BY name) AS b ON a.id = b.maxid |
如果不是
1 2 3 4 5 6 7 8 9 | SELECT a.* FROM messages a INNER JOIN (SELECT name, MAX(other_col) AS other_col FROM messages GROUP BY name) AS b ON a.name = b.name AND a.other_col = b.other_col |
通过这种方式,可以避免子查询中的相关子查询和/或排序,这往往非常缓慢/效率低下。
(P)I arrived at a different solution,which is to get the IDS for the last post within each group,they select from the messages table using the result from the first query as the argument for a EDOCX1 indicatoriginal 12 occupation:(p)字母名称(P)I don't know how this performs compared to some of the other solutions,but i t worked spectacularly for my table with 3+million rows.(4 Second Execution with 1200+Results)(p)(P)This should work both on mysql and sql server.(p)
(P)次贷Fiddle Link解决方案(p)字母名称(P)与附加条件相结合的解决方案(p)字母名称(P)Reason for this post is to give fiddle link only.Same sql is already provided in other answers.(p)
(P)I've not yet tested with large db but I think this could be faster than joining tables:(p)字母名称
这里是我的解决方案:
1 2 3 4 | SELECT DISTINCT NAME, MAX(MESSAGES) OVER(PARTITION BY NAME) MESSAGES FROM MESSAGE; |
这里有两个建议。首先,如果mysql支持row_number(),非常简单:
1 2 3 4 5 6 7 8 9 10 11 | WITH Ranked AS ( SELECT Id, Name, OtherColumns, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Id DESC ) AS rk FROM messages ) SELECT Id, Name, OtherColumns FROM messages WHERE rk = 1; |
我假设"last"是指身份证上的last。如果没有,则相应地更改row_number()窗口的order by子句。如果row_number()不可用,这是另一个解决方案:
其次,如果没有,这通常是一个很好的方法:
1 2 3 4 5 6 7 8 | SELECT Id, Name, OtherColumns FROM messages WHERE NOT EXISTS ( SELECT * FROM messages AS M2 WHERE M2.Name = messages.Name AND M2.Id > messages.Id ) |
换言之,选择没有具有相同名称的后续ID消息的消息。
安全与速度的方法是如下。
1 2 3 | SELECT * FROM messages a WHERE Id = (SELECT MAX(Id) FROM messages WHERE a.Name = Name) |
结果
1 2 3 4 | Id Name Other_Columns 3 A A_data_3 5 B B_data_2 6 C C_data_1 |
(P)这是另一种方法,可以得到最后相关的记录,使用EDOCX1,带有字母顺序和EDOCX1,字母名称14,与Pick one of the record from the list(p)字母名称(P)Above query will group the all the EDOCX1 penographic 15 communal that are in same EDOCX1 universal 16 group and using EDOCX1 universitable 17/American will join all the EDOCX1 penographic 15 in a specific group in descending order with the provided separator in my case I have used EDOCX1 individual 19,using EDOCX1 14 over this list will pick the first one.(p)Fiddle Demo
1 2 3 4 5 6 7 8 9 10 11 12 | SELECT column1, column2 FROM TABLE_NAME WHERE id IN (SELECT MAX(id) FROM TABLE_NAME GROUP BY column1) ORDER BY column1 ; |
试试这个:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | SELECT jos_categories.title AS name, joined .catid, joined .title, joined .introtext FROM jos_categories INNER JOIN (SELECT * FROM (SELECT `title`, catid, `created`, introtext FROM `jos_content` WHERE `sectionid` = 6 ORDER BY `id` DESC) AS yes GROUP BY `yes`.`catid` DESC ORDER BY `yes`.`created` DESC) AS joined ON( joined.catid = jos_categories.id ) |
从这里你可以把观以及。
HTTP:/ / sqlfiddle.com / #!9 / 9 / ef42b
第一个解决方案
1 2 3 | SELECT d1.ID,Name,City FROM Demo_User d1 INNER JOIN (SELECT MAX(ID) AS ID FROM Demo_User GROUP BY NAME) AS P ON (d1.ID=P.ID); |
第二解
1 | SELECT * FROM (SELECT * FROM Demo_User ORDER BY ID DESC) AS T GROUP BY NAME ; |
清楚的是有很多不同的方式得到相同的结果,你的问题是什么似乎是安全有效的方法得到的结果在一组在最后mysql。如果你是工作与巨大的少量的数据和假设你是与使用的最新版本的innodb甚至mysql(如5.7.21和8.0.4-rc),然后有可能不安全的方式,这是有效的。
有时我们需要做这与表行与甚至超过60万。
这些例子中我将使用为只有约150万行数据与那里的queries会找到所有需要的结果为在数据组。在我们的情况下,我们往往需要将实际数据从归来后约2000组(这会非常hypothetically不要求检查公布的数据)。
我会使用下面的表:
1 2 3 4 5 6 7 8 9 10 | CREATE TABLE temperature( id INT UNSIGNED NOT NULL AUTO_INCREMENT, groupID INT UNSIGNED NOT NULL, recordedTimestamp TIMESTAMP NOT NULL, recordedValue INT NOT NULL, INDEX groupIndex(groupID, recordedTimestamp), PRIMARY KEY (id) ); CREATE TEMPORARY TABLE selected_group(id INT UNSIGNED NOT NULL, PRIMARY KEY(id)); |
表的温度是约150万populated与随机的记录,和100不同的组。 _组的选择是与那些populated 100组(在我们的情况下,这是不20%煤通常会为所有组)。
这个数据是随机的,因为这意味着可以有多个行recordedtimestamps相同。什么我们想做的是得到一个列出的顺序在所有选定的组的最后一recordedtimestamp groupid与对每一组,同一组,如果有超过一个matching行像,然后最后matching ID的那些行。
如果有一个hypothetically mysql()函数返回最后的价值,从最后一行在一个特殊的顺序,然后由条款我们可以简单的原因:
1 2 3 4 5 6 7 8 9 | SELECT LAST(t1.id) AS id, t1.groupID, LAST(t1.recordedTimestamp) AS recordedTimestamp, LAST(t1.recordedValue) AS recordedValue FROM selected_group g INNER JOIN temperature t1 ON t1.groupID = g.id ORDER BY t1.recordedTimestamp, t1.id GROUP BY t1.groupID; |
这将只需要几行examine 100在这种情况下,因为它不使用任何由当日公布的正常组。这会execute在0秒,因此是高度有效的。 注意,通常我们会看到,在mysql安全秩序由集团由以下条款条款条款顺序,然而这是由用于确定最后的顺序为()函数,如果它是由集团后,然后它会ordering的组。如果没有集团目前是由条款,然后最后的价值将是相同的在所有的返回的行。
然而这不是有mysql并不比让我们看看有什么不同的想法和对prove,所有这些是有效的。
一个例子
1 2 3 4 5 6 7 8 9 | SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue FROM selected_group g INNER JOIN temperature t1 ON t1.id = ( SELECT t2.id FROM temperature t2 WHERE t2.groupID = g.id ORDER BY t2.recordedTimestamp DESC, t2.id DESC LIMIT 1 ); |
这examined 3009254行和带* 0.859秒在5.7.21和少量的长在8.0.4-rc
例子2
1 2 3 4 5 6 7 8 9 10 11 12 13 | SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue FROM temperature t1 INNER JOIN ( SELECT MAX(t2.id) AS id FROM temperature t2 INNER JOIN ( SELECT t3.groupID, MAX(t3.recordedTimestamp) AS recordedTimestamp FROM selected_group g INNER JOIN temperature t3 ON t3.groupID = g.id GROUP BY t3.groupID ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp GROUP BY t2.groupID ) t5 ON t5.id = t1.id; |
这examined 1505331行和带* 1.25秒在5.7.21和少量的长在8.0.4-rc
三例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue FROM temperature t1 WHERE t1.id IN ( SELECT MAX(t2.id) AS id FROM temperature t2 INNER JOIN ( SELECT t3.groupID, MAX(t3.recordedTimestamp) AS recordedTimestamp FROM selected_group g INNER JOIN temperature t3 ON t3.groupID = g.id GROUP BY t3.groupID ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp GROUP BY t2.groupID ) ORDER BY t1.groupID; |
这examined 3009685行和带* 1.95秒在5.7.21和少量的长在8.0.4-rc
4例
1 2 3 4 5 6 7 8 9 10 11 | SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue FROM selected_group g INNER JOIN temperature t1 ON t1.id = ( SELECT MAX(t2.id) FROM temperature t2 WHERE t2.groupID = g.id AND t2.recordedTimestamp = ( SELECT MAX(t3.recordedTimestamp) FROM temperature t3 WHERE t3.groupID = g.id ) ); |
把这一行和examined 6137810 * 2. 2秒在5.7.21和少量的长在8.0.4-rc
5例
1 2 3 4 5 6 7 8 9 10 11 12 13 | SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue FROM ( SELECT t2.id, t2.groupID, t2.recordedTimestamp, t2.recordedValue, ROW_NUMBER() OVER ( PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp DESC, t2.id DESC ) AS rowNumber FROM selected_group g INNER JOIN temperature t2 ON t2.groupID = g.id ) t1 WHERE t1.rowNumber = 1; |
这examined 6017808行和带* 4.2秒在8.0.4-rc
6例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue FROM ( SELECT last_value(t2.id) OVER w AS id, t2.groupID, last_value(t2.recordedTimestamp) OVER w AS recordedTimestamp, last_value(t2.recordedValue) OVER w AS recordedValue FROM selected_group g INNER JOIN temperature t2 ON t2.groupID = g.id WINDOW w AS ( PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp, t2.id RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) ) t1 GROUP BY t1.groupID; |
这examined 6017908行和带* 17.5秒在8.0.4-rc
7例
1 2 3 4 5 6 7 8 9 10 11 | SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue FROM selected_group g INNER JOIN temperature t1 ON t1.groupID = g.id LEFT JOIN temperature t2 ON t2.groupID = g.id AND ( t2.recordedTimestamp > t1.recordedTimestamp OR (t2.recordedTimestamp = t1.recordedTimestamp AND t2.id > t1.id) ) WHERE t2.id IS NULL ORDER BY t1.groupID; |
这一个是以永远比我有大杀了它。
如果你的希望"vijay dev留言表包含ID,是汽车的主要关键increment我们取的最新记录的基础上,对主要关键查询应读为如下:
1 | SELECT m1.* FROM messages m1 INNER JOIN (SELECT MAX(Id) AS lastmsgId FROM messages GROUP BY Name) m2 ON m1.Id=m2.lastmsgId |
如果你想对每一
查询
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | SELECT t1.Id, t1.Name, t1.Other_Columns FROM ( SELECT Id, Name, Other_Columns, ( CASE Name WHEN @curA THEN @curRow := @curRow + 1 ELSE @curRow := 1 AND @curA := Name END ) + 1 AS rn FROM messages t, (SELECT @curRow := 0, @curA := '') r ORDER BY Name,Id DESC )t1 WHERE t1.rn = 1 ORDER BY t1.Id; |
SQL fiddle
根据您的问题,下面的查询将正常工作。
1 2 3 4 5 6 7 8 9 | SELECT M1.* FROM MESSAGES M1, ( SELECT SUBSTR(Others_data,1,2),MAX(Others_data) AS Max_Others_data FROM MESSAGES GROUP BY 1 ) M2 WHERE M1.Others_data = M2.Max_Others_data ORDER BY Others_data; |
我们是否可以使用此方法删除表中的重复项?结果集基本上是唯一记录的集合,所以如果我们可以删除结果集中没有的所有记录,那么我们将有效地没有重复的记录?我试过了,但是MySQL出了1093个错误。
1 2 3 4 5 | DELETE FROM messages WHERE id NOT IN (SELECT m1.id FROM messages m1 LEFT JOIN messages m2 ON (m1.name = m2.name AND m1.id < m2.id) WHERE m2.id IS NULL) |
有没有一种方法可以将输出保存到临时变量,然后从非中删除(临时变量)?@比尔感谢你提供了一个非常有用的解决方案。
编辑:我想我找到了解决方案:
1 2 3 4 5 6 7 8 9 10 | DROP TABLE IF EXISTS UniqueIDs; CREATE TEMPORARY TABLE UniqueIDs (id INT(11)); INSERT INTO UniqueIDs (SELECT T1.ID FROM TABLE T1 LEFT JOIN TABLE T2 ON (T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison FIELDS AND T1.ID < T2.ID) WHERE T2.ID IS NULL); DELETE FROM TABLE WHERE id NOT IN (SELECT ID FROM UniqueIDs); |
1 | SELECT * FROM messages GROUP BY name DESC |
关于这个如何:
1 2 3 | SELECT DISTINCT ON (name) * FROM messages ORDER BY name, id DESC; |
我有类似的问题(在postgresql艰难)和在一个3英尺的记录表。本文以1.7s VS 44s溶液产生一个与左连接。 在我的情况下我有大的滤波corrispondant实地对null价值的你的名字,甚至更好的performances 0.2 resulting由设置在
如果性能是您真正关心的问题,您可以在表中引入一个名为
在最后一列上将其设置为true,并在每行插入/更新/删除时对其进行维护。写入速度会变慢,但在读取时会受益。它取决于您的用例,我建议您只有在以阅读为中心的情况下才使用它。
因此,您的查询将如下所示:
1 | SELECT * FROM Messages WHERE IsLastInGroup = 1 |
select*from table_name where primary_key in(select max(primary_key)from table_name group by column_name)
我们将研究如何使用MySQL获取分组中的最后一条记录。例如,如果您有这组文章的结果。
我希望能够得到每一个类别的最后一个职位,即标题3,标题5和标题6。要按类别获取文章,您将使用mysql group by keyboard。
但是我们从这个查询中得到的结果是。
Group By将始终返回组中结果集的第一条记录。
FROM posts
WHERE id IN (
SELECT MAX(id)
FROM posts
GROUP BY category_id
);
这将返回每个组中ID最高的帖子。
引用单击此处
您可以按计数分组,还可以获取组的最后一项,如:
1 2 3 4 5 6 | SELECT USER, COUNT(USER) AS COUNT, MAX(id) AS LAST FROM request GROUP BY USER |
你看到过https://github.com/fhulufhelo/get-last-record-in-each-mysql-group吗?它对我有用
1 | $sql ="SELECT c.id, c.name, c.email, r.id, r.companyid, r.name, r.email FROM companytable c LEFT JOIN ( SELECT * FROM revisiontable WHERE id IN ( SELECT MAX(id) FROM revisiontable GROUP BY companyid )) r ON a.cid=b.r.id"; |