Finding duplicate rows in SQL Server
我有一个组织的SQL Server数据库,并且有许多重复的行。 我想运行一个select语句来获取所有这些和dupes的数量,但也返回与每个组织关联的id。
声明如下:
1 2 3 4 | SELECT orgName, COUNT(*) AS dupes FROM organizations GROUP BY orgName HAVING (COUNT(*) > 1) |
将返回类似的东西
1 2 3 4 | orgName | dupes ABC Corp | 7 Foo Federation | 5 Widget Company | 2 |
但我也想抓住他们的身份证。 有没有办法做到这一点? 也许就像一个
1 2 3 4 5 6 | orgName | dupeCount | id ABC Corp | 1 | 34 ABC Corp | 2 | 5 ... Widget Company | 1 | 10 Widget Company | 2 | 2 |
原因是还有一个单独的用户表链接到这些组织,我想统一它们(因此删除欺骗,以便用户链接到同一组织而不是欺骗组织)。 但我想手动分配,所以我不会搞砸任何东西,但我仍然需要一个声明返回所有欺骗组织的ID,以便我可以浏览用户列表。
1 2 3 4 5 6 7 8 | SELECT o.orgName, oc.dupeCount, o.id FROM organizations o INNER JOIN ( SELECT orgName, COUNT(*) AS dupeCount FROM organizations GROUP BY orgName HAVING COUNT(*) > 1 ) oc ON o.orgName = oc.orgName |
您可以运行以下查询并使用
1 2 3 4 | SELECT orgName, COUNT(*), MAX(ID) AS dupes FROM organizations GROUP BY orgName HAVING (COUNT(*) > 1) |
但是你必须运行几次这个查询。
你可以这样做:
1 2 3 4 5 6 7 8 9 | SELECT o.id, o.orgName, d.intCount FROM ( SELECT orgName, COUNT(*) AS intCount FROM organizations GROUP BY orgName HAVING COUNT(*) > 1 ) AS d INNER JOIN organizations o ON o.orgName = d.orgName |
如果您只想返回可以删除的记录(只留下其中一个),您可以使用:
1 2 3 4 5 6 7 8 9 | SELECT id, orgName FROM ( SELECT orgName, id, ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow FROM organizations ) AS d WHERE intRow != 1 |
编辑:SQL Server 2000没有ROW_NUMBER()函数。相反,你可以使用:
1 2 3 4 5 6 7 8 9 10 | SELECT o.id, o.orgName, d.intCount FROM ( SELECT orgName, COUNT(*) AS intCount, MIN(id) AS minId FROM organizations GROUP BY orgName HAVING COUNT(*) > 1 ) AS d INNER JOIN organizations o ON o.orgName = d.orgName WHERE d.minId != o.id |
标记为正确的解决方案对我不起作用,但我发现这个答案非常有用:获取MySql中重复行的列表
1 2 3 4 5 | SELECT n1.* FROM myTable n1 INNER JOIN myTable n2 ON n2.repeatedCol = n1.repeatedCol WHERE n1.id <> n2.id |
你可以尝试这个,它最适合你
1 2 3 4 5 6 | WITH CTE AS ( SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations ) SELECT * FROM CTE WHERE RN>1 GO |
1 | SELECT * FROM [Employees] |
1 2 3 4 5 6 | WITH mycte AS ( SELECT Name,EmailId,ROW_NUMBER() OVER(partition BY Name,EmailId ORDER BY id) AS Duplicate FROM [Employees] ) SELECT * FROM mycte |
1 | SELECT Name,EmailId,COUNT(name) AS Duplicate FROM [Employees] GROUP BY Name,EmailId |
如果要删除重复项:
1 2 3 4 5 6 | WITH CTE AS( SELECT orgName,id, RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id) FROM organizations ) DELETE FROM CTE WHERE RN > 1 |
1 2 3 | SELECT * FROM (SELECT orgName,id, ROW_NUMBER() OVER(Partition BY OrgName ORDER BY id DESC) Rownum FROM organizations )tbl WHERE Rownum>1 |
因此,rowum> 1的记录将是表中的重复记录。 '由第一组按记录分区,然后通过给它们序列号序列化它们。
所以rownum> 1将是可以删除的重复记录。
1 2 3 4 5 6 7 8 9 | SELECT a.orgName,b.duplicate, a.id FROM organizations a INNER JOIN ( SELECT orgName, COUNT(*) AS duplicate FROM organizations GROUP BY orgName HAVING COUNT(*) > 1 ) b ON o.orgName = oc.orgName GROUP BY a.orgName,a.id |
1 2 3 4 | SELECT column_name, COUNT(column_name) FROM TABLE_NAME GROUP BY column_name HAVING COUNT (column_name) > 1; |
Src:https://stackoverflow.com/a/59242/1465252
1 2 3 4 5 6 7 8 9 | SELECT orgname, COUNT(*) AS dupes, id FROM organizations WHERE orgname IN ( SELECT orgname FROM organizations GROUP BY orgname HAVING (COUNT(*) > 1) ) GROUP BY orgname, id |
您可以通过多种方式选择
对于我的解决方案,首先考虑这个表格
1 2 3 4 5 6 7 8 9 10 11 12 13 | CREATE TABLE #Employee ( ID INT, FIRST_NAME NVARCHAR(100), LAST_NAME NVARCHAR(300) ) INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' ); INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' ); INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' ); INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' ); INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' ); INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' ); |
第一解决方案
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | SELECT DISTINCT * FROM #Employee; WITH #DeleteEmployee AS ( SELECT ROW_NUMBER() OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS RNUM FROM #Employee ) SELECT * FROM #DeleteEmployee WHERE RNUM > 1 SELECT DISTINCT * FROM #Employee |
Secound解决方案:使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | SELECT DISTINCT * FROM #Employee; ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1) SELECT * FROM #Employee WHERE UNIQ_ID < ( SELECT MAX(UNIQ_ID) FROM #Employee a2 WHERE #Employee.ID = a2.ID AND #Employee.FIRST_NAME = a2.FIRST_NAME AND #Employee.LAST_NAME = a2.LAST_NAME ) ALTER TABLE #Employee DROP COLUMN UNIQ_ID SELECT DISTINCT * FROM #Employee |
并且所有解决方案的结尾都使用此命令
1 | DROP TABLE #Employee |
我想我知道你需要什么
我需要在答案之间混合,我想我得到了他想要的解决方案:
1 2 3 4 5 6 7 8 | SELECT o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName FROM organizations o INNER JOIN ( SELECT MAX(id) AS id, orgName, COUNT(*) AS dupeCount FROM organizations GROUP BY orgName HAVING COUNT(*) > 1 ) oc ON o.orgName = oc.orgName |
拥有最大ID会给你一个dublicate的id和原始的id,这是他要求的:
1 2 | id org name , dublicate COUNT (missing OUT IN this CASE) id doublicate org name , doub COUNT (missing OUT again because does NOT help IN this CASE) |
只有悲伤的事情,你把它以这种形式推出
1 | id , name , dubid , name |
希望它仍然有帮助
1 2 3 4 | /*To get duplicate data in table */ SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE STATUS=1 GROUP BY EmpCode HAVING COUNT(EmpCode) > 1 |
假设我们有表格'Student'表有2列:
-
student_id int -
student_name varchar 1
2
3
4
5
6
7
8
9
10
11Records:
+------------+---------------------+
| student_id | student_name |
+------------+---------------------+
| 101 | usman |
| 101 | usman |
| 101 | usman |
| 102 | usmanyaqoob |
| 103 | muhammadusmanyaqoob |
| 103 | muhammadusmanyaqoob |
+------------+---------------------+
现在我们想看到重复的记录
使用此查询:
1 | SELECT student_name,student_id ,COUNT(*) c FROM student GROUP BY student_id,student_name HAVING c>1; |
1 2 3 4 5 6 | +---------------------+------------+---+ | student_name | student_id | c | +---------------------+------------+---+ | usman | 101 | 3 | | muhammadusmanyaqoob | 103 | 2 | +---------------------+------------+---+ |
我有一个更好的选择来获取表中的重复记录
1 2 3 4 5 6 7 8 9 10 11 | SELECT x.studid, y.stdname, y.dupecount FROM student AS x INNER JOIN (SELECT a.stdname, COUNT(*) AS dupecount FROM student AS a INNER JOIN studmisc AS b ON a.studid = b.studid WHERE (a.studid LIKE '2018%') AND (b.studstatus = 4) GROUP BY a.stdname HAVING (COUNT(*) > 1)) AS y ON x.stdname = y.stdname INNER JOIN studmisc AS z ON x.studid = z.studid WHERE (x.studid LIKE '2018%') AND (z.studstatus = 4) ORDER BY x.stdname |
上述查询的结果显示具有唯一学生ID和重复出现次数的所有重复名称
单击此处查看sql的结果
尝试
1 2 3 4 | SELECT orgName, id, COUNT(*) AS dupes FROM organizations GROUP BY orgName, id HAVING COUNT(*) > 1; |