Which method performs better: .Any() vs .Count() > 0?
在
最近有人告诉我,如果要检查集合中是否包含1个或多个项,我应该使用
其次,一些集合的属性(不是扩展方法)是
YEA/NaE?
如果您从一个具有
对于只有
当然,如果您使用LINQ过滤它等(
一般用
注意:当实体框架4是实际的时候,我写了这个答案。这个答案的重点不是要进行简单的
虽然我同意大多数投票通过的答案和评论,尤其是在
下面是对
1 2 3 4 5 6 | con = db.Contacts. Where(a => a.CompanyId == companyId && a.ContactStatusId <= (int) Const.ContactStatusEnum.Reactivated && !a.NewsletterLogs.Any(b => b.NewsletterLogTypeId == (int) Const.NewsletterLogTypeEnum.Unsubscr) ).OrderBy(a => a.ContactId). Skip(position - 1). Take(1).FirstOrDefault(); |
1 2 3 4 5 6 | con = db.Contacts. Where(a => a.CompanyId == companyId && a.ContactStatusId <= (int) Const.ContactStatusEnum.Reactivated && a.NewsletterLogs.Count(b => b.NewsletterLogTypeId == (int) Const.NewsletterLogTypeEnum.Unsubscr) == 0 ).OrderBy(a => a.ContactId). Skip(position - 1). Take(1).FirstOrDefault(); |
我需要找到一种方法来看看这两个LINQ都能产生什么样的SQL——但很明显,在某些情况下,
编辑:这里是生成的SQL。如你所见的美丽;)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | exec sp_executesql N'SELECT TOP (1) [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created] FROM ( SELECT [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created], row_number() OVER (ORDER BY [Project2].[ContactId] ASC) AS [row_number] FROM ( SELECT [Extent1].[ContactId] AS [ContactId], [Extent1].[CompanyId] AS [CompanyId], [Extent1].[ContactName] AS [ContactName], [Extent1].[FullName] AS [FullName], [Extent1].[ContactStatusId] AS [ContactStatusId], [Extent1].[Created] AS [Created] FROM [dbo].[Contact] AS [Extent1] WHERE ([Extent1].[CompanyId] = @p__linq__0) AND ([Extent1].[ContactStatusId] <= 3) AND ( NOT EXISTS (SELECT 1 AS [C1] FROM [dbo].[NewsletterLog] AS [Extent2] WHERE ([Extent1].[ContactId] = [Extent2].[ContactId]) AND (6 = [Extent2].[NewsletterLogTypeId]) )) ) AS [Project2] ) AS [Project2] WHERE [Project2].[row_number] > 99 ORDER BY [Project2].[ContactId] ASC',N'@p__linq__0 int',@p__linq__0=4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | exec sp_executesql N'SELECT TOP (1) [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created] FROM ( SELECT [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created], row_number() OVER (ORDER BY [Project2].[ContactId] ASC) AS [row_number] FROM ( SELECT [Project1].[ContactId] AS [ContactId], [Project1].[CompanyId] AS [CompanyId], [Project1].[ContactName] AS [ContactName], [Project1].[FullName] AS [FullName], [Project1].[ContactStatusId] AS [ContactStatusId], [Project1].[Created] AS [Created] FROM ( SELECT [Extent1].[ContactId] AS [ContactId], [Extent1].[CompanyId] AS [CompanyId], [Extent1].[ContactName] AS [ContactName], [Extent1].[FullName] AS [FullName], [Extent1].[ContactStatusId] AS [ContactStatusId], [Extent1].[Created] AS [Created], (SELECT COUNT(1) AS [A1] FROM [dbo].[NewsletterLog] AS [Extent2] WHERE ([Extent1].[ContactId] = [Extent2].[ContactId]) AND (6 = [Extent2].[NewsletterLogTypeId])) AS [C1] FROM [dbo].[Contact] AS [Extent1] ) AS [Project1] WHERE ([Project1].[CompanyId] = @p__linq__0) AND ([Project1].[ContactStatusId] <= 3) AND (0 = [Project1].[C1]) ) AS [Project2] ) AS [Project2] WHERE [Project2].[row_number] > 99 ORDER BY [Project2].[ContactId] ASC',N'@p__linq__0 int',@p__linq__0=4 |
似乎纯where with exists比计算count和执行count==0的where效果差得多。
如果你们看到我的发现有错误,请告诉我。不管对vs-count的讨论如何,所有这些都可以去掉的是,当重写为存储过程时,任何更复杂的LINQ都会更好。
由于这是一个相当流行的话题,答案也不尽相同,我不得不重新审视这个问题。
测试:EF 6.1.3,SQL Server,30万条记录
表模型:
1 2 3 4 5 6 7 8 9 | class TestTable { [Key] public int Id { get; set; } public string Name { get; set; } public string Surname { get; set; } } |
测试代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | class Program { static void Main() { using (var context = new TestContext()) { context.Database.Log = Console.WriteLine; context.TestTables.Where(x => x.Surname.Contains("Surname")).Any(x => x.Id > 1000); context.TestTables.Where(x => x.Surname.Contains("Surname") && x.Name.Contains("Name")).Any(x => x.Id > 1000); context.TestTables.Where(x => x.Surname.Contains("Surname")).Count(x => x.Id > 1000); context.TestTables.Where(x => x.Surname.Contains("Surname") && x.Name.Contains("Name")).Count(x => x.Id > 1000); Console.ReadLine(); } } } |
结果:
任意()~3ms
count()第一次查询~230ms,第二次查询~400ms
评论:
在我的例子中,ef并没有像@ben在他的文章中提到的那样生成SQL。
编辑:在EF版本6.1.1中已修复。这个答案不再真实
对于SQL Server和EF4-6,count()的执行速度大约是任何()的两倍。
当你运行table.any()时,它会生成类似的东西(警告:不要伤害试图理解它的大脑)
1 2 3 4 5 6 7 8 9 | SELECT CASE WHEN ( EXISTS (SELECT 1 AS [C1] FROM [Table] AS [Extent1] )) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT 1 AS [C1] FROM [Table] AS [Extent2] )) THEN cast(0 as bit) END AS [C1] FROM ( SELECT 1 AS X ) AS [SingleRowTable1] |
这需要对符合您条件的行进行两次扫描。
我不喜欢写
1 2 3 4 5 6 7 | public static class QueryExtensions { public static bool Exists<TSource>(this IQueryable<TSource> source, Expression<Func<TSource, bool>> predicate) { return source.Count(predicate) > 0; } } |
这取决于数据集有多大,您的性能要求是什么?
如果没有什么大不了的,用最可读的形式,对我自己来说,这是任何,因为它是短的和可读的,而不是一个方程。
您可以做一个简单的测试来解决这个问题:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | var query = //make any query here var timeCount = new Stopwatch(); timeCount.Start(); if (query.Count > 0) { } timeCount.Stop(); var testCount = timeCount.Elapsed; var timeAny = new Stopwatch(); timeAny.Start(); if (query.Any()) { } timeAny.Stop(); var testAny = timeAny.Elapsed; |
检查testcount和testany的值。
型
关于count()方法,如果IEnumerable是ICollection,那么我们不能遍历所有项,因为我们可以检索ICollection的count字段,如果IEnumerable不是ICollection,我们必须使用moveNext在一段时间内遍历所有项,请查看.NET框架代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | public static int Count<TSource>(this IEnumerable<TSource> source) { if (source == null) throw Error.ArgumentNull("source"); ICollection<TSource> collectionoft = source as ICollection<TSource>; if (collectionoft != null) return collectionoft.Count; ICollection collection = source as ICollection; if (collection != null) return collection.Count; int count = 0; using (IEnumerator<TSource> e = source.GetEnumerator()) { checked { while (e.MoveNext()) count++; } } return count; } |
引用:引用源可枚举
型
如果您使用的是实体框架,并且有一个包含许多记录的大表,那么any()将更快。我记得有一次我想检查一个表是否是空的,它是否有数百万行。Count()>0需要20-30秒才能完成。任何()都是即时的。
any()可以是一种性能增强,因为它可能不需要迭代集合来获取事物的数量。只需要击中其中一个。或者,例如,对于linq to实体,生成的SQL将是if exists(…)而不是select count…或者甚至选择*……