关于c#:LINQ – Full Outer Join

LINQ - Full Outer Join

我有一张身份证和他们的名字的清单,还有一张身份证和他们的姓的清单。有些人没有名字,有些人没有姓;我想在两个列表上进行完整的外部联接。

所以下面列出了:

1
2
3
4
5
6
7
8
9
ID  FirstName
--  ---------
 1  John
 2  Sue

ID  LastName
--  --------
 1  Doe
 3  Smith

应生产:

1
2
3
4
5
ID  FirstName  LastName
--  ---------  --------
 1  John       Doe
 2  Sue
 3             Smith

我是Linq的新手(如果我有点跛脚请原谅),我发现了很多关于"Linq外部连接"的解决方案,它们看起来非常相似,但实际上似乎是左外部连接。

到目前为止,我的尝试是这样的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
private void OuterJoinTest()
{
    List<FirstName> firstNames = new List<FirstName>();
    firstNames.Add(new FirstName { ID = 1, Name ="John" });
    firstNames.Add(new FirstName { ID = 2, Name ="Sue" });

    List<LastName> lastNames = new List<LastName>();
    lastNames.Add(new LastName { ID = 1, Name ="Doe" });
    lastNames.Add(new LastName { ID = 3, Name ="Smith" });

    var outerJoin = from first in firstNames
        join last in lastNames
        on first.ID equals last.ID
        into temp
        from last in temp.DefaultIfEmpty()
        select new
        {
            id = first != null ? first.ID : last.ID,
            firstname = first != null ? first.Name : string.Empty,
            surname = last != null ? last.Name : string.Empty
        };
    }
}

public class FirstName
{
    public int ID;

    public string Name;
}

public class LastName
{
    public int ID;

    public string Name;
}

但这又回来了:

1
2
3
4
ID  FirstName  LastName
--  ---------  --------
 1  John       Doe
 2  Sue

我做错什么了?


更新1:提供真正通用的扩展方法FullOuterJoin
更新2:可选接受密钥类型的自定义IEqualityComparer
更新3:这个实现最近已经成为MoreLinq的一部分-谢谢大家!

编辑添加了FullOuterGroupJoin(ideone)。我重用了GetOuter<>实现,使得它的性能比它可能的低了一小部分,但我现在的目标是"高级"代码,而不是优化的出血边缘。

在http://ideone.com/o36nwc上观看

1
2
3
4
5
6
7
8
9
10
11
12
static void Main(string[] args)
{
    var ax = new[] {
        new { id = 1, name ="John" },
        new { id = 2, name ="Sue" } };
    var bx = new[] {
        new { id = 1, surname ="Doe" },
        new { id = 3, surname ="Smith" } };

    ax.FullOuterJoin(bx, a => a.id, b => b.id, (a, b, id) => new {a, b})
        .ToList().ForEach(Console.WriteLine);
}

打印输出:

1
2
3
{ a = { id = 1, name = John }, b = { id = 1, surname = Doe } }
{ a = { id = 2, name = Sue }, b =  }
{ a = , b = { id = 3, surname = Smith } }

您还可以提供默认值:http://ideone.com/kg4kqo

1
2
3
4
5
6
    ax.FullOuterJoin(
            bx, a => a.id, b => b.id,
            (a, b, id) => new { a.name, b.surname },
            new { id = -1, name    ="(no firstname)" },
            new { id = -2, surname ="(no surname)" }
        )

印刷:

1
2
3
{ name = John, surname = Doe }
{ name = Sue, surname = (no surname) }
{ name = (no firstname), surname = Smith }

所用术语的解释:

联接是从关系数据库设计中借用的术语:

  • join将重复来自a的元素,重复次数与b中具有相应键的元素相同(即,如果b为空,则不重复)。数据库行话称之为inner (equi)join
  • 外部连接包括来自a的元素,没有对应的元素。元素存在于b中。(即:如果b为空,即使结果也是如此)。这通常被称为left join
  • 完整的外部联接包括来自a的记录,如果另一个元素中没有对应的元素,则包括来自b的记录。(即,如果a为空,即使结果也是如此)

RDBMS中不常见的是组联接[1]:

  • 组联接与上述操作相同,但对于多个对应的b来说,它不是重复来自a的元素,而是用对应的键对记录进行分组。当您希望基于公共密钥枚举"joined"记录时,这通常更方便。

另请参见GroupJoin,它也包含一些一般的背景解释。

[1](我相信Oracle和MSSQL对此有专有扩展)

全码

此的通用"Drop-in"扩展类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
internal static class MyExtensions
{
    internal static IEnumerable<TResult> FullOuterGroupJoin<TA, TB, TKey, TResult>(
        this IEnumerable<TA> a,
        IEnumerable<TB> b,
        Func<TA, TKey> selectKeyA,
        Func<TB, TKey> selectKeyB,
        Func<IEnumerable<TA>, IEnumerable<TB>, TKey, TResult> projection,
        IEqualityComparer<TKey> cmp = null)
    {
        cmp = cmp?? EqualityComparer<TKey>.Default;
        var alookup = a.ToLookup(selectKeyA, cmp);
        var blookup = b.ToLookup(selectKeyB, cmp);

        var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
        keys.UnionWith(blookup.Select(p => p.Key));

        var join = from key in keys
                   let xa = alookup[key]
                   let xb = blookup[key]
                   select projection(xa, xb, key);

        return join;
    }

    internal static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
        this IEnumerable<TA> a,
        IEnumerable<TB> b,
        Func<TA, TKey> selectKeyA,
        Func<TB, TKey> selectKeyB,
        Func<TA, TB, TKey, TResult> projection,
        TA defaultA = default(TA),
        TB defaultB = default(TB),
        IEqualityComparer<TKey> cmp = null)
    {
        cmp = cmp?? EqualityComparer<TKey>.Default;
        var alookup = a.ToLookup(selectKeyA, cmp);
        var blookup = b.ToLookup(selectKeyB, cmp);

        var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
        keys.UnionWith(blookup.Select(p => p.Key));

        var join = from key in keys
                   from xa in alookup[key].DefaultIfEmpty(defaultA)
                   from xb in blookup[key].DefaultIfEmpty(defaultB)
                   select projection(xa, xb, key);

        return join;
    }
}


我不知道这是否涵盖了所有的情况,从逻辑上看是正确的。其思想是采取左外部联接和右外部联接,然后采取结果的联合。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
var firstNames = new[]
{
    new { ID = 1, Name ="John" },
    new { ID = 2, Name ="Sue" },
};
var lastNames = new[]
{
    new { ID = 1, Name ="Doe" },
    new { ID = 3, Name ="Smith" },
};
var leftOuterJoin =
    from first in firstNames
    join last in lastNames on first.ID equals last.ID into temp
    from last in temp.DefaultIfEmpty()
    select new
    {
        first.ID,
        FirstName = first.Name,
        LastName = last?.Name,
    };
var rightOuterJoin =
    from last in lastNames
    join first in firstNames on last.ID equals first.ID into temp
    from first in temp.DefaultIfEmpty()
    select new
    {
        last.ID,
        FirstName = first?.Name,
        LastName = last.Name,
    };
var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);

这是因为它是在linq to objects中编写的。如果linq to sql或其他,查询处理器可能不支持安全导航或其他操作。您必须使用条件运算符有条件地获取值。

即。,

1
2
3
4
5
6
7
8
9
10
var leftOuterJoin =
    from first in firstNames
    join last in lastNames on first.ID equals last.ID into temp
    from last in temp.DefaultIfEmpty()
    select new
    {
        first.ID,
        FirstName = first.Name,
        LastName = last != null ? last.Name : default,
    };


我认为其中大多数都存在问题,包括接受的答案,因为它们在IQueryable上不能很好地与Linq配合,要么是因为执行了太多的服务器往返和太多的数据返回,要么是因为执行了太多的客户机。

对于IEnumerable,我不喜欢SEHE的答案或类似的答案,因为它有过多的内存使用(在32GB的机器上,一个简单的10000000双列表测试将linqpad内存耗尽)。

另外,大多数其他连接实际上没有实现正确的完全外部连接,因为它们使用的是右连接的联合而不是右反半连接的concat,这不仅从结果中消除了重复的内部连接行,而且还消除了最初存在于左或右数据中的任何正确重复。

因此,下面是我的扩展,它们处理所有这些问题,生成SQL,与直接在LINQ中实现联接一样好,在服务器上执行,并且比其他可枚举项更快、内存更少:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
public static class Ext {
    public static IEnumerable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector) {

        return from left in leftItems
               join right in rightItems on leftKeySelector(left) equals rightKeySelector(right) into temp
               from right in temp.DefaultIfEmpty()
               select resultSelector(left, right);
    }

    public static IEnumerable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector) {

        return from right in rightItems
               join left in leftItems on rightKeySelector(right) equals leftKeySelector(left) into temp
               from left in temp.DefaultIfEmpty()
               select resultSelector(left, right);
    }

    public static IEnumerable<TResult> FullOuterJoinDistinct<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector) {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Union(leftItems.RightOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }

    public static IEnumerable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector) where TLeft : class {

        var hashLK = new HashSet<TKey>(from l in leftItems select leftKeySelector(l));
        return rightItems.Where(r => !hashLK.Contains(rightKeySelector(r))).Select(r => resultSelector((TLeft)null,r));
    }

    public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector)  where TLeft : class {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }

    private static Expression<Func<TP, TC, TResult>> CastSMBody<TP, TC, TResult>(LambdaExpression ex, TP unusedP, TC unusedC, TResult unusedRes) => (Expression<Func<TP, TC, TResult>>)ex;

    public static IQueryable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {

        var sampleAnonLR = new { left = (TLeft)null, rightg = (IEnumerable<TRight>)null };
        var parmP = Expression.Parameter(sampleAnonLR.GetType(),"p");
        var parmC = Expression.Parameter(typeof(TRight),"c");
        var argLeft = Expression.PropertyOrField(parmP,"left");
        var newleftrs = CastSMBody(Expression.Lambda(Expression.Invoke(resultSelector, argLeft, parmC), parmP, parmC), sampleAnonLR, (TRight)null, (TResult)null);

        return leftItems.AsQueryable().GroupJoin(rightItems, leftKeySelector, rightKeySelector, (left, rightg) => new { left, rightg }).SelectMany(r => r.rightg.DefaultIfEmpty(), newleftrs);
    }

    public static IQueryable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {

        var sampleAnonLR = new { leftg = (IEnumerable<TLeft>)null, right = (TRight)null };
        var parmP = Expression.Parameter(sampleAnonLR.GetType(),"p");
        var parmC = Expression.Parameter(typeof(TLeft),"c");
        var argRight = Expression.PropertyOrField(parmP,"right");
        var newrightrs = CastSMBody(Expression.Lambda(Expression.Invoke(resultSelector, parmC, argRight), parmP, parmC), sampleAnonLR, (TLeft)null, (TResult)null);

        return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).SelectMany(l => l.leftg.DefaultIfEmpty(), newrightrs);
    }

    public static IQueryable<TResult> FullOuterJoinDistinct<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Union(leftItems.RightOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }

    private static Expression<Func<TP, TResult>> CastSBody<TP, TResult>(LambdaExpression ex, TP unusedP, TResult unusedRes) => (Expression<Func<TP, TResult>>)ex;

    public static IQueryable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {

        var sampleAnonLgR = new { leftg = (IEnumerable<TLeft>)null, right = (TRight)null };
        var parmLgR = Expression.Parameter(sampleAnonLgR.GetType(),"lgr");
        var argLeft = Expression.Constant(null, typeof(TLeft));
        var argRight = Expression.PropertyOrField(parmLgR,"right");
        var newrightrs = CastSBody(Expression.Lambda(Expression.Invoke(resultSelector, argLeft, argRight), parmLgR), sampleAnonLgR, (TResult)null);

        return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).Where(lgr => !lgr.leftg.Any()).Select(newrightrs);
    }

    public static IQueryable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }
}

正确的反半连接之间的区别主要是在linq-to对象或源中没有,但在服务器(SQL)端在最终答案中会有所不同,删除了不必要的JOIN

用linqkit可以改进处理将Expression>合并为lambda的Expression的手工编码,但如果语言/编译器为此增加了一些帮助,那就更好了。FullOuterJoinDistinctRightOuterJoin功能是完整的,但我还没有重新实现FullOuterGroupJoin

我为IEnumerable编写了另一个版本的完整外部联接,用于可订购密钥的情况,这比将左外部联接与右反半联接结合起来快50%,至少在小集合上。它只经过一次排序,就完成了每个集合。


下面是一个扩展方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public static IEnumerable<KeyValuePair<TLeft, TRight>> FullOuterJoin<TLeft, TRight>(this IEnumerable<TLeft> leftItems, Func<TLeft, object> leftIdSelector, IEnumerable<TRight> rightItems, Func<TRight, object> rightIdSelector)
{
    var leftOuterJoin = from left in leftItems
        join right in rightItems on leftIdSelector(left) equals rightIdSelector(right) into temp
        from right in temp.DefaultIfEmpty()
        select new { left, right };

    var rightOuterJoin = from right in rightItems
        join left in leftItems on rightIdSelector(right) equals leftIdSelector(left) into temp
        from left in temp.DefaultIfEmpty()
        select new { left, right };

    var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);

    return fullOuterJoin.Select(x => new KeyValuePair<TLeft, TRight>(x.left, x.right));
}


正如您所发现的,Linq没有"外部联接"构造。您能得到的最接近的是使用所述查询的左外部联接。为此,可以添加未在联接中表示的姓氏列表中的任何元素:

1
2
3
4
5
6
outerJoin = outerJoin.Concat(lastNames.Select(l=>new
                            {
                                id = l.ID,
                                firstname = String.Empty,
                                surname = l.Name
                            }).Where(l=>!outerJoin.Any(o=>o.id == l.id)));

我猜@sehe的方法更强大,但在我更好地理解它之前,我发现自己已经从@michaelsander的扩展中跳了出来。我修改了它以匹配这里描述的内置Enumerable.Join()方法的语法和返回类型。我在@jeffmercado's solution下为@cadrell0的评论添加了"distinct"后缀。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public static class MyExtensions {

    public static IEnumerable<TResult> FullJoinDistinct<TLeft, TRight, TKey, TResult> (
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector
    ) {

        var leftJoin =
            from left in leftItems
            join right in rightItems
              on leftKeySelector(left) equals rightKeySelector(right) into temp
            from right in temp.DefaultIfEmpty()
            select resultSelector(left, right);

        var rightJoin =
            from right in rightItems
            join left in leftItems
              on rightKeySelector(right) equals leftKeySelector(left) into temp
            from left in temp.DefaultIfEmpty()
            select resultSelector(left, right);

        return leftJoin.Union(rightJoin);
    }

}

在这个例子中,您可以这样使用它:

1
2
3
4
5
6
7
8
9
10
11
12
var test =
    firstNames
    .FullJoinDistinct(
        lastNames,
        f=> f.ID,
        j=> j.ID,
        (f,j)=> new {
            ID = f == null ? j.ID : f.ID,
            leftName = f == null ? null : f.Name,
            rightName = j == null ? null : j.Name
        }
    );

将来,随着我了解的更多,我有一种感觉,考虑到@sehe的受欢迎程度,我会迁移到它的逻辑。但即便如此,我还是要小心,因为我觉得至少有一个重载与现有".join()"方法的语法相匹配是很重要的,如果可行,有两个原因:

  • 方法的一致性有助于节省时间、避免错误和避免意外的行为。
  • 如果将来有一个现成的".fulljoin()"方法,我想如果可以的话,它会尽量保持现有".join()"方法的语法。如果是这样,那么如果您想迁移到它,您可以简单地重命名您的函数,而不必更改参数或担心不同的返回类型破坏您的代码。
  • 我对泛型、扩展、func语句和其他特性还是很新的,所以欢迎提供反馈。

    编辑:没过多久我就意识到我的代码有问题。我在linqpad中执行.dump()并查看返回类型。它是不可数的,所以我试图匹配它。但是,当我对扩展名执行.where()或.select()时,出现了一个错误:"'system collections.ienumerable'不包含'select'和……"的定义。因此,最终我能够匹配.join()的输入语法,但不能匹配返回行为。

    编辑:为函数的返回类型添加了"tresult"。在阅读微软的文章时错过了这一点,当然这是有道理的。有了这个解决方案,现在的返回行为似乎完全符合我的目标。


    我的解决方案是,在两个可枚举项中键都是唯一的:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
     private static IEnumerable<TResult> FullOuterJoin<Ta, Tb, TKey, TResult>(
                IEnumerable<Ta> a, IEnumerable<Tb> b,
                Func<Ta, TKey> key_a, Func<Tb, TKey> key_b,
                Func<Ta, Tb, TResult> selector)
            {
                var alookup = a.ToLookup(key_a);
                var blookup = b.ToLookup(key_b);
                var keys = new HashSet<TKey>(alookup.Select(p => p.Key));
                keys.UnionWith(blookup.Select(p => p.Key));
                return keys.Select(key => selector(alookup[key].FirstOrDefault(), blookup[key].FirstOrDefault()));
            }

    所以

    1
    2
    3
    4
    5
    6
    7
    8
        var ax = new[] {
            new { id = 1, first_name ="ali" },
            new { id = 2, first_name ="mohammad" } };
        var bx = new[] {
            new { id = 1, last_name ="rezaei" },
            new { id = 3, last_name ="kazemi" } };

        var list = FullOuterJoin(ax, bx, a => a.id, b => b.id, (a, b) =>"f:" + a?.first_name +" l:" + b?.last_name).ToArray();

    输出:

    1
    2
    3
    f: ali l: rezaei
    f: mohammad l:
    f:  l: kazemi

    我决定将此作为一个单独的答案添加,因为我不确定它是否经过了足够的测试。这是对FullOuterJoin方法的重新实现,本质上使用了LINQKitInvoke/Expand的简化、定制版本,用于Expression,以便它能够工作于实体框架。没有太多的解释,因为这和我之前的答案差不多。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    public static class Ext {
        private static Expression<Func<TP, TC, TResult>> CastSMBody<TP, TC, TResult>(LambdaExpression ex, TP unusedP, TC unusedC, TResult unusedRes) => (Expression<Func<TP, TC, TResult>>)ex;

        public static IQueryable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
            this IQueryable<TLeft> leftItems,
            IQueryable<TRight> rightItems,
            Expression<Func<TLeft, TKey>> leftKeySelector,
            Expression<Func<TRight, TKey>> rightKeySelector,
            Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {

            // (lrg,r) => resultSelector(lrg.left, r)
            var sampleAnonLR = new { left = (TLeft)null, rightg = (IEnumerable<TRight>)null };
            var parmP = Expression.Parameter(sampleAnonLR.GetType(),"lrg");
            var parmC = Expression.Parameter(typeof(TRight),"r");
            var argLeft = Expression.PropertyOrField(parmP,"left");
            var newleftrs = CastSMBody(Expression.Lambda(resultSelector.Apply(argLeft, parmC), parmP, parmC), sampleAnonLR, (TRight)null, (TResult)null);

            return leftItems.GroupJoin(rightItems, leftKeySelector, rightKeySelector, (left, rightg) => new { left, rightg }).SelectMany(r => r.rightg.DefaultIfEmpty(), newleftrs);
        }

        public static IQueryable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
            this IQueryable<TLeft> leftItems,
            IQueryable<TRight> rightItems,
            Expression<Func<TLeft, TKey>> leftKeySelector,
            Expression<Func<TRight, TKey>> rightKeySelector,
            Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {

            // (lgr,l) => resultSelector(l, lgr.right)
            var sampleAnonLR = new { leftg = (IEnumerable<TLeft>)null, right = (TRight)null };
            var parmP = Expression.Parameter(sampleAnonLR.GetType(),"lgr");
            var parmC = Expression.Parameter(typeof(TLeft),"l");
            var argRight = Expression.PropertyOrField(parmP,"right");
            var newrightrs = CastSMBody(Expression.Lambda(resultSelector.Apply(parmC, argRight), parmP, parmC), sampleAnonLR, (TLeft)null, (TResult)null);

            return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right })
                             .SelectMany(l => l.leftg.DefaultIfEmpty(), newrightrs);
        }

        private static Expression<Func<TParm, TResult>> CastSBody<TParm, TResult>(LambdaExpression ex, TParm unusedP, TResult unusedRes) => (Expression<Func<TParm, TResult>>)ex;

        public static IQueryable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
            this IQueryable<TLeft> leftItems,
            IQueryable<TRight> rightItems,
            Expression<Func<TLeft, TKey>> leftKeySelector,
            Expression<Func<TRight, TKey>> rightKeySelector,
            Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {

            // newrightrs = lgr => resultSelector((TLeft)null, lgr.right)
            var sampleAnonLgR = new { leftg = (IEnumerable<TLeft>)null, right = (TRight)null };
            var parmLgR = Expression.Parameter(sampleAnonLgR.GetType(),"lgr");
            var argLeft = Expression.Constant(null, typeof(TLeft));
            var argRight = Expression.PropertyOrField(parmLgR,"right");
            var newrightrs = CastSBody(Expression.Lambda(resultSelector.Apply(argLeft, argRight), parmLgR), sampleAnonLgR, (TResult)null);

            return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).Where(lgr => !lgr.leftg.Any()).Select(newrightrs);
        }

        public static IQueryable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
            this IQueryable<TLeft> leftItems,
            IQueryable<TRight> rightItems,
            Expression<Func<TLeft, TKey>> leftKeySelector,
            Expression<Func<TRight, TKey>> rightKeySelector,
            Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {

            return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
        }

        public static Expression Apply(this LambdaExpression e, params Expression[] args) {
            var b = e.Body;

            foreach (var pa in e.Parameters.Cast<ParameterExpression>().Zip(args, (p, a) => (p, a))) {
                b = b.Replace(pa.p, pa.a);
            }

            return b.PropagateNull();
        }

        public static Expression Replace(this Expression orig, Expression from, Expression to) => new ReplaceVisitor(from, to).Visit(orig);
        public class ReplaceVisitor : System.Linq.Expressions.ExpressionVisitor {
            public readonly Expression from;
            public readonly Expression to;

            public ReplaceVisitor(Expression _from, Expression _to) {
                from = _from;
                to = _to;
            }

            public override Expression Visit(Expression node) => node == from ? to : base.Visit(node);
        }

        public static Expression PropagateNull(this Expression orig) => new NullVisitor().Visit(orig);
        public class NullVisitor : System.Linq.Expressions.ExpressionVisitor {
            public override Expression Visit(Expression node) {
                if (node is MemberExpression nme && nme.Expression is ConstantExpression nce && nce.Value == null)
                    return Expression.Constant(null, nce.Type.GetMember(nme.Member.Name).Single().GetMemberType());
                else
                    return base.Visit(node);
            }
        }

        public static Type GetMemberType(this MemberInfo member) {
            switch (member) {
                case FieldInfo mfi:
                    return mfi.FieldType;
                case PropertyInfo mpi:
                    return mpi.PropertyType;
                case EventInfo mei:
                    return mei.EventHandlerType;
                default:
                    throw new ArgumentException("MemberInfo must be if type FieldInfo, PropertyInfo or EventInfo", nameof(member));
            }
        }
    }

    对两个输入执行内存流枚举,并为每行调用选择器。如果在当前迭代中没有相关性,则其中一个选择器参数将为空。

    例子:

    1
    2
    3
    4
    5
       var result = left.FullOuterJoin(
             right,
             x=>left.Key,
             x=>right.Key,
             (l,r) => new { LeftKey = l?.Key, RightKey=r?.Key });
    • 需要相关类型的IComparer,使用比较器。如果未提供,则使用默认值。

    • 要求将"orderby"应用于输入可枚举项

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      /// <summary>
      /// Performs a full outer join on two <see cref="IEnumerable{T}" />.
      /// </summary>
      /// <typeparam name="TLeft"></typeparam>
      /// <typeparam name="TValue"></typeparam>
      /// <typeparam name="TRight"></typeparam>
      /// <typeparam name="TResult"></typeparam>
      /// <param name="left"></param>
      /// <param name="right"></param>
      /// <param name="leftKeySelector"></param>
      /// <param name="rightKeySelector"></param>
      /// <param name="selector">Expression defining result type</param>
      /// <param name="keyComparer">A comparer if there is no default for the type</param>
      /// <returns></returns>
      [System.Diagnostics.DebuggerStepThrough]
      public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TValue, TResult>(
          this IEnumerable<TLeft> left,
          IEnumerable<TRight> right,
          Func<TLeft, TValue> leftKeySelector,
          Func<TRight, TValue> rightKeySelector,
          Func<TLeft, TRight, TResult> selector,
          IComparer<TValue> keyComparer = null)
          where TLeft: class
          where TRight: class
          where TValue : IComparable
      {

          keyComparer = keyComparer ?? Comparer<TValue>.Default;

          using (var enumLeft = left.OrderBy(leftKeySelector).GetEnumerator())
          using (var enumRight = right.OrderBy(rightKeySelector).GetEnumerator())
          {

              var hasLeft = enumLeft.MoveNext();
              var hasRight = enumRight.MoveNext();
              while (hasLeft || hasRight)
              {

                  var currentLeft = enumLeft.Current;
                  var valueLeft = hasLeft ? leftKeySelector(currentLeft) : default(TValue);

                  var currentRight = enumRight.Current;
                  var valueRight = hasRight ? rightKeySelector(currentRight) : default(TValue);

                  int compare =
                      !hasLeft ? 1
                      : !hasRight ? -1
                      : keyComparer.Compare(valueLeft, valueRight);

                  switch (compare)
                  {
                      case 0:
                          // The selector matches. An inner join is achieved
                          yield return selector(currentLeft, currentRight);
                          hasLeft = enumLeft.MoveNext();
                          hasRight = enumRight.MoveNext();
                          break;
                      case -1:
                          yield return selector(currentLeft, default(TRight));
                          hasLeft = enumLeft.MoveNext();
                          break;
                      case 1:
                          yield return selector(default(TLeft), currentRight);
                          hasRight = enumRight.MoveNext();
                          break;
                  }
              }

          }

      }


    我喜欢sehe的答案,但它不使用延迟执行(输入序列被tolookup调用热切地枚举)。因此,在查看了linq to对象的.NET源之后,我想到了:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    public static class LinqExtensions
    {
        public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
            this IEnumerable<TLeft> left,
            IEnumerable<TRight> right,
            Func<TLeft, TKey> leftKeySelector,
            Func<TRight, TKey> rightKeySelector,
            Func<TLeft, TRight, TKey, TResult> resultSelector,
            IEqualityComparer<TKey> comparator = null,
            TLeft defaultLeft = default(TLeft),
            TRight defaultRight = default(TRight))
        {
            if (left == null) throw new ArgumentNullException("left");
            if (right == null) throw new ArgumentNullException("right");
            if (leftKeySelector == null) throw new ArgumentNullException("leftKeySelector");
            if (rightKeySelector == null) throw new ArgumentNullException("rightKeySelector");
            if (resultSelector == null) throw new ArgumentNullException("resultSelector");

            comparator = comparator ?? EqualityComparer<TKey>.Default;
            return FullOuterJoinIterator(left, right, leftKeySelector, rightKeySelector, resultSelector, comparator, defaultLeft, defaultRight);
        }

        internal static IEnumerable<TResult> FullOuterJoinIterator<TLeft, TRight, TKey, TResult>(
            this IEnumerable<TLeft> left,
            IEnumerable<TRight> right,
            Func<TLeft, TKey> leftKeySelector,
            Func<TRight, TKey> rightKeySelector,
            Func<TLeft, TRight, TKey, TResult> resultSelector,
            IEqualityComparer<TKey> comparator,
            TLeft defaultLeft,
            TRight defaultRight)
        {
            var leftLookup = left.ToLookup(leftKeySelector, comparator);
            var rightLookup = right.ToLookup(rightKeySelector, comparator);
            var keys = leftLookup.Select(g => g.Key).Union(rightLookup.Select(g => g.Key), comparator);

            foreach (var key in keys)
                foreach (var leftValue in leftLookup[key].DefaultIfEmpty(defaultLeft))
                    foreach (var rightValue in rightLookup[key].DefaultIfEmpty(defaultRight))
                        yield return resultSelector(leftValue, rightValue, key);
        }
    }

    此实现具有以下重要属性:

    • 延迟执行,在枚举输出序列之前不会枚举输入序列。
    • 每个输入序列仅枚举一次。
    • 保留输入序列的顺序,从这个意义上说,它将按照左序列的顺序生成元组,然后是右序列(对于左序列中不存在的键)。

    这些属性很重要,因为它们是对fulloterjoin不熟悉但对linq有经验的人所期望的。


    两个或多个表的完全外部联接:首先提取要联接的列。

    1
    2
    3
    4
    5
    var DatesA = from A in db.T1 select A.Date;
    var DatesB = from B in db.T2 select B.Date;
    var DatesC = from C in db.T3 select C.Date;            

    var Dates = DatesA.Union(DatesB).Union(DatesC);

    然后在提取的列和主表之间使用左外部联接。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    var Full_Outer_Join =

    (from A in Dates
    join B in db.T1
    on A equals B.Date into AB

    from ab in AB.DefaultIfEmpty()
    join C in db.T2
    on A equals C.Date into ABC

    from abc in ABC.DefaultIfEmpty()
    join D in db.T3
    on A equals D.Date into ABCD

    from abcd in ABCD.DefaultIfEmpty()
    select new { A, ab, abc, abcd })
    .AsEnumerable();


    我大约6年前为一个应用程序编写了这个扩展类,从那时起,我就在许多没有问题的解决方案中使用它。希望它有帮助。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    public static class JoinExtensions
    {
        public static IEnumerable<TResult> FullOuterJoin<TOuter, TInner, TKey, TResult>(
            this IEnumerable<TOuter> outer,
            IEnumerable<TInner> inner,
            Func<TOuter, TKey> outerKeySelector,
            Func<TInner, TKey> innerKeySelector,
            Func<TOuter, TInner, TResult> resultSelector)
            where TInner : class
            where TOuter : class
        {
            var innerLookup = inner.ToLookup(innerKeySelector);
            var outerLookup = outer.ToLookup(outerKeySelector);

            var innerJoinItems = inner
                .Where(innerItem => !outerLookup.Contains(innerKeySelector(innerItem)))
                .Select(innerItem => resultSelector(null, innerItem));

            return outer
                .SelectMany(outerItem =>
                {
                    var innerItems = innerLookup[outerKeySelector(outerItem)];

                    return innerItems.Any() ? innerItems : new TInner[] { null };
                }, resultSelector)
                .Concat(innerJoinItems);
        }


        public static IEnumerable<TResult> LeftJoin<TOuter, TInner, TKey, TResult>(
            this IEnumerable<TOuter> outer,
            IEnumerable<TInner> inner,
            Func<TOuter, TKey> outerKeySelector,
            Func<TInner, TKey> innerKeySelector,
            Func<TOuter, TInner, TResult> resultSelector)
        {
            return outer.GroupJoin(
                inner,
                outerKeySelector,
                innerKeySelector,
                (o, i) =>
                    new { o = o, i = i.DefaultIfEmpty() })
                    .SelectMany(m => m.i.Select(inn =>
                        resultSelector(m.o, inn)
                        ));

        }



        public static IEnumerable<TResult> RightJoin<TOuter, TInner, TKey, TResult>(
            this IEnumerable<TOuter> outer,
            IEnumerable<TInner> inner,
            Func<TOuter, TKey> outerKeySelector,
            Func<TInner, TKey> innerKeySelector,
            Func<TOuter, TInner, TResult> resultSelector)
        {
            return inner.GroupJoin(
                outer,
                innerKeySelector,
                outerKeySelector,
                (i, o) =>
                    new { i = i, o = o.DefaultIfEmpty() })
                    .SelectMany(m => m.o.Select(outt =>
                        resultSelector(outt, m.i)
                        ));

        }

    }


    我真的讨厌这些LINQ表达式,这就是SQL存在的原因:

    1
    2
    3
    select isnull(fn.id, ln.id) as id, fn.firstname, ln.lastname
       from firstnames fn
       full join lastnames ln on ln.id=fn.id

    在数据库中将其创建为SQL视图,并将其作为实体导入。

    当然,(明显)左连接和右连接的结合也会使它成为现实,但它是愚蠢的。