How to Left Outer Join two DataTables in c#?
如何离开外部联接(我认为它是左外部联接,但我不是100%确定)两个数据表,其中包含以下表和条件,同时保留两个表中的所有列?
DTBLHOLTE:
1 2 3 4 5 6 7 8 | id col1 anotherColumn2 1 1 any2 2 1 any2 3 2 any2 4 3 any2 5 3 any2 6 3 any2 7 any2 |
dtblRight:
1 2 3 4 5 | col1 col2 anotherColumn1 1 Hi any1 2 Bye any1 3 Later any1 4 Never any1 |
DTBLANTE:
1 2 3 4 5 6 7 8 | id col1 col2 anotherColumn1 anotherColumn2 1 1 Hi any1 any2 2 1 Hi any1 any2 3 2 Bye any1 any2 4 3 Later any1 any2 5 3 Later any1 any2 6 3 Later any1 any2 7 any2 |
条件:
- 在DTBLLEFT中,col1不需要具有唯一的值。
- 在dtblright中,col1具有唯一的值。
- 如果DTBLLEFT在第1列中缺少一个外键,或者它有一个在DTBLRIGHT中不存在的外键,则将插入空字段或空字段。
- 加入第1列。
我可以使用常规的数据表操作、LINQ或其他。
我尝试过这个,但它删除了重复项:
1 2 3 4 5 |
编辑1:
这接近我想要的,但它只有一个表中的列(在该链接中找到):
1 2 3 | dtblJoined = (from t1 in dtblA.Rows.Cast<DataRow>() join t2 in dtblB.Rows.Cast<DataRow>() on t1["col1"] equals t2["col1"] select t1).CopyToDataTable(); |
编辑2:
这个链接的答案似乎对我有用,但我必须将其更改如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | DataTable targetTable = dtblA.Clone(); var dt2Columns = dtblB.Columns.OfType<DataColumn>().Select(dc => new DataColumn(dc.ColumnName, dc.DataType, dc.Expression, dc.ColumnMapping)); var dt2FinalColumns = from dc in dt2Columns.AsEnumerable() where targetTable.Columns.Contains(dc.ColumnName) == false select dc; targetTable.Columns.AddRange(dt2FinalColumns.ToArray()); var rowData = from row1 in dtblA.AsEnumerable() join row2 in dtblB.AsEnumerable() on row1["col1"] equals row2["col1"] select row1.ItemArray.Concat(row2.ItemArray.Where(r2 => row1.ItemArray.Contains(r2) == false)).ToArray(); foreach (object[] values in rowData) targetTable.Rows.Add(values); |
我也发现了这个链接,我可能会尝试一下,因为它看起来更简洁。
编辑3(11/18/2013):
更新表格以反映更多情况。
谢谢你的帮助。以下是我基于多种资源得出的结论:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | public static class DataTableHelper { public enum JoinType { /// <summary> /// Same as regular join. Inner join produces only the set of records that match in both Table A and Table B. /// </summary> Inner = 0, /// <summary> /// Same as Left Outer join. Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null. /// </summary> Left = 1 } /// <summary> /// Joins the passed in DataTables on the colToJoinOn. /// <para>Returns an appropriate DataTable with zero rows if the colToJoinOn does not exist in both tables.</para> /// </summary> /// <param name="dtblLeft"></param> /// <param name="dtblRight"></param> /// <param name="colToJoinOn"></param> /// <param name="joinType"></param> /// <returns></returns> /// <remarks> /// <para>http://stackoverflow.com/questions/2379747/create-combined-datatable-from-two-datatables-joined-with-linq-c-sharp?rq=1</para> /// <para>http://msdn.microsoft.com/en-us/library/vstudio/bb397895.aspx</para> /// <para>http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html</para> /// <para>http://stackoverflow.com/questions/406294/left-join-and-left-outer-join-in-sql-server</para> /// </remarks> public static DataTable JoinTwoDataTablesOnOneColumn(DataTable dtblLeft, DataTable dtblRight, string colToJoinOn, JoinType joinType) { //Change column name to a temp name so the LINQ for getting row data will work properly. string strTempColName = colToJoinOn +"_2"; if (dtblRight.Columns.Contains(colToJoinOn)) dtblRight.Columns[colToJoinOn].ColumnName = strTempColName; //Get columns from dtblLeft DataTable dtblResult = dtblLeft.Clone(); //Get columns from dtblRight var dt2Columns = dtblRight.Columns.OfType<DataColumn>().Select(dc => new DataColumn(dc.ColumnName, dc.DataType, dc.Expression, dc.ColumnMapping)); //Get columns from dtblRight that are not in dtblLeft var dt2FinalColumns = from dc in dt2Columns.AsEnumerable() where !dtblResult.Columns.Contains(dc.ColumnName) select dc; //Add the rest of the columns to dtblResult dtblResult.Columns.AddRange(dt2FinalColumns.ToArray()); //No reason to continue if the colToJoinOn does not exist in both DataTables. if (!dtblLeft.Columns.Contains(colToJoinOn) || (!dtblRight.Columns.Contains(colToJoinOn) && !dtblRight.Columns.Contains(strTempColName))) { if (!dtblResult.Columns.Contains(colToJoinOn)) dtblResult.Columns.Add(colToJoinOn); return dtblResult; } switch (joinType) { default: case JoinType.Inner: #region Inner //get row data //To use the DataTable.AsEnumerable() extension method you need to add a reference to the System.Data.DataSetExtension assembly in your project. var rowDataLeftInner = from rowLeft in dtblLeft.AsEnumerable() join rowRight in dtblRight.AsEnumerable() on rowLeft[colToJoinOn] equals rowRight[strTempColName] select rowLeft.ItemArray.Concat(rowRight.ItemArray).ToArray(); //Add row data to dtblResult foreach (object[] values in rowDataLeftInner) dtblResult.Rows.Add(values); #endregion break; case JoinType.Left: #region Left var rowDataLeftOuter = from rowLeft in dtblLeft.AsEnumerable() join rowRight in dtblRight.AsEnumerable() on rowLeft[colToJoinOn] equals rowRight[strTempColName] into gj from subRight in gj.DefaultIfEmpty() select rowLeft.ItemArray.Concat((subRight== null) ? (dtblRight.NewRow().ItemArray) :subRight.ItemArray).ToArray(); //Add row data to dtblResult foreach (object[] values in rowDataLeftOuter) dtblResult.Rows.Add(values); #endregion break; } //Change column name back to original dtblRight.Columns[strTempColName].ColumnName = colToJoinOn; //Remove extra column from result dtblResult.Columns.Remove(strTempColName); return dtblResult; } } |
。
编辑3:
这个方法现在工作正常,当表有2000多行时仍然很快。如有任何建议/建议/改进,我们将不胜感激。
编辑4:
我有一个场景让我意识到以前的版本实际上是在进行内部连接。函数已被修改以解决该问题。我用这个链接上的信息来解决这个问题。
这只是两个表之间的内部联接:
1 2 3 | var query = (from x in a.AsEnumerable() join y in b.AsEnumerable() on x.Field<int>("col1") equals y.Field<int>("col1") select new { col1= y.Field<int>("col1"), col2=x.Field<int>("col2") }).ToList(); |
号
生产:
1 2 3 4 5 6 7 | col1 col2 1 Hi 1 Hi 2 Bye 3 Later 3 Later 3 Later |
您可以使用LINQ并执行以下操作:
1 2 3 4 | var dtblJoined = from dB in dtblB.AsEnumerable() join dA in dtblA.AsEnumerable() on dA.col1 equals dB.col1 into dAB from d in dAB.DefaultIfEmpty() select new (col1 = dB.col1, ; col2 = (dB.col1 == dA.col1) ? dA.col2 : null); |
这将返回一个IEnumerable,因为结果不是一个数据表,但它将使您更接近于您正在查找的内容,我认为。不过可能需要稍作调整。