Pandas DataFrame.assign arguments
问题
如何使用
期望结果
1 2 3 4 5 6 7 | df = pd.DataFrame({'A': range(1, 5), 'B': range(11, 15)}) >>> df.assign({'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2}) A B C D 0 1 11 1 22 1 2 12 4 24 2 3 13 9 26 3 4 14 16 28 |
尝试
上面的示例导致:
背景
pandas中的
1 2 3 4 5 6 7 | df = df.assign(C=df.B * 2) >>> df A B C 0 1 11 22 1 2 12 24 2 3 13 26 3 4 14 28 |
此函数的0.19.2文档意味着可以向数据帧中添加多个列。
Assigning multiple columns within the same assign is possible, but you cannot reference other columns created within the same assign call.
此外:
Parameters:
kwargs : keyword, value pairskeywords are the column names.
函数的源代码声明它接受字典:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | def assign(self, **kwargs): """ .. versionadded:: 0.16.0 Parameters ---------- kwargs : keyword, value pairs keywords are the column names. If the values are callable, they are computed on the DataFrame and assigned to the new columns. If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned. Notes ----- Since ``kwargs`` is a dictionary, the order of your arguments may not be preserved. The make things predicatable, the columns are inserted in alphabetical order, at the end of your DataFrame. Assigning multiple columns within the same ``assign`` is possible, but you cannot reference other columns created within the same ``assign`` call. """ data = self.copy() # do all calculations first... results = {} for k, v in kwargs.items(): if callable(v): results[k] = v(data) else: results[k] = v # ... and then assign for k, v in sorted(results.items()): data[k] = v return data |
通过将每个新列作为关键字参数提供,可以创建多个列:
1 | df = df.assign(C=df['A']**2, D=df.B*2) |
我使用
1 | df = df.assign(**{'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2}) |
似乎EDOCX1[1]应该能够使用字典,但目前看来它不支持基于您发布的源代码。
结果输出:
1 2 3 4 5 | A B C D 0 1 11 1 22 1 2 12 4 24 2 3 13 9 26 3 4 14 16 28 |