Pandas DataFrame to list
本问题已经有最佳答案,请猛点这里访问。
我正在根据满足另一列中的条件从一列中提取数据的子集。
我可以得到正确的值,但它在pandas.core.frame.dataframe中。如何将其转换为列表?
1 2 3 4 5 6 7 | import pandas as pd tst = pd.read_csv('C:\\SomeCSV.csv') lookupValue = tst['SomeCol'] =="SomeValue" ID = tst[lookupValue][['SomeCol']] #How To convert ID to a list |
用
例如:
1 2 3 | import pandas as pd df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9], 'b':[3,5,6,2,4,6,7,8,7,8,9]}) |
结果:
1 2 | >>> df['a'].values.tolist() [1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9] |
或者你可以只用
1 2 | >>> df['a'].tolist() [1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9] |
要删除重复项,可以执行以下操作之一:
1 2 3 4 | >>> df['a'].drop_duplicates().values.tolist() [1, 3, 5, 7, 4, 6, 8, 9] >>> list(set(df['a'])) # as pointed out by EdChum [1, 3, 4, 5, 6, 7, 8, 9] |
我想澄清一些事情:
演示代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | import pandas as pd df = pd.DataFrame({'colA':[1,2,1], 'colB':[4,5,6]}) filter_value = 1 print"df" print df print type(df) rows_to_keep = df['colA'] == filter_value print" df['colA'] == filter_value" print rows_to_keep print type(rows_to_keep) result = df[rows_to_keep]['colB'] print" df[rows_to_keep]['colB']" print result print type(result) result = df[rows_to_keep][['colB']] print" df[rows_to_keep][['colB']]" print result print type(result) result = df[rows_to_keep][['colB']].squeeze() print" df[rows_to_keep][['colB']].squeeze()" print result print type(result) result = df.loc[rows_to_keep, 'colB'] print" df.loc[rows_to_keep, 'colB']" print result print type(result) result = df.loc[df['colA'] == filter_value, 'colB'] print" df.loc[df['colA'] == filter_value, 'colB']" print result print type(result) ID = df.loc[rows_to_keep, 'colB'].tolist() print" df.loc[rows_to_keep, 'colB'].tolist()" print ID print type(ID) ID = df.loc[df['colA'] == filter_value, 'colB'].tolist() print" df.loc[df['colA'] == filter_value, 'colB'].tolist()" print ID print type(ID) |
结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | df colA colB 0 1 4 1 2 5 2 1 6 <class 'pandas.core.frame.DataFrame'> df['colA'] == filter_value 0 True 1 False 2 True Name: colA, dtype: bool <class 'pandas.core.series.Series'> df[rows_to_keep]['colB'] 0 4 2 6 Name: colB, dtype: int64 <class 'pandas.core.series.Series'> df[rows_to_keep][['colB']] colB 0 4 2 6 <class 'pandas.core.frame.DataFrame'> df[rows_to_keep][['colB']].squeeze() 0 4 2 6 Name: colB, dtype: int64 <class 'pandas.core.series.Series'> df.loc[rows_to_keep, 'colB'] 0 4 2 6 Name: colB, dtype: int64 <class 'pandas.core.series.Series'> df.loc[df['colA'] == filter_value, 'colB'] 0 4 2 6 Name: colB, dtype: int64 <class 'pandas.core.series.Series'> df.loc[rows_to_keep, 'colB'].tolist() [4, 6] <type 'list'> df.loc[df['colA'] == filter_value, 'colB'].tolist() [4, 6] <type 'list'> |
您可以使用
例如。:
1 2 | import pandas as pd df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]}) |
运行:
1 | >>> df['a'].tolist() |
你会得到
1 | >>> [1, 2, 3] |
如果所有数据都是相同的数据类型,那么上述解决方案是很好的。numpy数组是同类容器。当您执行
1 2 3 4 5 6 7 | a b 0 1 4 1 2 5 2 3 6 a float64 b int64 |
因此,如果要保留原始数据类型,可以执行如下操作
1 2 | row_list = df.to_csv(None, header=False, index=False).split(' ') |
这将以字符串的形式返回每一行。
1 | ['1.0,4', '2.0,5', '3.0,6', ''] |
然后拆分每一行以获取列表列表。拆分后的每个元素都是Unicode。我们需要转换它所需的数据类型。
1 2 3 4 5 6 7 | def f(row_str): row_list = row_str.split(',') return [float(row_list[0]), int(row_list[1])] df_list_of_list = map(f, row_list[:-1]) [[1.0, 4], [2.0, 5], [3.0, 6]] |