关于python：要列出的Pandas DataFrame

Pandas DataFrame to list

本问题已经有最佳答案，请猛点这里访问。

我正在根据满足另一列中的条件从一列中提取数据的子集。

我可以得到正确的值，但它在pandas.core.frame.dataframe中。如何将其转换为列表？

1
2
3
4
5
6
7

import pandas as pd

tst = pd.read_csv('C:\\SomeCSV.csv')

lookupValue = tst['SomeCol'] =="SomeValue"
ID = tst[lookupValue][['SomeCol']]
#How To convert ID to a list

相关讨论

我想澄清一些事情：

正如其他答案所指出的，最简单的事情就是使用pandas.Series.tolist()。我不知道为什么最热门的答案首先使用pandas.Series.values.tolist()，因为据我所知，它增加了语法/混乱，没有额外的好处。

tst[lookupValue][['SomeCol']]是一个数据帧(如问题)，而不是一个系列(如对问题的评论中所述)。这是因为tst[lookupValue]是一个数据帧，用[['SomeCol']]进行切片需要列的列表(该列表的长度恰好为1)，导致返回数据帧。如果你移除额外的一组支架，如tst[lookupValue]['SomeCol']，那你就要那个列而不是列列表，因此您可以返回一个序列。

你需要一个系列来使用pandas.Series.tolist()，所以你应该在这种情况下，一定要跳过第二组括号。如果你最终会得到一个不易避免的单列数据帧这样，您可以使用pandas.DataFrame.squeeze()将其转换为一系列。

tst[lookupValue]['SomeCol']通过链式切片。它只切片一次以获得只有特定行的数据帧左边，然后它再次切片以得到某个列。你可以得到既然你只是在读，而不是在写，那就把它拿走吧，但是正确的方法是tst.loc[lookupValue, 'SomeCol'](它返回一个序列)。

使用4中的语法，您可以在一行中合理地完成所有操作：ID = tst.loc[tst['SomeCol'] == 'SomeValue', 'SomeCol'].tolist()。

演示代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

import pandas as pd
df = pd.DataFrame({'colA':[1,2,1],
'colB':[4,5,6]})
filter_value = 1

print"df"
print df
print type(df)

rows_to_keep = df['colA'] == filter_value
print"
df['colA'] == filter_value"
print rows_to_keep
print type(rows_to_keep)

result = df[rows_to_keep]['colB']
print"
df[rows_to_keep]['colB']"
print result
print type(result)

result = df[rows_to_keep][['colB']]
print"
df[rows_to_keep][['colB']]"
print result
print type(result)

result = df[rows_to_keep][['colB']].squeeze()
print"
df[rows_to_keep][['colB']].squeeze()"
print result
print type(result)

result = df.loc[rows_to_keep, 'colB']
print"
df.loc[rows_to_keep, 'colB']"
print result
print type(result)

result = df.loc[df['colA'] == filter_value, 'colB']
print"
df.loc[df['colA'] == filter_value, 'colB']"
print result
print type(result)

ID = df.loc[rows_to_keep, 'colB'].tolist()
print"
df.loc[rows_to_keep, 'colB'].tolist()"
print ID
print type(ID)

ID = df.loc[df['colA'] == filter_value, 'colB'].tolist()
print"
df.loc[df['colA'] == filter_value, 'colB'].tolist()"
print ID
print type(ID)

结果：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

df
colA colB
0 1 4
1 2 5
2 1 6
<class 'pandas.core.frame.DataFrame'>

df['colA'] == filter_value
0 True
1 False
2 True
Name: colA, dtype: bool
<class 'pandas.core.series.Series'>

df[rows_to_keep]['colB']
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df[rows_to_keep][['colB']]
colB
0 4
2 6
<class 'pandas.core.frame.DataFrame'>

df[rows_to_keep][['colB']].squeeze()
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df.loc[rows_to_keep, 'colB']
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df.loc[df['colA'] == filter_value, 'colB']
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df.loc[rows_to_keep, 'colB'].tolist()
[4, 6]
<type 'list'>

df.loc[df['colA'] == filter_value, 'colB'].tolist()
[4, 6]
<type 'list'>

您可以使用pandas.Series.tolist。

例如。：

1 2	import pandas as pd df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})

运行：

1	>>> df['a'].tolist()

你会得到

1	>>> [1, 2, 3]

如果所有数据都是相同的数据类型，那么上述解决方案是很好的。numpy数组是同类容器。当您执行df.values时，输出为numpy array。因此，如果数据中有int和float，那么输出将要么有int或float，列将释放其原始数据类型。考虑DF

1
2
3
4
5
6
7

a b
0 1 4
1 2 5
2 3 6

a float64
b int64

因此，如果要保留原始数据类型，可以执行如下操作

1 2	row_list = df.to_csv(None, header=False, index=False).split(' ')

这将以字符串的形式返回每一行。

1	['1.0,4', '2.0,5', '3.0,6', '']

然后拆分每一行以获取列表列表。拆分后的每个元素都是Unicode。我们需要转换它所需的数据类型。

1
2
3
4
5
6
7

def f(row_str):
row_list = row_str.split(',')
return [float(row_list[0]), int(row_list[1])]

df_list_of_list = map(f, row_list[:-1])

[[1.0, 4], [2.0, 5], [3.0, 6]]