关于python：包含数组的pandas系列

pandas series containing arrays

我有一个熊猫数据框架列，看起来有点像：

1
2
3
4
5

Out[67]:
0 ["cheese","milk...
1 ["yogurt","cheese...
2 ["cheese","cream"...
3 ["milk","cheese"...

现在，我最终希望这是一个简单的列表，但在试图将其扁平化的过程中，我注意到大熊猫将["cheese","milk","cream"]视为str，而不是list。

我该怎么把它压平，这样我最终得到：

1	["cheese","milk","yogurt","cheese","cheese"...]

号

[编辑]所以下面给出的答案是：

埃多克斯1〔3〕

1
2
3
4
5

s = s.str.strip("[]")
df = s.str.split(',', expand=True)
df = df.applymap(lambda x: x.replace("'", '').strip())
l = df.values.flatten()
print (l.tolist())

这太好了，问题回答，回答被接受了，但我觉得这是一个相当不雅的解决方案。

相关讨论

您可以使用numpy.flatten，然后使用平面嵌套lists—请参见：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

print df
a
0 [cheese, milk]
1 [yogurt, cheese]
2 [cheese, cream]

print df.a.values
[[['cheese', 'milk']]
[['yogurt', 'cheese']]
[['cheese', 'cream']]]

l = df.a.values.flatten()
print l
[['cheese', 'milk'] ['yogurt', 'cheese'] ['cheese', 'cream']]

print [item for sublist in l for item in sublist]
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']

编辑：

您可以尝试：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

import pandas as pd

s = pd.Series(["['cheese', 'milk']","['yogurt', 'cheese']","['cheese', 'cream']"])

#remove []
s = s.str.strip('[]')
print s
0 'cheese', 'milk'
1 'yogurt', 'cheese'
2 'cheese', 'cream'
dtype: object

df = s.str.split(',', expand=True)
#remove ' and strip empty string
df = df.applymap(lambda x: x.replace("'", '').strip())
print df
0 1
0 cheese milk
1 yogurt cheese
2 cheese cream

l = df.values.flatten()
print l.tolist()
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']

号