关于python：从.npy文件制作pandas数据帧

Making a pandas dataframe from a .npy file

我试图从一个.npy文件生成一个pandas数据帧，当使用np.load读取时，它返回一个包含字典的numpy数组。我最初的直觉是提取字典，然后使用pd.from_dict创建数据帧，但每次都失败，因为我似乎无法从np.load返回的数组中提取字典。它看起来像是np.array([字典，dtype=object])，但是我不能通过索引数组或类似的东西来获取字典。我也尝试过使用np.load('filename').item()，但熊猫仍然无法将结果识别为字典。

另外，我也尝试了pd.read_pickle，但也没用。

我怎样才能把这个.npy字典放到我的数据框中？这是不断失败的代码…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

import pandas as pd
import numpy as np
import os

targetdir = '../test_dir/'

filenames = []
successful = []
unsuccessful = []
for dirs, subdirs, files in os.walk(targetdir):
for name in files:
filenames.append(name)
path_to_use = os.path.join(dirs, name)
if path_to_use.endswith('.npy'):
try:
file_dict = np.load(path_to_use).item()
df = pd.from_dict(file_dict)
#df = pd.read_pickle(path_to_use)
successful.append(path_to_use)
except:
unsuccessful.append(path_to_use)
continue

print str(len(successful)) +" files were loaded successfully!"
print"The following files were not loaded:"
for item in unsuccessful:
print item +"
"

print df

相关讨论

假设加载.npy后，项目(np.load(path_to_use).item()看起来与此类似；

1	{'user_c': 'id_003', 'user_a': 'id_001', 'user_b': 'id_002'}

所以，如果您需要使用上面的字典，像下面这样设计一个数据框架；

1
2
3
4

user_name user_id
0 user_c id_003
1 user_a id_001
2 user_b id_002

号

你可以使用；

1	df = pd.DataFrame(list(x.item().iteritems()), columns=['user_name','user_id'])

如果您有如下字典列表：

1	users = [{'u_name': 'user_a', 'u_id': 'id_001'}, {'u_name': 'user_b', 'u_id': 'id_002'}]

。

你可以简单地使用

1	df = pd.DataFrame(users)

提出一个类似的数据帧；

1
2
3

u_id u_name
0 id_001 user_a
1 id_002 user_b

。

好像你有一本类似的字典；

1
2
3
4
5

data = {
'Center': [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
'Vpeak': [1.1, 2.2],
'ID': ['id_001', 'id_002']
}

。

在这种情况下，您可以简单地使用；

1	df = pd.DataFrame(data) # df = pd.DataFrame(file_dict.item()) in your case

提出一个类似的数据帧；

1
2
3

Center ID Vpeak
0 [0.1, 0.2, 0.3] id_001 1.1
1 [0.4, 0.5, 0.6] id_002 2.2

。

如果您在dict中有ndarray，请执行与下面类似的一些预处理；并使用它来创建df；

1
2
3
4
5

for key in data:
if isinstance(data[key], np.ndarray):
data[key] = data[key].tolist()

df = pd.DataFrame(data)