python:通过numpy.save保存字典

Python : save dictionaries through numpy.save

本问题已经有最佳答案，请猛点这里访问。

我在内存中有一个很大的数据集(数百万行)，形式是numpy数组和字典。

一旦构建了这些数据，我希望将它们存储到文件中；因此，稍后我可以快速地将这些文件加载到内存中，而无需重新从头开始重建数据。

np.save和np.load函数可以顺利完成numpy数组的工作。但我面临的问题与听写对象。

见下面的示例。d2是从文件加载的字典。请参见out[28]它已作为一个numpy数组而不是dict加载到d2中。因此，进一步的dict操作(如get)不起作用。

是否有方法将文件中的数据加载为dict(而不是numpy数组)？

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

In [25]: d1={'key1':[5,10], 'key2':[50,100]}

In [26]: np.save("d1.npy", d1)

In [27]: d2=np.load("d1.npy")

In [28]: d2
Out[28]: array({'key2': [50, 100], 'key1': [5, 10]}, dtype=object)

In [30]: d1.get('key1') #original dict before saving into file
Out[30]: [5, 10]

In [31]: d2.get('key2') #dictionary loaded from the file
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-31-23e02e45bf22> in <module>()
----> 1 d2.get('key2')

AttributeError: 'numpy.ndarray' object has no attribute 'get'

相关讨论

可以使用pickle模块。示例代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

from six.moves import cPickle as pickle #for performance
from __future__ import print_function
import numpy as np

def save_dict(di_, filename_):
with open(filename_, 'wb') as f:
pickle.dump(di_, f)

def load_dict(filename_):
with open(filename_, 'rb') as f:
ret_di = pickle.load(f)
return ret_di

if __name__ == '__main__':
g_data = {
'm':np.random.rand(4,4),
'n':np.random.rand(2,2,2)
}
save_dict(g_data, './data.pkl')
g_data2 = load_dict('./data.pkl')
print(g_data['m'] == g_data2['m'])
print(g_data['n'] == g_data2['n'])

也可以将多个python对象保存在单个pickled文件中。在这种情况下，每个pickle.load调用将加载一个对象。