Pandas msgpack vs pickle
Pandas中的
根据msgpack上的Pandas文档:
This is a lightweight portable binary format, similar to binary JSON,
that is highly space efficient, and provides good performance both on
the writing (serialization), and reading (deserialization).
然而,我发现它的性能似乎与咸菜不相上下。
1 2 3 4 5 6 7 8 9 10 11 12 13 | df = pd.DataFrame(np.random.randn(10000, 100)) >>> %timeit df.to_pickle('test.p') 10 loops, best of 3: 22.4 ms per loop >>> %timeit df.to_msgpack('test.msg') 10 loops, best of 3: 36.4 ms per loop >>> %timeit pd.read_pickle('test.p') 100 loops, best of 3: 10.5 ms per loop >>> %timeit pd.read_msgpack('test.msg') 10 loops, best of 3: 24.6 ms per loop |
问题:除了泡菜的潜在安全问题,msgpack对pickle有什么好处? pickle仍然是序列化数据的首选方法,还是目前存在更好的替代方案?
Pickle更适合以下情况:
MsgPack更适合以下情况:
正如@Jeff上面提到的,这篇博文可能会引起人们的兴趣