关于python：从字典写入numpy数组

Writing to numpy array from dictionary

我有一个文件头值字典(时间、帧数、年、月等)，我想将其写入一个numpy数组。我目前的代码如下：

1	arr=np.array([(k,)+v for k,v in fileheader.iteritems()],dtype=["a3,a,i4,i4,i4,i4,f8,i4,i4,i4,i4,i4,i4,a10,a26,a33,a235,i4,i4,i4,i4,i4,i4"])

但我得到一个错误，"只能将元组(而不是"int")连接到元组。

基本上，最终结果需要是存储整体文件头信息(512字节)和每个帧的数据(头和数据，每帧49408字节)的数组。有更简单的方法吗？

编辑：为了澄清(对我自己也一样)，我需要将文件的每个帧中的数据写入一个数组。我以matlab代码为基础。下面是给我的代码的大致概念：

1
2
3

data.frame=zeros([512 96])
frame=uint8(fread(fid,[data.numbeams,512]),'uint8'))
data.frame=frame

号

如何将"框架"转换为python？

相关讨论

你最好将头数据保存在dict中。你真的需要它作为数组吗？(如果是，为什么？头文件放在numpy数组中有一些好处，但它比简单的dict更复杂，也没有那么灵活。)

dict的一个缺点是它的键没有可预测的顺序。如果需要按常规顺序(类似于C结构)将头文件写回磁盘，则需要单独存储字段的顺序及其值。如果是这样的话，您可以考虑使用一个有序的dict(collections.OrderedDict)，或者只需要组合一个简单的类来保存头数据并将命令存储在那里。

除非有充分的理由将它放入一个麻木的数组中，否则您可能不想这样做。

但是，结构化数组将保留头的顺序，并使其更容易写入磁盘的二进制表示形式，但在其他方面它是不灵活的。

如果您确实想使头成为一个数组，您可以这样做：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

import numpy as np

# Lists can be modified, but preserve order. That's important in this case.
names = ['Name1', 'Name2', 'Name3']
# It's"S3" instead of"a3" for a string field in numpy, by the way
formats = ['S3', 'i4', 'f8']

# It's often cleaner to specify the dtype this way instead of as a giant string
dtype = dict(names=names, formats=formats)

# This won't preserve the order we're specifying things in!!
# If we iterate through it, things may be in any order.
header = dict(Name1='abc', Name2=456, Name3=3.45)

# Therefore, we'll be sure to pass things in in order...
# Also, np.array will expect a tuple instead of a list for a structured array...
values = tuple(header[name] for name in names)
header_array = np.array(values, dtype=dtype)

# We can access field in the array like this...
print header_array['Name2']

# And dump it to disk (similar to a C struct) with
header_array.tofile('test.dat')

号

另一方面，如果您只想访问头中的值，只需将其保留为一个dict。这样更简单。

根据听起来你在做什么，我会这样做。我使用numpy数组来读取头，但头值实际上是作为类属性(以及头数组)存储的。

这看起来比实际情况更复杂。

我只是定义两个新类，一个用于父文件，一个用于框架。你可以用更少的代码来做同样的事情，但是这会为你提供更复杂的事物的基础。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

import numpy as np

class SonarFile(object):
# These define the format of the file header
header_fields = ('num_frames', 'name1', 'name2', 'name3')
header_formats = ('i4', 'f4', 'S10', '>I4')

def __init__(self, filename):
self.infile = open(filename, 'r')
dtype = dict(names=self.header_fields, formats=self.header_formats)

# Read in the header as a numpy array (count=1 is important here!)
self.header = np.fromfile(self.infile, dtype=dtype, count=1)

# Store the position so we can"rewind" to the end of the header
self.header_length = self.infile.tell()

# You may or may not want to do this (If the field names can have
# spaces, it's a bad idea). It will allow you to access things with
# sonar_file.Name1 instead of sonar_file.header['Name1'], though.
for field in self.header_fields:
setattr(self, field, self.header[field])

# __iter__ is a special function that defines what should happen when we
# try to iterate through an instance of this class.
def __iter__(self):
"""Iterate through each frame in the dataset."""
# Rewind to the end of the file header
self.infile.seek(self.header_length)

# Iterate through frames...
for _ in range(self.num_frames):
yield Frame(self.infile)

def close(self):
self.infile.close()

class Frame(object):
header_fields = ('width', 'height', 'name')
header_formats = ('i4', 'i4', 'S20')
data_format = 'f4'

def __init__(self, infile):
dtype = dict(names=self.header_fields, formats=self.header_formats)
self.header = np.fromfile(infile, dtype=dtype, count=1)

# See discussion above...
for field in self.header_fields:
setattr(self, field, self.header[field])

# I'm assuming that the size of the frame is in the frame header...
ncols, nrows = self.width, self.height

# Read the data in
self.data = np.fromfile(infile, self.data_format, count=ncols * nrows)

# And reshape it into a 2d array.
# I'm assuming C-order, instead of Fortran order.
# If it's fortran order, just do"data.reshape((ncols, nrows)).T"
self.data = self.data.reshape((nrows, ncols))

你可以这样使用它：

1
2
3
4
5

dataset = SonarFile('input.dat')

for frame in dataset:
im = frame.data
# Do something...

。

相关讨论

我想头信息不需要在数组中。不过，我确实需要数组中的帧信息来创建图像。在这里，请允许我——我被抛到了最深处，负责翻译matlab代码，以便对数据进行图像处理。我知道以下内容：文件头是512字节，每帧大小为49408字节，其中256个是帧头，编写matlab代码的人设置了一个初始的零数组，大小为[512,96](这是一个96波束的声纳)。我需要处理每个文件的每一帧。
是否需要将其写回原始文件格式？
否，我希望最终将最终数据导出到.txt文件中。现在，我需要读取每个文件和相关的图像数据；它是二进制格式。我们的最终结果(沿着这条线)是获取这些图像文件(用声纳相机收集)并在声纳波束中自动定位目标(为了可视化，它本质上是一个黑色背景上的白色圆圈)。如果有任何意义的话，我最终需要将检测到的目标的坐标位置存储在一个文件中。非常感谢你的帮助，我完全是初学者！
哦，顺便提一句——这个程序已经在matlab中成功地编写(半)了，我们正在尝试转换为python。
查看更新。希望能有所帮助！有不止一种方法可以做到这一点，而且您可以用更少的代码来完成我展示的工作，但是如果您将来需要更新它，那么它的灵活性就会降低。我通常会发现，在阅读类似数据时，沿着这些行做一些最简单的事情。以同样的格式向磁盘添加写操作也很简单。(不管它值多少钱，我也是一个海洋地球物理学家，而且我似乎比我想的要频繁地阅读随机二进制数据格式。)
这太好了，我试试看！再次感谢你！
到目前为止，成功！没有错误或什么。那么，我应该放弃我为标题而写的原版字典吗？如何显示为文件和初始帧创建的数组？
通过显示，您的意思是"制作一个psuedo颜色图"还是只打印出数值？如果要绘制数据，请使用matplotlib的imshow。要打印它，只需调用print data即可显示摘要。
还有一个问题——每个帧下包含的实际图像数据从256字节的帧头信息后开始——我如何确保将数据读取到数组中是从256字节开始的所有数据(即仅图像数据)？

问题似乎是v是int而不是tuple。尝试：

1	arr=np.array([(k,v) for k,v in fileheader.iteritems()],dtype=["a3,a,i4,i4,i4,i4,f8,i4,i4,i4,i4,i4,i4,a10,a26,a33,a235,i4,i4,i4,i4,i4,i4"])