关于python：从段中读取文件

python

Reading from a file in segments

我编写了一个脚本，它从两个不同的文件中读取数据，并相应地继续执行。然而，当我写这个脚本时，我的印象是我正在阅读的第一个文件只有两行，不幸的是，这已经改变了。

我的代码提取前两行并将数据传递给另一个函数，然后通过传递多个其他函数进行计算。

现在我正在做这样的事情：

1
2
3
4
5
6
7
8
9

try:
file = open(myfile, 'r')
for line in file:
if line[0] != '|':
name = line.strip('
')
else:
data = line.strip('|
')

通常，该文件如下所示：

1 2	Samantha \|j&8ju820kahu9\|

号

遗憾的是，现在我可以有一个文件，它可以有多行，如下所示：

1
2
3
4
5
6

Andy
|o81kujd0-la88js|
Mathew
|a992kma82nf-x01j4|
Andrew
|01ks83nnz;a82jlad|

有没有一种方法可以从一个文件中一次提取两行？处理它们，然后再提取两个？所以抓住前两行，把它们赋给name+data，然后把它传递给我的函数，最终打印出所需的内容，然后得到新的两行等等。

请给出建议。

是的，因为文件上下文也是一个迭代器：

1
2
3

with open(filename, 'r') as f:
for l1, l2 in zip(f, f):
# ... do something with l1 and l2

。

这是阿法克最短最Python的路线。

相关讨论

当然可以。

1
2
3
4
5
6
7
8
9
10
11
12
13

okay = False
with open(...) as f:
while True:
okay = False
try:
line_1 = next(f)
line_2 = next(f)
okay = True
# ... do something with the pair of lines
except StopIteration:
break; # End of file.
if not okay:
complain("The file did not contain an even number of lines")

。

您的解决方案可能是：

1
2
3
4

data = {}
with open(filename) as f:
for name, value in zip(f, f):
data[name] = value

有关带迭代器的zip函数的说明，请参阅文档。

此外，这是来自ITertools文档中的配方：

1
2
3
4
5

def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)

。

可以使用列表拼接符号list[::]在迭代时跳过列表元素。如果你的文件很小，你可以直接用readlines()把它读到内存中。

考虑这样的事情不要使用file作为文件句柄。它隐藏了在file中内置的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

In [9]: a = my_file.readlines()
In [10]: for i, line in enumerate(a[::2]):
...: data_line = a[i+1]
...: name = line.strip('
')
...: data = data_line.strip("|
")
...: print name
...: print data
...:
Andy
o81kujd0-la88js
Mathew
Mathew
Andrew
a992kma82nf-x01j4

In [11]:

(不过，我个人会做一些类似于正则表达式匹配的事情)。

相关讨论

试试这个

1
2
3
4
5

from itertools import islice
with open(filename, 'r') as infile:
current_slice = islice(infile, N)
for line in current_slice:
print line

其中N是要处理的行数，current_slice是一个生成器对象，它为您提供文件的每一行，并且可以在循环中使用。这一次应该给你两行。您可以执行操作，然后继续执行下两行，而不是打印。

另一个选择是

1
2
3
4
5
6

from itertools import izip_longest

with open(filename) as f:
for lines in grouper(f, N, ''):
for line in lines:
# process N lines here

号

相关讨论