关于python：逐行读取子进程标准输出

read subprocess stdout line by line

我的python脚本使用子进程调用一个非常嘈杂的Linux实用程序。我想将所有输出存储到一个日志文件中，并向用户显示其中的一些输出。我认为下面的方法可行，但是直到实用程序产生大量输出之后，输出才会显示在我的应用程序中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14

#fake_utility.py, just generates lots of output over time
import time
i = 0
while True:
print hex(i)*512
i += 1
time.sleep(0.5)

#filters output
import subprocess
proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)
for line in proc.stdout:
#the real code does filtering here
print"test:", line.rstrip()

我真正想要的行为是过滤器脚本在从子进程接收到每一行时打印它。类似于tee所做的，但使用了python代码。

我错过了什么？这是可能的吗？

更新：

如果将sys.stdout.flush()添加到fake_utility.py中，那么代码在python 3.1中具有所需的行为。我使用的是python 2.6。你可能会认为使用proc.stdout.xreadlines()和py3k的效果一样，但事实并非如此。

更新2：

这里是最小的工作代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14

#fake_utility.py, just generates lots of output over time
import sys, time
for i in range(10):
print i
sys.stdout.flush()
time.sleep(0.5)

#display out put line by line
import subprocess
proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)
#works in python 3.0+
#for line in proc.stdout:
for line in iter(proc.stdout.readline,''):
print line.rstrip()

相关讨论

我上次使用python已经很久了，但我认为问题在于语句for line in proc.stdout，它在迭代之前读取整个输入。解决方案是使用readline()代替：

1
2
3
4
5
6
7
8
9
10

#filters output
import subprocess
proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)
while True:
line = proc.stdout.readline()
if line != '':
#the real code does filtering here
print"test:", line.rstrip()
else:
break

当然，您仍然需要处理子进程的缓冲。

注意：根据文档，使用迭代器的解决方案应该等同于使用readline()，除了预读缓冲区，但是(或者正因为如此)所提议的更改确实为我产生了不同的结果(Windows XP上的python 2.5)。

相关讨论

对于file.readline()和for line in file，请参见bugs.python.org/issue3907(简而言之：它在python3上工作；在python 2.6+上使用io.open())。
根据PEP8(python.org/dev/peps/pep-0008)中的"编程建议"，对EOF进行的更多的pythonic测试将是"如果不是行："。
这个脚本中没有使用open()；您将io.open()放在哪里？有2.5的工作区吗？
@NAXA：用于管道：for line in iter(proc.stdout.readline, ''):。
@J.F.Sebastian:你在python3上试过这个解决方案吗？我有以前使用iter(proc.stdout.readline, '')方法在python 2(.7)上运行的代码，现在我切换到python 3.4，代码变成梨形，循环不会返回，RAM使用在~0到3GB之间波动。
@是的。1。您可以在python 3上使用for line in proc.stdout(没有预读bug)2。python 3上的'' != b''——不要盲目复制粘贴代码——想想它做了什么，它是如何工作的。
@J.F.Sebastian：当然，iter(f.readline, b'')解决方案是相当明显的(如果有人感兴趣的话，它也适用于Python2)。我的评论并不是要怪你的解决方案(很抱歉，如果是那样的话，我现在也读到了！)，但为了描述症状的严重程度，在这种情况下(大多数PY2/3问题会导致异常，而在这里，行为良好的循环变为无止境，垃圾收集与新创建对象的洪水搏斗，产生长周期和大幅度的内存使用振荡)。
@Jan Philipgehrcke：使用''还是b''取决于启用文本模式的universal_newlines参数。这并不明显。在python 2和3上有不同的参数。如果编写使用subprocess模块的单一源代码python 2/3兼容代码，应该小心。
@J.F.Sebastian：我同意在使用subprocess时有很多需要考虑的地方，但是b''的使用适合大多数应用场景，因为在python 2和3中选择好的默认值是将subprocess.PIPE视为字节流，而不是隐式执行去/编码操作。我想说，即使在python 2上，b''也是推荐的，因为它在语义上更好(显式)。实际上，在python 3上，b''与universal_newlines=True是错误的(它使stdout/err属性成为TextIOWrapper对象)。在python 2上，b''独立于universal_newlines工作。
python挂在readline()上
在试图从标准输出中读取另一行之前，您如何查看proc是否已终止？
这是否关心被调用进程发送输出的频率或频率？它能无限期地运行几个月吗？每30秒只打印一行？我不明白readline()如何确定程序输出实际完成的时间…
我建议在断裂前加上sys.stdout.flush()，否则会弄混。

参加聚会有点晚了，但很惊讶没有看到我认为最简单的解决方案：

1
2
3
4
5
6

import io
import subprocess

proc = subprocess.Popen(["prog","arg"], stdout=subprocess.PIPE)
for line in io.TextIOWrapper(proc.stdout, encoding="utf-8"): # or another encoding
# do something with line

相关讨论

实际上，如果您整理了迭代器，那么缓冲现在可能是您的问题。您可以告诉子进程中的python不要缓冲其输出。

1	proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)

变成

1	proc = subprocess.Popen(['python','-u', 'fake_utility.py'],stdout=subprocess.PIPE)

我在从python内部调用python时需要这个。

您想将这些额外的参数传递给subprocess.Popen：

1	bufsize=1, universal_newlines=True

然后可以像在示例中那样迭代。(用python 3.5测试)

相关讨论

我用python3试过了，效果很好，来源

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

def output_reader(proc):
for line in iter(proc.stdout.readline, b''):
print('got line: {0}'.format(line.decode('utf-8')), end='')

def main():
proc = subprocess.Popen(['python', 'fake_utility.py'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)

t = threading.Thread(target=output_reader, args=(proc,))
t.start()

try:
time.sleep(0.2)
import time
i = 0

while True:
print (hex(i)*512)
i += 1
time.sleep(0.5)
finally:
proc.terminate()
try:
proc.wait(timeout=0.2)
print('== subprocess exited with rc =', proc.returncode)
except subprocess.TimeoutExpired:
print('subprocess did not terminate in time')
t.join()

R的以下修改？在python 2和3(2.7.12和3.6.1)上，mulo的答案对我很有用：

1
2
3
4
5
6
7
8
9
10

import os
import subprocess

process = subprocess.Popen(command, stdout=subprocess.PIPE)
while True:
line = process.stdout.readline()
if line != '':
os.write(1, line)
else:
break

您还可以读取不带循环的行。在python3.6工作。

1
2
3
4
5

import os
import subprocess

process = subprocess.Popen(command, stdout=subprocess.PIPE)
list_of_byte_strings = process.stdout.readlines()