关于python:python-如何逐行读取HTML

Python - How to read HTML line by line

本问题已经有最佳答案,请猛点这里访问。

我正在尝试编写一个程序,它将获取一个HTML文件并输出每一行。我做错了,因为我的代码正在输出每个字母。如何将所有HTML行放入列表中?

这是迄今为止的代码:

1
2
3
4
5
6
7
f = open("/home/tony/Downloads/page1/test.html","r")
htmltext = f.read()
f.close()

for t in htmltext:
    print t +"
"


你可以用f.readlines()代替f.read()。此函数返回文件中所有行的列表。

1
2
3
with open("/home/tony/Downloads/page1/test.html","r") as f:
    for line in f.readlines():
        print(line)

或者,您可以使用list(f)

1
2
3
4
f = open("/home/tony/Downloads/page1/test.html","r")
f_lines = list(f)
for line in f_lines:
    print(line)

来源:https://docs.python.org/3.5/tutorial/inputout.html


f.read()将尝试读取并生成每个字符,直到满足EOF。您需要的是f.readlines()方法:

1
2
3
with open("/home/tony/Downloads/page1/test.html","r") as f:
    for line in f.readlines():
        print(line) # The newline is included in line