关于解析:Python lxml web scraping

Python lxml web scraping

1
2
3
4
5
6
7
from lxml import html
import requests

page = requests.get('https://projecteuler.net/problem=1')
tree = html.fromstring(page.content)
text=tree.xpath('//div[@class="problem_content"]/text()')
print (text)

我有这个代码,因此我想要得到描述问题的文本,在这种情况下:

"If we list all the natural numbers below 10 that are multiples of 3
or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000."

但是,我收到了:

1
2
3
4
5
['

'
, '
'
, '
'
]

发现文本本身包含在

槽中,所以xpath行应该类似

1
text=tree.xpath('//div[@role="problem"]/p/text()')