Python lxml web scraping
1 2 3 4 5 6 7 | from lxml import html import requests page = requests.get('https://projecteuler.net/problem=1') tree = html.fromstring(page.content) text=tree.xpath('//div[@class="problem_content"]/text()') print (text) |
我有这个代码,因此我想要得到描述问题的文本,在这种情况下:
"If we list all the natural numbers below 10 that are multiples of 3
or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.Find the sum of all the multiples of 3 or 5 below 1000."
但是,我收到了:
1 2 3 4 5 | [' ', ' ', ' '] |
发现文本本身包含在
槽中,所以xpath行应该类似
1 | text=tree.xpath('//div[@role="problem"]/p/text()') |