关于selenium:Python循环覆盖最后的HTML写

Python Loop Overwriting Last HTML Write

本问题已经有最佳答案,请猛点这里访问。

这个脚本在"while true:"处循环,它是通过单击底部的"下一步"按钮从多个页面中提取数据而编写的,但是我不知道如何构造代码,以便在HTML分页时继续写入HTML。相反,它会覆盖先前编写的HTML结果。感谢你的帮助。谢谢!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
while True:
    time.sleep(10)

    golds = driver.find_elements_by_css_selector(".widgetContainer #widgetContent > div.singleCell")
    print("found %d golds" % len(golds))  

    template ="""\
        <tr class="border">
            <td class="image"><img src="{0}"></td>\
            <td class="title">{2}</td>\
            <td class="price">{3}</td>
        </tr>"""


    lines = []

    for gold in golds:
        goldInfo = {}

        goldInfo['title'] = gold.find_element_by_css_selector('#dealTitle > span').text
        goldInfo['link'] = gold.find_element_by_css_selector('#dealTitle').get_attribute('href')
        goldInfo['image'] = gold.find_element_by_css_selector('#dealImage img').get_attribute('src')

        try:
            goldInfo['price'] = gold.find_element_by_css_selector('.priceBlock > span').text
        except NoSuchElementException:
            goldInfo['price'] = 'No price display'

        line = template.format(goldInfo['image'], goldInfo['link'], goldInfo['title'], goldInfo['price'])
        lines.append(line)

    try:
        #clicks next button
        driver.find_element_by_link_text("Next→").click()
    except NoSuchElementException:
        break

    time.sleep(10)

    html ="""\
        <html>
            <body>
                <table>
                    <tr class='headers'>
                        <td class='image'></td>
                        <td class='title'>Product</td>
                        <td class='price'>Price / Deal</td>
                    </tr>
                </table>
                <table class='data'>
                    {0}
                </table>
            </body>
        </html>\
   """


    f = open('./result.html', 'w')
    f.write(html.format('
'
.join(lines)))
f.close()


在脚本末尾打开文件时,请查看不同的模式:https://docs.python.org/2/library/functions.html open

The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists), and 'a' for appending

然后还有更多

Modes 'r+', 'w+' and 'a+' open the file for updating (reading and writing); note that 'w+' truncates the file. Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the 'b' has no effect.

所以你有几个选择。您可能会使用a,因为您希望向它附加数据。

或者您可以将打开的文件移动到循环之外,这样就不会根据需要不断地重新打开文件。

1
2
3
4
5
f = open('./result.html', 'w')
while True:
  # do stuff
  f.write (...)
f.close()


您应该在附加模式下打开文件

1
f = open('./result.html', 'a')