Python Loop Overwriting Last HTML Write
本问题已经有最佳答案,请猛点这里访问。
这个脚本在"while true:"处循环,它是通过单击底部的"下一步"按钮从多个页面中提取数据而编写的,但是我不知道如何构造代码,以便在HTML分页时继续写入HTML。相反,它会覆盖先前编写的HTML结果。感谢你的帮助。谢谢!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | while True: time.sleep(10) golds = driver.find_elements_by_css_selector(".widgetContainer #widgetContent > div.singleCell") print("found %d golds" % len(golds)) template ="""\ <tr class="border"> <td class="image"><img src="{0}"></td>\ <td class="title">{2}</td>\ <td class="price">{3}</td> </tr>""" lines = [] for gold in golds: goldInfo = {} goldInfo['title'] = gold.find_element_by_css_selector('#dealTitle > span').text goldInfo['link'] = gold.find_element_by_css_selector('#dealTitle').get_attribute('href') goldInfo['image'] = gold.find_element_by_css_selector('#dealImage img').get_attribute('src') try: goldInfo['price'] = gold.find_element_by_css_selector('.priceBlock > span').text except NoSuchElementException: goldInfo['price'] = 'No price display' line = template.format(goldInfo['image'], goldInfo['link'], goldInfo['title'], goldInfo['price']) lines.append(line) try: #clicks next button driver.find_element_by_link_text("Next→").click() except NoSuchElementException: break time.sleep(10) html ="""\ <html> <body> <table> <tr class='headers'> <td class='image'></td> <td class='title'>Product</td> <td class='price'>Price / Deal</td> </tr> </table> <table class='data'> {0} </table> </body> </html>\ """ f = open('./result.html', 'w') f.write(html.format(' '.join(lines))) f.close() |
在脚本末尾打开文件时,请查看不同的模式:https://docs.python.org/2/library/functions.html open
The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists), and 'a' for appending
号
然后还有更多
Modes 'r+', 'w+' and 'a+' open the file for updating (reading and writing); note that 'w+' truncates the file. Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the 'b' has no effect.
号
所以你有几个选择。您可能会使用
或者您可以将打开的文件移动到循环之外,这样就不会根据需要不断地重新打开文件。
1 2 3 4 5 | f = open('./result.html', 'w') while True: # do stuff f.write (...) f.close() |
号
您应该在附加模式下打开文件
1 | f = open('./result.html', 'a') |