Use an already open webpage(with selenium) to beautifulsoup?
我打开了一个网页,并使用WebDriver代码登录。为此使用WebDriver,因为在设置为scrape之前,页面需要登录和各种其他操作。
目的是从这个打开的页面中获取数据。需要找到链接并打开它们,所以SeleniumWebDriver和BeautifulSoup之间会有很多组合。
我查看了BS4的文档,
OSError: [Errno 22] Invalid argument: 'https://m/search.mp?ss=Pr+Dn+Ts'
我想这是因为它不是一个
您正试图按网址打开网页。
1 2 3 4 5 | from urllib.request import urlopen # Python 3 # from urllib2 import urlopen # Python 2 url ="your target url here" soup = bs4.BeautifulSoup(urlopen(url),"html.parser") |
或者,对人类使用http-
1 2 3 4 | import requests response = requests.get(url) soup = bs4.BeautifulSoup(response.content,"html.parser") |
还要注意,强烈建议显式地指定一个解析器——我已经使用了
I want to use the exact same page(same instance)
一种常见的方法是获取
1 2 3 4 5 6 7 8 9 10 11 12 | from bs4 import BeautifulSoup from selenium import webdriver driver = webdriver.Firefox() driver.get(url) # wait for page to load.. source = driver.page_source driver.quit() # remove this line to leave the browser open soup = BeautifulSoup(source,"html.parser") |