关于Python：Python – 被认为更适合报废：selenium或beautifulsoup with selenium？

Python - which is considered better for scrapping: selenium or beautifulsoup with selenium?

这个问题是针对Win10上的Python 3.6.3、BS4和Selenium 3.8的。

我正在尝试用动态内容刮取页面。我试图搜集的是数字和文本(例如http://www.oddsportal.com)。从我的理解使用请求+美汤不会做这个工作，因为动态内容会被隐藏。所以我必须使用其他工具，比如我们的Selenium WebDriver。

那么，考虑到我无论如何都会使用Selenium WebDriver，您是否建议忽略BeautifulSoup并坚持使用Selenium WebDriver函数，例如

1	elem = driver.find_element_by_name("q"))

或者是使用硒+美容汤被认为是更好的做法？

你对这两条路线中哪条能给我提供更方便的功能有什么看法吗？

谢谢。

相关讨论

美人汤

Beautifulsoup是一个强大的web报废工具。它使用urllib.request python库。urllib.request对于从静态页面提取数据非常强大。

硒

Selenium是目前最广泛接受和最有效的网络自动化工具。硒支持与Dynamic Pages, Contents and Elements相互作用。

结论

要创建一个强大而高效的框架来使用动态内容来抓取页面，您必须在框架中集成Selenium和Beautifulsoup。通过Selenium与动态元素进行浏览和交互，通过Beautifulsoup高效的抓取内容。

一个例子

这里是一个example使用Selenium和Beautifulsoup来表示Scrapping。

Selenium有许多选择器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

# and

find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector

所以大多数情况下你不需要Beautifulsoup。

尤其是xpath和css_selector是有用的。