关于Python的:PyQt4到PyQt5 – > 不推荐使用mainFrame(),需要修复以加载网页

PyQt4 to PyQt5 -> mainFrame() deprecated, need fix to load web pages

我正在做Sentdex的Pyqt4 YouTube教程。我试着跟着去,但是用Pyqt5代替。这是一个简单的网页抓取应用程序。我跟随Sentdex的教程来到这里:

enter image description here

现在我尝试用pyqt5编写相同的应用程序,这就是我所拥有的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import os
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl, QEventLoop
from PyQt5.QtWebEngineWidgets import QWebEnginePage
from bs4 import BeautifulSoup
import requests


class Client(QWebEnginePage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebEnginePage.__init__(self)
        self.loadFinished.connect(self._loadFinished)
        self.load(QUrl(url))
        self.app.exec_()

    def _loadFinished(self):
        self.app.quit()


url = 'https://pythonprogramming.net/parsememcparseface/'
client_response = Client(url)

#I think the issue is here at LINE 26
source = client_response.mainFrame().toHtml()

soup = BeautifulSoup(source,"html.parser")
js_test = soup.find('p', class_='jstest')
print(js_test.text)

当我运行此程序时,我收到消息:

1
2
source = client_response.mainFrame().toHtml()
AttributeError: 'Client' object has no attribute 'mainFrame'

我试过几种不同的解决方案,但都没用。任何帮助都将不胜感激。

编辑

第15行的logging qurl(url)返回该值:

PyQt5.QtCore.QUrl('https://pythonprogramming.net/parsememcparseface/')

当我尝试在26号线使用source = client_response.load(QUrl(url))时,我得到的信息是:

File"test3.py", line 28, in
soup = BeautifulSoup(source,"html.parser")
File"/Users/MYNAME/.venv/qtproject/lib/python3.6/site-packages/bs4/__init__.py", line 192, in __init__
elif len(markup) <= 256 and ( TypeError: object of type 'NoneType' has no len()

当我尝试source = client_response.url()时,我得到:

1
2
3
4
soup = BeautifulSoup(source,"html.parser")
      File"/Users/MYNAME/.venv/qtproject/lib/python3.6/site-packages/bs4/__init__.py", line 192, in __init__
        elif len(markup) <= 256 and (
    TypeError: object of type 'QUrl' has no len()


必须在类的定义中调用QWebEnginePage::toHtml()QWebEnginePage::toHtml()接受一个指针函数或lambda作为参数,而这个指针函数又必须接受一个"str"类型的参数(这是包含页面HTML的参数)。下面是示例代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import bs4 as bs
import sys
import urllib.request
from PyQt5.QtWebEngineWidgets import QWebEnginePage
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl

class Page(QWebEnginePage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebEnginePage.__init__(self)
        self.html = ''
        self.loadFinished.connect(self._on_load_finished)
        self.load(QUrl(url))
        self.app.exec_()

    def _on_load_finished(self):
        self.html = self.toHtml(self.Callable)
        print('Load finished')

    def Callable(self, html_str):
        self.html = html_str
        self.app.quit()


def main():
    page = Page('https://pythonprogramming.net/parsememcparseface/')
    soup = bs.BeautifulSoup(page.html, 'html.parser')
    js_test = soup.find('p', class_='jstest')
    print js_test.text

if __name__ == '__main__': main()


从不太迟…我发现了同样的问题,并在这里找到了它的描述:http://pyqt.sourceforge.net/docs/pyqt5/gotchas.html在退出时崩溃

我遵循了将qapplication放入全局变量的建议(我知道它很脏…我将为此受到惩罚),而且它工作"很好"。我可以循环而不发生任何碰撞。

希望这会有所帮助。