Beautifulsoup python3 Howlongtobeat.com extracting name (and other elements)
想知道如何通过漂亮的汤来提取游戏的名字
我觉得HTML方面有问题
以下是迄今为止我所拥有的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | from requests import get url = 'https://howlongtobeat.com/game.php?id=38050' response = get(url) from bs4 import BeautifulSoup html_soup = BeautifulSoup(response.text, 'html.parser') game_length = html_soup.find_all('div', class_='game_times') length = (game_length[-1].find_all({'li': ' short time_100 shadow_box'})[-1].contents[3].get_text()) print(length) game_name = html_soup.find_all('div', class_='profile_header_game') game = (game_name[].find({"profile_header shadow_text"})[].contents[].get_text()) print(game) |
我知道长度,但不知道游戏名称为什么?
对于打印(长度)打印:
1 | 31 Hours |
但对于打印(游戏)打印:
game_name = html_soup.find_all('div', class_='profile_header_game')
game = (game_name[].find({"profile_header shadow_text"})[].contents[].get_text())
File"", line 1
game = (game_name[].find({"profile_header shadow_text"})[].contents[].get_text())
^
SyntaxError: invalid syntaxprint(game)
Traceback (most recent call last):
File"", line 1, in
NameError: name 'game' is not defined< /块引用>< /块引用>
我做错什么了?
代码中似乎有一些语法问题。以下是正确的版本:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 from bs4 import BeautifulSoup
import requests
url = 'https://howlongtobeat.com/game.php?id=38050'
response = requests.get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
game_times_tag = html_soup.find('div', class_='game_times')
game_time_list = []
for li_tag in game_times_tag.find_all('li'):
title = li_tag.find('h5').text.strip()
play_time = li_tag.find('div').text.strip()
game_time_list.append((title, play_time))
for game_time in game_time_list:
print(game_time)
profile_header_tag = html_soup.find("div", {"class":"profile_header shadow_text"})
game_name = profile_header_tag.text.strip()
print(game_name)较短版本
1
2
3
4
5 game_length = html_soup.select('div.game_times li div')[-1].text
game_name = html_soup.select('div.profile_header')[0].text
developer = html_soup.find_all('strong', string='
Developer:
')[0].next_sibling