How to get the url and the title from the <a> tags with beautifulSoup
我正在编写一个脚本,用class="pntc txt"从div获取所有链接,在我想从
1 2 3 4 5 6 7 8 9 10 11 12 13 | import urllib.request from bs4 import * sock = urllib.request.urlopen("http://as.com/tag/moto_gp/a/") htmlSource = sock.read() sock.close() soup = BeautifulSoup(htmlSource) for div in soup.findAll('div', {'class': 'pntc-txt'}): a = div.findAll('a') print (a) |
试试这个:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import requests from bs4 import * srcCode = requests.get("http://as.com/tag/moto_gp/a/") plainText = srcCode.text soup = BeautifulSoup(plainText) for div in soup.findAll('div', {'class': 'pntc-txt'}): for each in div.findAll('a'): #get all elements with 'a' tag href = each.get('href') print href #print href print each.string #print the text in tags print each #print whole tag |
注意:还删除了urllib部分以读取HTML页。替代使用的包