关于python:如何从< a>获取网址和标题

How to get the url and the title from the <a> tags with beautifulSoup

我正在编写一个脚本,用class="pntc txt"从div获取所有链接,在我想从获取链接后,标记href属性和Something之间的文本。for after获取该URL和文本并将它们插入数据库中。我将发布到目前为止所做的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
import urllib.request
from bs4 import *

sock = urllib.request.urlopen("http://as.com/tag/moto_gp/a/")
htmlSource = sock.read()                            
sock.close()                                        

soup = BeautifulSoup(htmlSource)


for div in soup.findAll('div', {'class': 'pntc-txt'}):
    a = div.findAll('a')
    print (a)


试试这个:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import requests
from bs4 import *

srcCode = requests.get("http://as.com/tag/moto_gp/a/")
plainText = srcCode.text

soup = BeautifulSoup(plainText)


for div in soup.findAll('div', {'class': 'pntc-txt'}):
    for each in div.findAll('a'):      #get all elements with 'a' tag
        href = each.get('href')
        print href          #print href
        print each.string   #print the text in tags
        print each          #print whole tag

注意:还删除了urllib部分以读取HTML页。替代使用的包requests