Passing Selenium data results into Pandas
我正在尝试自动执行返回信息表的搜索。我可以将结果打印到.text中,但我的问题是如何将结果传递到熊猫数据框中。我问这个问题的原因是双重的;因为我想把结果打印到一个csv文件中,我需要在pandas中得到结果,以便以后进行数据分析。如果有人能帮忙,请感激。我的代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | import time from selenium import webdriver import pandas as pd search = ['0501020210597400','0501020210597500','0501020210597600'] df = pd.DataFrame(search) chrome_path = [Chrome Path] driver = webdriver.Chrome(chrome_path) driver.get('https://enquiry.mpsj.gov.my/v2/service/cuk_search/') x = 0 while x <(len(df.index)): search_box = driver.find_element_by_name('sel_value') new_line = (df[0][x]).format(x) search_box.send_keys(new_line) search_box.submit() time.sleep(5) table = driver.find_elements_by_class_name('tr-body') for data in table: print(data.text) driver.find_element_by_name('sel_value').clear() x +=1 driver.close() |
你可以使用请求和帖子来获取信息,而不是使用硒。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | import requests from bs4 import BeautifulSoup as bs import pandas as pd search = ['0501020210597400','0501020210597500','0501020210597600'] headers = {'Referer' : 'https://enquiry.mpsj.gov.my/v2/service/cuk_search/1', 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' } output = [] dfHeaders = ['No.', 'No. Akaun', 'Nama Di Bil', 'Jumlah Perlu Dibayar', ''] with requests.Session() as s: for item in search: r = s.get('https://enquiry.mpsj.gov.my/v2/service/cuk_search/1', headers = headers) soup = bs(r.content, 'lxml') key = soup.select_one('[name=ACCESS_KEY]')['value'] body = {'sel_input': 'no_akaun', 'sel_value': item, 'ACCESS_KEY': key} res = s.post('https://enquiry.mpsj.gov.my/v2/service/cuk_search_submit/', data = body) soup = bs(res.content, 'lxml') table = soup.select_one('.tbl-list') rows = table.select('.tr-body') for row in rows: cols = row.find_all('td') cols = [item.text.strip() for item in cols] output.append([item for item in cols if item]) df = pd.DataFrame(output, columns = dfHeaders) print(df) df.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8-sig',index = False ) |
要将csv文件加载到数据帧,可以执行以下操作:
1 | df = pd.read_csv('example.csv') |
查看在线文档:https://pandas.pydata.org/pandas docs/stable/generated/pandas.read_csv.html pandas.read_csv
要将数据写入csv,请参阅本文:pandas在so上将数据帧写入csv文件。
解决方案是:
1 | df.to_csv(file_name, sep='\t') |