目录
Python 获取ajax的post请求数据
需求:
1 ajax的过程分析:
1.1 首先是这个店铺分类的url地址,来打开这个绘画session:
1.2 获取ajax发送需要的数据:
2.页面的分析
2.1当点击下一页时,会发现地址栏中url和页面没变化,那么通过分析发现他是使用了ajax方法
2.2获取请求地址url和请求头headers 后面要用到:
2.3然后分析需要最后的结果数据:
3.开始Python 编码:
4 展示结果:
Python 获取ajax的post请求数据
知识小提示:用dict()方法把字符串转换为字典(可能报错) ,字符串转字典要用eval(),这个方法很多书上都没有介绍,eval()的用法
https://www.runoob.com/python/python-func-eval.html
需求:
http://shop.11st.co.kr/stores/522047/category 这个是店铺的每个产品的url网址。
1 ajax的过程分析:
1.1 首先是这个店铺分类的url地址,来打开这个绘画session:
1 | session_url=http://shop.11st.co.kr/stores/522047/category |
1.2 获取ajax发送需要的数据:
请求地址: jump_url , 发送表格数据: form_data, 发送方式: post,请求头信息: headers
2.页面的分析
2.1当点击下一页时,会发现地址栏中url和页面没变化,那么通过分析发现他是使用了ajax方法
method:StoreSearchListingAjax
F12选择network的xhr ,点击下一页发现多了 多了一条xhr:http://shop.11st.co.kr/storesAjax/StoreListingAjaxAction.tmall? method=StoreSearchListingAjax
2.2获取请求地址url和请求头headers 后面要用到:
jump_url:
1 | http://shop.11st.co.kr/storesAjax/StoreListingAjaxAction.tmall?method=StoreSearchListingAjax |
headers:
1 2 3 4 5 6 7 8 9 10 11 12 13 | POST /storesAjax/StoreListingAjaxAction.tmall?method=StoreSearchListingAjax HTTP/1.1 Host: shop.11st.co.kr Connection: keep-alive Content-Length: 137 Accept: application/json, text/javascript, */*; q=0.01 Origin: http://shop.11st.co.kr X-Requested-With: XMLHttpRequest User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36 Content-Type: application/x-www-form-urlencoded; charset=UTF-8 Referer: http://shop.11st.co.kr/stores/522047/category Accept-Encoding: gzip, deflate Accept-Language: zh-CN,zh;q=0.9 Cookie: WMONID=QQRH5vHq0a_; PCID=15879036685077771016115; XSRF-TOKEN=cbfcf578-5b14-5ed1-380f-308df58ad410; TD=TB_ACT_DATA%7C0%3AN%3A-1%3A0%3AN%3A-1; _ga=GA1.3.1915814488.1587903671; _gid=GA1.3.1451948013.1587903671; PCID_FRV=true; RAKE_SID=15879036717188511103499; RAKE_SID_XSITE=15879036717188511103499; TP=scrnChk%7CY%23TB_DATA_CHK%7CN%3AY%23GLOBAL_DOMESTIC_ACCESS%7CY; DMP_UID=(DMPC)4d9b4410-f650-4dc5-bb6c-6f7bb58d62dc; AUID=AUID_iET0kpU0K6EvJpF84MPHow; TT=GLOBAL_CHINESE_IP_YN%7CY%2311ST_EN_CURR%7CCNY%23GLOBAL_DELIVERY%7C222%23GLOBAL_CHARSET%7Czh; JSESSIONID=m1O2eCNosbWjDxfe6IhayxSVgTg2P8w1h_urNbRD-KFHFV8Gn5kP!-1198550641 |
ajax的发送的数据form data:
1 2 3 4 5 6 7 8 9 | searchKwd: storeId: 522047 storeNo: 522047 encSellerNo: 19wqwPhwf0bYTT5rhUwvVA== sortCd: NP filter: pageNo: 2 pageTypeCd: 02 trTypeCd: STP06 |
2.3然后分析需要最后的结果数据:
3.开始Python 编码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | #-*-coding:utf-8-*- import requests import json headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8', 'X-Requested-With': 'XMLHttpRequest' } form_data = { 'searchKwd': '', 'storeId': '522047', 'storeNo': '522047', 'encSellerNo': '19wqwPhwf0bYTT5rhUwvVA==', 'sortCd': 'NP', 'filter': '', "pageNo": '', 'pageTypeCd': '02', 'trTypeCd': 'STP06', } first_url="http://shop.11st.co.kr/stores/522047/category" jump_url="http://shop.11st.co.kr/storesAjax/StoreListingAjaxAction.tmall?method=StoreSearchListingAjax" s = requests.Session() print(type(s)) s.post(first_url) prdDtlUrl_list=[] def getProDtlUrl_list(s,prdDtlUrl_list): totalpage=int(9301/30+1) # for pageNo in range(1,totalpage+1): #全部数据 for pageNo in range(1,3): #先测试2页数据 form_data['pageNo'] = '{}'.format(pageNo) response = s.post(jump_url, data=form_data, headers=headers) byte_content=response.content #byte数据 # str=str(byte_content,'utf-8') #字符串 # json_data=json.loads(byte_content) #转json数据/load(file数据),dumps/dump转其他 # print(str) print(eval(byte_content)) #转字典类型eval(),当使用dict()时,不太好用报错 dict_content=eval(byte_content) productList=dict_content['data']['productList'] print(productList) for d in productList: prdDtlUrl_list.append(d['prdDtlUrl']) print(prdDtlUrl_list) getProDtlUrl_list(s,prdDtlUrl_list) print('以下为提取的产品url地址=================') for url in prdDtlUrl_list: print(url) |
4 展示结果:
使用正则表达式方式:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | import requests import re process = requests.Session() url = 'http://shop.11st.co.kr/stores/522047/category' url1 = 'http://shop.11st.co.kr/storesAjax/StoreListingAjaxAction.tmall?method=StoreSearchListingAjax' headers = { 'Accept': 'application/json, text/javascript, */*; q=0.01', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', 'Content-Length': '137', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8', 'Cookie': 'WMONID=zErTjIsNZ9l; PCID=15879009766040741912504; XSRF-TOKEN=2b308af8-e144-4c2f-216c-c6a9e24bad38; TD=TB_ACT_DATA%7C0%3AN%3A-1%3A0%3AN%3A-1; TP=scrnChk%7CY%23TB_DATA_CHK%7CN%3AY; _ga=GA1.3.406950129.1587900979; _gid=GA1.3.567725348.1587900979; PCID_FRV=true; RAKE_SID=15879009798878793472311; RAKE_SID_XSITE=15879009798878793472311; recopick_uid=41130255.1587901035216; plab.uid=753a5979-7ecb-491e-9bc2-12c09575be23; plab.h.11st_web=; _ascend_uid=3724501418_1587901035:1587901035331; DMP_UID=(DMPC)20a03a6b-3f3c-4e03-b9db-e1eeebe83f29; AUID=AUID_KkOd8Kfs5H-UandUvp-ohg; JSESSIONID=JXi2XhQJcEyhR3CjZDzRtp5E5rk7o1YWd2xdNh9DJP_M1x_aSW_I!-1198550641', 'Host': 'shop.11st.co.kr', 'Origin': 'http://shop.11st.co.kr', 'Pragma': 'no-cache', 'X-Requested-With': 'XMLHttpRequest', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36', 'Referer': 'http://shop.11st.co.kr/stores/522047/category', } data = { 'searchKwd': '', 'storeId': '522047', 'storeNo': '522047', 'encSellerNo': '19wqwPhwf0bYTT5rhUwvVA==', 'sortCd': 'NP', 'filter': '', 'pageTypeCd': '02', 'trTypeCd': 'STP06', } # '': '2' # 获取url内容 def get_urls(): response = process.get(url) response.encoding = 'utf-8' # '<a href="http://www.11st.co.kr/product/.*?" id=".*?" data-ga-event-category' # for index in range(0, len(urls)): # urls[index] = 'http://www.11st.co.kr/product/SellerProductDetail.tmall?method=getSellerProductDetail&prdNo=' + urls[index] # print(urls[index]) for i in range(2, 301): data['pageNo'] = str(i) response = process.post(url1, data=data, headers=headers) urls = re.findall('prdNo=.*?&trTypeCd=STP06', response.text) for index in range(0, len(urls)): urls[index] = 'http://www.11st.co.kr/product/SellerProductDetail.tmall?method=getSellerProductDetail&prdNo=' + urls[index] print(urls[index]) get_urls() |