In many cases, we will use the program independent update function, so today we will use BlueCloud to realize the independent update of a Python program (of course, today is only to complete the direct link address of the request BlueCloud)
Article catalog
Thinking analysisWe can first apply for an account of blueplay cloud, which is just described here. Here, we can use a website here to analyze cookies address
What we need to get is the link address of the red bottom
We can open the developer mode and see the iframe tag
It leads us to another link address
This is the real location of the red bottom
When we think about this main interface, we can only see the timeout display, and we can't see other content. Then we can know that it comes to data through background loading or interface
We can continue to search, and finally ajaxm.php Address found in
Request this address (invalid), we can see that the download is successful
So let's see how to request to get data, and then we will simulate this request to get data again
We can see that it is submitted with a form. action, sign and vis are the parameters of the form. As long as we send this form to the php website, we can get the blue cloud direct chain
#coding=utf-8 from bs4 import BeautifulSoup import requests import re import time import json def GetHeaders(url): headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36', 'referer': url,#Blueplay cloud share file link address 'Accept-Language': 'zh-CN,zh;q=0.9', } return headers def GetRealAddress(url): # Get html file of sharing page res = session.get(url,headers=GetHeaders(url)) # Introduce the beautifulsop library to process html and get the js file in iframe soup = BeautifulSoup(res.text,'html.parser') url2 = 'https://www.lanzous.com/'+soup.find('iframe')['src'] #print(url2) res2 = session.get(url2,headers=GetHeaders(url2)) # Three parameters of regular extraction request a = re.findall(r'var a = \'([\w]+?)\';',res2.text) params = re.findall(r'var [\w] = \'([\w]+?)\';',res2.text) #print(params) # Request download address url3 = 'https://www.lanzous.com/ajaxm.php' data = { 'action':'downprocess', 'sign':params[1], 'ves':1, } res3 = session.post(url3,headers=GetHeaders(url2),data=data) res3 = json.loads(res3.content) # Request final redirect address try: url4 = res3['dom']+'/file/'+res3['url'] except: while True: res3 = session.post(url3,headers=GetHeaders(url2),data=data) res3 = json.loads(res3.content) a = res3.get("inf") #print(res3) if a == "Timeout, please refresh": print("Trying to crawl for you for the second time, please wait...") else: break else: pass print(url4) headers2 = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0', 'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2', } res4 = session.head(url4, headers=headers2) file_address = res4.headers['Location'] return file_address if __name__ == '__main__': session = requests.session() address = GetRealAddress("https://fightmountain.lanzous.com/ixtAme2m0kd") r = requests.get(address) with open("cookies.txt",'wb') as f: f.write(r.content)
In this case, we mainly use session to maintain the session, then use post method to get data, finally request data, and finally get the final straight chain
Reference documentsPython crawls the blue cloud direct chain (get the real file address)