Python uses Requests and BS4 to realize the analysis and download of blue cloud direct chain

In many cases, we will use the program independent update function, so today we will use BlueCloud to realize the indepe...

In many cases, we will use the program independent update function, so today we will use BlueCloud to realize the independent update of a Python program (of course, today is only to complete the direct link address of the request BlueCloud)

Article catalog

Thinking analysis

We can first apply for an account of blueplay cloud, which is just described here. Here, we can use a website here to analyze cookies address

What we need to get is the link address of the red bottom
We can open the developer mode and see the iframe tag

It leads us to another link address

This is the real location of the red bottom
When we think about this main interface, we can only see the timeout display, and we can't see other content. Then we can know that it comes to data through background loading or interface

We can continue to search, and finally ajaxm.php Address found in

Request this address (invalid), we can see that the download is successful

So let's see how to request to get data, and then we will simulate this request to get data again

We can see that it is submitted with a form. action, sign and vis are the parameters of the form. As long as we send this form to the php website, we can get the blue cloud direct chain

code implementation
#coding=utf-8 from bs4 import BeautifulSoup import requests import re import time import json def GetHeaders(url): headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36', 'referer': url,#Blueplay cloud share file link address 'Accept-Language': 'zh-CN,zh;q=0.9', } return headers def GetRealAddress(url): # Get html file of sharing page res = session.get(url,headers=GetHeaders(url)) # Introduce the beautifulsop library to process html and get the js file in iframe soup = BeautifulSoup(res.text,'html.parser') url2 = 'https://www.lanzous.com/'+soup.find('iframe')['src'] #print(url2) res2 = session.get(url2,headers=GetHeaders(url2)) # Three parameters of regular extraction request a = re.findall(r'var a = \'([\w]+?)\';',res2.text) params = re.findall(r'var [\w] = \'([\w]+?)\';',res2.text) #print(params) # Request download address url3 = 'https://www.lanzous.com/ajaxm.php' data = { 'action':'downprocess', 'sign':params[1], 'ves':1, } res3 = session.post(url3,headers=GetHeaders(url2),data=data) res3 = json.loads(res3.content) # Request final redirect address try: url4 = res3['dom']+'/file/'+res3['url'] except: while True: res3 = session.post(url3,headers=GetHeaders(url2),data=data) res3 = json.loads(res3.content) a = res3.get("inf") #print(res3) if a == "Timeout, please refresh": print("Trying to crawl for you for the second time, please wait...") else: break else: pass print(url4) headers2 = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0', 'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2', } res4 = session.head(url4, headers=headers2) file_address = res4.headers['Location'] return file_address if __name__ == '__main__': session = requests.session() address = GetRealAddress("https://fightmountain.lanzous.com/ixtAme2m0kd") r = requests.get(address) with open("cookies.txt",'wb') as f: f.write(r.content)

In this case, we mainly use session to maintain the session, then use post method to get data, finally request data, and finally get the final straight chain

Reference documents

Python crawls the blue cloud direct chain (get the real file address)

26 June 2020, 23:46 | Views: 3666

Add new comment

For adding a comment, please log in
or create account

0 comments