Python uses Requests and BS4 to realize the analysis and download of blue cloud direct chain

In many cases, we will use the program independent update function, so today we will use BlueCloud to realize the independent update of a Python program (of course, today is only to complete the direct link address of the request BlueCloud)

Article catalog

Thinking analysis

We can first apply for an account of blueplay cloud, which is just described here. Here, we can use a website here to analyze cookies address

What we need to get is the link address of the red bottom
We can open the developer mode and see the iframe tag

It leads us to another link address

This is the real location of the red bottom
When we think about this main interface, we can only see the timeout display, and we can't see other content. Then we can know that it comes to data through background loading or interface

We can continue to search, and finally ajaxm.php Address found in

Request this address (invalid), we can see that the download is successful

So let's see how to request to get data, and then we will simulate this request to get data again

We can see that it is submitted with a form. action, sign and vis are the parameters of the form. As long as we send this form to the php website, we can get the blue cloud direct chain

code implementation

#coding=utf-8
from bs4 import BeautifulSoup
import requests
import re
import time
import json

def GetHeaders(url):
	headers = {
	    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36',
	    'referer': url,#Blueplay cloud share file link address
	    'Accept-Language': 'zh-CN,zh;q=0.9',
	}
	return headers

def GetRealAddress(url):
	# Get html file of sharing page
	res = session.get(url,headers=GetHeaders(url))
	# Introduce the beautifulsop library to process html and get the js file in iframe
	soup = BeautifulSoup(res.text,'html.parser')
	url2 = 'https://www.lanzous.com/'+soup.find('iframe')['src']
	#print(url2)
	res2 = session.get(url2,headers=GetHeaders(url2))

	# Three parameters of regular extraction request

	a = re.findall(r'var a = \'([\w]+?)\';',res2.text)
	params = re.findall(r'var [\w]{6} = \'([\w]+?)\';',res2.text)
	#print(params)

	# Request download address
	url3 = 'https://www.lanzous.com/ajaxm.php'
	data = {
	    'action':'downprocess',
	    'sign':params[1],
	    'ves':1,
	}
	res3 = session.post(url3,headers=GetHeaders(url2),data=data)
	res3 = json.loads(res3.content)


	# Request final redirect address
	try:
		url4 = res3['dom']+'/file/'+res3['url']
	except:
		while True:
			res3 = session.post(url3,headers=GetHeaders(url2),data=data)
			res3 = json.loads(res3.content)
			a = res3.get("inf")
			#print(res3)
			if a == "Timeout, please refresh":
				print("Trying to crawl for you for the second time, please wait...")
			else:
				break
	else:
		pass
	print(url4)
	headers2 = {
	    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0',
	    'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
	}
	res4 = session.head(url4, headers=headers2)
	file_address = res4.headers['Location']

	return file_address

if __name__ == '__main__':
	session = requests.session()
	address = GetRealAddress("https://fightmountain.lanzous.com/ixtAme2m0kd")
	r = requests.get(address) 
	with open("cookies.txt",'wb') as f:
	    f.write(r.content)

In this case, we mainly use session to maintain the session, then use post method to get data, finally request data, and finally get the final straight chain

Reference documents

Python crawls the blue cloud direct chain (get the real file address)

Tags: Session PHP JSON Python

Posted on Fri, 26 Jun 2020 23:46:17 -0400 by Typer999