catalogue
9.1.2 submitting cookies to simulate login
9.2.2 view web page source code submission form
Whether it is a simple web page or a web page using asynchronous loading technology, the web page information is obtained by requesting the web address through the GET method. But how to GET the information after the login form? This section will explain the Post method of reqrequests library, fill in the form to obtain web page information by observing the form code and reverse engineering, and simulate login to the website by submitting Cookie information.
The main knowledge points of this section are as follows:
Form interaction: use the POST method of Requests library for form interaction
Cookie: understand the basic concept of cookie
Simulated Login: learn to use Cookie information to simulate login to the website
9.1 simulated Login
Sometimes, form fields may be wrapped in encryption or other forms. This increases the difficulty of constructing the form. You can choose to submit Cookie information for simulated Login.
9.1.1Cookie overview
Cookie refers to the data stored on the local terminal by some websites in order to identify users and track session s. Internet shopping companies provide users with goods of relevant interest by tracking users' cookie information. Similarly, because cookies save the user's information, we can simulate login to the website by submitting cookies.
9.1.2 submitting cookies to simulate login
Next, take yaozhi.com as an example to find Cookie information and submit it to simulate logging in to yaozhi.com.
(1) Enter yaozhi.com, open the developer tool of Chrome browser and select the Network option.
(2) Manually enter the account and password to log in. At this time, you will find that many files will be loaded in the Network.
(3) At this time, you do not need to view the file information of the login page, but directly view the file information after login, as shown in the following figure
(1) Add cookies to headers to pass parameters
import requests member_url = 'https://www.yaozh.com/member/' headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36', 'Cookie': '_ga=GA1.2.1508206450.1629445703; UtzD_f52b_saltkey=NzQM8wJu;UtzD_f52b_lastvisit=1629442261; yaozh_uidhas=1; UtzD_f52b_ulastactivity=1629445859%7C0; _gid=GA1.2.468845100.1630220751; yaozh_userId=828458; PHPSESSID=q3p104a4m5oidc0ftanp7n4rk7; Hm_lvt_65968db3ac154c3089d7f9a4cbb98c94=1629705497,1629792799,1630220751,1630292865; yaozh_mylogin=1630304933; UtzD_f52b_creditnotice=0D0D2D0D0D0D0D0D0D721338; UtzD_f52b_creditbase=0D0D6D0D0D0D0D0D0;UtzD_f52b_creditrule=%E6%AF%8F%E5%A4%A9%E7%99%BB%E5%BD%95;acw_tc=707c9f9816303118236322722e19d2bb3bf070bb34e00eed1b4400b5b6ab67;UtzD_f52b_lastact=1630311824%09uc.php%09; Hm_lpvt_65968db3ac154c3089d7f9a4cbb98c94=1630311838' } response = requests.get(member_url,headers=headers) print(response.text)
Please note that sometimes the Cookie value you directly copy is incorrect. If there is an error at this time, you need to check it on the website.
(2) Cookie s are passed directly as parameters
member_url = 'https://www.yaozh.com/member/' headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36' } Cookies = '_ga=GA1.2.1508206450.1629445703; UtzD_f52b_saltkey=NzQM8wJu; UtzD_f52b_lastvisit=1629442261; yaozh_uidhas=1; UtzD_f52b_ulastactivity=1629445859|0; _gid=GA1.2.468845100.1630220751; yaozh_userId=828458; PHPSESSID=q3p104a4m5oidc0ftanp7n4rk7; Hm_lvt_65968db3ac154c3089d7f9a4cbb98c94=1629705497,1629792799,1630220751,1630292865; yaozh_mylogin=1630304933; UtzD_f52b_creditnotice=0D0D2D0D0D0D0D0D0D721338; UtzD_f52b_creditbase=0D0D6D0D0D0D0D0D0; UtzD_f52b_creditrule=Log in every day; acw_tc=707c9f9816303118236322722e19d2bb3bf070bb34e00eed1b4400b5b6ab67; UtzD_f52b_lastact=1630311824 uc.php ; Hm_lpvt_65968db3ac154c3089d7f9a4cbb98c94=1630311838' """ #What is needed is a dictionary cook_dict = {} cookies_list = cookies.split('; ') for cookie in cookies_list: cook_dict[cookie.split('=')[0]] = cookie.split('=')[1] """ # Dictionary derivation cook_dict = {cookie.split('=')[0]:cookie.split('=')[1] for cookie in cookies.split('; ')} response = requests.get(member_url, headers=headers, cookies=cook_dict) data = response.content.decode()
9.2 form interaction
This section will explain how to use the POST of Requests library, submit the form by observing the web page source code of the form, and finally obtain the fields submitted by the form through reverse engineering, so as to interact with the form.
9.2.1 post method
The POST method of Requests library is simple. You only need to simply pass the data of a dictionary structure to the data parameter. In this way, when the request is initiated, it will be automatically encoded into form form to complete the form filling.
import requests params = { 'key':'value1', 'key':'value2', 'key':'value3' } html = requests.post(url,data=params) #post method print(html.text)
9.2.2 view web page source code submission form
(1) Open yaozhi.com, locate the login location, use Chrome browser to "check" and find the location of the login element, as shown in the following figure:
Here we should pay attention to several parameters. The first is our username and pwd, and the second is the formhash and backurl in the last two lines, because the four parameters are the parameters in from data, which are very important
(2) Construction code
import requests url = 'https://www.yaozh.com/login '# login website headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36' } data = { #Parameters passed 'username':'18303056364', 'pwd':'qfkjr8yn', 'formhash':'5AE08D06CB', 'backurl':'https%3A%2F%2Fwww.yaozh.com%2F' } session = requests.session() #Establish a session r = session.post(url,data=params,headers=headers) #Log in to the website with parameters member_url= 'https://www.yaozh.com/member / '# login to the page to visit r2 = session.get(member_url,headers=headers) #Visit the personal center with a cookie print(r2.text)
There is a point to be prompted, that is, the values of formhash and backurl need to be found before login. It can be found in the unlisted web page source code. Formhash will change according to time.