abstract
Prepare to systematically learn python's triple library, requests. The urllib that comes with python, urllib3, is not written.
The reference book is another big guy
System environment: MacOS, python3
Reference: Requests API;Reference Book Download;
Preparatory knowledge
Message structure of HTTP
Reference material: https://www.runoob.com/http/http-messages.html
There are two types of messages: client-to-server, server-to-client
Message structure has three parts: start line, header field, empty line, message body
Starting line | Head Domain | Blank Line | Message Body | |
Client Request Message | Request line | Request header | Blank Line | Requestor |
---|---|---|---|---|
Server response message | Response line | Response Header | Blank Line | Response Body |
HTTP Request Message
Reference material: https://www.runoob.com/http/http-methods.html
- Request line
HTTP1.0 defines three request methods: GET, POST, and HEAD.
HTTP1.1 adds six new request methods: OPTIONS, PUT, PATCH, DELETE, TRACE, and CONNECT.
Sequence Number | Method | describe |
---|---|---|
1 | GET | Requests the specified page information and returns the entity body. |
2 | HEAD | Similar to a GET request, except there is nothing specific in the response returned to get the header |
3 | POST | Submit data to a specified resource to process the request (for example, submit a form or upload a file, where the data is contained in the request body). POST requests may result in the creation of new resources and/or modification of existing resources. |
4 | PUT | The data transferred from the client to the server replaces the contents of the specified document. |
5 | DELETE | Request the server to delete the specified page. |
6 | CONNECT | The HTTP/1.1 protocol is reserved for proxy servers that can change connections to pipeline. |
7 | OPTIONS | Allows clients to view the performance of the server. |
8 | TRACE | Requests received by the echo server, mainly for testing or diagnostics. |
9 | PATCH | It is a complement to the PUT method used to locally update known resources. |
- Request Header
Method | describe |
---|---|
Accept | The browser declares the type of file requested. |
Accept-Encoding | The browser declares the encoding format of the accepted response content. |
Accept-Language | The browser declares the natural language type it accepts. |
Connection | The browser declares the connection mode for this request. |
Cookie | Data sent by the browser stored locally to claim the identity of this request and facilitate session tracking. Typically composed of key-value pairs, it is often used for persistent authentication of user identity. |
Host | Represents the primary domain name requested by the browser. |
User-Agent | Identification of browser. The identities of browsers from different operating systems, versions, and manufacturers are different. |
- Requestor
Usually only post, put request methods are used.
HTTP response message
Reference material: https://www.runoob.com/http/http-status-codes.html
- Response line
The response line is a status code consisting of three decimal digits, the first of which defines the type of status code.
Responses are divided into five categories: information response (100-199), successful response (200-299), redirection (300-399), client error (400-499), and server error (500-599).
classification | Classification Description |
---|---|
1** | Information, the server receives the request and needs the requestor to continue |
2** | Successfully, the operation was successfully received and processed |
3** | Redirection, further action is required to complete the request |
4** | Client error, request contains syntax error or cannot complete request |
5** | Server error, server error in processing request |
Common HTTP status codes:
Status Code | Explain |
---|---|
200 | Request succeeded |
301 | Resources (web pages, etc.) are permanently moved to other URL s |
404 | The requested resource (web page, etc.) does not exist |
500 | Internal Server Error |
- Response Header
key | Explain |
---|---|
Connection | Represents the mode of this HTTP connection |
Content-Encoding | Represents the encoding of the response entity. When a browser sends a request, it carries its own list of supported content encoding formats through the Accept-Encoding header field. When received on the server side, a response entity is selected to encode and the selected format is indicated by the Content-Encoding response header. When the browser gets the response body, decompress it according to Content-Encoding. |
Content-Type | Represents the type of response entity used to define the file type of the response and the encoding of the Web page to determine what form and encoding the browser will read the response entity. |
Date | Current GMT time. |
Server | Represents the architecture of the response server. |
Transfer-Encoding | Represents the transmission encoding mode of the response entity. |
- Response Body
Page Body
More detailed description
Install requests Library
Install directly in terminal.
pip3 install requests
Note whether terminal uses system bash or zsh, the environment variables of the two scripts are not common, and which one was used before.
Basic use of requests
# Guide Pack import requests # Website url = "https://www.baidu.com" # Request Method method = "get" # Get Content response = requests.request(method, url) # View what's returned a = dir(response) print("_______________All attributes of the response result_______________") for i in range(len(a)): if i % 5 == 0: print() print(f'{a[i]}'.ljust(25), end = ' ')
print("\n_______________Destination Address_______________") print(response.url) print("\n_______________Response Header_______________") h = response.headers for i in h: print(f'{i}'.ljust(20)+':'+f'{h[i]}') # Encoding Method print("\n_______________Encoding Method of Response Body_______________") print(f"Original encoding method:"+response.encoding) response.encoding = "utf-8" print(f'Modified encoding:'+response.encoding) # Content of Response Body print("\n_______________Content of Response Body_______________") print(f'adopt text Method acquisition:\n'+response.text[:100] + "...") print(f'adopt content Method acquisition:\n', response.content[:100])
Normal use of requests
Reference material: https://docs.python-requests.org/en/latest/_modules/requests/api/#head
The server is often requested as follows:
url = "https://www.baidu.com" r = requests.get(url) r = requests.post(url) r = requests.put(url) r = requests.options(url) ...
But inside the code is actually executing the following sentence
method = "post" requests.request(method, url) ...
Add Request Header Information
import requests header = { 'Host': 'www.baidu.com', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36', 'Connection': 'Keep-Alive', 'Content-Type': 'text/plain; Charset=UTF-8', 'Accept-Language': 'zh-cn', 'Cookie': 'BAIDUID=EB6B88EE649F5D3157DC4B26CBF117BD:FG=1;', } r = requests.get("https://www.baidu.com", headers=header) c = requests.request("get", "https://www.baidu.com", headers=header) # Both methods are the same print(r.text[:222]) print(c.text[:222])