Using the request module of urlib

1: urlopen

The most basic way to construct HTTP requests.

from urllib import request
response=urllib.request.urlopen('http://www.python.org')
print(response.read().decode('utf-8'))

note:

1:urllib.request.urlopen returns a < class' http. Client. Httpresponse '> followed by read(). After decode(), it is < class' STR '>

2:urlopen returns an object of type HTTPResponse, including: read(),readinto(),getheader(name),getheaders(),fileno(), msg,version,status,reason,debuglevel,closed and other attributes.

3: API for url parameter:

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

 

 

2: Request

Request is a class in the request module, which can add headers and other information to the request.

from urllib import request
request=request.Request('http;//python.org')
response=urllib.request.urlopen(request)
print(response.read().decode('utf-8'))

note:

1: We still use the urlopen() method to send the request, but this time we do not directly use the url as a parameter, but a request object. Function: independent request into an object, rich and flexible configuration parameters.

2: Parameter list for request:

class urllib.request.Request ( ur1, data=None, headers={}, origin_req_host=None, unverifiable=False, method =None)

3;

Next, we construct a request with four parameters, URL is the URL, user agent and Host are specified in the headers, and the parameter data is converted into byte stream with urlencode() and bytes() methods. (and read(),decode(), in reverse). The request method is also specified as POST.

from urllib import request,parse
url='http://httpbin.org/post'
headers={
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36',
    'Host':'httpbin.org'
}
dict={
    'name':'lingxiaoyun'
}
data=bytes(parse.urlencode(dict),encoding='utf-8')
req=request.Request(url=url,data=data,headers=headers,method='POST')
response=request.urlopen(req)
print(response.read().decode('utf-8'))



//Output:
{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "name": "lingxiaoyun"
  },
  "headers": {
    "Accept-Encoding": "identity",
    "Content-Length": "16",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin.org",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
    "X-Amzn-Trace-Id": "Root=1-5e632f08-460bfac8eda3a85c06b9b917"
  },
  "json": null,
  "origin": "119.134.98.185",
  "url": "http://httpbin.org/post"
}

 

3: Advanced usage of the request module.

There is a BaseHandler class in the request module. There are many Handler subclasses in this class. These subclasses of Handler are various "processors", which are specialized in handling Cookies login verification and proxy settings.

Hitpdefaultererorhandler: used to handle HTTP response errors, which will throw HTTP Error type exceptions.
Http redirecthandler: used to handle redirection.
HTTP cookie processor: used to process Coo kies.
ProxyHandler: used to set the proxy. The default proxy is empty.
H π PPasswordMgr: used to manage passwords. It maintains tables of user names and passwords.
HTTP basic authhandler: used to manage authentication. If a link needs authentication when it is opened, you can use it to solve
Solve the authentication problem.
In addition, there are other Handler classes, which are not listed here. For details, please refer to the official document: ht φ s:// docs.python
org/3/library/urllib.request.html#urllib.request.BaseHandler.

 

To use the above handlers to handle related matters, we need to use an OpenerDirector class, or opener for short. We build an opener with handler as a parameter, and then use opener's open() method (the result type returned by the open() method is the same as that of urlopen()). In short, it uses handler to build opener and open() to request html.

Example 1: Verification

from urllib.request import HTTPPasswordMgrWithDefaultRealm,HTTPBasicAuthHandler,build_opener
from urllib.error import URLError
username='Put the user name here'
password='Fill in the password here'
url='There's a shopping cart here url(Web address to log in)'
p=HTTPPasswordMgrWithDefaultRealm()#Create an HTTPPasswordMgrWithDefaultRealm object
p.add_password(None,url,username,password)#Add a user name and password to this object
auth_handler=HTTPBasicAuthHandler(p)#Create an instance of HTTPBasicAuthHandler with the parameter HTTPPasswordMgrWithDefaultRealm object
opener=build_opener(auth_handler)#Building an opener

try:
    result=opener.open(url)
    html=result.read().decode('utf-8', errors='ignore')
    print(html)
except URLError as e:
    print(e.reason)

 

Example 2: agent

from urllib.error import URLError
from urllib.request import ProxyHandler,build_opener

proxy_handler=ProxyHandler(
    {
        'http':'http://70.165.64.33:48678',
        'https':'https://70.165.64.33:48678'
    }
)
opener=build_opener(proxy_handler)
try:
    response=opener.open('https://www.baidu.com')
    print(response.read().decode('utf-8'))
except URLError as e:
    print(e.reason)

Build the handler, create the opener, and open.

You can find the agent you want at this website: https://ip.ihuan.me/

 

Example 3: cookies

First of all: how to obtain cookies of the website:

from http import cookiejar
from urllib import request
cookie=cookiejar.CookieJar()#Create a CookieJar object first
handler=request.HTTPCookieProcessor(cookie)#Build handler
opener=request.build_opener(handler)#Building opener
response=opener.open('http://www.baidu.com')
for item in cookie:
    print(item.name+'='+item.value)

 

We can also output it as a file:

#Generate a file with a cookie. txt
filename='cookies.txt'
cookie=cookiejar.MozillaCookieJar(filename)
handler=request.HTTPCookieProcessor(cookie)
opener=request.build_opener(handler)
response=opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True,ignore_expires=True)

At this time, cookie jar needs to be replaced with Mozilla cookie jar, which is used in generating files. It belongs to the subclass of cookie jar, and can handle things related to cookies, such as reading and saving cookies, and can save cookies to the cookie format of Mozilla browser.

 

You can also generate a file whose cookie style is LWP. When creating cookie jar, change it to:

 cookie=cookiejar.LWPCookieJar(filename)

 

To read and use cookies from a file:

cookie=cookiejar.LWPCookieJar()
cookie.load('cookies.txt',ignore_discard=True,ignore_expires=True)
handler=request.HTTPCookieProcessor(cookie)
opener=request.build_opener(handler)
response=opener.open('http://www.baidu.com')
print(response.read().decode('utf-8'))

Lwcookiejar is used above, provided that a cookie in lwokookiejar format has been generated.

Of course, you can:

cookie=cookiejar.MozillaCookieJar()

The premise is that there is already a cookie file of the corresponding type in the same folder of the program.

 

 

55 original articles published, 36 praised, 1458 visited
Private letter follow

Tags: Python Windows encoding JSON

Posted on Sat, 07 Mar 2020 21:16:11 -0500 by sv4rog