python implements automatic logon implementation code for websites with authentication codes

This example logs in to a website that requires a username, password, and authentication code, where python's urllib2 is used to log in directly to the site and handle the cookies for the site
Earlier I heard that it is very convenient to use python as a web crawler. Just these days, units also have this demand. They need to log on to XX website to download some documents, so I experimented with it personally and the results are good.

This example logs in to a website that requires a username, password, and authentication code, where python's urllib2 is used to log in directly to the site and handle the cookies for the site.

How Cookie s work:
Cookies are generated by the server and sent to the browser, which saves the cookies in a text file in a directory.The next time you request the same Web site, the Cookie is sent to the server so that the server knows if the user is legal and needs to log in again.

Python provides a basic cookie eLib library, which is automatically saved the first time a page is visited, and then all other pages are visited with a properly logged-in Cookie.

Principle:

(1) Activate cookie function
(2) Anti-theft Chain, masquerading as browser access
(3) Access the Verification Code link and download the Verification Code picture locally
(4) There are many recognition schemes of authentication codes on the network, python also has its own image processing library, this example calls OCR identification interface of the locomotive collector.
(5) Processing of forms, using capture tools such as fiddler to get parameters to submit
(6) Generate data to be submitted, generate http requests, and send
(7) Determine whether the landing was successful based on the returned js page
(8) Download other pages after successful login

In this example, multiple accounts are used to poll for logins, and each account downloads three pages.

The download website will not be disclosed due to some problems.

Here are some of the codes:

#!usr/bin/env python
#-*- coding: utf-8 -*-
 
import os
import urllib2
import urllib
import cookielib
import xml.etree.ElementTree as ET
 
 
#-----------------------------------------------------------------------------
# Login in www.***.com.cn
def ChinaBiddingLogin(url, username, password):
    # Enable cookie support for urllib2
    cookiejar=cookielib.CookieJar()
    urlopener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar))
    urllib2.install_opener(urlopener)
     
    urlopener.addheaders.append(('Referer', 'http://www.chinabidding.com.cn/zbw/login/login.jsp'))
    urlopener.addheaders.append(('Accept-Language', 'zh-CN'))
    urlopener.addheaders.append(('Host', 'www.chinabidding.com.cn'))
    urlopener.addheaders.append(('User-Agent', 'Mozilla/5.0 (compatible; MISE 9.0; Windows NT 6.1); Trident/5.0'))
    urlopener.addheaders.append(('Connection', 'Keep-Alive'))
 
 
    print 'XXX Login......'
 
 
    imgurl=r'http://www.*****.com.cn/zbw/login/image.jsp'
    DownloadFile(imgurl, urlopener)
    authcode=raw_input('Please enter the authcode:')
    #authcode=VerifyingCodeRecognization(r"http://192.168.0.106/images/code.jpg")
 
 
    # Send login/password to the site and get the session cookie
    values={'login_id':username, 'opl':'op_login', 'login_passwd':password, 'login_check':authcode}
    urlcontent=urlopener.open(urllib2.Request(url, urllib.urlencode(values)))
    page=urlcontent.read(500000)
 
 
    # Make sure we are logged in, check the returned page content
    if page.find('login.jsp')!=-1:
        print 'Login failed with username=%s, password=%s and authcode=%s' \
                % (username, password, authcode)
        return False
    else:
        print 'Login succeeded!'
        return True
 
 
#-----------------------------------------------------------------------------
# Download from fileUrl then save to fileToSave
# Note: the fileUrl must be a valid file
def DownloadFile(fileUrl, urlopener):
    isDownOk=False
 
 
    try:
        if fileUrl:
            outfile=open(r'/var/www/images/code.jpg', 'w')
            outfile.write(urlopener.open(urllib2.Request(fileUrl)).read())
            outfile.close()
 
 
            isDownOK=True
        else:
            print 'ERROR: fileUrl is NULL!'
    except:
        isDownOK=False
 
 
    return isDownOK
 
 
#------------------------------------------------------------------------------
# Verifying code recoginization
def VerifyingCodeRecognization(imgurl):
    url=r'http://192.168.0.119:800/api?'
    user='admin'
    pwd='admin'
    model='ocr'
    ocrfile='cbi'
 
 
    values={'user':user, 'pwd':pwd, 'model':model, 'ocrfile':ocrfile, 'imgurl':imgurl}
    data=urllib.urlencode(values)
 
 
    try:
        url+=data
        urlcontent=urllib2.urlopen(url)
    except IOError:
        print '***ERROR: invalid URL (%s)' % url
 
 
    page=urlcontent.read(500000)
 
 
    # Parse the xml data and get the verifying code
    root=ET.fromstring(page)
    node_find=root.find('AddField')
    authcode=node_find.attrib['data']
 
 
    return authcode
 
 
#------------------------------------------------------------------------------
# Read users from configure file
def ReadUsersFromFile(filename):
    users={}
    for eachLine in open(filename, 'r'):
        info=[w for w in eachLine.strip().split()]
        if len(info)==2:
            users[info[0]]=info[1]
 
 
    return users
 
 
#------------------------------------------------------------------------------
def main():
    login_page=r'http://www.***.com.cnlogin/login.jsp'
    download_page=r'http://www.***.com.cn***/***?record_id='
 
 
    start_id=8593330
    end_id=8595000
 
 
    now_id=start_id
    Users=ReadUsersFromFile('users.conf')
    while True:
        for key in Users:
            if ChinaBiddingLogin(login_page, key, Users[key]):
                for i in range(3):
                    pageUrl=download_page+'%d' % now_id
                    urlcontent=urllib2.urlopen(pageUrl)
 
 
                    filepath='./download/%s.html' % now_id
                    f=open(filepath, 'w')
                    f.write(urlcontent.read(500000))
                    f.close()
 
 
                    now_id+=1
            else:
                continue
#------------------------------------------------------------------------------
 
 
if __name__=='__main__':
    main()

Finally, recommend a good-known python gathering place [ Click to enter There are a lot of learning skills, learning experiences, interview skills, career experiences and so on shared by older generations. More people have carefully prepared zero basic introductory materials, actual project data, every day programmers regularly explain Python technology, share some learning methods and small details that need to be noticed.

20 original articles were published. 4. 20,000 visits+
Private letter follow

Tags: Python JSP xml network

Posted on Tue, 17 Mar 2020 14:07:14 -0400 by Browzer