Growth path of python crawler engineer 8 Selenium WebDriver

Article directory

Introduction to Selenium WebDriver

  • Selenium WebDriver is a local and remote real-time browser automation tool, which is the closest simulation of user behavior.

  • The goal of WebDriver is to provide a well-designed object-oriented API, which provides improved support for modern advanced web application testing problems.

  • Selenium webdriver is better able to support dynamic pages where the elements of the page change without reloading the page itself.

Selenium WebDriver principle

  • WebDriver is designed according to the pattern of C/S(Client/Server).

  • WebDriver starts the target browser and binds to the specified port. As the remote server of the web driver, the launched browser instance.

  • The Client sends http request to the Server according to our requirements, and returns the returned value and other information after performing various operations.

  • On the Server side, that is, the Remote server needs to rely on the native browser components (such as: chromedriver.exe) to wait for the Client to send the request and respond.

Selenium WebDriver installation

To install the selenium library, type

pip install selenium

If the following words appear, the installation is successful

Successfully installed selenium

Download webdriver matching browser version

chromedriver.exe

http://chromedriver.storage.googleapis.com/index.html

After downloading, put chromedriver.exe in the Scripts directory under the python installation directory

Selenium WebDriver using

Browser common operations

Launch browser

Different browsers, slightly different ways

from selenium import webdriver
#Google browser
dr = webdriver.Chrome()
#Firefox
dr = webdriver.Firefox()

Open web page

After executing the code, automatically open the web address you entered

dr.get('http://www.baidu.com')

Close browser

It is often necessary to close the browser after work. There are two ways to close the browser:

  • close(): close the current browser window
  • quit(): it will not only close the browser window, but also exit webdriver completely, release the connection with driver server, and release all resources
from selenium import webdriver
#Google browser
dr = webdriver.Chrome()
#Open web page
dr.get('http://www.baidu.com')
#Exit browser
dr.quit()
#dr.close()

Maximize browser

from selenium import webdriver
import time
#Google browser
dr = webdriver.Chrome()
#Open web page
dr.get('http://www.baidu.com')
#Maximize browser
dr.maximize_window()
#Easy to observe
time.sleep(3)
#Exit browser
dr.quit()
#dr.close()

Custom browser size

from selenium import webdriver
import time
#Google browser
dr = webdriver.Chrome()
#Open web page
dr.get('http://www.baidu.com')

dr.set_window_size(240, 320)

#Easy to observe
time.sleep(3)
#Exit browser
dr.quit()
#dr.close()

Print information for the current page

Print the title, url and source code of the current page (too many source codes will not be displayed)

from selenium import webdriver
import time
#Google browser
dr = webdriver.Chrome()
#Open web page
dr.get('http://www.baidu.com')

print('current_url:',dr.current_url)
print('source:',dr.page_source)
print('title:',dr.title)

#Easy to observe
time.sleep(3)
#Exit browser
dr.quit()
#dr.close()

Browser forward and backward

Back will return to the previous page of the current page, and forward will also return to the previous page of the current page

from selenium import webdriver
import time
import os

dr = webdriver.Chrome()

first_url = 'http://www.baidu.com'
dr.get(first_url)
print('current_url',dr.current_url)
time.sleep(1)

second_url = 'http://www.news.baidu.com'
dr.get(second_url)
print('current_url',dr.current_url)
time.sleep(1)

dr.back()
print("backing to %s"%(first_url))

time.sleep(1)
dr.forward()
print("forward to %s"%(second_url))
time.sleep(1)
dr.quit()

Single object positioning

  • find_element() for single object positioning.
  • Object location and operation are the core content of webdriver, in which operation is based on location, so the position of object location becomes more and more important

Demo document

Open Baidu homepage → right click → view source code of webpage

webdriver provides a series of object location methods, including:

by_id

To locate by id, you need to locate by the value of id

#Locate by id
print(dr.find_element_by_id('head').text)

by_name

To locate by name, you need to locate by the value of name

#Locate by name
dr.find_element_by_name('mp')

by_class_name

To locate by class name, you need to locate by the value of class name

#Locate by name
dr.find_element_by_name('mp')

by_tag_name

Use tag name to locate the value of the tag (for example, the value of < a > is a)

#Locate by tag name
print(dr.find_element_by_tag_name('div').text)

by_link_text

Locate by linked text description

For example, match the news link in the picture below

dr.find_element_by_link_text("Journalism")

by_css_selector

If you don't understand css selector, you can click to understand it or just use the above methods CSS introduction

#Get the label named div
dr.find_element_by_css_selector('div')

by_xpath

If you don't understand xpath, you can click to understand it or use other methods xpath lxml Library

dr.find_element_by_xpath('/html/body/form/div/label')

get attribute

Get the id attribute of div tag

dr.find_element_by_tag_name('div').get_attribute('id')

Get the text content of the div tag

print(dr.find_element_by_tag_name('div').text)

A set of object anchors

find_elements(), which is often used to locate a group of objects or batch operation objects, is the same as the previous single object positioning (find_element()) (it's OK to flip forward if it's unclear).

Simulate user actions

element.click()

Simulate user click on object element

#
dr.find_element_by_link_text("Journalism").click()

element.send_keys

Impersonate the user to type python into the text box named p

dr.find_element_by_name('q').send_keys('python')

element.clear()

Simulate user to clear input

dr.find_element_by_name('q').clear()

Analog keyboard input

Analog user input on keyboard

from selenium.webdriver.common.keys import Keys

#Simulation control+a
dr.find_element_by_id('p').send_keys((Keys.CONTROL, 'a'))
#Simulation control+c
dr.find_element_by_id('p').send_keys((Keys.CONTROL, 'c'))
#Simulation control+x
dr.find_element_by_id('p').send_keys((Keys.CONTROL, 'v'))
74 original articles published, 116 praised, 20000 visitors+
Private letter follow

Tags: Selenium Google Python Firefox

Posted on Fri, 13 Mar 2020 03:36:09 -0400 by m@tt