python crawler selenium page waiting

Selenium page waiting

Cookie operation
Get all cookie s
Get the cookie according to the name of the cookie
Delete a cookie
Page waiting
selenium is not born for crawlers, but what you see is what you crawl. It can be easily used to crawl data. The loading speed is very slow. You need to open the page and load the corresponding elements. If you open the web page, the internal elements will be loaded slowly. If you open the web page to find the elements immediately, an exception will be reported; Many web pages are loaded with Ajax, such as 12306. Data can only be loaded after selecting the date, going back and forth, and clicking query. These are the two reasons why the page needs to wait.
Previously, page waiting was built in python. The time module was imported and time.sleep was used
() forced waiting. In bad places, the waiting time is long and wasted. If the time is not enough, an error will be reported if the page is not loaded completely, which is not conducive to the optimization of the system.
Now more and more web pages use Ajax technology, so the program can't determine when an element is fully loaded. If the actual page waiting time is too long, resulting in a dom element not coming out, but your code directly uses this WebElement, it will throw a null pointer exception. To solve this problem. Therefore, Selenium provides two waiting methods: implicit waiting and explicit waiting.

Implicit waiting

Implicitly waiting to call driver.implicitly_wait. Then, before getting the unavailable element, it will wait for 10 seconds, but it will not wait for 10 seconds completely. It will be executed immediately after the page is loaded and located at the label position. If it is not found, it will report an exception after waiting for 10 seconds at most. The time module must wait for the set time.

from selenium import webdriver
import time

driver = webdriver.Edge()
# Method of waiting for the previous time module page

# Implicit waiting
# Set the maximum waiting time of 10 seconds. When running, it will not wait for 10 seconds. After locating the input box, data will be entered
# If the located tag does not exist, an error will be reported after 10 seconds

For example, when human behavior operates 12306, you need to input the starting point, destination and departure time first, and then click query. This step will also be followed when simulating human behavior with selenium. After entering the website, there will be a pop-up window. You can locate the "OK" or "cross" in the pop-up window for operation. Normally, the web page will have a cache, and the departure and destination will have the previously entered information. You can directly click to query. If you open it with selenium, if the departure and destination are empty, click to query and prompt for input.
Click the cross to close the pop-up window

Click OK to close the pop-up window

Locate and click the query button.

Display wait

Display wait indicates that the operation of obtaining elements is not executed until a condition is established. You can also specify a maximum time while waiting. If it exceeds this time, an exception will be thrown. The display waiting should use Conditions the expected conditions and are completed together
Display the waiting setting conditions and maximum range. Once it is met, it will no longer wait and go straight down. If it is not met, it will wait until the end of the time, and an error will be reported.

from selenium import webdriver
from import By
from import WebDriverWait
from import expected_conditions as EC

driver = webdriver.Edge()
# Implicit wait, wait for the page to load

# Processing pop-up window
# Find the cross and click the cross
# driver.find_element_by_id('gb_closeDefaultWarningWindowDialog_id').click()
# Find OK and click OK

# Display waiting, need to import the library, need to have conditions, and run only when the conditions are met
# Wait for the input of the origin, and the id value of the origin of fromStationText
# Wait for the user to input Changsha at the input place. If it is not input or input incorrectly, it will wait all the time
WebDriverWait(driver, 1000).until(
    EC.text_to_be_present_in_element_value((By.ID, 'fromStationText'), 'Changsha'))
# Wait for the input of the destination, toStationText the id value of the destination
# Wait for the user to enter the destination Beijing. The waiting time is 1000 seconds. If it exceeds the time, an error will be reported
WebDriverWait(driver, 1000).until(
    EC.text_to_be_present_in_element_value((By.ID, 'toStationText'), 'Beijing'))
# navigation button
que_btn = driver.find_element_by_id('query_ticket')
# Click query
# If there is no response to a case click, you can also use js to execute the click
# Put the target that has been located_ BTN is passed in and click on arguments. 0 represents the first parameter_ btn
driver.execute_script('arguments[0].click()', que_btn)
# Displays that the wait is conditional and can only be executed if the conditions are met,
# After the user enters the departure Changsha and destination Beijing, he automatically clicks the query button,
# Any input error or other value will stay in the input interface until the end of time to report an error

Summary: the contents learned through the case, the implementation methods of implicit waiting and explicit waiting, and the application scenarios of both; How to use js to realize click operation.
Some other waiting conditions:
presence_of_element_located: an element has been loaded.
presence_of_all_elements_located: all qualified elements in the web page have been loaded.
element_to_be_clickable: an element can be clicked.

For more conditions, please refer to:

Open multiple windows and switch pages

Sometimes there are many sub tab pages in the window. It must be necessary to switch at this time. selenium provides a called switch_to_window. You can switch to which page from driver.window_ Found in handles.

from selenium import webdriver
from import By
from import WebDriverWait
from import expected_conditions as EC
import time

driver = webdriver.Edge()
# driver.get('')
# Baidu will be opened first, and then Douban will be replaced and opened in the same interface
# Open multiple windows with js and execute_script executes js code
# Pay attention to the use of single and double quotation marks. The outer layer is single quotation marks, and the inner layer should be double quotation marks

# Operate on two web pages. The first web page is operated by default

# driver.find_element_by_id('kw').send_keys('python')
# driver.close()  # The operation is the first web page

# If you want to operate on the second web page, you can switch
# For people, the perspective has been on the second open web page, but for selenium, it still stays on the first web page
print('Before switching:', driver.current_url)  # Before switching:
# Consider the web page tab as a list, - 1 represents the last one, and here you can also index it with 1
# switch_to_window, outdated methods can also be used
print('After switching:', driver.current_url)  # After switching:
# At this time, the second web page is closed with close
# It is recommended that you do not use close. You should switch first when closing the web page and reopening it
# Switch the perspective of selenium to the first label and then open it, otherwise an error will occur

Tags: Python Selenium crawler Python crawler

Posted on Fri, 19 Nov 2021 11:28:08 -0500 by jskywalker