Automatic sign in with Python crawler -- selenium mode

Automatic check in with Python crawler (1) -- selenium mode

tags: Python crawler automatically checks in selenium XPath

It is a simple application in reptile to realize automatic check-in calculation, but it does not hinder its practicability. In total, there will be two blogs implemented in the way of selenium and requests respectively. Each of the two ways has its advantages and disadvantages, which can be selected according to the actual situation. It can also be used as a reptile.

What are the application scenarios

There are many application scenarios for automatic check-in, such as post bar check-in, forum check-in, website punch, and even the application scenario I will use next: gym punch.

Due to the impact of the epidemic, the gym I usually go to has been closed for nearly three months, so I put forward a relatively harsh compensation measure or activity, and I can give an extra four months as a gift for four consecutive months when I punch in the website. It's also to filter the people who don't usually go to the gym, but they are very unfriendly to most of the people who often go. They are busy and have to worry about punching him every day. Can being a procedural ape make such a small thing difficult?

What is selenium

Selenium is a tool for testing web applications. Selenium tests run directly in the browser, just as real users do. Supported browsers include IE (7, 8, 9, 10, 11), Mozilla Firefox, Safari, Google Chrome, Opera, etc. Selenium is a complete web application testing system, which includes the selenium IDE, Selenium Remote Control and Selenium Grid.
selenium can simulate real browsers, automate testing tools, and support multiple browsers. Crawlers are mainly used to solve JavaScript rendering problems.

What is selenium

Selenium is a tool for testing web applications. Selenium tests run directly in the browser, just as real users do. Supported browsers include IE (7, 8, 9, 10, 11), Mozilla Firefox, Safari, Google Chrome, Opera, etc. Selenium is a complete web application testing system, which includes the selenium IDE, Selenium Remote Control and Selenium Grid.
selenium can simulate real browsers, automate testing tools, and support multiple browsers. Crawlers are mainly used to solve JavaScript rendering problems

In short, selenium is a tool that lets you use code to simulate the operation of a browser. Attach a code experience

from selenium import webdriver#Import library browser = webdriver.Chrome()#Claim browser url = 'https:www.baidu.com' browser.get(url)#Open browser preset URL print(browser.page_source)#Print web page source code browser.close()#Close browser

After the above code runs, it will automatically open Chrome browser, log in Baidu to print the source code of Baidu home page, and then close the browser. Is it simple and clear.

Installation is also very simple, just like other Python Libraries

pip install selenium

Web page analysis

One of the unavoidable steps for a crawler is to analyze the page, and attach the page to be crawled first link

This is a mobile web page, not optimized for the computer, which is one of the reasons why I choose to use selenium first.

The first is web login

Two input boxes and one button, three lines of code

# Here is to find the input box, send the user name and password to be input, and simulate login browser.find_element_by_xpath('//*[@id="login_box"]/ul/li[1]/div/input').send_keys(username) browser.find_element_by_xpath('//*[@id="login_box"]/ul/li[2]/div/input').send_keys(password) # After entering the user name and password, click the login button browser.find_element_by_xpath('//*[@id="login_box"]/div[2]/button[1]').click()

I use xpath to locate page elements. In addition, selenium supports many ways to locate elements, such as

browser.find_element_by_class_name() browser.find_element_by_css_selector() browser.find_element_by_id() browser.find_element_by_name()

But it has to be said that the combination of xpath+Chrome is really delicious.

After that, there will be no difficulty in the operation, but to continue to select elements, click, jump, click. One thing we need to pay attention to is to use selenium waiting reasonably to make the page's click and jump more smooth.

Three waiting methods of selenium

Force wait

The first is the simplest and rudest way - forced waiting. time.sleep(2) Whether the browser is loaded or not, wait 3 seconds.

In the beginning, you can use this method to let the program run first, and then optimize it.
Invisible waiting

The second way is invisible waiting_ wait(xx)

Invisible waiting is to set a maximum waiting time. If the web page load is completed within the specified time, perform the next step. Otherwise, wait until the time expires, and then perform the next step.

Note that there is a drawback here, that is, the program will wait for the whole page to be loaded, that is to say, generally, you will not execute the next step until you see that the small circle in the browser's tab bar does not rotate any more, but sometimes the desired elements of the page have been loaded for a long time, but because some js and other things are particularly slow, I still have to wait until the page is completed to execute the next step , what do I want to do next when the elements I want come out? There is a way, it depends on another waiting method provided by selenium - explicit waiting.
Dominant wait

The third way is to explicitly wait, WebDriverWait, with the class of until() and until_ The not () method can wait flexibly according to the judging conditions. Its main meaning is: the program takes a look every xx seconds. If the condition is established, the next step is executed. Otherwise, it continues to wait until the maximum time set is exceeded, and then a TimeoutException is thrown.

The specific application to the code is as follows:

from selenium.webdriver.support import expected_conditions as EC locator = (By.XPATH, '//*[@id="login_box"]/ul/li[1]/div/input') WebDriverWait(browser, 3, 0.3).until(EC.presence_of_element_located(locator))

The WebDriverWait class is as follows:

class WebDriverWait(object): def __init__(self, driver, timeout, poll_frequency=POLL_FREQUENCY, ignored_exceptions=None): """Constructor, takes a WebDriver instance and timeout in seconds. :Args: - driver - Instance of WebDriver (Ie, Firefox, Chrome or Remote) - timeout - Number of seconds before timing out - poll_frequency - sleep interval between calls By default, it is 0.5 second. - ignored_exceptions - iterable structure of exception classes ignored during calls. By default, it contains NoSuchElementException only. Example: from selenium.webdriver.support.ui import WebDriverWait \n element = WebDriverWait(driver, 10).until(lambda x: x.find_element_by_id("someId")) \n is_disappeared = WebDriverWait(driver, 30, 1, (ElementNotVisibleException)).\ \n until_not(lambda x: x.find_element_by_id("someId").is_displayed()) """

until

Method: during the waiting period, call this incoming method every other time until the return value is not False
Message: if timeout occurs, throw TimeoutException and pass message to the exception

until_not

In contrast to until, until is to continue execution when an element appears or when certain conditions are met,
until_not is to continue to execute when an element disappears or any condition is not tenable. The parameters are the same, so we will not repeat it.

Final realization

Knowing those, we can optimize our code and replace the time module with explicit waiting. The final code is as follows:

# username = "xxxxxx" # password = "xxxxxx" # Simulate browser to open website def AutoSign(username, password): chrome_options = Options() # Using the browser without interface chrome_options.add_argument('--headless') browser = webdriver.Chrome(options=chrome_options) browser.get(url) locator = (By.XPATH, '//*[@id="login_box"]/ul/li[1]/div/input') # Delay for 2 seconds, so that the web page can load all elements, so that the corresponding elements can not be found later # time.sleep(2) WebDriverWait(browser, 3, 0.3).until(EC.presence_of_element_located(locator)) # Here is to find the input box, send the user name and password to be input, and simulate login browser.find_element_by_xpath('//*[@id="login_box"]/ul/li[1]/div/input').send_keys(username) browser.find_element_by_xpath('//*[@id="login_box"]/ul/li[2]/div/input').send_keys(password) # After entering the user name and password, click the login button browser.find_element_by_xpath('//*[@id="login_box"]/div[2]/button[1]').click() WebDriverWait(browser, 3, 0.3).until(EC.visibility_of_element_located((By.ID, 'name-a'))) time.sleep(0.5) browser.execute_script("window.scrollTo(0,1000);") # Click sign in on the login page to jump to the sign in page browser.find_element_by_xpath('//*[@id="member"]/div[6]/div/div[2]/div[1]/div/div/h3').click() # time.sleep(1) # browser.execute_script("window.scrollTo(0,1000);") # time.sleep(1) browser.find_element_by_xpath('//*[@id="member"]/div[6]/div/div[2]/div[2]/div/div/a[1]/div').click() # time.sleep(2) WebDriverWait(browser, 2, 0.3).until( EC.presence_of_element_located((By.XPATH, '//*[@id="sign"]/div[3]/div'))) # Click sign in to realize the function browser.find_element_by_xpath('//*[@id="sign"]/div[3]/div').click() time.sleep(0.5) # This print is useless. If you really want to test whether the script runs successfully, you can use try to throw an exception print("Sign in succeeded") # Script run successfully, exit browser browser.quit()

All condition s and poll in the above code_ Frequency needs to be adjusted flexibly according to the actual web loading situation, including the network situation, in order to find the optimal solution.

Other related materials have been synced to my blog website. Welcome to my Personal blog Learn more.

Automatic sign in with Python crawler -- selenium mode

What are the application scenarios

What is selenium

What is selenium

Web page analysis

Three waiting methods of selenium

Final realization

15 June 2020, 01:11 | Views: 9082

Add new comment

0 comments