Personal blog https://blog.fmujie.cn/
Simply put, the library teacher has a project that needs to tell a company something and needs a retrieval system, but the data source has to crawl from the Internet. Because the account is very expensive, it's not worth buying another account and finding a senior to do it. The senior is busy taking the postgraduate entrance examination (● '◡ ●), so it's down. Ha ha! At first, I thought it was easy to climb something, and then I really started to do it Here is a link But I think that using Python automatic crawler is the same as hanging. I can help you control the browser to grab data efficiently. I can imagine that if you climb more, it will be a little slow (later, I cry about what I have to do.)
UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
pip install selenium
I use the windows system and Chrome browser, so I downloaded the chrome driver ﹣ Win32. After extracting it, there will be an. exe executable program. Add its path to the environment variable of the system and it will be OK (it is not needed, but it will be said below)
<input type="text" name="user-name" id="passwd-id" />
# Get id tag value element = driver.find_element_by_id("passwd-id") # Get name tag value element = driver.find_element_by_name("user-name") # Get tag name value element = driver.find_elements_by_tag_name("input") # You can also use XPath to match the recommended usage element = driver.find_element_by_xpath("//input[@id='passwd-id']")
from selenium import webdriver from lxml import etree from selenium.common.exceptions import TimeoutException # Set the path of the chrome driver driver_path = r"Your chromedriver.exe position" # Enable chrome driver (declare browser objects) driver = webdriver.Chrome(executable_path=driver_path) #Simply enter Baidu homepage first driver.get('https://www.baidu.com/') #Set width and height of open browser driver.set_window_size(1500, 1000) # The page buffer time (implicit waiting) is afraid of slow loading when the network is not good, resulting in incomplete code for subsequent page grabbing driver.implicitly_wait(3) # Page buffering time (display wait - > intelligent), try the following first, and then try again try: element = WebDriverWait(driver, 5).until( EC.presence_of_element_located((By.ID, 'kwssss')) ) print(element) except TimeoutException as ex: print(ex) #Grab web source code html = etree.HTML(driver.page_source) # Baidu test grabs the input box. Send "keys is to input the content in input and wait for three seconds before clearing # inputTag = driver.find_element_by_id('kw') # inputTag = driver.find_element_by_name('wd') # inputTag = driver.find_element_by_class_name('s_ipt') inputTag = driver.find_element(By.ID, 'kw') inputTag.send_keys('python') time.sleep(3) inputTag.clear() #You can also grab the search button and click search practice without clearing it # submitBtn = driver.find_element(By.ID, 'su') # submitBtn =driver.find_element_by_xpath("//*[@id='su']") # submitBtn.click()
# selenium behavior chain test inputTag = driver.find_element_by_id('kw') #Grab elements by ID submitBtn = driver.find_element(By.ID, 'su') actions = ActionChains(driver) #Call behavior link actions.move_to_element(inputTag) #Move the mouse over actions.send_keys_to_element(inputTag, 'Python') #Fill in the data actions.move_to_element(submitBtn) #Mouse over search button actions.click(submitBtn) #click actions.perform() #The behavior chain starts, otherwise the upper side will not execute
#According to the Xpath syntax html.xpath("Xpath Route/text()")
1. If you just want to parse the data in the web page, it is recommended to throw the source code of the web page to lxml for parsing. Because the underlying language of lxml is C, the parsing efficiency will be higher.
2. If you want to perform some operations on the element, such as entering a value for a text box or clicking a button, you must use the method provided by selenium to find the element.
Common form elements: input type = 'text/password/email/number' button, input[type = 'submit'] chexbox: input = 'chexbox' select; drop-down list '
#####To be continued
#####git full project link