current position:Home>[Python crawler] detailed explanation of selenium from introduction to actual combat [2]

[Python crawler] detailed explanation of selenium from introduction to actual combat [2]

2022-01-30 13:55:12 Dream, killer

Little knowledge , Great challenge ! This article is participating in 「 A programmer must have a little knowledge 」 Creative activities

This article has participated in  「 Digging force Star Program 」 , Win a creative gift bag , Challenge creation incentive fund .


Set the element to wait

Many pages use ajax technology , The elements of the page are not loaded at the same time , To prevent errors from locating these elements that are still being loaded , You can set elements and so on to increase the stability of the script .webdriver The waiting in is divided into Explicit waiting and An implicit wait .

Explicit waiting

Explicit waiting : Set a timeout , Check whether the element exists every time , If so, follow up , If the maximum time is exceeded ( Timeout time ) Throws a timeout exception (TimeoutException). Show waiting needs WebDriverWait, Simultaneous coordination until or not until . Let's talk about it in detail .

WebDriverWait(driver, timeout, poll_frequency=0.5, ignored_exceptions=None)

  • driver: Browser driven
  • timeout: Timeout time , Unit second
  • poll_frequency: The interval between each test , The default is 0.5 second
  • ignored_exceptions: Specifies the exception to ignore , If you're calling until or until_not Throw an exception specified to be ignored , Then do not interrupt the code , Only... Is ignored by default NoSuchElementException .

until(method, message=' ') until_not(method, message=' ')

  • method: Specify the judgment method of the expected condition , While waiting , Call this method every once in a while , Determine whether an element exists , Until the element appears .until_not Just the opposite , When the element disappears or the specified condition does not hold , Then continue to execute the subsequent code
  • message: If the timeout , Throw out TimeoutException , And display message The content in

method The expected condition judgment method in is determined by expected_conditions Provide , Here are some common methods .

First define a locator

from selenium.webdriver.common.by import By
from selenium import webdriver

driver = webdriver.Chrome()
locator = (By.ID, 'kw')
element = driver.find_element_by_id('kw')
 Copy code 
Method describe
title_is(' use Baidu Search ') Determine the... Of the current page title Is it equal to the expectation
title_contains(' Baidu ') Determine the... Of the current page title Whether to include the expected string
presence_of_element_located(locator) Determine whether the element is added to dom In the tree , Does not mean that the element must be visible
visibility_of_element_located(locator) Determines whether an element is visible , Visible representative elements are not hidden , And the width and height of the elements are not equal to 0
visibility_of(element) The following method works the same , But the passed in parameter is element
text_to_be_present_in_element(locator , ' Baidu ') Judge the... In the element text Whether the expected string is included
text_to_be_present_in_element_value(locator , ' Certain value ') Judge the... In the element value Property contains the expected string
frame_to_be_available_and_switch_to_it(locator) Judge that frame Whether it can be or not? switch go in ,True be switch go in , conversely False
invisibility_of_element_located(locator) Determine whether the element does not exist in dom Trees or invisible
element_to_be_clickable(locator) Determine whether the element is visible and clickable
staleness_of(element) Wait for the element from dom Remove in tree
element_to_be_selected(element) Determine whether the element is selected , Usually used in drop-down list
element_selection_state_to_be(element, True) Judge whether the selected state of the element meets the expectation , Parameters element, The second parameter is True/False
element_located_selection_state_to_be(locator, True) The following method works the same , But the passed in parameter is locator
alert_is_present() Determine if there is... On the page alert

Let's write a simple example , Here, locate an element that does not exist on the page , The exception information thrown is exactly what we specified .

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
element = WebDriverWait(driver, 5, 0.5).until(
            EC.presence_of_element_located((By.ID, 'kw')),
                                           message=' Time out !')
 Copy code 

image.png

An implicit wait

Implicit waiting also specifies a timeout , If this time is exceeded, the specified element has not been loaded , Will throw NoSuchElementException abnormal . Except for the exceptions thrown , And a little bit more , Implicit waiting is global , That is, during operation , If the element can be positioned to , It doesn't affect code execution , But if you can't locate , Then it will continuously access the element by polling until the element is found , If the specified time is exceeded , Throw an exception .

Use implicitly_wait() To achieve implicit waiting , The difficulty of using is much simpler than explicit waiting . Example : Open personal homepage , Set an implicit wait time 5s, adopt id Locate a non-existent element , Finally print Exception thrown And The elapsed time .

from selenium import webdriver
from time import time

driver = webdriver.Chrome()
driver.get('https://blog.csdn.net/qq_43965708')

start = time()
driver.implicitly_wait(5)
try:
    driver.find_element_by_id('kw')
except Exception as e:
    print(e)
    print(f' Time consuming :{time()-start}')
 Copy code 

image.png

The code runs to driver.find_element_by_id('kw') This sentence triggers an implicit wait , Check in polling 5s The element is still not located after , Throw an exception .

Mandatory waiting

Use time.sleep() Mandatory waiting , Set a fixed sleep time , It will have an impact on the efficiency of the code . Take the above example as a reference , take An implicit wait Change it to Mandatory waiting .

from selenium import webdriver
from time import time, sleep

driver = webdriver.Chrome()
driver.get('https://blog.csdn.net/qq_43965708')

start = time()
sleep(5)
try:
    driver.find_element_by_id('kw')
except Exception as e:
    print(e)
    print(f' Time consuming :{time()-start}')
 Copy code 

image.png It is worth mentioning that , When the element cannot be located , In terms of time consumption, implicit waiting is no different from forced waiting . But if the element passes through 2s Then it is loaded , Then the implicit wait will continue to execute the following code , but sleep And keep waiting 3s.

Locate a set of elements

The previous chapter describes the process of locating an element 8 Methods , The method used to locate a set of elements only needs to element Change it to elements that will do , Its usage scenario is generally for batch operation of elements .

  • find_elements_by_id()
  • find_elements_by_name()
  • find_elements_by_class_name()
  • find_elements_by_tag_name()
  • find_elements_by_xpath()
  • find_elements_by_css_selector()
  • find_elements_by_link_text()
  • find_elements_by_partial_link_text()

Here we use CSDN One on the home page Blog expert column For example . Use find_elements_by_xpath To locate the names of the three experts . This is the page code of the expert name section , I wonder if you have thought of how to pass xpath Locate the name of this group of experts ?

from selenium import webdriver

#  Set up headless browser 
option = webdriver.ChromeOptions()
option.add_argument('--headless')

driver = webdriver.Chrome(options=option)
driver.get('https://blog.csdn.net/')

p_list = driver.find_elements_by_xpath("//p[@class='name']")
name = [p.text for p in p_list]
name
 Copy code 

Switching operation

Window switch

stay selenium When operating the page , You may jump to a new page by clicking a link ( Opened a new tab ), Now selenium It's actually on the previous page , We need to switch to locate the elements on the latest page .

Window switching requires switch_to.windows() Method .

First, let's look at the following code .

Code flow : Enter the first 【CSDN home page 】, Save the handle of the current page , Then click on the left 【CSDN The official blog 】 Jump to a new tab , Save the handle of the page again , Let's verify selenium Will it automatically navigate to the newly opened window .

from selenium import webdriver

handles = []
driver = webdriver.Chrome()
driver.get('https://blog.csdn.net/')
#  Set implicit wait 
driver.implicitly_wait(3)
#  Get the handle of the current window 
handles.append(driver.current_window_handle)
#  Click on  python, Go to the category page 
driver.find_element_by_xpath('//*[@id="mainContent"]/aside/div[1]/div').click()
#  Get the handle of the current window 
handles.append(driver.current_window_handle)

print(handles)
#  Get the handle of all current windows 
print(driver.window_handles)
 Copy code 

image.png You can see the first list handle It's the same , explain selenium The actual operation is still CSDN home page , Did not switch to a new page . Use switch_to.windows() Switch .

from selenium import webdriver

handles = []
driver = webdriver.Chrome()
driver.get('https://blog.csdn.net/')
#  Set implicit wait 
driver.implicitly_wait(3)
#  Get the handle of the current window 
handles.append(driver.current_window_handle)
#  Click on  python, Go to the category page 
driver.find_element_by_xpath('//*[@id="mainContent"]/aside/div[1]/div').click()
#  Switch windows 
driver.switch_to.window(driver.window_handles[-1])
#  Get the handle of the current window 
handles.append(driver.current_window_handle)

print(handles)
print(driver.window_handles)
 Copy code 

image.png The above code after clicking jump , Use switch_to Switch windows ,window_handles Back to handle The list is sorted by the time the page appears , The newly opened page must be the last , Use this way driver.window_handles[-1] + switch_to You can jump to the latest open page .

If there are multiple windows open , How to jump to a previously opened window , If there is such a need , Then you need to record every window when you open the window key( Alias ) And value(handle), Save in dictionary , Follow up on key Come and get it handle .

Form switching

Many pages also use tape frame/iframe Form nesting , For this embedded page selenium It can't be located directly , Need to use switch_to.frame() Method switches the object of the current operation to frame/iframe Embedded pages .

switch_to.frame() It can be used by default id or name Property is directly located , But if iframe No, id or name , It needs to be used xpath Positioning . Let's write an example that contains iframe Page for testing .

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
    <style>
        div p {
            color: #red;
            animation: change 2s infinite;
        }
        @keyframes change {
            from {
                color: red;
            }
            to {
                color: blue;
            }
        }
    </style>
</head>

<body>
    <div>
        <p> official account :Python New horizons </p>
        <p>CSDN:Dream , Killer</p>
        <p> WeChat :python-sun</p>
    </div>
    <iframe src="https://blog.csdn.net/qq_43965708" width="400" height="200"></iframe>
<!--     <iframe id="CSDN_info" name="Dream , Killer" src="https://blog.csdn.net/qq_43965708" width="400" height="200"></iframe> -->
</body>
</html>
 Copy code 

1.gif

Now let's locate the... In the red box CSDN Button , You can jump to CSDN home page .

from selenium import webdriver
from pathlib import Path


driver = webdriver.Chrome()
#  Read local html file 
driver.get('file:///' + str(Path(Path.cwd(), 'iframe test .html')))

# 1. adopt id location 
driver.switch_to.frame('CSDN_info')
# 2. adopt name location 
# driver.switch_to.frame('Dream , Killer')
#  adopt xpath location 
# 3.iframe_label = driver.find_element_by_xpath('/html/body/iframe')
# driver.switch_to.frame(iframe_label)

driver.find_element_by_xpath('//*[@id="csdn-toolbar"]/div/div/div[1]/div/a/img').click()
 Copy code 

Here are three positioning methods , Can locate iframe .


For beginners  Python  Or want to get started  Python  Little buddy , You can search through wechat 【Python New horizons 】, Exchange and study together , They all come from novices , Sometimes a simple question card takes a long time , But maybe someone else's advice will suddenly realize , I sincerely hope you can make progress together .

copyright notice
author[Dream, killer],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201301355103188.html

Random recommended