current position:Home>[Python crawler] detailed explanation of selenium from introduction to actual combat [2]
[Python crawler] detailed explanation of selenium from introduction to actual combat [2]
2022-01-30 13:55:12 【Dream, killer】
Little knowledge , Great challenge ! This article is participating in 「 A programmer must have a little knowledge 」 Creative activities
This article has participated in 「 Digging force Star Program 」 , Win a creative gift bag , Challenge creation incentive fund .
Set the element to wait
Many pages use ajax
technology , The elements of the page are not loaded at the same time , To prevent errors from locating these elements that are still being loaded , You can set elements and so on to increase the stability of the script .webdriver
The waiting in is divided into Explicit waiting and An implicit wait .
Explicit waiting
Explicit waiting : Set a timeout , Check whether the element exists every time , If so, follow up , If the maximum time is exceeded ( Timeout time ) Throws a timeout exception (TimeoutException
). Show waiting needs WebDriverWait
, Simultaneous coordination until
or not until
. Let's talk about it in detail .
WebDriverWait(driver, timeout, poll_frequency=0.5, ignored_exceptions=None)
driver
: Browser driventimeout
: Timeout time , Unit secondpoll_frequency
: The interval between each test , The default is 0.5 secondignored_exceptions
: Specifies the exception to ignore , If you're callinguntil
oruntil_not
Throw an exception specified to be ignored , Then do not interrupt the code , Only... Is ignored by defaultNoSuchElementException
.
until(method, message=' ') until_not(method, message=' ')
method
: Specify the judgment method of the expected condition , While waiting , Call this method every once in a while , Determine whether an element exists , Until the element appears .until_not
Just the opposite , When the element disappears or the specified condition does not hold , Then continue to execute the subsequent codemessage
: If the timeout , Throw outTimeoutException
, And displaymessage
The content in
method
The expected condition judgment method in is determined by expected_conditions
Provide , Here are some common methods .
First define a locator
from selenium.webdriver.common.by import By
from selenium import webdriver
driver = webdriver.Chrome()
locator = (By.ID, 'kw')
element = driver.find_element_by_id('kw')
Copy code
Method | describe |
---|---|
title_is(' use Baidu Search ') | Determine the... Of the current page title Is it equal to the expectation |
title_contains(' Baidu ') | Determine the... Of the current page title Whether to include the expected string |
presence_of_element_located(locator) | Determine whether the element is added to dom In the tree , Does not mean that the element must be visible |
visibility_of_element_located(locator) | Determines whether an element is visible , Visible representative elements are not hidden , And the width and height of the elements are not equal to 0 |
visibility_of(element) | The following method works the same , But the passed in parameter is element |
text_to_be_present_in_element(locator , ' Baidu ') | Judge the... In the element text Whether the expected string is included |
text_to_be_present_in_element_value(locator , ' Certain value ') | Judge the... In the element value Property contains the expected string |
frame_to_be_available_and_switch_to_it(locator) | Judge that frame Whether it can be or not? switch go in ,True be switch go in , conversely False |
invisibility_of_element_located(locator) | Determine whether the element does not exist in dom Trees or invisible |
element_to_be_clickable(locator) | Determine whether the element is visible and clickable |
staleness_of(element) | Wait for the element from dom Remove in tree |
element_to_be_selected(element) | Determine whether the element is selected , Usually used in drop-down list |
element_selection_state_to_be(element, True) | Judge whether the selected state of the element meets the expectation , Parameters element, The second parameter is True/False |
element_located_selection_state_to_be(locator, True) | The following method works the same , But the passed in parameter is locator |
alert_is_present() | Determine if there is... On the page alert |
Let's write a simple example , Here, locate an element that does not exist on the page , The exception information thrown is exactly what we specified .
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
element = WebDriverWait(driver, 5, 0.5).until(
EC.presence_of_element_located((By.ID, 'kw')),
message=' Time out !')
Copy code
An implicit wait
Implicit waiting also specifies a timeout , If this time is exceeded, the specified element has not been loaded , Will throw NoSuchElementException
abnormal . Except for the exceptions thrown , And a little bit more , Implicit waiting is global , That is, during operation , If the element can be positioned to , It doesn't affect code execution , But if you can't locate , Then it will continuously access the element by polling until the element is found , If the specified time is exceeded , Throw an exception .
Use implicitly_wait()
To achieve implicit waiting , The difficulty of using is much simpler than explicit waiting . Example : Open personal homepage , Set an implicit wait time 5s, adopt id
Locate a non-existent element , Finally print Exception thrown And The elapsed time .
from selenium import webdriver
from time import time
driver = webdriver.Chrome()
driver.get('https://blog.csdn.net/qq_43965708')
start = time()
driver.implicitly_wait(5)
try:
driver.find_element_by_id('kw')
except Exception as e:
print(e)
print(f' Time consuming :{time()-start}')
Copy code
The code runs to driver.find_element_by_id('kw')
This sentence triggers an implicit wait , Check in polling 5s The element is still not located after , Throw an exception .
Mandatory waiting
Use time.sleep()
Mandatory waiting , Set a fixed sleep time , It will have an impact on the efficiency of the code . Take the above example as a reference , take An implicit wait Change it to Mandatory waiting .
from selenium import webdriver
from time import time, sleep
driver = webdriver.Chrome()
driver.get('https://blog.csdn.net/qq_43965708')
start = time()
sleep(5)
try:
driver.find_element_by_id('kw')
except Exception as e:
print(e)
print(f' Time consuming :{time()-start}')
Copy code
It is worth mentioning that , When the element cannot be located , In terms of time consumption, implicit waiting is no different from forced waiting . But if the element passes through 2s Then it is loaded , Then the implicit wait will continue to execute the following code , but sleep And keep waiting 3s.
Locate a set of elements
The previous chapter describes the process of locating an element 8 Methods , The method used to locate a set of elements only needs to element
Change it to elements
that will do , Its usage scenario is generally for batch operation of elements .
find_elements_by_id()
find_elements_by_name()
find_elements_by_class_name()
find_elements_by_tag_name()
find_elements_by_xpath()
find_elements_by_css_selector()
find_elements_by_link_text()
find_elements_by_partial_link_text()
Here we use CSDN One on the home page Blog expert column For example . Use
find_elements_by_xpath
To locate the names of the three experts . This is the page code of the expert name section , I wonder if you have thought of how to pass
xpath
Locate the name of this group of experts ?
from selenium import webdriver
# Set up headless browser
option = webdriver.ChromeOptions()
option.add_argument('--headless')
driver = webdriver.Chrome(options=option)
driver.get('https://blog.csdn.net/')
p_list = driver.find_elements_by_xpath("//p[@class='name']")
name = [p.text for p in p_list]
name
Copy code
Switching operation
Window switch
stay selenium
When operating the page , You may jump to a new page by clicking a link ( Opened a new tab ), Now selenium
It's actually on the previous page , We need to switch to locate the elements on the latest page .
Window switching requires switch_to.windows()
Method .
First, let's look at the following code .
Code flow : Enter the first 【CSDN home page 】, Save the handle of the current page , Then click on the left 【CSDN The official blog 】 Jump to a new tab , Save the handle of the page again , Let's verify selenium
Will it automatically navigate to the newly opened window .
from selenium import webdriver
handles = []
driver = webdriver.Chrome()
driver.get('https://blog.csdn.net/')
# Set implicit wait
driver.implicitly_wait(3)
# Get the handle of the current window
handles.append(driver.current_window_handle)
# Click on python, Go to the category page
driver.find_element_by_xpath('//*[@id="mainContent"]/aside/div[1]/div').click()
# Get the handle of the current window
handles.append(driver.current_window_handle)
print(handles)
# Get the handle of all current windows
print(driver.window_handles)
Copy code
You can see the first list
handle
It's the same , explain selenium
The actual operation is still CSDN home page , Did not switch to a new page . Use switch_to.windows()
Switch .
from selenium import webdriver
handles = []
driver = webdriver.Chrome()
driver.get('https://blog.csdn.net/')
# Set implicit wait
driver.implicitly_wait(3)
# Get the handle of the current window
handles.append(driver.current_window_handle)
# Click on python, Go to the category page
driver.find_element_by_xpath('//*[@id="mainContent"]/aside/div[1]/div').click()
# Switch windows
driver.switch_to.window(driver.window_handles[-1])
# Get the handle of the current window
handles.append(driver.current_window_handle)
print(handles)
print(driver.window_handles)
Copy code
The above code after clicking jump , Use
switch_to
Switch windows ,window_handles
Back to handle
The list is sorted by the time the page appears , The newly opened page must be the last , Use this way driver.window_handles[-1]
+ switch_to
You can jump to the latest open page .
If there are multiple windows open , How to jump to a previously opened window , If there is such a need , Then you need to record every window when you open the window key
( Alias ) And value
(handle
), Save in dictionary , Follow up on key
Come and get it handle
.
Form switching
Many pages also use tape frame/iframe
Form nesting , For this embedded page selenium
It can't be located directly , Need to use switch_to.frame()
Method switches the object of the current operation to frame/iframe
Embedded pages .
switch_to.frame()
It can be used by default id
or name
Property is directly located , But if iframe
No, id
or name
, It needs to be used xpath
Positioning . Let's write an example that contains iframe
Page for testing .
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
<style>
div p {
color: #red;
animation: change 2s infinite;
}
@keyframes change {
from {
color: red;
}
to {
color: blue;
}
}
</style>
</head>
<body>
<div>
<p> official account :Python New horizons </p>
<p>CSDN:Dream , Killer</p>
<p> WeChat :python-sun</p>
</div>
<iframe src="https://blog.csdn.net/qq_43965708" width="400" height="200"></iframe>
<!-- <iframe id="CSDN_info" name="Dream , Killer" src="https://blog.csdn.net/qq_43965708" width="400" height="200"></iframe> -->
</body>
</html>
Copy code
Now let's locate the... In the red box CSDN Button , You can jump to CSDN home page .
from selenium import webdriver
from pathlib import Path
driver = webdriver.Chrome()
# Read local html file
driver.get('file:///' + str(Path(Path.cwd(), 'iframe test .html')))
# 1. adopt id location
driver.switch_to.frame('CSDN_info')
# 2. adopt name location
# driver.switch_to.frame('Dream , Killer')
# adopt xpath location
# 3.iframe_label = driver.find_element_by_xpath('/html/body/iframe')
# driver.switch_to.frame(iframe_label)
driver.find_element_by_xpath('//*[@id="csdn-toolbar"]/div/div/div[1]/div/a/img').click()
Copy code
Here are three positioning methods , Can locate iframe
.
For beginners Python
Or want to get started Python
Little buddy , You can search through wechat 【Python New horizons
】, Exchange and study together , They all come from novices , Sometimes a simple question card takes a long time , But maybe someone else's advice will suddenly realize , I sincerely hope you can make progress together .
copyright notice
author[Dream, killer],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201301355103188.html
The sidebar is recommended
- Python code reading (Part 44): find the location of qualified elements
- Elegant implementation of Django model field encryption
- 40 Python entry applet
- Pandas comprehensive application
- Chapter 2: Fundamentals of python-3 character string
- Python pyplot draws a parallel histogram, and the x-axis value is displayed in the center of the two histograms
- [Python crawler] detailed explanation of selenium from introduction to actual combat [1]
- Curl to Python self use version
- Python visualization - 3D drawing solutions pyecharts, Matplotlib, openpyxl
- Use python, opencv's meanshift and CAMSHIFT algorithms to find and track objects in video
guess what you like
-
Using python, opencv obtains and changes pixels, modifies image channels, and trims ROI
-
[Python data collection] university ranking data collection
-
[Python data collection] stock information collection
-
Python game development, pyGame module, python takes you to realize a magic tower game from scratch (2)
-
Python solves the problem of suspending execution after clicking the mouse in CMD window (fast editing mode is prohibited)
-
[Python from introduction to mastery] (II) how to run Python? What are the good development tools (pycharm)
-
Python type hints from introduction to practice
-
Python notes (IX): basic operation of dictionary
-
Python notes (8): basic operations of collections
-
Python notes (VII): definition and use of tuples
Random recommended
- Python notes (6): definition and use of lists
- Python notes (V): string operation
- Python notes (IV): use of functions and modules
- Python notes (3): conditional statements and circular statements
- Python notes (II): lexical structure
- Notes on python (I): getting to know Python
- [Python data structure series] - tree and binary tree - basic knowledge - knowledge point explanation + code implementation
- [Python daily homework] Day7: how to combine two dictionaries in an expression?
- How to implement a custom list or dictionary in Python
- 15 advanced Python tips for experienced programmers
- Python string method tutorial - how to use the find() and replace() functions on Python strings
- Python computer network basics
- Python crawler series: crawling global airport information
- Python crawler series: crawling global port information
- How to calculate unique values using pandas groupby
- Application of built-in distribution of Monte Carlo simulation SciPy with Python
- Gradient lifting method and its implementation in Python
- Pandas: how to group and calculate by index
- Can you create an empty pandas data frame and fill it in?
- Python basic exercises teaching! can't? (practice makes perfect)
- Exploratory data analysis (EDA) in Python using SQL and Seaborn (SNS).
- Turn audio into shareable video with Python and ffmpeg
- Using rbind in python (equivalent to R)
- Pandas: how to create an empty data frame with column names
- Talk about quantifying investment using Python
- Python, image restoration in opencv - CV2 inpaint
- Python notes (14): advanced technologies such as object-oriented programming
- Python notes (13): operations such as object-oriented programming
- Python notes (12): inheritance such as object-oriented programming
- Chapter 2: Fundamentals of python-5 Boolean
- Python notes (11): encapsulation such as object-oriented programming
- Python notes (10): concepts such as object-oriented programming
- Gradient lifting method and its implementation in Python
- Van * Python | simple crawling of a site course
- Chapter 1 preliminary knowledge of pandas (list derivation and conditional assignment, anonymous function and map method, zip object and enumerate method, NP basis)
- Nanny tutorial! Build VIM into an IDE (Python)
- Fourier transform of Python OpenCV image processing, lesson 52
- Introduction to python (III) network request and analysis
- China Merchants Bank credit card number recognition project (Part I), python OpenCV image processing journey, Part 53
- Python practice - capture 58 rental information and store it in MySQL database