current position:Home>[Python crawler] detailed explanation of selenium from introduction to actual combat [1]
[Python crawler] detailed explanation of selenium from introduction to actual combat [1]
2022-01-30 09:35:21 【Dream, killer】
Little knowledge , Great challenge ! This article is participating in 「 A programmer must have a little knowledge 」 Creative activities
This article has participated in 「 Digging force Star Program 」 , Win a creative gift bag , Challenge creation incentive fund .
brief introduction
Selenium Is the most widely used open source Web UI( The user interface ) One of the automated test suites .Selenium Supported languages include C#,Java,Perl,PHP,Python and Ruby. at present ,Selenium Web Drivers are the most popular Python and C# welcome . Selenium Test scripts can be coded in any supported programming language , And can be used directly in most modern Web Run in browser . In the field of reptiles selenium It's also a sharp weapon , It can solve the anti - crawling problem of most web pages , But it's not everything , Its most obvious disadvantage is its slow speed . Let's move on to the official study Stage .
selenium install
open cmd, Type the following command to install .
pip install -i https://pypi.douban.com/simple selenium
Copy code
After execution , Use pip show selenium
Check whether the installation is successful .
Install browser driver
For different browsers , Different drivers need to be installed . The following lists the common browser and corresponding driver download links , Some websites need “ Science and the Internet ” To open it (dddd).
- Firefox Browser driven :Firefox
- Chrome Browser driven :Chrome
- IE Browser driven :IE
- Edge Browser driven :Edge
- PhantomJS Browser driven :PhantomJS
- Opera Browser driven :Opera
Here to install Chrome
Drive as a demonstration . but Chrome
In use selenium
There are still parts of automated testing bug
, There is no problem with regular use , But if there are some rare errors , have access to Firefox
Try , After all, selenium
Officially recommended .
Determine browser version
Enter... In the new tab chrome://settings/
Enter the setting interface , And then choose 【 About Chrome】
View your version information . My version here is 94, In this way, you can download the corresponding version of Chrome
Just drive .
Download driver
open Chrome drive . Click the corresponding version . According to your own operating system , Select Download .
When the download is complete , There is only one... In the compressed package
exe
file . take
chromedriver.exe
Save to any location , And save the current path to the environment variable ( My computer >> Right click Properties >> Advanced system setup >> senior >> environment variable >> System variables >>Path), Be careful not to add path Variables are covered by , If it's covered, don't turn it off , Then Baidu . After adding successfully, use the following code to test .
from selenium import webdriver
# Chrome browser
driver = webdriver.Chrome()
Copy code
Locate page elements
Open the specified page
Use selenium
The premise of locating page elements is that you have understood the basic page layout and the meaning of various labels , Of course, if you haven't touched it before , Now I can also give you a brief introduction . As we know CSDN For example , Let's go to the home page , Press 【F12】 Enter developer tools . The red box shows the code of the page , All we have to do is locate and get the elements we need from the code . Want to locate and get the information in the page , The first thing to use
webdriver
Open the specified page , Then locate .
from selenium import webdriver
# Chrome browser
driver = webdriver.Chrome()
driver.get('https://www.csdn.net/')
Copy code
After executing the above statement, you will find , Browser open CSDN The home page will close immediately , Want to prevent the browser from closing automatically , You can add the following code .
# Do not automatically close the browser
option = webdriver.ChromeOptions()
option.add_experimental_option("detach", True)
# take option Add as a parameter to Chrome in
driver = webdriver.Chrome(chrome_options=option)
Copy code
In this way, the browser will not close automatically if you open the above code combination again .
from selenium import webdriver
# Do not automatically close the browser
option = webdriver.ChromeOptions()
option.add_experimental_option("detach", True)
# Note that... Has been added here chrome_options Parameters
driver = webdriver.Chrome(chrome_options=option)
driver.get('https://www.csdn.net/')
Copy code
Let's take a look at several common page element positioning methods .
id location
Labeled id
Have uniqueness , It's like a person's ID card , Suppose there's a input
The label is as follows .
<input id="toolbar-search-input" autocomplete="off" type="text" value="" placeholder="C++ Where is the difficult ?">
Copy code
We can go through id
Locate it , because id
Uniqueness , We can ignore the contents of other labels .
driver.find_element_by_id("toolbar-search-input")
Copy code
name location
name
Specify the name of the label , It can not be unique in the page . Suppose there's a meta
The label is as follows
<meta name="keywords" content="CSDN Blog ,CSDN college ,CSDN Forum ,CSDN live broadcast ">
Copy code
We can use find_element_by_name
Locate the meta
label .
driver.find_element_by_name("keywords")
Copy code
class location
class
Specify the class name of the tag , It can not be unique in the page . Suppose there's a div
The label is as follows
<div class="toolbar-search-container">
Copy code
We can use find_element_by_class_name
Locate the div
label .
driver.find_element_by_class_name("toolbar-search-container")
Copy code
tag location
Every tag
Often used to define a class of functions , So pass tag
The success rate of identifying an element is very low , Each page generally uses many of the same tag
, such as :\<div\>
、\<input\>
etc. . Here still use the above div
As an example .
<div class="toolbar-search-container">
Copy code
We can use find_element_by_class_name
Locate the div
label .
driver.find_element_by_tag_name("div")
Copy code
xpath location
xpath
It's a kind of XML
The language for locating elements in the document , It has a variety of positioning methods , Let's take a look at several ways of using it through examples .
<html>
<head>...<head/>
<body>
<div id="csdn-toolbar">
<div class="toolbar-inside">
<div class="toolbar-container">
<div class="toolbar-container-left">...</div>
<div class="toolbar-container-middle">
<div class="toolbar-search onlySearch">
<div class="toolbar-search-container">
<input id="toolbar-search-input" autocomplete="off" type="text" value="" placeholder="C++ Where is the difficult ?">
Copy code
Locate according to the label above The last line input
label , Here are four ways ,xpath
The ways of positioning are diverse and not unique , When in use, it can be parsed according to the situation .
# Absolute path ( Hierarchy ) location
driver.find_element_by_xpath(
"/html/body/div/div/div/div[2]/div/div/input[1]")
# Use element attributes to locate
driver.find_element_by_xpath(
"//*[@id='toolbar-search-input']"))
# Hierarchy + Element attribute positioning
driver.find_element_by_xpath(
"//div[@id='csdn-toolbar']/div/div/div[2]/div/div/input[1]")
# Logical operator positioning
driver.find_element_by_xpath(
"//*[@id='toolbar-search-input' and @autocomplete='off']")
Copy code
css location
CSS
Use selectors to bind properties to page elements , It can flexibly select any property of the control , General positioning speed ratio xpath
Be quick , But it's a little difficult to use . CSS
Common syntax for selectors :
Method | Example | describe |
---|---|---|
.class | .toolbar-search-container |
choice class = 'toolbar-search-container' All elements of |
#id | #toolbar-search-input |
choice id = 'toolbar-search-input' The elements of |
* | * |
Select all elements |
element | input |
Select all <input\> Elements |
element>element | div>input |
Select the parent element as <div\> All of the <input\> Elements |
element+element | div+input |
Select the same level in <div\> After all <input\> Elements |
[attribute=value] | type='text' |
choice type = 'text' All elements of |
A simple example , Also locate the... In the above example input
label .
driver.find_element_by_css_selector('#toolbar-search-input')
driver.find_element_by_css_selector('html>body>div>div>div>div>div>div>input')
Copy code
link location
link
Specifically used to locate text links , If you want to locate the label below .
<div class="practice-box" data-v-04f46969=""> Join in ! Practice every day </div>
Copy code
We use find_element_by_link_text
And indicate all the text in the label to locate .
driver.find_element_by_link_text(" Join in ! Practice every day ")
Copy code
partial_link location
partial_link
Which translates as “ Partial links ”, For some texts, it's very long , At this time, you can specify only part of the text to locate , Also use the example just now .
<div class="practice-box" data-v-04f46969=""> Join in ! Practice every day </div>
Copy code
We use find_element_by_partial_link_text
And indicate some text in the label for positioning .
driver.find_element_by_partial_link_text(" Join in ")
Copy code
Browser control
Modify browser window size
webdriver
Provide set_window_size()
Method to modify the size of the browser window .
from selenium import webdriver
# Chrome browser
driver = webdriver.Chrome()
driver.get('https://www.csdn.net/')
# Set the width and height of the browser to :600x800
driver.set_window_size(600, 800)
Copy code
You can also use maximize_window()
Method can realize the full screen display of the browser .
from selenium import webdriver
# Chrome browser
driver = webdriver.Chrome()
driver.get('https://www.csdn.net/')
# Set the width and height of the browser to :600x800
driver.maximize_window()
Copy code
Browser forward & back off
webdriver
Provide back
and forward
Method to realize the back and forward of the page . Next we ① Get into CSDN home page ,② open CSDN Personal home page ,③back
Back to CSDN home page ,④ forward
Go to your home page .
from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()
# visit CSDN home page
driver.get('https://www.csdn.net/')
sleep(2)
# visit CSDN Personal home page
driver.get('https://blog.csdn.net/qq_43965708')
sleep(2)
# return ( back off ) To CSDN home page
driver.back()
sleep(2)
# Go to your home page
driver.forward()
Copy code
Careful readers will find that the second time get()
When opening a new page , It will open on the original page , Instead of opening in a new tab . If you want, you can also open a new link in a new tab , But you need to change the code , perform js
Statement to open a new label .
# Open on the original page
driver.get('https://blog.csdn.net/qq_43965708')
# Open... In the new label
js = "window.open('https://blog.csdn.net/qq_43965708')"
driver.execute_script(js)
Copy code
Browser refresh
In some special cases, we may need to refresh the page to get the latest page data , Now we can use refresh()
To refresh the current page .
# Refresh the page
driver.refresh()
Copy code
Browser window switch
In many cases, we need to use window switching , such as : When we click the register button , It usually opens a new tab , But the code doesn't actually switch to the latest page , At this time, if you want to locate the label of the registration page, you will find that you can't locate , At this time, you need to switch the actual window to the newly opened window . Let's first get the handle of each current window , This information is saved in the order of Time To the , The newly opened window is placed in the... Of the array At the end of , At this time, we can locate the newly opened window .
# Get multiple open window handles
windows = driver.window_handles
# Switch to the latest open window
driver.switch_to.window(windows[-1])
Copy code
Common operations
webdriver Common operations in are :
Method | describe |
---|---|
send_keys() |
Analog input specifies |
clear() |
Clear text content |
is_displayed() |
Determine whether the element is visible |
get_attribute() |
Get the tag property value |
size |
Returns the size of the element |
text |
Return element text |
Next, use CSDN Home page as an example , What you need is a search box and a search button . You can understand the usage of each operation through the following examples .
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.csdn.net/')
sleep(2)
# Locate the search input box
text_label = driver.find_element_by_xpath('//*[@id="toolbar-search-input"]')
# Type in the search box Dream , Killer
text_label.send_keys('Dream , Killer')
sleep(2)
# Clear the search box
text_label.clear()
# Output whether the search box element is visible
print(text_label.is_displayed())
# Output placeholder Value
print(text_label.get_attribute('placeholder'))
# Locate the search button
button = driver.find_element_by_xpath('//*[@id="toolbar-search-button"]/span')
# The size of the output button
print(button.size)
# Output the text on the button
print(button.text)
''' Output content True python interview 100 ask {'height': 32, 'width': 28} Search for '''
Copy code
Mouse control
stay webdriver in , Mouse operations are encapsulated in ActionChains Class , Common methods are as follows :
Method | describe |
---|---|
click() |
Left click |
context_click() |
Right click |
double_click() |
double-click |
drag_and_drop() |
Drag the |
move_to_element() |
Mouse hovering |
perform() |
To perform all ActionChains Actions stored in |
Left click
Simulate the operation of clicking the left mouse button , Generally, clicking to enter the sub page will use , The left button does not need to be used ActionChains
.
# Locate the search button
button = driver.find_element_by_xpath('//*[@id="toolbar-search-button"]/span')
# Perform the click action
button.click()
Copy code
Right click
The operation of right clicking is very different from that of left clicking , Need to use ActionChains
.
from selenium.webdriver.common.action_chains import ActionChains
# Locate the search button
button = driver.find_element_by_xpath('//*[@id="toolbar-search-button"]/span')
# Right click the search button
ActionChains(driver).context_click(button).perform()
Copy code
double-click
Simulate double click operation .
# Locate the search button
button = driver.find_element_by_xpath('//*[@id="toolbar-search-button"]/span')
# Perform a double-click action
ActionChains(driver).double_click(button).perform()
Copy code
Drag the
Simulate mouse drag operation , This operation has two necessary parameters ,
- source: Elements dragged by the mouse
- target: Drag the mouse to and release the target element
# Locate the element to drag
source = driver.find_element_by_xpath('xxx')
# Locate the target element
target = driver.find_element_by_xpath('xxx')
# Perform the drag action
ActionChains(driver).drag_and_drop(source, target).perform()
Copy code
Mouse hovering
The purpose of simulated hover is to display the hidden drop-down box , such as CSDN Home page favorites , Let's see the effect .
# Locate the collection bar
collect = driver.find_element_by_xpath('//*[@id="csdn-toolbar"]/div/div/div[3]/div/div[3]/a')
# Hover over the favorites tab
ActionChains(driver).move_to_element(collect).perform()
Copy code
Keyboard control
webdriver
in Keys
Class provides almost all key methods on the keyboard , We can use send_keys + Keys
Realize the combined keys on the output keyboard, such as “Ctrl + C”、“Ctrl + V” etc. .
from selenium.webdriver.common.keys import Keys
# Locate the input box and enter the text
driver.find_element_by_id('xxx').send_keys('Dream , killer')
# Simulate the Enter key to jump ( After entering the content )
driver.find_element_by_id('xxx').send_keys(Keys.ENTER)
# Use Backspace To delete a character
driver.find_element_by_id('xxx').send_keys(Keys.BACK_SPACE)
# Ctrl + A Select all in the input box
driver.find_element_by_id('xxx').send_keys(Keys.CONTROL, 'a')
# Ctrl + C Copy the contents of the input box
driver.find_element_by_id('xxx').send_keys(Keys.CONTROL, 'c')
# Ctrl + V Paste the contents of the input box
driver.find_element_by_id('xxx').send_keys(Keys.CONTROL, 'v')
Copy code
Other common keyboard operations :
operation | describe |
---|---|
Keys.F1 |
F1 key |
Keys.SPACE |
Space |
Keys.TAB |
Tab key |
Keys.ESCAPE |
ESC key |
Keys.ALT |
Alt key |
Keys.SHIFT |
Shift key |
Keys.ARROW_DOWN |
down arrow |
Keys.ARROW_LEFT |
Left arrow |
Keys.ARROW_RIGHT |
Right arrow |
Keys.ARROW_UP |
up arrow |
For beginners Python
Or want to get started Python
Little buddy , You can search through wechat 【Python New horizons
】, Exchange and study together , They all come from novices , Sometimes a simple question card takes a long time , But maybe someone else's advice will suddenly realize , I sincerely hope you can make progress together .
copyright notice
author[Dream, killer],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201300935192476.html
The sidebar is recommended
- Similarities and differences of five pandas combinatorial functions
- Python beginner's eighth day ()
- Necessary knowledge of Python: take you to learn regular expressions from zero
- Get your girlfriend's chat records with Python and solve the paranoia with one move
- My new book "Python 3 web crawler development practice (Second Edition)" has been recommended by the father of Python!
- From zero to familiarity, it will take you to master the use of Python len() function
- Python type hint type annotation guide
- leetcode 108. Convert Sorted Array to Binary Search Tree(python)
- For the geometric transformation of Python OpenCV image, let's first talk about the extraordinary resize function
- leetcode 701. Insert into a Binary Search Tree (python)
guess what you like
-
For another 3 days, I sorted out 80 Python datetime examples, which must be collected!
-
Python crawler actual combat | using multithreading to crawl lol HD Wallpaper
-
Complete a python game in 28 minutes, "customer service play over the president card"
-
The universal Python praise machine (commonly known as the brushing machine) in the whole network. Do you want to know the principle? After reading this article, you can write one yourself
-
How does Python compare file differences between two paths
-
Common OS operations for Python
-
[Python data structure series] linear table - explanation of knowledge points + code implementation
-
How Python parses web pages using BS4
-
How do Python Network requests pass parameters
-
Python core programming - decorator
Random recommended
- Python Network Programming -- create a simple UPD socket to realize mutual communication between two processes
- leetcode 110. Balanced Binary Tree(python)
- Django uses Django celery beat to dynamically add scheduled tasks
- The bear child said "you haven't seen Altman" and hurriedly studied it in Python. Unexpectedly
- Optimization iteration of nearest neighbor interpolation and bilinear interpolation algorithm for Python OpenCV image
- Bilinear interpolation algorithm for Python OpenCV image, the most detailed algorithm description in the whole network
- Use of Python partial()
- Python game development, pyGame module, python implementation of angry birds
- leetcode 1104. Path In Zigzag Labelled Binary Tree(python)
- Save time and effort. 10 lines of Python code automatically clean up duplicate files in the computer
- Learn python, know more meat, and be a "meat expert" in the technical circle. One article is enough
- [Python data structure series] "stack (sequential stack and chain stack)" -- Explanation of knowledge points + code implementation
- Datetime module of Python time series
- Python encrypts and decrypts des to solve the problem of inconsistency with Java results
- Chapter 1: introduction to Python programming-4 Hello World
- Summary of Python technical points
- 11.5K Star! An open source Python static type checking Library
- Chapter 2: Fundamentals of python-1 grammar
- [Python daily homework] day4: write a function to count the number of occurrences of each number in the incoming list and return the corresponding dictionary.
- Python uses turtle to express white
- Some people say Python does not support function overloading?
- "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system with Python
- Introduction to Python - CONDA common commands
- Python actual combat | just "4 steps" to get started with web crawler (with benefits)
- Don't know what to eat every day? Python to tell you! Generate recipes and don't worry about what to eat every day!
- Are people who like drinking tea successful? I use Python to make a tea guide! Do you like it?
- I took 100g pictures offline overnight with Python just to prevent the website from disappearing
- Binary operation of Python OpenCV image re learning and image smoothing (convolution processing)
- Analysis of Python event mechanism
- Iterator of Python basic language
- Base64 encryption and decryption in Python
- Chapter 2: Fundamentals of python-2 variable
- Python garbage collection summary
- Python game development, pyGame module, python takes you to realize a magic tower game from scratch (1)
- Python draws a spinning windmill with turtle
- Deep understanding of Python features
- A website full of temptations for Python crawler writers, "lovely picture network", look at the name of this website
- Python opencv Canny edge detection knowledge supplement
- Complex learning of Python opencv Sobel operator, ScHARR operator and Laplacian operator
- Python: faker extension package