current position:Home>[Python data collection] selenium automated test framework

[Python data collection] selenium automated test framework

2022-01-30 19:37:29 liedmirror

Little knowledge , Great challenge ! This article is participating in “ A programmer must have a little knowledge ” Creative activities

Preface

A lot of times , adopt js Dynamic rendering does not return plaintext data directly , But through some encryption algorithms , As a result, we can't get the correct data . Regarding this , We can go through Selenium Automated testing framework , Simulate the browser to achieve real “ You can climb when you can see it ”.

principle

Selenium By imitating browser behavior ,selenium By opening a browser , Then execute the operation events set by our implementation , So as to achieve data acquisition .

edition

Selenium There are two different versions

Selenium RC,Remote Control: Tradition Selenium frame . Selenium Webdriver: New automation interface , Break through the Selenium 1 Some of the limitations of .

Our common version is Selenium Webdriver, This version will also be selected later .

technological process

  1. Driver created and sent to browser ;
  2. The driver contains a HTTP Server, Used to receive http request ;
  3. HTTP Server Manipulate the browser to perform steps according to the request ;
  4. The browser returns the result of the step execution to HTTP Server;
  5. HTTP Server Return the result to Selenium Script .

install

  1. install selenium library

Execute the following command :

pip install selenium
 Copy code 
  1. install chrome drive

( limit windows) download chromedrive.exe Driver program , Then copy it to python or env Of scripts Under the table of contents .

Use

from selenium import webdriver
from selenium.webdriver.chrome.options import Options from bs4 import BeautifulSoup

url = r"https://juejin.cn/"
chrome_options = Options() 
chrome_options.add_argument('--headless') 
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=chrome_options) 
driver.get(url)
html = driver.page_source 
soup=BeautifulSoup(html,"lxml")
#  Use... In the back BeautifulSoup To extract 
 Copy code 

summary

Selenium The advantage is that , It can be passed sleep Wait for loading , So you can ignore js Logic on ; But it also has a fatal disadvantage : Easy to detect , Therefore, there are great restrictions on the use .

copyright notice
author[liedmirror],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201301937268573.html

Random recommended