current position:Home>Python practice - capture 58 rental information and store it in MySQL database
Python practice - capture 58 rental information and store it in MySQL database
2022-01-30 13:20:38 【baiyuliang】
Python Operating the database , Compared with other languages , It's a lot simpler !
Mysql Not to mention the installation of and the operation of building database and table , I created a local database here py, And the watch tb_py_test:
create table tb_py_test
(
id int auto_increment
primary key,
url text null,
content varchar(255) null,
price double null
);
Copy code
Next , install py Of mysql Connection tool ,pymysql:pip install pymysql
;
After successful installation , Writing linker :
pymysql.connect(host="localhost", user="root", password="root", database="py")
Copy code
It returns the database object , Then get the cursor through the database object cursor, Re pass cursor perform sql sentence , And get the results :
import pymysql
db = pymysql.connect(host="localhost", user="root", password="root", database="py")
cursor = db.cursor()
try:
sql = "select * from tb_py_test"
cursor.execute(sql)
results = cursor.fetchall()
for result in results:
print(result[0], result[1], result[2])
except Exception as e:
print('fail:' + str(e))
db.rollback()
db.close()
Copy code
Let's manually insert a piece of data into the database :
And then execute py:
Prove that there is no problem with database connection and operation !
So next , The function of this blog , It's crawling 58 City rental information , Price <2000 Before 300 Bar information !
open 58 home page ,zz.58.com/ analysis :
We need to get “ Rent button ” And automatically click ( Of course , You can also skip this step , Get the rental connection directly ):
driver = webdriver.Firefox(executable_path=r'C:\geckodriver.exe')
driver.get("https://zz.58.com")
zf = driver.find_element_by_xpath("//a[@tongji_tag='pc_home_dh_zf']")
zf.click()
Copy code
Be careful find_element_by_xpath usage ,"//a[@tongji_tag='pc_home_dh_zf']" It means to find out a In the label , Properties, tongji_tag, The property value is 'pc_home_dh_zf' Of element, Automatically click to enter the rental information page : Continue analysis , On demand , We need to get three fields , Rental information title , link , Price :
here , What we need to pay attention to , Rental information is a list (ul>li), What we get is a collection , therefore i We need to get... First ul, Then get li surface , Last loop traversal li, And from each item To extract information :
driver = webdriver.Firefox(executable_path=r'C:\geckodriver.exe')
driver.get("https://zz.58.com")
zf = driver.find_element_by_xpath("//a[@tongji_tag='pc_home_dh_zf']")
zf.click()
time.sleep(2)
driver.switch_to.window(driver.window_handles[len(driver.window_handles) - 1])
ul = driver.find_element_by_css_selector('ul.house-list')
lis = ul.find_elements_by_tag_name('li')
for i in range(len(lis) - 1):
price = lis[i].find_element_by_class_name("money").find_element_by_tag_name('b').text # Price
if int(price) < 2000:
des = lis[i].find_element_by_class_name("des")
a = des.find_element_by_tag_name('a')
title = a.text # title
url = a.get_attribute('href') # link
print(title, " The rent :" + price, " link :" + url)
Copy code
But actually , Let's take a closer look at the following li Elements :
The last one is the page number , It's not the data we want , So you need to filter out this one , We can directly... During the cycle -1 that will do !
driver.switch_to.window(driver.window_handles[len(driver.window_handles) - 1])
The meaning of this sentence is , Get the handle to the new window , And switch to a new window , otherwise ,driver Looking for the elements of the old window !
Print the results :
Is it over ?, Of course not , Our need is to get 300 strip , The above code just gets the data of the first page , So we need to get the data on the first page , Automatically get the data of the next page , Until you get satisfaction 300 strip !
The first method : Analyze each page url link :
You'll find that , When switching to the next page , This parameter will change to... Following the number of page numbers pn2,pn3..., So after the data of the current page is extracted , You can modify it directly url And go to the next page ;
The second method : Click on the simulation “ The next page ” Button :
This is also the method used in this blog :
nextBtn = driver.find_element_by_css_selector('div.pager').find_element_by_css_selector("a.next")
nextBtn.click()
Copy code
When looping through each piece of data , Then insert into the database , that will do :
sql = "insert into tb_py_test (url, content, price) VALUE ('','','')"
cursor.execute(sql)
db.commit()
Copy code
complete py Code :
import time
from selenium import webdriver
import pymysql
class House:
def __init__(self):
self.title = ""
self.url = ""
self.price = 0.0
fp = webdriver.FirefoxProfile()
# Limit css load
fp.set_preference("permissions.default.stylesheet", 2)
# Limit img load
fp.set_preference("permissions.default.image", 2)
# Limit js load
fp.set_preference("javascript.enabled", False)
driver = webdriver.Firefox(firefox_profile=fp, executable_path=r'C:\geckodriver.exe')
driver.get("https://zz.58.com")
zf = driver.find_element_by_xpath("//a[@tongji_tag='pc_home_dh_zf']")
zf.click()
time.sleep(2)
driver.switch_to.window(driver.window_handles[len(driver.window_handles) - 1])
houses = []
def start():
while True:
if len(houses) >= 300:
return
getHouses()
def getHouses():
try:
ul = driver.find_element_by_css_selector('ul.house-list')
lis = ul.find_elements_by_tag_name('li')
for i in range(len(lis) - 1):
house = House()
try:
price = lis[i].find_element_by_class_name("money").find_element_by_tag_name('b').text # Price
if int(price) < 2000:
des = lis[i].find_element_by_class_name("des")
a = des.find_element_by_tag_name('a')
title = a.text # title
url = a.get_attribute('href') # link
# print(title, " The rent :" + price, " link :" + url)
house.title = title
house.url = url
house.price = price
addHouse(house)
if len(houses) >= 300:
return
except Exception as e:
print(str(e))
nextBtn = driver.find_element_by_css_selector('div.pager').find_element_by_css_selector("a.next")
nextBtn.click()
time.sleep(3)
except Exception as e:
print(str(e))
def addHouse(house):
houses.append(house)
print(len(houses), house.price)
try:
# First query whether the item has been inserted into the database , Yes, filter
sql = "select url from tb_py_test where url='" + house.url + "'"
cursor.execute(sql)
url = cursor.fetchone()
if url:
return
# insert data
sql = "insert into tb_py_test (url, content, price) VALUE ('" + house.url + "','" + house.title + "',+'" + house.price + "')"
cursor.execute(sql)
db.commit()
print('insert success')
except Exception as e:
print('insert fail:' + str(e))
db.rollback()
db = pymysql.connect(host="localhost", user="root", password="root", database="py")
cursor = db.cursor()
start()
db.close()
Copy code
copyright notice
author[baiyuliang],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201301320348351.html
The sidebar is recommended
- Some people say Python does not support function overloading?
- "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system with Python
- Introduction to Python - CONDA common commands
- Python actual combat | just "4 steps" to get started with web crawler (with benefits)
- Don't know what to eat every day? Python to tell you! Generate recipes and don't worry about what to eat every day!
- Are people who like drinking tea successful? I use Python to make a tea guide! Do you like it?
- I took 100g pictures offline overnight with Python just to prevent the website from disappearing
- Binary operation of Python OpenCV image re learning and image smoothing (convolution processing)
- Analysis of Python event mechanism
- Iterator of Python basic language
guess what you like
-
Base64 encryption and decryption in Python
-
Chapter 2: Fundamentals of python-2 variable
-
Python garbage collection summary
-
Python game development, pyGame module, python takes you to realize a magic tower game from scratch (1)
-
Python draws a spinning windmill with turtle
-
Deep understanding of Python features
-
A website full of temptations for Python crawler writers, "lovely picture network", look at the name of this website
-
Python opencv Canny edge detection knowledge supplement
-
Complex learning of Python opencv Sobel operator, ScHARR operator and Laplacian operator
-
Python: faker extension package
Random recommended
- Python code reading (Part 44): find the location of qualified elements
- Elegant implementation of Django model field encryption
- 40 Python entry applet
- Pandas comprehensive application
- Chapter 2: Fundamentals of python-3 character string
- Python pyplot draws a parallel histogram, and the x-axis value is displayed in the center of the two histograms
- [Python crawler] detailed explanation of selenium from introduction to actual combat [1]
- Curl to Python self use version
- Python visualization - 3D drawing solutions pyecharts, Matplotlib, openpyxl
- Use python, opencv's meanshift and CAMSHIFT algorithms to find and track objects in video
- Using python, opencv obtains and changes pixels, modifies image channels, and trims ROI
- [Python data collection] university ranking data collection
- [Python data collection] stock information collection
- Python game development, pyGame module, python takes you to realize a magic tower game from scratch (2)
- Python solves the problem of suspending execution after clicking the mouse in CMD window (fast editing mode is prohibited)
- [Python from introduction to mastery] (II) how to run Python? What are the good development tools (pycharm)
- Python type hints from introduction to practice
- Python notes (IX): basic operation of dictionary
- Python notes (8): basic operations of collections
- Python notes (VII): definition and use of tuples
- Python notes (6): definition and use of lists
- Python notes (V): string operation
- Python notes (IV): use of functions and modules
- Python notes (3): conditional statements and circular statements
- Python notes (II): lexical structure
- Notes on python (I): getting to know Python
- [Python data structure series] - tree and binary tree - basic knowledge - knowledge point explanation + code implementation
- [Python daily homework] Day7: how to combine two dictionaries in an expression?
- How to implement a custom list or dictionary in Python
- 15 advanced Python tips for experienced programmers
- Python string method tutorial - how to use the find() and replace() functions on Python strings
- Python computer network basics
- Python crawler series: crawling global airport information
- Python crawler series: crawling global port information
- How to calculate unique values using pandas groupby
- Application of built-in distribution of Monte Carlo simulation SciPy with Python
- Gradient lifting method and its implementation in Python
- Pandas: how to group and calculate by index
- Can you create an empty pandas data frame and fill it in?
- Python basic exercises teaching! can't? (practice makes perfect)