current position:Home>[Python data collection] stock information collection
[Python data collection] stock information collection
2022-01-30 09:49:55 【liedmirror】
Little knowledge , Great challenge ! This article is participating in “ A programmer must have a little knowledge ” Creative activities
Preface
This article will introduce the way of capturing packets , Get use js Website data crawling for dynamic rendering , Taking stock information data as an example , Carry out actual operation .
Stock information collection
requirement : use requests And optional information extraction method to crawl stock related information , And stored in the database .
Ideas :
1. Grab the bag
Because this web page belongs to dynamic rendering web page , Data is passed through ajax Make dynamic rendering , Therefore, it is necessary to capture packets .
open F12 And refresh , Get the message sent by the browser js request .
Analyze as follows :
1. Copy any stock name ;
2. Start search , Paste the copied stock name ;
3. A... Appears in the search bar ( Or more -> Additional data refresh requests will be made over time ) Request interface containing all stock code information on this page ;
4. On the right side Response All data can be previewed in .
2. Argument parsing
The interface obtained in the previous step url as follows :
http://67.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112407972169804676412_1634974778877&pn=1&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1634974778878
Copy code
You can see , stay url after , Follow a lot get Parameters , The parameters can be modified to improve the crawler .
So I made some attempts ( The request has been verified , Match the actual page ):
First , The first thing to see is pn and pz Two parameters .
Write the interface in the back-end , Generally used page_num and page_size Corresponding page number and page size ,pn and pz Is the abbreviation of these two variables .
And then there was cb=jQuery....... Parameters .
jQuery Is based on js A framework for , Although I don't know much about the front end , But in general , The front and back end separation project needs to have a certain request specification , The interface you crawled to is Restful API Add a layer based on jQuery Function encapsulation .
You can speculate boldly , This interface is dedicated to jQuery Adaptation performed , The most likely parameter to control this adaptation is cb Parameters ( Because there is jQuery).
After deleting , Interface data usage json Format to return ( The back-end interface is not standardized ):
It's intuitive to see ,data Medium diff Correspondence is a problem list Type data , Start the included f* Is the data we need . Then there is the data range limitation
After manual comparison , The corresponding table of each field and parameter name is as follows :
Stock code | f12 |
---|---|
name | f14 |
Latest offer | f2 |
applies | f3 |
Up and down | f4 |
volume | f5 |
turnover | f6 |
The amplitude | f7 |
The highest | f15 |
The minimum | f16 |
Open today | f17 |
Yesterday | f18 |
Finally, there are some other unknown parameters , It may be the use of buried points at the back end , Filter one by one , Remove unnecessary parameters , Last , Add the following parameters :
# Required parameters , Other parameters do not need to be requested
needParams = ['f12', 'f14', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f15', 'f16', 'f17', 'f18']
fields = ",".join(needParams) # Merge into fields Parameters
# Enter paging related parameters to limit the number and range of crawls ,fid For the stock class (f3 For the purpose ),fields Qualified return parameters
# Compared with the interface crawled , Removed cb=jQuery.... Parameters , You can go straight back to json Format data
url = f'http://60.push2.eastmoney.com/api/qt/clist/get' \
f'?pn={pageNum}' \
f'&pz={pageSize}' \
f'&po=1&np=1&fltt=2&invt=2' \
f'&fid={fid}&fs=m:1+s:2' \
f'&fields={fields}'
Copy code
(po=1&np=1&fltt=2&invt=2 The effect is unknown , But deleting will cause the request not to be empty )
3. Database building
db = DB()
db.driver.execute('use spider_test')
db.driver.execute('drop table if exists money')
sql_create_table = """ CREATE TABLE `money` ( `id` int(11) NOT NULL AUTO_INCREMENT, `code` varchar(64) DEFAULT NULL, `name` varchar(64) DEFAULT NULL, `zxbj` varchar(64) DEFAULT NULL, `zdf` varchar(64) DEFAULT NULL, `zde` varchar(64) DEFAULT NULL, `cjl` varchar(64) DEFAULT NULL, `cje` varchar(64) DEFAULT NULL, `zf` varchar(64) DEFAULT NULL, `high` varchar(64) DEFAULT NULL, `low` varchar(64) DEFAULT NULL, `jk` varchar(64) DEFAULT NULL, `zs` varchar(64) DEFAULT NULL, PRIMARY KEY (`id`) )ENGINE=InnoDB DEFAULT CHARSET=utf8; """
db.driver.execute(sql_create_table)
Copy code
In principle, , The database field name should not appear Chinese characters , therefore , Database naming looks a little random .
The insertion operation is similar to that in job 1 , It's not going to show .
Copy code
4. Result display
At present, only one data table is created for experiment , If there is an actual demand, the table shall be divided according to time , Data of all time periods cannot be mixed in one table ( Otherwise, it doesn't make sense ).
in addition , Save only source data information , Facilitate data analysis or visual operation , Data escape ( Add % Or gawan 、 Billion and so on ) It should not be done in the database .
copyright notice
author[liedmirror],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201300949521244.html
The sidebar is recommended
- Similarities and differences of five pandas combinatorial functions
- Python beginner's eighth day ()
- Necessary knowledge of Python: take you to learn regular expressions from zero
- Get your girlfriend's chat records with Python and solve the paranoia with one move
- My new book "Python 3 web crawler development practice (Second Edition)" has been recommended by the father of Python!
- From zero to familiarity, it will take you to master the use of Python len() function
- Python type hint type annotation guide
- leetcode 108. Convert Sorted Array to Binary Search Tree(python)
- For the geometric transformation of Python OpenCV image, let's first talk about the extraordinary resize function
- leetcode 701. Insert into a Binary Search Tree (python)
guess what you like
-
For another 3 days, I sorted out 80 Python datetime examples, which must be collected!
-
Python crawler actual combat | using multithreading to crawl lol HD Wallpaper
-
Complete a python game in 28 minutes, "customer service play over the president card"
-
The universal Python praise machine (commonly known as the brushing machine) in the whole network. Do you want to know the principle? After reading this article, you can write one yourself
-
How does Python compare file differences between two paths
-
Common OS operations for Python
-
[Python data structure series] linear table - explanation of knowledge points + code implementation
-
How Python parses web pages using BS4
-
How do Python Network requests pass parameters
-
Python core programming - decorator
Random recommended
- Python Network Programming -- create a simple UPD socket to realize mutual communication between two processes
- leetcode 110. Balanced Binary Tree(python)
- Django uses Django celery beat to dynamically add scheduled tasks
- The bear child said "you haven't seen Altman" and hurriedly studied it in Python. Unexpectedly
- Optimization iteration of nearest neighbor interpolation and bilinear interpolation algorithm for Python OpenCV image
- Bilinear interpolation algorithm for Python OpenCV image, the most detailed algorithm description in the whole network
- Use of Python partial()
- Python game development, pyGame module, python implementation of angry birds
- leetcode 1104. Path In Zigzag Labelled Binary Tree(python)
- Save time and effort. 10 lines of Python code automatically clean up duplicate files in the computer
- Learn python, know more meat, and be a "meat expert" in the technical circle. One article is enough
- [Python data structure series] "stack (sequential stack and chain stack)" -- Explanation of knowledge points + code implementation
- Datetime module of Python time series
- Python encrypts and decrypts des to solve the problem of inconsistency with Java results
- Chapter 1: introduction to Python programming-4 Hello World
- Summary of Python technical points
- 11.5K Star! An open source Python static type checking Library
- Chapter 2: Fundamentals of python-1 grammar
- [Python daily homework] day4: write a function to count the number of occurrences of each number in the incoming list and return the corresponding dictionary.
- Python uses turtle to express white
- Some people say Python does not support function overloading?
- "Python instance" was shocked and realized the dirty words and advertisement detection of the chat system with Python
- Introduction to Python - CONDA common commands
- Python actual combat | just "4 steps" to get started with web crawler (with benefits)
- Don't know what to eat every day? Python to tell you! Generate recipes and don't worry about what to eat every day!
- Are people who like drinking tea successful? I use Python to make a tea guide! Do you like it?
- I took 100g pictures offline overnight with Python just to prevent the website from disappearing
- Binary operation of Python OpenCV image re learning and image smoothing (convolution processing)
- Analysis of Python event mechanism
- Iterator of Python basic language
- Base64 encryption and decryption in Python
- Chapter 2: Fundamentals of python-2 variable
- Python garbage collection summary
- Python game development, pyGame module, python takes you to realize a magic tower game from scratch (1)
- Python draws a spinning windmill with turtle
- Deep understanding of Python features
- A website full of temptations for Python crawler writers, "lovely picture network", look at the name of this website
- Python opencv Canny edge detection knowledge supplement
- Complex learning of Python opencv Sobel operator, ScHARR operator and Laplacian operator
- Python: faker extension package