current position:Home>Python crawler - get fund change information
Python crawler - get fund change information
2022-01-31 21:35:38 【first quarter of the moon】
This is my participation 11 The fourth of the yuegengwen challenge 3 God , Check out the activity details :2021 One last more challenge
Lose humanity , Lose a lot ; Loss of animal nature , Lose everything .
1 Preface
Previously, we have introduced how to obtain the fund list and how to obtain the basic information of the Fund , Today we continue with the previous content , Get the change information of the Fund . How to get and parse the information of the page api Interface call .
2 Capture change information
We look at the basic information page of the Fund , We can find that the page of fund change information can include the following 4 Parts of :
Next, let's talk about our idea of capturing data , In the first figure, we have got the basic information of the Fund , Change information and stage increase , But the increase is already in the third stage 2 It is shown in the figure , So in this picture , We just need to get the real-time rise and fall and the net value of the fund the previous day .
2.1 Acquisition of fund change information
# Fund change information , Let's start with a simple connection , The way of obtaining other funds is similar to this ,
# The access address can be changed into other fund codes .
http://fund.eastmoney.com/005585.html
Copy code
There are two parts to get the change , One part is to obtain the new information of fund changes in real time , You will find that the net worth estimation will change over time , By monitoring the access request record of the browser , Caught such a api visit , The flowers bloom in an instant .
// http://fundgz.1234567.com.cn/js/005585.js
{
'fundcode': '005585',
'name': ' Galaxy Entertainment mix ',
'jzrq': '2021-11-16',
'dwjz': '1.6718',
'gsz': '1.6732',
'gszzl': '0.08',
'gztime': '2021-11-17 15:00'
}
Copy code
The fund code and fund name can be according to json The returned content can be known , however jzrq,dwjz,gsz,gszzl,gztime
What do you mean , I studied it carefully for a long time , Combined with the content displayed on the page , Plus dfcf The habit of coding the first letter of Chinese Pinyin , I guess these fields roughly mean Net value date 、 Unit net worth 、 Estimate 、 Estimate the growth rate 、 Estimate time
. I'm a little complacent , Even cracked the mystery .
The second part is to obtain the unit net value of the Fund , Through analysis, it is found that the data is contained in a <dl class="dataItem02">
Of html In the element , The way we get it is through bs4 The method of parsing returns the page information to grab the element dom Get the tree .
To sum up, we passed api The interface call is used to obtain the real-time change information of the fund , Returned by parsing html, analysis dom Tree to get the unit net value information of the Fund . The following is the code for grabbing information in the first part .
# Capture real-time fund change information
resp = requests.get("http://fundgz.1234567.com.cn/js/{}.js".format(code))
# Remove js The woolen fabric is convenient for data json conversion
data = resp.text.replace("jsonpgz(", "").replace(");", "")
body = json.loads(data)
# Output the obtained result data
print("{} {} Estimate {} Estimate the rise and fall {} Estimate time {}".format(body["fundcode"], body["name"], body["gsz"], body["gszzl"], body["gztime"]))
# Request information on the fund page
response = requests.get("http://fund.eastmoney.com/{}.html".format(code))
# Print the encoding method of the original request return message
# print(response.apparent_encoding)
# Set the encoding method of the returned content of the request , Avoid random code on the console
response.encoding = "UTF-8"
resp_body = response.text
# Data conversion and analysis
soup = BeautifulSoup(resp_body, 'lxml')
# Because it is determined that there is only one element , So you can use find Release to get data , This is to find dl label ,class=dataItem02 The elements of
dl_con = soup.find("dl", class_="dataItem02")
# Get the update time of the net value of the Fund
value_date = dl_con.find("p").get_text()
# Only the time of extracting fund data
value_date = value_date.replace(" Unit net worth ", "").replace("(", "").replace(")", "")
# Net worth data and percentage rise and fall data are in dd Two under the label p In the label
value_con = dl_con.find("dd", class_="dataNums")
data_list = value_con.find_all("span")
val_data = data_list[0].get_text()
per_data = data_list[1].get_text()
print(" Fund net worth date {} Net worth data {} Up and down percentage {}".format(value_date, val_data, per_data))
Copy code
Final , Through the above operations , You can get the change information of the Fund .
2.2 Capture of fund stage information
The stage information capture of the fund also adopts bs4 Operate by parsing page data , There are three figures here , The first figure shows the rise and fall information of the stage , The second and third are the quarterly and annual rise and fall information , Because finally, we need to format the storage , For the first graph, we can store structured row patterns , It can show the changes every day , But for two and three, we need to use column mode storage , Query as a kind of statistical data . Because the parsing methods of the two methods are different , The header field in the figure exists as a field in the database , So we don't need to care , Two and three need to get the header of the table for storage , The statistical events are also the data we store . Another is that we should not only get the basic information of the Fund , And get to Shanghai and Shenzhen 300 Information about , In the future, it is convenient to use it as an intensity index for benchmark judgment during screening , So Shanghai and Shenzhen 300 The data also needs to be captured , The operation of this part is not difficult , It mainly focuses on the analysis of the data obtained and the subsequent storage ideas .
I'm here to get all the pages directly table Elements , Then cycle and output the results , Then get the data that needs to be captured in that subscript . Here I will directly post the code to explain :
# Print forms
def print_table(head, body):
tb = PrettyTable() # Generate table objects
tb.field_names = head # Define header
tb.add_row(body)
print(tb)
# Query quarter Annual data
def query_year_quarter(data_list, num):
stage_list = data_list.find_all("tr")[0].find_all("th")
head_list = []
for nd in stage_list:
val = nd.get_text().strip()
val = val.replace(" quarter ", "").replace(" year ", "").replace(" year ", "-")
if val:
# print(nd.get_text())
head_list.append(val)
body_list = []
stage_list = data_list.find_all("tr")[num].find_all("td")
for nd in stage_list:
val = nd.get_text()
if " Stage increase " in val or " Shanghai and Shenzhen 300" in val:
continue
body_list.append(val.replace("%", ""))
# Print forms
print_table(head_list, body_list)
# To get the basic information of the fund, only some codes are pasted here , You need to combine the information of the net value part to run
def query_fund_basic(code="005585", hsFlag=False):
# Stage increase header
stage_head_list = ["stage_week", "stage_month", "stage_month3", "stage_month6", "stage_year", "stage_year1","stage_year2", "stage_year3", ]
stage_list = body_list[11].find_all("tr")
# For the first 2 One is the fund situation For the first 4 Yes hs300 situation
num = 1
if hsFlag:
num = 3
tmp_list = []
for nd in stage_list[num].find_all("td"):
val = nd.get_text()
if " Stage increase " in val or " Shanghai and Shenzhen 300" in val:
continue
tmp_list.append(val.replace("%", ""))
# Print phase amplitude table
print("\t------ Stage rise and fall ------")
print_table(stage_head_list, tmp_list)
print("\t------ Quarterly rise and fall ------")
query_year_quarter(body_list[12], num)
print("\t------ Annual rise and fall ------")
query_year_quarter(body_list[13], num)
Copy code
3 The final result shows
Due to limited space , This code will not be shown in the text , In the future, I will maintain the content in github Provide... On .
copyright notice
author[first quarter of the moon],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201312135333881.html
The sidebar is recommended
- Python crawls the map of Gaode and the weather conditions of each city
- leetcode 1275. Find Winner on a Tic Tac Toe Game(python)
- leetcode 2016. Maximum Difference Between Increasing Elements(python)
- Run through Python date and time processing (Part 2)
- Application of urllib package in Python
- Django API Version (II)
- Python utility module playsound
- Database addition, deletion, modification and query of Python Sqlalchemy basic operation
- Tiobe November programming language ranking: Python surpasses C language to become the first! PHP is about to fall out of the top ten?
- Learn how to use opencv and python to realize face recognition!
guess what you like
-
Using OpenCV and python to identify credit card numbers
-
Principle of Python Apriori algorithm (11)
-
Python AI steals your voice in 5 seconds
-
A glance at Python's file processing (Part 1)
-
Python cloud cat
-
Python crawler actual combat, pyecharts module, python data analysis tells you which goods are popular on free fish~
-
Using pandas to implement SQL group_ concat
-
How IOS developers learn Python Programming 8 - set type 3
-
windows10+apache2. 4 + Django deployment
-
Django parser
Random recommended
- leetcode 1560. Most Visited Sector in a Circular Track(python)
- leetcode 1995. Count Special Quadruplets(python)
- How to program based on interfaces using Python
- leetcode 1286. Iterator for Combination(python)
- leetcode 1418. Display Table of Food Orders in a Restaurant (python)
- Python Matplotlib drawing histogram
- Python development foundation summary (VII) database + FTP + character coding + source code security
- Python modular package management and import mechanism
- Django serialization (II)
- Python dataloader error "dataloader worker (PID XXX) is killed by signal" solution
- apache2. 4 + Django + windows 10 Automated Deployment
- leetcode 1222. Queens That Can Attack the King(python)
- leetcode 1387. Sort Integers by The Power Value (python)
- Tiger sniffing 24-hour praise device, a case with a crawler skill, python crawler lesson 7-9
- Python object oriented programming 01: introduction classes and objects
- Baidu Post: high definition Python
- Python Matplotlib drawing contour map
- Python crawler actual combat, requests module, python realizes IMDB movie top data visualization
- Python classic: explain programming and development from simple to deep and step by step
- Python implements URL availability monitoring and instant push
- Python avatar animation, come and generate your own animation avatar
- leetcode 1884. Egg Drop With 2 Eggs and N Floors(python)
- leetcode 1910. Remove All Occurrences of a Substring(python)
- Python and binary
- First acquaintance with Python class
- [Python data collection] scrapy book acquisition and coding analysis
- Python crawler from introduction to mastery (IV) extracting information from web pages
- Python crawler from entry to mastery (III) implementation of simple crawler
- The apscheduler module in Python implements scheduled tasks
- 1379. Find the same node in the cloned binary tree (Java / C + + / Python)
- Python connects redis, singleton and thread pool, and resolves problems encountered
- Python from 0 to 1 (day 11) - Python data application 1
- Python bisect module
- Python + OpenGL realizes real-time interactive writing on blocks with B-spline curves
- Use the properties of Python VTK implicit functions to select and cut data
- Learn these 10000 passages and become a humorous person in the IT workplace. Python crawler lessons 8-9
- leetcode 986. Interval List Intersections(python)
- leetcode 1860. Incremental Memory Leak(python)
- How to teach yourself Python? How long will it take?
- Python Matplotlib drawing pie chart