current position:Home>Python crawler - ETF fund acquisition

Python crawler - ETF fund acquisition

2022-02-01 03:36:44 first quarter of the moon

This is my participation 11 The fourth of the yuegengwen challenge 5 God , Check out the activity details :2021 One last more challenge

Light is always short , Darkness is eternity , You pompous light , How can we understand the depth of darkness ?

1 Preface

The change information of the fund has been introduced before , But these funds are off-site , Today we are going to introduce an investment product with threshold -ETF. Only players who open securities accounts can enter the game ,ETF It is an exchange traded fund , You can trade in intraday , Trading is a little stronger than OTC funds , Then talk less , Let's get down to business right now .

2 ETF List and abbreviations

ETF The acquisition method of fund changes and basic information is the same as that of OTC funds , How to obtain more comprehensive ETF What about the fund list ?

#  Get a list of fund information 
http://fund.eastmoney.com/data/fbsfundranking.html
 Copy code 

Here are ETF The information displayed in the information list .

ETF When trading on the floor , There is usually an abbreviation , The way to get the abbreviation is a little more troublesome , Need to visit a page , And then through bs4 To parse elements to get .

# Through the analysis of , We can find that the fund code prefix represents its market ,5 Shanghai market  1- Shenzhen market , To real estate ETF And photovoltaic ETF For example 
http://quote.eastmoney.com/sz159707.html
http://quote.eastmoney.com/sh515790.html
 Copy code 

3 ETF information acquisition

3.1 ETF Get a list of letters

ETF List information by accessing the list, we find that when accessing the list data , It's a request api Interface to the background , Then it returns a response message to the front end .  Get fund list information

http://fund.eastmoney.com/data/rankhandler.aspx?op=ph&dt=fb&ft=ct&rs=&gs=0&sc=zzf&st=desc&pi=1&pn=50
 Copy code 

I feel very happy to see here , Don't parse html The file , When passed request Use get When getting data by , It is found that no access permission is returned , I think maybe I didn't carry cookie Why , But I didn't log in , Maybe the request header needs to carry some page information , therefore , After trying , Finally, it is determined that the information to be carried is :

headers = {
    'Host': 'fund.eastmoney.com',
    'Referer': 'http://fund.eastmoney.com/data/fbsfundranking.html'
}
 Copy code 

Finally, the code for obtaining the fund list should be written like this :

The results obtained after debugging are shown in the figure below :

3.2 Get the abbreviation of the Fund

Getting the abbreviation of the fund is relatively simple , Through analysis, we found that , The abbreviation is located in <span class="quote_title_0 wryh"> Photovoltaic ETF</span> in , By visiting the page to get the element, you can get the description of the abbreviation . The specific code is shown in the figure below :

4 The end result shows

After obtaining the list of funds and obtaining the abbreviation of funds , We got the final result, as shown in the figure below , The purpose of obtaining information has been achieved :

 Final results of the fund

In the future, we will share the fund information with ETF The information is merged and stored in the database , Facilitate subsequent data analysis .

copyright notice
author[first quarter of the moon],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202010336415016.html

Random recommended