current position:Home>Exciting challenge: Python crawler crawls the cover picture of station B

Exciting challenge: Python crawler crawls the cover picture of station B

2022-06-24 05:11:54Distant stars

Source power

For the article , The title is the essence of its concentration. ; So for video , Its cover is probably the most eye-catching frame .B standing , As a hot short video platform recently , There are all kinds of dances in its dance area , Especially house dance , suffer greatly “ Indoorsman ” The love of .( Don't tell me anything, black silk 、jk, I really don't like

Insert picture description here

So , I'll try to get it with a crawler B Station cover .

Web access

B The station has anti climbing measures , I started by analyzing web pages , To no avail .

Think about it. , It's so hot B standing , I'm definitely not the only one who wants to climb , So , I started searching for relevant articles and videos .

A slap , Soon! , I found one , according to B standing AV No. crawling to get the cover picture of the article , I tried , Why , It really works 🤩( Ecstasy in heart )

#  according to aid, Get the cover 
https://api.bilibili.com/x/web-interface/view?aid=(aid)

Just think about it , Since last year ,B The station began to use BV The no. , Which come of AV Give me the number , In the article AV Where did the number come from ? harm , I read the date of the article again ,2019 year , Oh , That's OK , People write that it will ,B The station hasn't been changed yet

There are more ways than difficulties , Now at least I know how to use AV Number , Then I use BV The number is found AV Don't you just number ? I'm so smart .

Look for it , A big man shared BV The no. api, Click send to boss page

I have a look at , Oh , still B The boss of the station , You don't talk about martial virtue , Teach others to do B standing ( But I like 🤪

#  according to BV Number acquisition cid
https://api.bilibili.com/x/player/pagelist?bvid=(bvid, Take the beginning BV!)
#  according to BV Number and cid Get video playlist 
https://api.bilibili.com/x/player/playurl?cid=(cid)&qn=(qn)&bvid=(bvid, Take the beginning BV!)
#  according to BV Number and cid obtain aid
https://api.bilibili.com/x/web-interface/view?cid=(cid)&bvid=(bvid, Take the beginning BV!)

Summarize the above api, So the idea is , Just have a hand , Follow the boss , That's the line. !

First, according to BV The number is found cid, According to BV Number and cid obtain aid, According to aid Get the cover .

And the data in the crawling process is basically json data . among :

cid Data in json Of ['data'][0]['cid'] in

aid Data in json Of ['data']['aid'] in

Cover picture Data in json Of ['data']['pic'] in

More detailed process , I wrote it in the comments of the code

Complete code

# -*- coding: UTF-8 -*-
# @Time: 2021/8/17 20:12
# @Author:  Stars in the distance 
# @CSDN: https://blog.csdn.net/qq_44921056

import os
import json
import requests
import chardet
from fake_useragent import UserAgent

#  Randomly generate request header 
ua = UserAgent(verify_ssl=False, path='D:/Pycharm/fake_useragent.json')


#  Random handover request header 
def random_ua():
    headers = {
        "accept-encoding": "gzip",  # gzip Compression coding    It can increase the file transfer rate 
        "user-agent": ua.random
    }
    return headers


#   Create folder 
def path_creat():
    _path = "D:/B Station cover /"
    if not os.path.exists(_path):
        os.mkdir(_path)
    return _path


#  The crawled page content is json Format processing 
def get_text(url):
    res = requests.get(url=url, headers=random_ua())
    res.encoding = chardet.detect(res.content)['encoding']  #  Uniform character encoding 
    res = res.text
    data = json.loads(res)  # json format 
    return data


#  according to bv Number acquisition av Number 
def get_aid(bv):
    url_1 = 'https://api.bilibili.com/x/player/pagelist?bvid={}'.format(bv)

    response = get_text(url_1)
    cid = response['data'][0]['cid']  #  obtain cid

    url_2 = 'https://api.bilibili.com/x/web-interface/view?cid={}&bvid={}'.format(cid, bv)
    response_2 = get_text(url_2)

    aid = response_2['data']['aid']  #  obtain aid
    return aid


#  according to av No. get the cover picture 
def get_image(aid):
    url_3 = 'https://api.bilibili.com/x/web-interface/view?aid={}'.format(aid)
    response_3 = get_text(url_3)
    image_url = response_3['data']['pic']  #  Get the picture download connection 
    image = requests.get(url=image_url, headers=random_ua()).content  #  Get photo 
    return image


#  Download the cover 
def download(image, file_name):
    with open(file_name, 'wb') as f:
        f.write(image)
        f.close()


def main():
    k = 'Y'
    while k == 'Y':  #  Cycle all the time according to the user's needs 
        path = path_creat()  #  Create save B Station cover folder 
        bv = input(" Please enter the name of the video bv Number :")
        image_name = input(" Please give the cover you want to download a favorite name :")
        aid = get_aid(bv)
        image = get_image(aid)
        file_name = path + '{}.jpg'.format(image_name)
        download(image, file_name)
        print(" Cover extraction completed ^_^")
        k = input(" Press Y Key to continue extraction , Press Q sign out :")


if __name__ == '__main__':
    main()

== Code can be directly copied to run ==, If it helps you , Remember == give the thumbs-up == Oh , It's also the greatest encouragement to the author , The shortcomings can be corrected in the comments section 、 communication .

Running results : Beautiful sister , Here, take you

  • With BV Number is BV1C5411P7qM Video of :
Insert picture description here
Insert picture description here

Image lossless magnification

Online website :https://bigjpg.com/zh

This can be used online , You can enlarge your picture online and do noise reduction . If you are interested, you can try it yourself , I think the effect is OK .

Reference article

Reference article 1:python Crawling B Station cover

Reference article 2:bilibili new BV Number api

author : Stars in the distance CSDN:https://blog.csdn.net/qq_44921056

This article is only for communication learning , Without the permission of the author , Prohibited reproduced , Let alone for other purposes , Offenders will investigate .

copyright notice
author[Distant stars],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/175/20210821105037106b.html

Random recommended