current position:Home>Python reptile lesson 2-9 Chinese monster database. It is found that there is a classification of color (he) desire (Xie) monsters during operation

Python reptile lesson 2-9 Chinese monster database. It is found that there is a classification of color (he) desire (Xie) monsters during operation

2022-01-31 09:44:00 Dream eraser

「 This is my participation 11 The fourth of the yuegengwen challenge 11 God , Check out the activity details :2021 One last more challenge

I got a good book recently 《 Chinese monster story ( The complete )》, Suddenly thought to do a collection of Chinese monster website should be very interesting , So this article . Python  Reptile lesson  2-9  China monster database , In operation, we found a color (he) to (xie) Monster classification

Analysis before writing crawlers

For writing crawlers , A lot of times, find a target site , Then analyze the site , Always find a way to get the data you want ; There is also a situation like this today , We came across an idea , I think it's a good idea , And then try to grab some basic data , In combination PHP,JAVA Make a website of these languages , Maybe you can get good traffic .

The data to be captured today is Chinese monsters , In addition to sorting it out by yourself , It's important to find a data source website , So I directly open Baidu search , Sure enough , With an eraser (dream.blog.csdn.net/) Your intelligence is still hard to think of …

Although not much about monster , But there was one Know the demon . This website has really done such an interesting job of sorting out monsters , Here for something that came to mind earlier than I did bosses , Point a praise .

Python  Reptile lesson  2-9  China monster database , In operation, we found a color (he) to (xie) Monster classification

Now that we've found the target site , The next work is relatively simple , Analysis starts .

Let's first see if the amount of data is complete , One sentence I often write in my blog is “ As long as the human eye can see the data , The reptiles can catch ”. This website is personal maintenance , So the data is more comprehensive , Of course, the amount is not very large , total 130 Page about , Every time I see a description like the last page , I knew that there must be drama in this website .

Python  Reptile lesson  2-9  China monster database , In operation, we found a color (he) to (xie) Monster classification

Get paging address rules

Just click on 1~2 page , You can get the basic rules of pagination .

https://www.cbaigui.com/page/4
https://www.cbaigui.com/page/3
https://www.cbaigui.com/page/130
 Copy code 

You can see in the above address , The page number is a simple number .

Write regular expressions

The goal of this time is to get the monster data , A certain amount of redundant data is allowed during crawler crawling , So analyze the page elements directly , Take a look at what data is valuable .

Python  Reptile lesson  2-9  China monster database , In operation, we found a color (he) to (xie) Monster classification

The area shown in the red box above , Compare core data for list pages , Here's actually grabbing 2 It's worth , The first is the title , The second is the link after the title click , Grab the link , In order to get the inner page data , This is shown in the red box below tag Label area . The reason why we got this tag , It's different from person to person , I mainly want to get the tags and then classify them accordingly .

Python  Reptile lesson  2-9  China monster database , In operation, we found a color (he) to (xie) Monster classification

If you want to complete the data integrity , You can grab some other information from the head . It contains the origins of some dynasties and monsters .

Python  Reptile lesson  2-9  China monster database , In operation, we found a color (he) to (xie) Monster classification

Analysis complete , From the eraser's point of view , The hardest work is done , The rest is to write code and grab .

Crawler writing work

Here we need to pay attention to , The site should belong to individual developers , So we should pay attention to limiting the climbing speed when climbing , If you climb too fast, it's bad for the website .

And then the coding begins .

First of all, you can use some regular expression tools , First write the regular match , In fact, this part has been written , A lot of code is written .

This page uses 2 On regularity , The first is used to match the title with the link , The rules are as follows :

The first regular expression :

<h2 class="post-title">[.\s]*<a href="(.*?)" rel="bookmark">(.*?)</a>
 Copy code 

Python  Reptile lesson  2-9  China monster database , In operation, we found a color (he) to (xie) Monster classification

The second regular expression :

<a href=".*?" rel="tag">(.*?)</a>'  Copy code 

Python  Reptile lesson  2-9  China monster database , In operation, we found a color (he) to (xie) Monster classification

For the rigor of writing regular expressions , Not required in this series of columns , Enough , Easy to use .

Let's show you part of the code , The core code has been completed , The rest is up to you !~

import requests
import re
import time


headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'}


def get_tags(url):
    res = requests.get(url, headers=headers)
    pattern = re.compile(
        r'<a href=".*?" rel="tag">(.*?)</a>')
    all = pattern.findall(res.text)
    print(all)


def get_list(page):
    url_format = "https://www.cbaigui.com/page/{page}"
    url = url_format.format(page=page)
    res = requests.get(url, headers=headers)
    pattern = re.compile(
        r'<h2 class="post-title">[.\s]*<a href="(.*?)" rel="bookmark">(.*?)</a>')
    all = pattern.findall(res.text)
    for item in all:
        get_tags(item[0])
        time.sleep(1)


if __name__ == "__main__":
    total = int(input(" Please enter the maximum page number :"))
    for i in range(1, total):
        get_list(1)
    # get_tags("https://www.cbaigui.com/post-18153.html")


 Copy code 

After running , One of the results is Lust ? What the hell? ?

Python  Reptile lesson  2-9  China monster database , In operation, we found a color (he) to (xie) Monster classification

Curiosity didn't hold back , Find the link and click , Have a good look at the relevant information , Very fruitful .

Python  Reptile lesson  2-9  China monster database , In operation, we found a color (he) to (xie) Monster classification

Reptiles talk after class

Complete the code, you can complete it by yourself , The rest is all about data storage , You can write csv In the file . After the crawler crawls the data , You'll find a lot of fun , For example, this case has virtually added a lot of knowledge to me .

copyright notice
author[Dream eraser],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201310943583144.html

Random recommended