2022-02-01 16:46:21 Cui Qingcai

Hello everyone ! I'm Cui Qingcai .

Today I'd like to tell you some good news :《Python3 Web crawler development practice ( The second edition )》 It's on the shelf !!!!

you 're right , This is the book :

2018 year 5 Month my 《Python3 Web crawler development practice 》 The first edition of was published , More than three years since its listing, the sales volume is about 10w book , Thank you very much for your support . later , Due to some technical changes , I started planning to write the second edition of this book .

2021 year 11 month , The book has undergone various repeated revisions 、 Review and other stages , It's finally on the shelves today !

In recent months, I have received too many inquiries from readers , When will the second edition come out , I'm really sorry to have kept you waiting .

you 're right , Today is today , Here it comes !

Second Edition Update

Your first question may be , The second edition is better than the first edition  What has been updated ?

Because technology is always developing and improving , The same is true of reptile Technology , It is also evolving in the process of the constant struggle between reptiles and anti reptiles . For example, more and more web pages have taken various protective measures , For example, compression and confusion of front-end code 、API Parameter encryption 、WebDriver Detection of , To achieve efficient data crawling , We need to know something JavaScript Reverse analysis related technology .App Is the same ,App Bag grabbing protection 、 Shell protection 、Native turn 、 Risk control detection makes more and more App Data is difficult to crawl , So we have to understand some reverse related technologies , Such as Xposed、Frida、IDA Pro And the use of tools . besides , In recent years, deep learning and artificial intelligence have also developed in full swing , So reptiles can also be combined with artificial intelligence , For example, verification code recognition based on deep learning 、 We can also learn and understand technologies such as intelligent parsing and extraction of web content . in addition , The management and operation and maintenance technologies of some large-scale crawlers are also developing , At present Kubernetes、Docker、Prometheus Cloud native technology is also very popular , be based on Kubernetes Crawler management and operation and maintenance solutions based on cloud native technologies have also been very popular . However , These emerging technologies mentioned above were hardly mentioned in the first edition of the book .

besides , The first edition cited many cases and services in explaining data crawling , For example, cat's eye movie website 、 Taobao website 、 Agency service website , But years passed , In some cases, websites and services have been revised or stopped maintenance , As a result, many cases in the first edition of the book can no longer work properly . This is actually a big problem , Because the program doesn't work, it will greatly reduce the enthusiasm and sense of achievement of learning , And it will waste a lot of time . in addition , Even if the crawler code corresponding to the case is updated in time , Then we don't know when these case websites and services will be revised again , Because it's uncontrollable . therefore , In order to solve the problem completely , I spent nearly half a year building a crawler case platform (, The platform contains dozens of crawler cases , Including server rendering (SSR) Website 、 Single page application (SPA) Website 、 All kinds of anti climbing websites 、 Verification code website 、 Simulated login website 、 Various types App etc. , Covers most of the technologies related to reptiles and anti reptiles now , The whole platform is maintained by me , Almost all cases in the book come from the case platform , Thus, the problem of page revision is solved .

therefore , Compared with the first edition , The updated contents are as follows: :

  • Most of them have migrated to the self built case platform , There is no need to worry about the expiration or revision of the case in the future .
  • Replaced the original chapter I environmental installation , Summarize and migrate all parts of the environment configuration to the case platform ( And in the form of a chain outside the book , To ensure that the configuration and installation instructions of the environment can be updated in time .
  • Added some new request Libraries 、 Parsing library 、 Introduction to repository, etc , Such as httpx、parsel、Elasticsearch Wait for the introduction of Library .
  • Added the introduction of asynchronous crawler , Such as the basic principle of coprocessing 、aiohttp Introduction to the use and crawling of .
  • The introduction of some new automation tools is added , Such as Pyppeteer、Playwright Introduction to .
  • Added content related to in-depth learning , Such as graphic verification code 、 Identification scheme of sliding verification code .
  • Enriched the simulated Login chapter , Such as adding JWT Introduction and practice of simulated Login 、 Optimization of large-scale account pool .
  • Added JavaScript Reverse chapters , Including website encryption and obfuscation technology 、JavaScript Reverse debugging skills 、JavaScript Various simulation execution modes 、AST Restore obfuscated code 、WebAssembly And other related technologies .
  • Enriched App Chapter on automatic crawling technology , Such as the emerging framework Airtest Introduction to 、 Introduction to mobile group control and cloud mobile phone technology .
  • Added Android Reverse chapter , Such as Decompilation 、 Disassembly 、Hook、 Shelling 、so Introduction of file analysis and simulation execution .
  • The chapter of intelligent web page analysis is added , Include list pages 、 Detail page content extraction algorithm and classification algorithm .
  • Enriched Scrapy Introduction to relevant chapters , Such as Pyppeteer Docking 、RabbitMQ Docking 、Prometheus Docking, etc .
  • Added based on Kubernetes、Docker、Prometheus、Grafana And other cloud native technology crawler management and operation and maintenance solutions .

The above is the main update of the second edition .

Chapter introduction

In order to let everyone know the contents of the book more directly , Just put the directory here :

you 're right ! The whole book 900 Multi page , I measured it 4.3 Cm thick , The price is 139.8 element .

Can I see the second edition directly ?

Of course , Friends will worry , Do I need to study the first edition first , Then you can learn the second edition ?

The answer is : You can learn the second edition directly , The content knowledge system of the second edition book crawler is complete , Some old technologies have been removed in the first version , The second edition of the book is a new upgrade to all reptile knowledge systems .

Is there no foundation to learn ?

A friend may also ask , No reptiles or Python Can I learn the basics ?

The answer is : Sure , This book is specially prepared for friends with zero reptile Foundation , This book starts with the most basic environment configuration 、 The introduction of basic knowledge begins , Introduce each knowledge point of the reptile step by step , So there is no need to worry about the problem that basic reptile science will not . without Python Basics , It doesn't matter ( Of course, it would be better ), The book will also mention Python The configuration of the environment and some Python Introductory learning materials ( link ), At the same time, through various Python Code snippets to explain , Many cases are also easy to understand , When learning to crawl Python Will gradually master it .

Big guy recommends

The book also won Python Father's recommendation ( That's right is Python The founder of ,Guido van Rossum). In addition, I was also honored to receive Zeng Wenfeng, vice president of Microsoft's Asian Internet Engineering Institute 、 Famous reptile expert Liang Bin penny、 Recommended by song Ruihua, associate professor of Gaoling Artificial Intelligence College of Renmin University of China .

The following is the content of the recommendation :


In addition, the editor also made several color pages for the book , It's a publicity introduction to the whole book , You can have a look at :

Is there an electronic version ?

See here , You may also ask , Is there an electronic version ? Maybe some friends are used to learning with electronic books , Some friends may be inconvenient to buy overseas , So I want an electronic version .

But I'm sorry to say : There is no electronic version .

Because you know , If there is an electronic version , Then there will be all kinds of piracy soon , The Internet will also cause all kinds of malicious spread .

therefore , To protect copyright , This book is not in electronic version .

