current position:Home>Case 58 of 120 Python crawlers, mobile app crawler, preparation of "arsenal" and test of skin shrimp app

Case 58 of 120 Python crawlers, mobile app crawler, preparation of "arsenal" and test of skin shrimp app

2022-02-02 10:03:50 InfoQ

This blog begins , We will be involved in mobile phones  APP  Collection field , In this part, we mainly analyze the core interface , Write breakthrough code for the interface .

Here I hope to pass through the front  57  Learning from a series of crawler blogs , You can already use different “ methods ”, To collect data .

Caught tools  Fiddler

mobile phone  APP  The biggest difference between web crawler collection and web crawler collection , Just need to grab  APP  Address of the interface , Because we don't have the support of Google browser developer tools , So we need to use  Fiddler  Tools , Carry out the bag .

For any software , Can make it work properly , It's done.  90% The job of .

fiddler  It's a charging software , Official website :
, It is recommended to buy , If you don't want to pay, you can choose another path , The official version has  30  Days of probation , Let's use this version to learn .

When downloading , choice  
  Version can ,
Download address

The installation process is relatively simple , Basically follow the principle of the next step , After installation, the following web page will appear , Be careful not to close here , We will use several configuration documents later .
Operation interface , The effect is as follows , It is generally not recommended to sinicize , Because there are not many complex operations , I'm used to using more .

Here's a detail to note , When you open  Fiddler  When , It has put  HTTP  Your agent has been modified , So when you open  Fiddler  when , You may not be able to access the website normally , Or access slows down .

Fiddler  The default is to capture packets directly
If you don't want to grab the bag , Can be in  
File->Capture Traffic
  And the lower left button , Shortcut keys are  F12.![Python Reptiles 120 Examples of cases 58, mobile phone APP Reptiles ,“ arsenal ” To prepare and Pipi shrimp APP Test of ]( =200x) What you can grab by default is  
  Requested site , It will be explained later  
  How to configure . After opening the packet capture request , To access the Internet , You will get the following page , The relevant fields have been marked in the figure below .

The content in the above picture , Be sure to make an impression , Convenient for follow-up study . Next, double-click any of the above requests , View the contents of the right window , The following figure shows you what the contents of the right window are .

In the process of writing crawler, the most used is  
, It represents viewing the data content of requests and responses .

Other functions are briefly described as follows :

  • Statistics
    :  View about  HTTP  Request performance and data analysis ;
  • AutoResponder
     : Allow to intercept requests for specified rules , You can do Interception Based on strings and regular expressions , After hijacking the specified request , Use local version ;
  • Composer
    :  Custom request sending server , Can be built , You can also drag a past directly ;
  • Filters
     : Request filtering rules ;
  • Timeline
    :  Request response time .

These contents will be gradually used to .

Here's the most important step ,
To configure  fiddler, So that it can grab  HTTPS  request

Select... From the start menu  
 , Then click on the image below  
Decrypt HTTPS Traffic
, You can install a certificate .

If this step is set up , Still can't get  
  request , You can refer to the following  2  This blog solves the corresponding problems .


By sharing hotspots  fiddler  Yes, cell phones  APP  Grab the bag

This step requires the computer to turn on  Wifi, And realize shared network , If you're using a desktop , It may not have a wireless network card , You can't share the network ( Install the external network card ), The laptop does not have this problem . Then the mobile link is shared  Wifi, The next in  
  The following is configured in .

After determining , Need to be in  
  Find the local... Shown in the figure below  
, For example, as shown in the figure below  
  The address is
, This address is very important , Combined with the above ports  
, Later, you need to visit this address to download a certificate to the mobile terminal .

The mobile link should  Wifi, And set the following proxy .

Not yet , You also need to download the certificate on the mobile terminal , Open through the default browser on the mobile terminal
( Make sure  
  It's packet capture status ), If the mobile terminal cannot be opened , restart  

After the open , As shown in the figure below , Click the red line in the figure below to download the certificate .

The certificate downloaded from the browser cannot be installed if you click , Go to the mobile phone settings to find the following content , The paths of each mobile phone are not consistent , The basic reference route is  
Set up -> General settings -> Security and privacy -> more -> Encryption and credentials
, Then click on the... Below  SD  Card installation , You should find the certificate just downloaded in the root directory , Click Install .

When installing the certificate , What needs to be set , As shown below .

At this point, most of the work has been completed , If your   Android system is  7.0  following , Then that's the end , But if your version is better than  7.0  high , Then you need to continue to set some configurations . If you're not sure if you can , You can open the... On your mobile phone at random  APP, And then again  
  See if you can unlock  
  request .

During environment construction , You can restart from time to time  fiddler  Make sure the configuration works .

Use  VirtualXposed+JustTrustMe  To bypass  SSL  verification , Realize the requirements of packet capture

stay  github  Download the above two tools .

  • VirtualXposed
  • JustTrustMe

Unable to download , It can be directly in the  
download , When I installed it, I found that  
  The latest version does not support  32  position  APP  了 , If grab is needed  32  Bit  APP, Need to install  
  package  .

After the file is transferred to the mobile phone , All installed , Then open the  
 , Click the button below and select Add Application , Module management is enabled  
, Then select restart .

Follow up  
  Open the software just loaded , For example, this case opens  “ Pipi shrimp  APP”, Turn on  
, The following request was successfully captured , This place is the final interface .

Copy interface address , Open in local browser , obtain
Pipi shrimp
Video comment data .…… The rest is hidden

After getting the interface , The rest of the logical processing is simpler , The steps are as follows :

  • Analyze interface parameters
  • Write code to collect

After simplifying the parameters, the following interface format is obtained :

Crawler encoding time

APP  The hardest part for a crawler is getting an interface , After getting the interface and analyzing it , If there are no encryption parameters , Then use any library or framework mentioned in the previous blog , Can complete the preparation of crawler code .

Closing time

== Start with this blog , We will officially enter the mobile phone APP The reptile part , This part will probably be written 10 About blogs ==

Today is the first day of continuous writing  <font color=red>262</font> / 200  God . Sure <font color=#04a9f4> Focus on </font> I ,<font color=#04a9f4> give the thumbs-up </font> I 、<font color=#04a9f4> Comment on </font> I 、<font color=#04a9f4> Collection </font> It's me .

copyright notice
author[InfoQ],Please bring the original link to reprint, thank you.

Random recommended