current position:Home>Case 58 of 120 Python crawlers, mobile app crawler, preparation of "arsenal" and test of skin shrimp app

Case 58 of 120 Python crawlers, mobile app crawler, preparation of "arsenal" and test of skin shrimp app

2022-02-02 10:03:50 InfoQ

This blog begins , We will be involved in mobile phones  APP  Collection field , In this part, we mainly analyze the core interface , Write breakthrough code for the interface .

Here I hope to pass through the front  57  Learning from a series of crawler blogs , You can already use different “ methods ”, To collect data .

Caught tools  Fiddler

mobile phone  APP  The biggest difference between web crawler collection and web crawler collection , Just need to grab  APP  Address of the interface , Because we don't have the support of Google browser developer tools , So we need to use  Fiddler  Tools , Carry out the bag .

For any software , Can make it work properly , It's done.  90% The job of .

fiddler  It's a charging software , Official website :
https://www.telerik.com/fiddler
, It is recommended to buy , If you don't want to pay, you can choose another path , The official version has  30  Days of probation , Let's use this version to learn .

When downloading , choice  
Classic
  Version can ,
Download address
.

The installation process is relatively simple , Basically follow the principle of the next step , After installation, the following web page will appear , Be careful not to close here , We will use several configuration documents later .
null
Operation interface , The effect is as follows , It is generally not recommended to sinicize , Because there are not many complex operations , I'm used to using more .

null
Here's a detail to note , When you open  Fiddler  When , It has put  HTTP  Your agent has been modified , So when you open  Fiddler  when , You may not be able to access the website normally , Or access slows down .

Fiddler  The default is to capture packets directly
If you don't want to grab the bag , Can be in  
File->Capture Traffic
  And the lower left button , Shortcut keys are  F12.![Python Reptiles 120 Examples of cases 58, mobile phone APP Reptiles ,“ arsenal ” To prepare and Pipi shrimp APP Test of ](https://img-blog.csdnimg.cn/7730972bcb0548f98f08d0b643cc91d7.png =200x) What you can grab by default is  
HTTP
  Requested site , It will be explained later  
HTTPS
  How to configure . After opening the packet capture request , To access the Internet , You will get the following page , The relevant fields have been marked in the figure below .

null
The content in the above picture , Be sure to make an impression , Convenient for follow-up study . Next, double-click any of the above requests , View the contents of the right window , The following figure shows you what the contents of the right window are .

null
In the process of writing crawler, the most used is  
Inspectors
, It represents viewing the data content of requests and responses .

Other functions are briefly described as follows :

  • Statistics
    :  View about  HTTP  Request performance and data analysis ;
  • AutoResponder
     : Allow to intercept requests for specified rules , You can do Interception Based on strings and regular expressions , After hijacking the specified request , Use local version ;
  • Composer
    :  Custom request sending server , Can be built , You can also drag a past directly ;
  • Filters
     : Request filtering rules ;
  • Timeline
    :  Request response time .

These contents will be gradually used to .

Here's the most important step ,
To configure  fiddler, So that it can grab  HTTPS  request
.

Select... From the start menu  
Tool
->
Options
 -> 
HTTPS
 , Then click on the image below  
Decrypt HTTPS Traffic
, You can install a certificate .

null
If this step is set up , Still can't get  
HTTPS
  request , You can refer to the following  2  This blog solves the corresponding problems .

- https://www.cnblogs.com/wsy0202/p/12404715.html
- https://blog.csdn.net/baidu_28647571/article/details/107554126

By sharing hotspots  fiddler  Yes, cell phones  APP  Grab the bag

This step requires the computer to turn on  Wifi, And realize shared network , If you're using a desktop , It may not have a wireless network card , You can't share the network ( Install the external network card ), The laptop does not have this problem . Then the mobile link is shared  Wifi, The next in  
fiddler
  The following is configured in .

null
After determining , Need to be in  
fiddler
  Find the local... Shown in the figure below  
IP
, For example, as shown in the figure below  
IP
  The address is  
172.24.203.1
, This address is very important , Combined with the above ports  
8888
, Later, you need to visit this address to download a certificate to the mobile terminal .

null
The mobile link should  Wifi, And set the following proxy .

null
Not yet , You also need to download the certificate on the mobile terminal , Open through the default browser on the mobile terminal  
http://172.24.203.1:8888
( Make sure  
fiddler
  It's packet capture status ), If the mobile terminal cannot be opened , restart  
fiddler
.

After the open , As shown in the figure below , Click the red line in the figure below to download the certificate .

null
The certificate downloaded from the browser cannot be installed if you click , Go to the mobile phone settings to find the following content , The paths of each mobile phone are not consistent , The basic reference route is  
Set up -> General settings -> Security and privacy -> more -> Encryption and credentials
, Then click on the... Below  SD  Card installation , You should find the certificate just downloaded in the root directory , Click Install .

null
When installing the certificate , What needs to be set , As shown below .

null
At this point, most of the work has been completed , If your   Android system is  7.0  following , Then that's the end , But if your version is better than  7.0  high , Then you need to continue to set some configurations . If you're not sure if you can , You can open the... On your mobile phone at random  APP, And then again  
fiddler
  See if you can unlock  
HTTPS
  request .

During environment construction , You can restart from time to time  fiddler  Make sure the configuration works .

null
Use  VirtualXposed+JustTrustMe  To bypass  SSL  verification , Realize the requirements of packet capture

stay  github  Download the above two tools .

  • VirtualXposed
    https://github.com/android-hacker/VirtualXposed/
  • JustTrustMe
    https://github.com/Fuzion24/JustTrustMe/

Unable to download , It can be directly in the  
Address
download , When I installed it, I found that  
VirtualXposed
  The latest version does not support  32  position  APP  了 , If grab is needed  32  Bit  APP, Need to install  
VirtualXposed_0.18.2.apk
  package  .

After the file is transferred to the mobile phone , All installed , Then open the  
VirtualXposed
 , Click the button below and select Add Application , Module management is enabled  
JustTrustMe
, Then select restart .

null
Follow up  
VirtualXposed
  Open the software just loaded , For example, this case opens  “ Pipi shrimp  APP”, Turn on  
fiddler
, The following request was successfully captured , This place is the final interface .

null
Copy interface address , Open in local browser , obtain
Pipi shrimp
Video comment data .

https://is-hl.snssdk.com/bds/cell/cell_comment/?offset=10&cell_type=1&api_version=1&cell_id=7023269838151751943…… The rest is hidden

null
After getting the interface , The rest of the logical processing is simpler , The steps are as follows :

  • Analyze interface parameters
  • Write code to collect

After simplifying the parameters, the following interface format is obtained :

https://is-hl.snssdk.com/bds/cell/cell_comment/?offset=10&cell_id=7023269838151751943&aid=1319&app_name=super

Crawler encoding time

APP  The hardest part for a crawler is getting an interface , After getting the interface and analyzing it , If there are no encryption parameters , Then use any library or framework mentioned in the previous blog , Can complete the preparation of crawler code .

Closing time

== Start with this blog , We will officially enter the mobile phone APP The reptile part , This part will probably be written 10 About blogs ==

Today is the first day of continuous writing  <font color=red>262</font> / 200  God . Sure <font color=#04a9f4> Focus on </font> I ,<font color=#04a9f4> give the thumbs-up </font> I 、<font color=#04a9f4> Comment on </font> I 、<font color=#04a9f4> Collection </font> It's me .




copyright notice
author[InfoQ],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202021003482542.html

Random recommended