current position:Home>Advanced practical case: Javascript confusion of Python anti crawling

Advanced practical case: Javascript confusion of Python anti crawling

2021-08-22 23:23:00 Python Programming

Write it at the front

I saw it on my love cracking forum a long time ago 【 Homo erectus 】Web End crawler attack and defense competition , When I entered their official website , The game is over . It's interesting to look at those topics , But for various reasons, I have never had the opportunity to do those problems . I've been relatively idle recently , I went to open the ape man learning official website and had a look , Trying to finish the first question ---JS confusion [ The source code is garbled ]

"""
 Of course, I'm learning Python It's going to be difficult , There is no good learning material , How to learn ? 
 Study Python If you don't understand, it is recommended to join the communication group number :928946953 
 There are like-minded partners in the group , Help each other ,  There are good video tutorials and PDF!
 And Daniel's answer !
"""

 

 

subject

 


I'll brush the title

Grab all (5 page ) The price of the air ticket , And calculate the average of all ticket prices , Fill in the answer .


The title looks very simple , Is to grab the ticket price , And calculate the average , But when we open the developer debugging tool , You'll encounter the first pit

Close the breakpoint

When we open the debugging tool , A page will pop up

 


This is a debug The breakpoint , Will block our subsequent operations , But it doesn't matter , It's easy to turn it off

 

View data sources

next , We click 【Network】, Press ctrl+f5 Refresh web page , There will be blocking operations again , We still click the arrow symbol just now , You can solve it

 

 

 

next , We go back to 【Network】, Click on 【XHR】, You'll find one more message
as for , Why click 【XHR】 Instead of clicking somewhere else ; It's also very simple , You check this page a little , You will find , The price of the air ticket is through XHR Request to get

Some friends may have to ask , What is? XHR request , You can read this article XHR request


Come back to , Let's take a look at the extra message

 


View this message Header, Discover its URL It is interesting to , Especially where blue lines are drawn , We don 't care here , Let's take a look at Preview

That's even more interesting , As we expected , The price of air tickets is included in this

 

 

analysis URL

Links and data have been found , Let's go to the link just now ( After a while ), Of course, I made a mistake

{"error": "token failed"}

JavaScript

Copy

Let's take a look at this link

http://match.yuanrenxue.com/api/match/1?m=f289e3140053a9320c137b67e8723ba3%E4%B8%A81608971657

Students who often write about reptiles will find , The following number string 【1608971657】, It must be a timestamp
in other words , To access this link normally, get the data , Be sure to have a normal timestamp
that , What is a normal timestamp , It must be related to the previous string (m=
f289e3140053a9320c137b67e8723ba3%E4%B8%A8)

We continue to refresh the page , Find out 【m=...】 The strings inside are changing over time , only [%E4%B8%A8] It hasn't changed

[%E4%B8%A8], I think you should be familiar with this as long as you have used Baidu search , It's through UrlEncode Deal with it , We just need to decode to know [%E4%B8%A8] What's it like to arrive
We are here , Webmaster Tools , Decode it

 


It's easy , We got its value ------> 丨 ( Yes, it's such a Chinese symbol )

Look for the symbol

According to the tips of the topic [js confusion The source code is garbled ], We can think of a very clear idea , Is to find the source code   丨
Back to the page , Right click , Click on [ View page source code ]
Press crtl+F , Search   Symbol , Will find the only code
Take a closer look at the code , Found it written in script In the label
We will take the whole script Copy the label , To notepad++ Open it inside

<script>window.url='/api/match/1';request=function(){var timestamp=Date.parse(new Date()) + 100000000;var m=oo0O0(timestamp.toString())+window.f;var list={"page":window.page,"m":m+' 丨 '+timestamp/1000};$.ajax({url:window.url,dataType:"json",async:false,data:list,type:"GET",beforeSend:function(request){},success:function(data){data=data.data;let html='';let us_sign=`<div class="b-airfly"><div class="e-airfly"data-reactid=".1.3.3.2.0.$KN5911.0"><div class="col-trip"data-reactid=".1.3.3.2.0.$KN5911.0.0"><div class="s-trip"data-reactid=".1.3.3.2.0.$KN5911.0.0.0"><div class="col-airline"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.0"><div class="d-air"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.0.0:$0"><div class="air"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.0.0:$0.0"><span data-reactid=".1.3.3.2.0.$KN5911.0.0.0.0.0:$0.0.1"> United Airlines of China </span></div><div class="num"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.0.0:$0.1"><span class="n"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.0.0:$0.1.0">KN5911</span><span class="n"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.0.0:$0.1.1"> Boeing 737( in )</span><noscript data-reactid=".1.3.3.2.0.$KN5911.0.0.0.0.0:$0.1.2"></noscript></div></div><noscript data-reactid=".1.3.3.2.0.$KN5911.0.0.0.0.1"></noscript></div><div class="col-time"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1"><div class="sep-lf"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.0"><h2 data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.0.0">13:50</h2><p class="airport"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.0.1"><span data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.0.1.0"> Daxing International Airport </span><span data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.0.1.1"></span></p></div><div class="sep-ct"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.1"><div class="range"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.1.0">3 Hours 40 minute </div><div class="line"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.1.1"></div></div><div class="sep-rt"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.2"><noscript data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.2.0"></noscript><h2 data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.2.1">17:30</h2><p class="airport"data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.2.2"><span data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.2.2.0"> Bao'an Airport </span></p></div><noscript data-reactid=".1.3.3.2.0.$KN5911.0.0.0.1.3"></noscript></div></div></div><div class="col-price"data-reactid=".1.3.3.2.0.$KN5911.0.1"><p class="prc"data-reactid=".1.3.3.2.0.$KN5911.0.1.0"><span data-reactid=".1.3.3.2.0.$KN5911.0.1.0.0"><i class="rmb"data-reactid=".1.3.3.2.0.$KN5911.0.1.0.0.0">¥</i><span class="fix_price"data-reactid=".1.3.3.2.0.$KN5911.0.1.0.0.1"><span class="prc_wp"style="width:48px">price_sole</span></span></span></p><div class="vim"data-reactid=".1.3.3.2.0.$KN5911.0.1.1"><span class="v dis"data-reactid=".1.3.3.2.0.$KN5911.0.1.1.$0"></span></div></div><div class="col-fold"data-reactid=".1.3.3.2.0.$KN5911.0.2"><p class="fd"data-reactid=".1.3.3.2.0.$KN5911.0.2.0"> Retract </p></div></div><noscript data-reactid=".1.3.3.2.0.$KN5911.1"></noscript></div>`;let choice=[' China Southern Airlines ',' Lucky air ',' Okay air ',' Nine yuan air ',' Long Dragon Airlines ',' Eastern airlines ',' Air China ',' Shenzhen Airlines ',' Hainan Airlines ',' Spring airlines ',' Shanghai Airlines ',' Western Airlines ',' Chongqing Airlines ',' Tibet Airlines ',' United Airlines of China ',' Yunnan Xiangpeng airlines ',' Yunnan ying'an airlines ',' Xiamen Airlines ',' Tianjin Airlines ',' Shandong Airlines ',' Sichuan Airlines ',' China airlines ',' Great Wall Airlines ',' Chengdu Airlines has ',' Beijing Capital Airlines ',' Air China ',' Italian National Airlines ',' India Baijie Airlines ',' Vietnam Air ',' Far East Airlines ',' Air India ',' Jet Air India Ltd ',' Air Israel ',' Air Italia ',' Iran airlines ',' Eagle Air Indonesia ',' British Airways ',' Western Sky Airlines ',' Seagate ',' Spanish European Airlines ',' Spanish Airlines ',' China Southern Airlines ',' Lucky air ',' Okay air ',' Nine yuan air ',' Long Dragon Airlines ',' Eastern airlines ',' Air China ',' Shenzhen Airlines ',' Hainan Airlines ',' Spring airlines ',' Shanghai Airlines ',' Western Airlines ',' Chongqing Airlines ',' Tibet Airlines ',' United Airlines of China ',' Yunnan Xiangpeng airlines ',' Yunnan ying'an airlines ',' Xiamen Airlines ',' Tianjin Airlines ',' Shandong Airlines ',' Sichuan Airlines ',' China airlines ',' Great Wall Airlines ',' Chengdu Airlines has ',' Beijing Capital Airlines ',' Air China ',' Italian National Airlines ',' India Baijie Airlines ',' Vietnam Air ',' Far East Airlines ',' Air India ',' Jet Air India Ltd ',' Air Israel ',' Air Italia ',' Iran airlines ',' Eagle Air Indonesia ',' British Airways ',' Western Sky Airlines ',' Seagate ',' Spanish European Airlines ',' Spanish Airlines '];let op=1; let jic=[' Beijing Capital International Airport ',' Shanghai Hongqiao International Airport ',' Shanghai Pudong International Airport ',' Tianjin Binhai International Airport ',' Taiyuan Wusu Airport ',' Hohhot Baita Airport ',' Shenyang Taoxian International Airport ',' Dalian Zhoushuizi International Airport ',' Changchun Dafangshen Airport ',' Harbin Yanjiagang International Airport ',' Qiqihar Sanjiazi Airport ',' Jiamusi east suburb Airport ',' Xiamen Gaoqi International Airport ',' Fuzhou Changle International Airport ',' Hangzhou Xiaoshan International Airport ',' Hefei Luogang Airport ',' Ningbo Lishe airport ',' Nanjing Lukou International Airport ',' Guangzhou Baiyun International Airport ',' Shenzhen Bao'an International Airport ',' Changsha Huanghua Airport ',' Haikou Meiya Airport ',' Wuhan Tianhe Airport ',' Jinan Yaoqiang airport ',' Qingdao Liuting Airport ',' Nanning Wuxu Airport ',' Sanya Phoenix International Airport ',' Chongqing Jiangbei International Airport ',' Chengdu Shuangliu International Airport ',' Kunming Wujiaba International Airport ',' Kunming Changshui International Airport ',' Guilin Liangjiang International Airport ',' Xian Xianyang International Airport ',' Lanzhou Zhongchuan airport ',' Guiyang Longdongbao airport ',' Lhasa Gongga Airport ',' Urumqi diwobao airport ',' Nanchang Xiangtang Airport ',' Zhengzhou Xinzheng airport ',' Beijing Capital International Airport ',' Shanghai Hongqiao International Airport ',' Shanghai Pudong International Airport ',' Tianjin Binhai International Airport ',' Taiyuan Wusu Airport ',' Hohhot Baita Airport ',' Shenyang Taoxian International Airport ',' Dalian Zhoushuizi International Airport ',' Changchun Dafangshen Airport ',' Harbin Yanjiagang International Airport ',' Qiqihar Sanjiazi Airport ',' Jiamusi east suburb Airport ',' Xiamen Gaoqi International Airport ',' Fuzhou Changle International Airport ',' Hangzhou Xiaoshan International Airport ',' Hefei Luogang Airport ',' Ningbo Lishe airport ',' Nanjing Lukou International Airport ',' Guangzhou Baiyun International Airport ',' Shenzhen Bao'an International Airport ',' Changsha Huanghua Airport ',' Haikou Meiya Airport ',' Wuhan Tianhe Airport ',' Jinan Yaoqiang airport ',' Qingdao Liuting Airport ',' Nanning Wuxu Airport ',' Sanya Phoenix International Airport ',' Chongqing Jiangbei International Airport ',' Chengdu Shuangliu International Airport ',' Kunming Wujiaba International Airport ',' Kunming Changshui International Airport ',' Guilin Liangjiang International Airport ',' Xian Xianyang International Airport ',' Lanzhou Zhongchuan airport ',' Guiyang Longdongbao airport ',' Lhasa Gongga Airport ',' Urumqi diwobao airport ',' Nanchang Xiangtang Airport ',' Zhengzhou Xinzheng airport '];if(window.page){}else{window.page=1}$.each(data,function(index,val){html+=us_sign.replace('price_sole',val.value).replace(' United Airlines of China ',choice[op*window.page]).replace(' Daxing International ',jic[parseInt(op*window.page/2)+1]).replace(' Bao'an Airport ',jic[jic.length-parseInt(op*window.page/2)-1]);op+=1});$('.m-airfly-lst').text('').append(html)},complete:function(){},error:function(){alert(' Data pull failed . It may have triggered the risk control system , If you are visiting normally , Please use Google browser traceless mode , And calibrate the system time of the computer and try again ');alert(' Born as a worm , I apologize , Please refresh the page , See if the problem exists ');$('.page-message').eq(0).addClass('active');$('.page-message').removeClass('active')}})};request()</script>

JavaScript

Copy

Paste the notepad++, You will find that all the code is condensed into one line
Don't panic , We can use notepad++ Built-in plug-ins JSTool Inside JSFormat, Can be JS Code formatting for , The results are as follows
( without JSTool plug-in unit , Can be in notepad++ Install it inside )

 

ajax The contents can be ignored , We remove the extra code , You can get the simplified code

<script> request = function () { var timestamp = Date.parse(new Date()) + 100000000; var m = oo0O0(timestamp.toString()) + window.f; var list = { "page": window.page, "m": m + ' 丨 ' + timestamp / 1000 }; }; request() </script>

JavaScript

Copy

var list What's in it , You can leave it alone , It mainly contains page numbers , and m value , So we can simplify again

var timestamp = Date.parse(new Date()) + 100000000; var m = oo0O0(timestamp.toString()) + window.f;

JavaScript

Copy

The variables here m, Is that familiar , Let's take a look at the previous link

http://match.yuanrenxue.com/api/match/1?m=f289e3140053a9320c137b67e8723ba3%E4%B8%A81608971657

Make a bold guess ,m The latter value must be implemented through the code here
So let's see var m Later

  • oo0O0(), It must be a function
  • timestamp.toString(), Students with a little basic programming should be able to guess , This is to convert a string of digits of the timestamp into a string type
  • window.f, The window object calls f, as for f What is it? , We don't care

Obviously , We have a new breakthrough ,oo0O0() function

analysis oo0O0() function

Let's go back to the page containing the web source code , Search for [oo0O0]
You'll find two codes , One must be the code that defines this function , The other is the code that calls the function
We'll copy the code that defines this function

oo0O0() function

 

Don't worry about studying the code inside , First look at the return value of the function , It's an empty string


that , That means [ var m = window.f ]

var m = oo0O0(timestamp.toString()) + window.f; var m = window.f

JavaScript

Copy

all ,m Just follow me window.f It's about , that window.f What is it? , Where to find it ?

seek window.f

How to find [window.f], The first thing we can think of , Is to search in the source code , But the result is not so ideal , There is only one code , That's the code just now

Not found in the source code , But it uses [window.f], The assignment must be defined indirectly , So where is it defined ?

Let's take another look at this line of code

var m = oo0O0(timestamp.toString()) + window.f;

JavaScript

Copy

oo0O0 The function executes , But it returns a null value ; and oo0O0() There is a lot of code in it , Developers won't be so idle
It doesn't rule out , Developers write code that doesn't mislead us , But there must be something in it

Often do reverse students , Must be familiar with eval() function , And this function also appears in oo0O0() Inside
JS Medium eval() Function to evaluate a string , And execute the JavaScript Code
Let's take another look ,oo0O0() Function eval() part

eval(atob(window['b'])[J('0x0', ']dQW')](J('0x1', 'GTu!'), '\x27' + mw + '\x27'));

JavaScript

Copy

and eval() It also calls a function atob()
atob() Method is used to decode using base-64 Encoded string
We will 【atob(window['b'])】 Copy down , In the debugging tool Console Run it , You can get the following code

 


Let's copy this code , stay notepad++ Format it inside , You can get md5 Encryption algorithm , Let me show you some of the code here

 

 

You can see it clearly ,【window.f】 There is , By calling hex_md5() To implement the assignment ; But there is a new problem ,【mwqqppz】 What is it? ?

seek mwqqppz

Let's take a look at the just mentioned eval code snippet

eval(atob(window['b'])[J('0x0', ']dQW')](J('0x1', 'GTu!'), '\x27' + mw + '\x27'));

JavaScript

Copy

You can see eval Except for atob(window['b'], also

  • J('0x0', ']dQW')
  • J('0x1', 'GTu!')
  • 'x27' + mw + 'x27'

Let's go to the debugging tool Console Run the code listed above

 


The tool gives an error message ,[ Can't find J]
This is because oo0O0() There are some code snippets in the function that have not been executed
Let's put the oo0O0() In the function eval Copy all the code above the function , Then go to Console I'm gonna run it in

eval The code above the function


After operation , Some content may be output , We can ignore , Then run... Again J('0x0', ']dQW'), You can get the result

 


next , We run in turn J('0x1', 'GTu!'),'x27' + mw + 'x27', You can get the following results

 

conversant mwqqppz Did it appear

however , The tool reported another error ,[ mw Can't find ]
This is because [ mw ] yes oo0O0() A formal parameter of a function
Let's review the previous code , You know the

var timestamp = Date.parse(new Date()) + 100000000; var m = oo0O0(timestamp.toString()) + window.f; function oo0O0(mw) { ... }

JavaScript

Copy

So everyone can understand ,[ mw ] Is a string timestamp passed in

that ,[ 'x27' ] What is it? , All we need to do is Console Just run it
'x27' = ' ( Yes, it's single quotation marks )

translate eval() function

know eval() After the meaning of the strange symbol in the function , We can translate it eval() code snippet

eval(atob(window['b'])[J('0x0', ']dQW')](J('0x1', 'GTu!'), '\x27' + mw + '\x27')); eval(atob(window['b'])["replace"]("mwqqppz",'mw'));

JavaScript

Copy

Maybe some friends still don't understand this line of code , I use Python Grammatical rules , Write this line of code , You must understand

eval(atob(window['b']).replace("mwqqppz",'mw'));

JavaScript

Copy

Actually , Well understood. , Will be 【atob(window['b'])】 Inside 【mwqqppz】 Replace with 【mw】
That is to say , take 【mwqqppz】 Replace with the timestamp of the string

understand window.f Where does the value of come from

Read to here little friend , I believe you have generally understood window.f Where does the value of come from

var timestamp = Date.parse(new Date()) + 100000000; var m = oo0O0(timestamp.toString()) + window.f; var m = window.f; var m = hex_md5(mwqqppz); var m = hex_md5(timestamp);

JavaScript

Copy

Through the code above , The logic is clear ( Face to face, you have to see this from the beginning )
Now let's go through ghosts JS The debugging tool tests , Is our analysis right

Verify answer

First , We will 【atob(window['b'])】 In debugging tools Console The code run in , Copy to ghost debugging tool
Write a function to verify the answer

function get_cipher(){ timestamp = '1608971657000'; f = hex_md5(timestamp); return f; }

JavaScript

Copy

The time stamp in here , From the link at the beginning of the article

http://match.yuanrenxue.com/api/match/1?m=f289e3140053a9320c137b67e8723ba3%E4%B8%A81608971657

Run code , You can see as like as two peas of encrypted timestamps


Now? , We can happily write Python Code crawling web data !

 

The final reptile reconciled the answer

Through the above operation , We can write a JS file , Used to generate the ciphertext behind the link
stay Python Third party libraries can be used in the code execjs, Execute this JS file , Get the ciphertext

The code is inside


Not much to say , Directly on the crawler code , It's simple , Wrote some notes , I won't go into detail

# @BY :Java_S # @Time :2020/12/25 9:10 # @Slogan : Firm enough and hard enough, someone will knock on the door , Don't be afraid that no one will appreciate it, just like 30-year-old Van Gogh  import requests import execjs import time def get_md5_value(): #  Import JS, Read the required js file  with open(r'JS/jsConfuse.js',encoding='utf-8',mode='r') as f: JsData = f.read() #  load js file , Use call() Function execution , Pass in a function that needs to be executed to get the return value  psd = execjs.compile(JsData).call('get_cipher') psd = psd.replace(' 丨 ','%E4%B8%A8') return psd def get_data(page_num,md5): url = f'http://match.yuanrenxue.com/api/match/1?page={page_num}&m={md5}' headers = { 'Host':'match.yuanrenxue.com', 'Referer':'http://match.yuanrenxue.com/match/1', 'User-Agent':'yuanrenxue.project', } response = requests.get(url,headers=headers) return response.json() if __name__ == '__main__': sum_num = 0 index_num = 0 for page_num in range(1,6): info = get_data(page_num,get_md5_value()) price_list = [i['value'] for i in info['data']] print(f' The first {page_num} Price list on page {price_list}') sum_num += sum(price_list) index_num += len(price_list) time.sleep(1) average_price = sum_num / index_num print(f' The average value of the ticket price :{average_price}')

Python

Copy

copyright notice
author[Python Programming],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2021/08/20210822232251009b.html

Random recommended