current position:Home>[Python crawler Sao operation] you can crawl Sirius cinema movies without paying

[Python crawler Sao operation] you can crawl Sirius cinema movies without paying

2022-01-29 19:23:08 White and white I

Little knowledge , Great challenge ! This article is participating in “   A programmer must have a little knowledge   ” Creative activities

This article has participated in  「 Digging force Star Program 」 , Win a creative gift bag , Challenge creation incentive fund

Possible problems in multithreaded development

Suppose two threads t1 and t2 It's all about global variables g_num( The default is 0) add 1 operation ,t1 and t2 All right g_num Add 10 Time ,g_num The final result should be 20.

But because it is multi-threaded operation at the same time , It is possible that :

stay g_num=0 when ,t1 obtain g_num=0. At this time, the system turns t1 The schedule is ”sleeping” state , hold t2 Convert to ”running” state ,t2 Also get g_num=0
then t2 Add... To the value 1 And give it to g_num, bring g_num=1
And then the system put t2 The schedule is ”sleeping”, hold t1 To ”running”. Threads t1 And put what it got before 0 Add 1 Assigned to g_num.
This leads to although t1 and t2 All the g_num Add 1, But the result is still g_num=1

test 1

import threading
import time
​
g_num = 0
​
def work1(num):
    global g_num
    for i in range(num):
        g_num += 1
    print("----in work1, g_num is %d---"%g_num)
​
​
def work2(num):
    global g_num
    for i in range(num):
        g_num += 1
    print("----in work2, g_num is %d---"%g_num)
​
​
print("--- Before thread creation g_num is %d---"%g_num)
​
t1 = threading.Thread(target=work1, args=(100,))
t1.start()
​
t2 = threading.Thread(target=work2, args=(100,))
t2.start()
​
while len(threading.enumerate()) != 1:
    time.sleep(1)
​
print("2 The end result of two threads operating on the same global variable is :%s" % g_num)


 Copy code 

Running results :

 
--- Before thread creation g_num is 0---
----in work1, g_num is 100---
----in work2, g_num is 200---
2 The end result of two threads operating on the same global variable is :200


 Copy code 

To enter the body

The target site :tlvod.com/v-57381.htm… ( Fast and furious 9)

notes : The article has supporting video tutorials , Focus on your private self

Tool use

development tool :pycharm
development environment :python3.7, Windows10
Using third party libraries :requests

123.png

Dynamic capture after playing Look at the data Take a close look at I found that the videos are ts Composed of documents Fragment files
234.png

notice ts When All at once, I feel like I've realized This is the legendary m3u8 Video format

below Refresh the page look for m3u8 Closing document

345.png

There will be little friends who have questions How to make sure he is ? Simple Copy this Go visit

456.png

visit When New download Mission But there is no suffix Remember to save Keep up .ts
567.png

Play after downloading What's the problem It's just a small clip

Let's use regular expressions Extract them ( notes : Worry about the zero basis of reading the article Regular expressions , White and white, simple Be ugly A good understanding It's really not. You can go to the regular expression official website and learn it first .)

import requests  #  Crawler third party package 
import re  #  Regular expressions 
from tqdm import tqdm #  This is the scroll bar 


def Tools(url):
    #  Prevent website anti crawling 
    headers = {
        #  Proxy browser   visit 
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36 Edg/93.0.961.38'
    }
    response = requests.get(url, headers=headers)
    return response


def save(url, name):
    '''
     Storage ts Fragment file 
    :param url: ts Address 
    :param name: ts name 
    :return:
    '''
    response = Tools(url).content  # 16 Hexadecimal data 
    f = open('./video/{}.ts'.format(name), 'ab') # a  files were added   b Binary file read and write 
    f.write(response)
    f.close()


url = 'https://c1.monidai.com/20210907/SOPKxzCy/index.m3u8'
response = Tools(url).text
response = re.sub(r'#EXTM3U', '', response)  #  Replace 
response = re.sub(r'#EXT-X-VERSION:\d*', '', response)  #  Replace   \d  Integers 
response = re.sub(r'#EXT-X-TARGETDURATION:\d*', '', response)  #  Replace 
response = re.sub(r'#EXT-X-MEDIA-SEQUENCE:\d*', '', response)  #  Replace 
response = re.sub(r'#EXTINF:\d.\d*,', '', response)  #  Replace 
response = re.sub(r'#EXT-X-ENDLIST', '', response)
response = re.sub(r'#EXTINF:\d\d,', '', response)
response = re.sub(r'#EXTINF:\d,', '', response)
ts_url = response.split()
for link in tqdm(ts_url):
    name = link.split('/')[-1]  #  obtain ts File name 
    save(link, name) #  Storage ts Function of fragment file 

 Copy code 

After downloading the video Need to merge these ts file
678.png

Finally completed But this code can improve Guys can try Try multithreading .

I am white and white i, A program Yuan who likes to share knowledge ️
Interested can pay attention to my official account : White and white Python【 Thank you very much for your praise 、 Collection 、 Focus on 、 Comment on , One key three links support 】

copyright notice
author[White and white I],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201291923050377.html

Random recommended