current position:Home>[Python crawler Sao operation] you can crawl Sirius cinema movies without paying
[Python crawler Sao operation] you can crawl Sirius cinema movies without paying
2022-01-29 19:23:08 【White and white I】
Little knowledge , Great challenge ! This article is participating in “ A programmer must have a little knowledge ” Creative activities
This article has participated in 「 Digging force Star Program 」 , Win a creative gift bag , Challenge creation incentive fund
Possible problems in multithreaded development
Suppose two threads t1 and t2 It's all about global variables g_num( The default is 0) add 1 operation ,t1 and t2 All right g_num Add 10 Time ,g_num The final result should be 20.
But because it is multi-threaded operation at the same time , It is possible that :
stay g_num=0 when ,t1 obtain g_num=0. At this time, the system turns t1 The schedule is ”sleeping” state , hold t2 Convert to ”running” state ,t2 Also get g_num=0
then t2 Add... To the value 1 And give it to g_num, bring g_num=1
And then the system put t2 The schedule is ”sleeping”, hold t1 To ”running”. Threads t1 And put what it got before 0 Add 1 Assigned to g_num.
This leads to although t1 and t2 All the g_num Add 1, But the result is still g_num=1
test 1
import threading
import time
g_num = 0
def work1(num):
global g_num
for i in range(num):
g_num += 1
print("----in work1, g_num is %d---"%g_num)
def work2(num):
global g_num
for i in range(num):
g_num += 1
print("----in work2, g_num is %d---"%g_num)
print("--- Before thread creation g_num is %d---"%g_num)
t1 = threading.Thread(target=work1, args=(100,))
t1.start()
t2 = threading.Thread(target=work2, args=(100,))
t2.start()
while len(threading.enumerate()) != 1:
time.sleep(1)
print("2 The end result of two threads operating on the same global variable is :%s" % g_num)
Copy code
Running results :
--- Before thread creation g_num is 0---
----in work1, g_num is 100---
----in work2, g_num is 200---
2 The end result of two threads operating on the same global variable is :200
Copy code
To enter the body
The target site :tlvod.com/v-57381.htm… ( Fast and furious 9)
notes : The article has supporting video tutorials , Focus on your private self
Tool use
development tool :pycharm
development environment :python3.7, Windows10
Using third party libraries :requests
Dynamic capture after playing Look at the data Take a close look at I found that the videos are ts Composed of documents Fragment files
notice ts When All at once, I feel like I've realized This is the legendary m3u8 Video format
below Refresh the page look for m3u8 Closing document
There will be little friends who have questions How to make sure he is ? Simple Copy this Go visit
visit When New download Mission But there is no suffix Remember to save Keep up .ts
Play after downloading What's the problem It's just a small clip
Let's use regular expressions Extract them ( notes : Worry about the zero basis of reading the article Regular expressions , White and white, simple Be ugly A good understanding It's really not. You can go to the regular expression official website and learn it first .)
import requests # Crawler third party package
import re # Regular expressions
from tqdm import tqdm # This is the scroll bar
def Tools(url):
# Prevent website anti crawling
headers = {
# Proxy browser visit
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36 Edg/93.0.961.38'
}
response = requests.get(url, headers=headers)
return response
def save(url, name):
'''
Storage ts Fragment file
:param url: ts Address
:param name: ts name
:return:
'''
response = Tools(url).content # 16 Hexadecimal data
f = open('./video/{}.ts'.format(name), 'ab') # a files were added b Binary file read and write
f.write(response)
f.close()
url = 'https://c1.monidai.com/20210907/SOPKxzCy/index.m3u8'
response = Tools(url).text
response = re.sub(r'#EXTM3U', '', response) # Replace
response = re.sub(r'#EXT-X-VERSION:\d*', '', response) # Replace \d Integers
response = re.sub(r'#EXT-X-TARGETDURATION:\d*', '', response) # Replace
response = re.sub(r'#EXT-X-MEDIA-SEQUENCE:\d*', '', response) # Replace
response = re.sub(r'#EXTINF:\d.\d*,', '', response) # Replace
response = re.sub(r'#EXT-X-ENDLIST', '', response)
response = re.sub(r'#EXTINF:\d\d,', '', response)
response = re.sub(r'#EXTINF:\d,', '', response)
ts_url = response.split()
for link in tqdm(ts_url):
name = link.split('/')[-1] # obtain ts File name
save(link, name) # Storage ts Function of fragment file
Copy code
After downloading the video Need to merge these ts file
Finally completed But this code can improve Guys can try Try multithreading .
I am white and white i, A program Yuan who likes to share knowledge ️
Interested can pay attention to my official account : White and white Python【 Thank you very much for your praise 、 Collection 、 Focus on 、 Comment on , One key three links support 】
copyright notice
author[White and white I],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201291923050377.html
The sidebar is recommended
- Compile D + +, and use d to call C from python
- Install tensorflow and python 3.6 in Windows 7
- Python collects and monitors system data -- psutil
- Python collects and monitors system data -- psutil
- Finally, this Python import guide has been sorted out. Look!
- Quickly build Django blog based on function calculation
- Getting started with Python - object oriented - special methods
- Teach you how to use Python to transform an alien invasion game
- You can easily get started with Excel. Python data analysis package pandas (VI): sorting
- Implementation of top-level design pattern in Python
guess what you like
-
Using linear systems in python with scipy.linalg
-
Python tiktok 5000+ V, and found that everyone love this video.
-
Using linear systems in python with scipy.linalg
-
How to get started quickly? How to learn Python
-
Modifying Python environment with Mac OS security
-
You can easily get started with Excel. Python data analysis package pandas (XI): segment matching
-
Advanced practical case: Javascript confusion of Python anti crawling
-
Better use atom to support jupyter based Python development
-
Better use atom to support jupyter based Python development
-
Fast power modulus Python implementation of large numbers
Random recommended
- Python architects recommend the book "Python programmer's Guide" which must be read by self-study Python architects. You are welcome to take it away
- Decoding the verification code of Taobao slider with Python + selenium, the road of information security
- Python game development, pyGame module, python implementation of skiing games
- This paper clarifies the chaotic switching operation and elegant derivation of Python
- You can easily get started with Excel. Python data analysis package pandas (3): making score bar
- Test Development: self study Dubbo + Python experience summary and sharing
- Python + selenium automated test: page object mode
- You can easily get started with Excel. Python data analysis package pandas (IV): any grouping score bar
- Opencv skills | saving pictures in common formats as transparent background pictures (with Python source code) - teach you to easily make logo
- You can easily get started with Excel. Python data analysis package pandas (V): duplicate value processing
- Python ThreadPoolExecutor restrictions_ work_ Queue size
- Python generates and deploys verification codes with one click (Django)
- With "Python" advanced, you can catch all the advanced syntax! Advanced function + file operation, do not look at regret Series ~
- At the beginning of "Python", you must see the series. 10000 words are only for you. It is recommended to like the collection ~
- [Python kaggle] pandas basic exercises in machine learning series (6)
- Using linear systems in python with scipy.linalg
- The founder of pandas teaches you how to use Python for data analysis (mind mapping)
- Using Python to realize national second-hand housing data capture + map display
- Python image processing, automatic generation of GIF dynamic pictures
- Pandas advanced tutorial: time processing
- How to make Python run faster? Six tips!
- Django: use of elastic search search system
- Fundamentals of Python I
- Python code reading (chapter 35): fully (deeply) expand nested lists
- Python 3.10 official release
- Solution of no Python 3.9 installation was detected when uninstalling Python
- This pandas exercise must be successfully won
- [Python homework] coupling network information dissemination
- Python application software development tool - tkinterdesigner v1.0 5.1 release!
- [Python development tool Tkinter designer]: Lecture 2: introduction to Tkinter designer's example project
- [algorithm learning] sword finger offer 64 Find 1 + 2 +... + n (Java / C / C + + / Python / go / trust)
- leetcode 58. Length of Last Word(python)
- Problems encountered in writing the HTML content of articles into the database during the development of Django blog
- leetcode 1261. Find Elements in a Contaminated Binary Tree(python)
- [algorithm learning] 1486 Array XOR operation (Java / C / C + + / Python / go / trust)
- Understand Python's built-in function and add a print function yourself
- Python implements JS encryption algorithm in thousands of music websites
- leetcode 35. Search Insert Position(python)
- leetcode 1829. Maximum XOR for Each Query(python)
- [introduction to Python visualization]: 12 small examples of complete data visualization, taking you to play with visualization ~