current position：Home>Python reptile test ox knife (I)
Python reptile test ox knife (I)
2022-01-30 00:48:49 【It man who is not making two mistakes】
stay github A lightweight crawler framework was found on requests-html
One . website
- CSS Selectors (jQuery style , thank PyQuery).
- XPath Selectors , for the faint at heart.
- Customize user-agent ( It's like a real web browser ).
- Automatically track redirects .
- Connection pool and cookie Persistence .
- A delightful request experience , Magic parsing page .
Um. It feels powerful , Have a try , 315 The party reported 360 Medical related fake advertising , Think about climbing some medical related data
www.fudanmed.com/institute/n… Lock the target first , Let's climb the data of the top 100 hospitals in the country to excel In practice
Two . F12 Analyze web page element interface
The entire page table The parcel , The name of the hospital is wrapped in a In the label
The simplest and crudest idea is to crawl all a The data in the tag , And then the loop extracts href Text in , Go straight to the code
# coding=UTF-8 from requests_html import HTMLSession import xlwt # Web link site session = HTMLSession() r = session.get('http://www.fudanmed.com/institute/news2019-2.aspx') # Initialize a Excel xl = xlwt.Workbook(encoding='utf-8') sheet = xl.add_sheet(' National Hospital ranking ') sheet.write(0, 0, ' ranking ') sheet.write(0, 1, ' Hospital name ') # Initialize ranking i = 0 # Crawl data def findHospitalName(): trs = r.html.find("a") for item in trs: # Get a Labeled href Text in attribute text = item.find('a', first=True).attrs['href'] filterData(text) # Data cleaning def filterData(text): # Filtered text link parameters if "#" in text: array = text.split("#", 1) # Filter out empty if len(array): global i i += 1 writeData(i, array) # Write data def writeData(sort, data): print(sort) print(data) sheet.write(sort, 0, sort) sheet.write(sort, 1, data) xl.save('/Users/lsr/Documents/GJProject/py/' + " National Hospital ranking " + ".xls") # Start findHospitalName() Copy code
Don't look at the code , In fact, there are only two core codes
trs = r.html.find("a") # Get all a Tag data , return Element object Array text = item.find('a', first=True).attrs['href'] # obtain a Labeled herf attribute Copy code
author[It man who is not making two mistakes],Please bring the original link to reprint, thank you.
The sidebar is recommended
- Install tensorflow and python 3.6 in Windows 7
- Python collects and monitors system data -- psutil
- Getting started with Python - object oriented - special methods
- Teach you how to use Python to transform an alien invasion game
- You can easily get started with Excel. Python data analysis package pandas (VI): sorting
- Implementation of top-level design pattern in Python
- Using linear systems in python with scipy.linalg
- How to get started quickly? How to learn Python
- Modifying Python environment with Mac OS security
- Better use atom to support jupyter based Python development
guess what you like
Better use atom to support jupyter based Python development
Fast power modulus Python implementation of large numbers
Python architects recommend the book "Python programmer's Guide" which must be read by self-study Python architects. You are welcome to take it away
Decoding the verification code of Taobao slider with Python + selenium, the road of information security
Python game development, pyGame module, python implementation of skiing games
Python collects and monitors system data -- psutil
Python + selenium automated test: page object mode
You can easily get started with Excel. Python data analysis package pandas (IV): any grouping score bar
Opencv skills | saving pictures in common formats as transparent background pictures (with Python source code) - teach you to easily make logo
Python ThreadPoolExecutor restrictions_ work_ Queue size
- Python generates and deploys verification codes with one click (Django)
- With "Python" advanced, you can catch all the advanced syntax! Advanced function + file operation, do not look at regret Series ~
- At the beginning of "Python", you must see the series. 10000 words are only for you. It is recommended to like the collection ~
- [Python kaggle] pandas basic exercises in machine learning series (6)
- Using linear systems in python with scipy.linalg
- The founder of pandas teaches you how to use Python for data analysis (mind mapping)
- Using Python to realize national second-hand housing data capture + map display
- Python image processing, automatic generation of GIF dynamic pictures
- Pandas advanced tutorial: time processing
- How to make Python run faster? Six tips!
- Django: use of elastic search search system
- Python 3.10 official release
- Python chat room (Tkinter writing interface, streaming, socket to realize private chat, group chat, check chat records, Mysql to store data)
- This pandas exercise must be successfully won
- [algorithm learning] sword finger offer 64 Find 1 + 2 +... + n (Java / C / C + + / Python / go / trust)
- leetcode 58. Length of Last Word（python）
- Problems encountered in writing the HTML content of articles into the database during the development of Django blog
- Understand Python's built-in function and add a print function yourself
- Python implements JS encryption algorithm in thousands of music websites
- leetcode 35. Search Insert Position（python）
- leetcode 1829. Maximum XOR for Each Query（python）
- [introduction to Python visualization]: 12 small examples of complete data visualization, taking you to play with visualization ~
- Learning this Python library can reduce at least 100 lines of code
- leetcode 67. Add Binary（python）
- Regular re parameter replacement of Python 3 interface automation test framework
- V. pandas based on Python
- Only 15 lines of code is needed for face detection! (using Python and openCV)
- [Python crawler Sao operation] you can crawl Sirius cinema movies without paying
- leetcode 69. Sqrt(x)（python）
- Teach you to read the source code of Cpython (I)
- Snowball learning started in the fourth quarter of Python. One needs three meals. I have a new understanding of Python functional programming, process-oriented, object-oriented and functional
- leetcode 88. Merge Sorted Array（python）
- Don't you know more about a python library before the end of 2021?
- Python crawler web page parsing artifact XPath quick start teaching!!!
- Use Python and OpenCV to watermark the image
- String and related methods of Python data type introduction
- Heapq module of Python module
- Introduction to beautiful soup of Python crawler weapon, detailed explanation, actual combat summary!!!
- Event loop of Python collaboration series
- Django docking pin login system