current position:Home>Python reptile test ox knife (I)
Python reptile test ox knife (I)
2022-01-30 00:48:49 【It man who is not making two mistakes】
stay github A lightweight crawler framework was found on requests-html
One . website
- Full support for parsing JavaScript!
- CSS Selectors (jQuery style , thank PyQuery).
- XPath Selectors , for the faint at heart.
- Customize user-agent ( It's like a real web browser ).
- Automatically track redirects .
- Connection pool and cookie Persistence .
- A delightful request experience , Magic parsing page .
Um. It feels powerful , Have a try , 315 The party reported 360 Medical related fake advertising , Think about climbing some medical related data
www.fudanmed.com/institute/n… Lock the target first , Let's climb the data of the top 100 hospitals in the country to excel In practice
Two . F12 Analyze web page element interface
The entire page table The parcel , The name of the hospital is wrapped in a In the label
The simplest and crudest idea is to crawl all a The data in the tag , And then the loop extracts href Text in , Go straight to the code
# coding=UTF-8
from requests_html import HTMLSession
import xlwt
# Web link site
session = HTMLSession()
r = session.get('http://www.fudanmed.com/institute/news2019-2.aspx')
# Initialize a Excel
xl = xlwt.Workbook(encoding='utf-8')
sheet = xl.add_sheet(' National Hospital ranking ')
sheet.write(0, 0, ' ranking ')
sheet.write(0, 1, ' Hospital name ')
# Initialize ranking
i = 0
# Crawl data
def findHospitalName():
trs = r.html.find("a")
for item in trs:
# Get a Labeled href Text in attribute
text = item.find('a', first=True).attrs['href']
filterData(text)
# Data cleaning
def filterData(text):
# Filtered text link parameters
if "#" in text:
array = text.split("#", 1)
# Filter out empty
if len(array[1]):
global i
i += 1
writeData(i, array[1])
# Write data
def writeData(sort, data):
print(sort)
print(data)
sheet.write(sort, 0, sort)
sheet.write(sort, 1, data)
xl.save('/Users/lsr/Documents/GJProject/py/' + " National Hospital ranking " + ".xls")
# Start
findHospitalName()
Copy code
Don't look at the code , In fact, there are only two core codes
trs = r.html.find("a") # Get all a Tag data , return Element object Array
text = item.find('a', first=True).attrs['href'] # obtain a Labeled herf attribute
Copy code
copyright notice
author[It man who is not making two mistakes],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201300048473834.html
The sidebar is recommended
- Install tensorflow and python 3.6 in Windows 7
- Python collects and monitors system data -- psutil
- Getting started with Python - object oriented - special methods
- Teach you how to use Python to transform an alien invasion game
- You can easily get started with Excel. Python data analysis package pandas (VI): sorting
- Implementation of top-level design pattern in Python
- Using linear systems in python with scipy.linalg
- How to get started quickly? How to learn Python
- Modifying Python environment with Mac OS security
- Better use atom to support jupyter based Python development
guess what you like
-
Better use atom to support jupyter based Python development
-
Fast power modulus Python implementation of large numbers
-
Python architects recommend the book "Python programmer's Guide" which must be read by self-study Python architects. You are welcome to take it away
-
Decoding the verification code of Taobao slider with Python + selenium, the road of information security
-
Python game development, pyGame module, python implementation of skiing games
-
Python collects and monitors system data -- psutil
-
Python + selenium automated test: page object mode
-
You can easily get started with Excel. Python data analysis package pandas (IV): any grouping score bar
-
Opencv skills | saving pictures in common formats as transparent background pictures (with Python source code) - teach you to easily make logo
-
Python ThreadPoolExecutor restrictions_ work_ Queue size
Random recommended
- Python generates and deploys verification codes with one click (Django)
- With "Python" advanced, you can catch all the advanced syntax! Advanced function + file operation, do not look at regret Series ~
- At the beginning of "Python", you must see the series. 10000 words are only for you. It is recommended to like the collection ~
- [Python kaggle] pandas basic exercises in machine learning series (6)
- Using linear systems in python with scipy.linalg
- The founder of pandas teaches you how to use Python for data analysis (mind mapping)
- Using Python to realize national second-hand housing data capture + map display
- Python image processing, automatic generation of GIF dynamic pictures
- Pandas advanced tutorial: time processing
- How to make Python run faster? Six tips!
- Django: use of elastic search search system
- Python 3.10 official release
- Python chat room (Tkinter writing interface, streaming, socket to realize private chat, group chat, check chat records, Mysql to store data)
- This pandas exercise must be successfully won
- [algorithm learning] sword finger offer 64 Find 1 + 2 +... + n (Java / C / C + + / Python / go / trust)
- leetcode 58. Length of Last Word(python)
- Problems encountered in writing the HTML content of articles into the database during the development of Django blog
- Understand Python's built-in function and add a print function yourself
- Python implements JS encryption algorithm in thousands of music websites
- leetcode 35. Search Insert Position(python)
- leetcode 1829. Maximum XOR for Each Query(python)
- [introduction to Python visualization]: 12 small examples of complete data visualization, taking you to play with visualization ~
- Learning this Python library can reduce at least 100 lines of code
- leetcode 67. Add Binary(python)
- Regular re parameter replacement of Python 3 interface automation test framework
- V. pandas based on Python
- Only 15 lines of code is needed for face detection! (using Python and openCV)
- [Python crawler Sao operation] you can crawl Sirius cinema movies without paying
- leetcode 69. Sqrt(x)(python)
- Teach you to read the source code of Cpython (I)
- Snowball learning started in the fourth quarter of Python. One needs three meals. I have a new understanding of Python functional programming, process-oriented, object-oriented and functional
- leetcode 88. Merge Sorted Array(python)
- Don't you know more about a python library before the end of 2021?
- Python crawler web page parsing artifact XPath quick start teaching!!!
- Use Python and OpenCV to watermark the image
- String and related methods of Python data type introduction
- Heapq module of Python module
- Introduction to beautiful soup of Python crawler weapon, detailed explanation, actual combat summary!!!
- Event loop of Python collaboration series
- Django docking pin login system