current position：Home>You can easily get started with Excel. Python data analysis package pandas (XI): segment matching
You can easily get started with Excel. Python data analysis package pandas (XI): segment matching
20210822 22:36:13 【Excel catalyst】
> I often listen to others Python How powerful in the data field , As a result, I studied for a long time , Even data processing is a death of trouble . Only later , It's not Python Data processing is powerful , But he has a data analysis artifact —— pandas
Preface
There is already an article in this series that introduces pandas To realize Excel Of vlookup How to function , however vlookup One more " Fuzzy matching " The function of , It is mainly used for segment matching , Come and have a look today pandas How to achieve the same effect in .
Case study 1
today , You have received an urgent and temporary demand , The data table is as follows ：
  200 Million lines of records
  Need to follow fraction Column , Calculate according to the rules The rating
The rule table is as follows ：
  This is specially for Vlookup And the rule table designed
> If pressed pandas To design the rule table , that Vlookup The solution will be troublesome .
What do I do ？ There's so much data Excel I don't open it , Whatever you are Vlookup still Xlookup It's no use .
Multifunctional Vlookup
Because there are too many data this time , use Excel This file can no longer be opened , So I'll use a small amount of data to briefly demonstrate how to use Vlookup Solve this problem ：
  It's actually the last parameter input 1(True), that will do
  It's important , Rule table value Column , Remember to put it in order , Otherwise, the result will be disordered, and you don't know
pandas Segment matching in
This requirement is commonly referred to as... In data processing " Separate boxes ",pandas Use in cut Method to achieve ：
  We from csv Reading data , from Excel Read the rule table
  Notice this is pandas The toplevel method , So it is pd.cut()
  The first 1 The parameter is passed into the judgment data column
  The first 2 Parameters are passed into the rule table value , however cut The method must give the boundaries of all intervals . Like the rule table in this example , No higher than 120 The end boundary of the score , We need to add a large value as the end boundary
  Parameters right, Set to False , Just to be with vlookup Just the same effect , Express ： " The boundary of the right section is open ", such as ： 120 branch , Be divided into A+ The rating
  Parameters labels, Is the result returned
> You can see pandas You can easily read data from any data source , In this case, even if your data source is in various databases
> Be careful ,bins When there is no ascending sort , Will report a mistake . It's a very good design
It's hard to read , Look at this diagram , It should be much clearer ：
Case study 2： Automatic partition
In practical analysis , You may not know at first how to define the nodes of the rule table .
For example, you now want to divide 3 Segments , But you don't know how to define the boundary between segments . Then you can call cut Method ：
  cut Methods the first 2 Parameters , We specify 3, To indicate a division 3 paragraph
  Do not specify parameters labels, You can see the divided interval .
You can also specify labels：
The final division results shall be balanced in the number of each interval as much as possible , Look at the number of intervals ：
summary
  pd.cut() , Process the data in boxes
  Parameters bins You can specify your own rule table , You can also directly specify the number of segments
  When specifying the number of segments , Each partition interval will be defined automatically
  When the specified bins When the rule table is not sorted in ascending order , Will report a mistake
This article is from WeChat official account.  Excel catalyst （ExcelCuiHuaJi）
The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the [email protected] Delete .
Original publication time ： 20190906
Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .
copyright notice
author[Excel catalyst],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2021/08/20210822223609215v.html
The sidebar is recommended
 [Python introduction project] use Python to generate QR code
 Quickly build Django blog based on function calculation
 Python collects and monitors system data  psutil
 Python interface test unittest usage details
 Implementation of toplevel design pattern in Python
 You can easily get started with Excel. Python data analysis package pandas (VII): breakdown
 Python simulation random coin toss (non optimized version)
 Using linear systems in python with scipy.linalg
 Using linear systems in python with scipy.linalg
 Together with Python to do a license plate automatic recognition system, fun and practical!
guess what you like

Using linear systems in python with scipy.linalg

Fast power modulus Python implementation of large numbers

Quickly build Django blog based on function calculation

This paper clarifies the chaotic switching operation and elegant derivation of Python

You can easily get started with Excel pandas (I): filtering function

You can easily get started with Excel. Python data analysis package pandas (II): advanced filtering (I)

You can easily get started with Excel. Python data analysis package pandas (2): advanced filtering (2)

How does Python correctly call jar package encryption to get the encrypted value?

Python 3 interview question: give an array. If there is 0 in the array, add a 0 after 0, and the overall array length remains the same

Python simple Snake game (single player mode)
Random recommended
 Using linear systems in python with scipy.linalg
 Python executes functions and even code through strings! Come and understand the operation of such a top!
 Decoding the verification code of Taobao slider with Python + selenium, the road of information security
 [Python introduction project] use Python to generate QR code
 Vanessa basks in her photos and gets caught up in the golden python. There are highlights in the accompanying text. She can't forget Kobe after all
 [windows] Python installation pyteseract
 [introduction to Python project] create bar chart animation in Python
 Python series tutorials 116
 Python code reading (chapter 35): fully (deeply) expand nested lists
 Practical series 1 ️⃣ Wechat applet automatic testing practice (with Python source code)
 Python Basics: do you know how to use lists?
 Solution of no Python 3.9 installation was detected when uninstalling Python
 [Python homework] coupling network information dissemination
 [common links of Python & Python]
 [Python development tool tkinterdiesigner]: example: develop stock monitoring alarm using Tkinter desinger
 [Python development tool Tkinter designer]: Lecture 1: introduction to the basic functions of Tkinter Designer
 [introduction to Python tutorial] use Python 3 to teach you how to extract any HTML main content
 Python socket implements UDP server and client
 Python socket implements TCP server and client
 leetcode 1974. Minimum Time to Type Word Using Special Typewriter（python）
 The mobile phone uses Python to operate picture files
 [learning notes] Python exception handling try except...
 Two methods of using pandas to read poorly structured excel. You're welcome to take them away
 Python sum (): the summation method of Python
 Practical experience sharing: use pyo3 to build your Python module
 Using Python to realize multitasking process