current position:Home>You can easily get started with Excel. Python data analysis package pandas (V): duplicate value processing
You can easily get started with Excel. Python data analysis package pandas (V): duplicate value processing
2021-08-23 04:59:48 【Excel catalyst】
> I often listen to others Python How powerful in the data field , As a result, I studied for a long time , Even data processing is a death of trouble . Only later , It's not Python Data processing is powerful , But he has a data analysis artifact —— pandas
Preface
Sometimes duplicate values appear in the data , It may cause errors in the final statistical results , therefore , Finding and removing duplicate values is a common operation in data processing . Today we'll look at it pandas How to realize .
Excel Handle duplicate values
Excel It directly provides the function of removing duplication , Therefore, simple operation can be realized . as follows :
- - Function card " data "," Data tools " There is " Delete duplicates " Button
- - Then you can choose which columns to use as repeated judgment
> besides ,Excel Conditional formatting can also be used in 、 Advanced filtering or function formula to achieve similar functions
pandas Tag duplicate value
pandas Also provides a simple way to mark duplicate values , And ratio Excel There are more flexible ways for you to choose , Let's see :
- - DataFrame.duplicated() , Generate a boolean flag whether it is a duplicate record . By default, all data in the whole row is used as the judgment basis
- - The results are clear , The last line is the repeating line , So the value of the last row of the tag column is True
We can specify , When there are duplicate values , Where to keep the row . as follows :
- - By default ,duplicated() Of keep Parameter is "first", You mean for " Keep the first "
- - Now let's keep Set to "last", So keep the last one , So now the first of the repeated lines is marked True
besides , We can also put keep Parameter set to False, intend " No reservation ", as follows :
- - Now, where there are duplicate lines , Are marked True
Through parameters subset Which columns can be specified as the basis for judgment :
image Excel Remove duplicates as well
In fact, after marking the duplicate value , Only simple filtering is needed to get non duplicate records . however pandas There is a direct method to remove duplication . as follows :
- - call DataFrame.drop_duplicates() , You can remove duplicate
- - His parameters and rules are the same as duplicated As like as two peas . It's actually a duplicated() Marked as True Just remove the line
Last
- - DataFrame.duplicated() , Mark duplicates . Use subset Specify duplicate value judgment column ,keep={'first','last',False} Specify how to determine which are duplicates
- - DataFrame.drop_duplicates() , Remove duplicates
Next section , We'll look at the implementation of the sorting function . Stay tuned .
** If you want to learn from scratch pandas , Then you can look at my pandas special column .**
This article is from WeChat official account. - Excel catalyst (ExcelCuiHuaJi)
The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the [email protected] Delete .
Original publication time : 2019-08-21
Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .
copyright notice
author[Excel catalyst],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2021/08/20210823045945687N.html
The sidebar is recommended
- [Python introduction project] use Python to generate QR code
- Compile D + +, and use d to call C from python
- Quickly build Django blog based on function calculation
- Python collects and monitors system data -- psutil
- Quickly build Django blog based on function calculation
- Python interface test unittest usage details
- Implementation of top-level design pattern in Python
- You can easily get started with Excel. Python data analysis package pandas (VII): breakdown
- Python simulation random coin toss (non optimized version)
- Python tiktok 5000+ V, and found that everyone love this video.
guess what you like
-
Using linear systems in python with scipy.linalg
-
Using linear systems in python with scipy.linalg
-
Together with Python to do a license plate automatic recognition system, fun and practical!
-
You can easily get started with Excel. Python data analysis package pandas (XI): segment matching
-
Advanced practical case: Javascript confusion of Python anti crawling
-
Using linear systems in python with scipy.linalg
-
Fast power modulus Python implementation of large numbers
-
Quickly build Django blog based on function calculation
-
This paper clarifies the chaotic switching operation and elegant derivation of Python
-
You can easily get started with Excel pandas (I): filtering function
Random recommended
- You can easily get started with Excel. Python data analysis package pandas (II): advanced filtering (I)
- You can easily get started with Excel. Python data analysis package pandas (2): advanced filtering (2)
- You can easily get started with Excel. Python data analysis package pandas (3): making score bar
- Test Development: self study Dubbo + Python experience summary and sharing
- How does Python correctly call jar package encryption to get the encrypted value?
- Python 3 interview question: give an array. If there is 0 in the array, add a 0 after 0, and the overall array length remains the same
- Python simple Snake game (single player mode)
- Using linear systems in python with scipy.linalg
- Python executes functions and even code through strings! Come and understand the operation of such a top!
- Decoding the verification code of Taobao slider with Python + selenium, the road of information security
- [Python introduction project] use Python to generate QR code
- Vanessa basks in her photos and gets caught up in the golden python. There are highlights in the accompanying text. She can't forget Kobe after all
- [windows] Python installation pyteseract
- [introduction to Python project] create bar chart animation in Python
- Fundamentals of Python I
- Python series tutorials 116
- Python code reading (chapter 35): fully (deeply) expand nested lists
- Practical series 1 ️⃣ Wechat applet automatic testing practice (with Python source code)
- Python Basics: do you know how to use lists?
- Solution of no Python 3.9 installation was detected when uninstalling Python
- [Python homework] coupling network information dissemination
- [common links of Python & Python]
- [Python development tool tkinterdiesigner]: example: develop stock monitoring alarm using Tkinter desinger
- [Python development tool Tkinter designer]: Lecture 2: introduction to Tkinter designer's example project
- [Python development tool Tkinter designer]: Lecture 1: introduction to the basic functions of Tkinter Designer
- [introduction to Python tutorial] use Python 3 to teach you how to extract any HTML main content
- Python socket implements UDP server and client
- Python socket implements TCP server and client
- [algorithm learning] 1486 Array XOR operation (Java / C / C + + / Python / go / trust)
- leetcode 1974. Minimum Time to Type Word Using Special Typewriter(python)
- The mobile phone uses Python to operate picture files
- [learning notes] Python exception handling try except...
- Two methods of using pandas to read poorly structured excel. You're welcome to take them away
- Python sum (): the summation method of Python
- Practical experience sharing: use pyo3 to build your Python module
- Using Python to realize multitasking process