current position:Home>Pandas handles duplicate values
Pandas handles duplicate values
2022-01-30 19:03:51 【Dream, killer】
Little knowledge , Great challenge ! This article is participating in 「 A programmer must have a little knowledge 」 Creative activities
Example data :
import pandas as pd
df = pd.DataFrame({'a':['Python', 'Python', 'Java', 'Java', 'C'], 'b': [2, 2, 6, 8, 10]})
df
Copy code
Only judge whether there are duplicate values in a single column
- Use
values_counts()
Count the number of occurrences of the value in the column . The results are arranged in descending order by default , You only need to judge whether the number of occurrences of the first line value is 1 You can determine whether there are duplicate values .
df['a'].value_counts()
Copy code
- Use
drop_duplicates()
Delete duplicate values , Keep only the first occurrence , Judge whether the processed value is consistent with the original valuedf
equal , IfFalse
It means that there are duplicate values .
df.equals(df.drop_duplicates(subset=['a'], keep='first'))
False
Copy code
Determine whether all columns have duplicate rows Also use drop_duplicates()
Delete duplicate values , Keep only the first occurrence , Not applicable at this time subset
Parameter setting column , The default is all columns , Judge whether the processed value is consistent with the original value df
equal , If False
It means that there are duplicate values .
df.equals(df.drop_duplicates(keep='first'))
False
Copy code
Count the number of duplicate lines
len(df) - len(df.drop_duplicates(keep="first"))
Copy code
Show duplicate data rows Delete the duplicate lines first , Keep only the first occurrence of , Get a row unique data set , Reuse drop_duplicates()
Delete the df
There are all duplicate data in , This time, the duplicate value for the first time is not retained , Merge the above two result sets , Use drop_duplicates()
De duplicate the newly generated data set , You can get the data of duplicate lines .
df.drop_duplicates(keep="first").append(df.drop_duplicates(keep=False)).drop_duplicates(keep=False)
Copy code
For beginners
Python
Or want to get startedPython
Little buddy , You can search through wechat 【Python New horizons
】, Exchange and study together , They all come from novices , Sometimes a simple question card takes a long time , But maybe someone else's advice will suddenly realize , I sincerely hope you can make progress together .
copyright notice
author[Dream, killer],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201301903484431.html
The sidebar is recommended
- Exploratory data analysis (EDA) in Python using SQL and Seaborn (SNS).
- Turn audio into shareable video with Python and ffmpeg
- Using rbind in python (equivalent to R)
- Pandas: how to create an empty data frame with column names
- Talk about quantifying investment using Python
- Python, image restoration in opencv - CV2 inpaint
- Python notes (14): advanced technologies such as object-oriented programming
- Python notes (13): operations such as object-oriented programming
- Python notes (12): inheritance such as object-oriented programming
- Chapter 2: Fundamentals of python-5 Boolean
guess what you like
-
Python notes (11): encapsulation such as object-oriented programming
-
Python notes (10): concepts such as object-oriented programming
-
Gradient lifting method and its implementation in Python
-
Van * Python | simple crawling of a site course
-
Chapter 1 preliminary knowledge of pandas (list derivation and conditional assignment, anonymous function and map method, zip object and enumerate method, NP basis)
-
Nanny tutorial! Build VIM into an IDE (Python)
-
Fourier transform of Python OpenCV image processing, lesson 52
-
Introduction to python (III) network request and analysis
-
China Merchants Bank credit card number recognition project (Part I), python OpenCV image processing journey, Part 53
-
Introduction to python (IV) dynamic web page analysis and capture
Random recommended
- Python practice - capture 58 rental information and store it in MySQL database
- leetcode 119. Pascal's Triangle II(python)
- leetcode 31. Next Permutation(python)
- [algorithm learning] 807 Maintain the city skyline (Java / C / C + + / Python / go / trust)
- The rich woman's best friend asked me to write her a Taobao double 11 rush purchase script in Python, which can only be arranged
- Glom module of Python data analysis module (1)
- Python crawler actual combat, requests module, python realizes the full set of skin to capture the glory of the king
- Summarize some common mistakes of novices in Python development
- Python libraries you may not know
- [Python crawler] detailed explanation of selenium from introduction to actual combat [2]
- This is what you should do to quickly create a list in Python
- On the 55th day of the journey, python opencv perspective transformation front knowledge contour coordinate points
- Python OpenCV image area contour mark, which can be used to frame various small notes
- How to set up an asgi Django application with Postgres, nginx and uvicorn on Ubuntu 20.04
- Initial Python tuple
- Introduction to Python urllib module
- Advanced Python Basics: from functions to advanced magic methods
- Python Foundation: data structure summary
- Python Basics: from variables to exception handling
- Python notes (22): time module and calendar module
- Python notes (20): built in high-order functions
- Python notes (17): closure
- Python notes (18): decorator
- Python notes (16): generators and iterators
- Python notes (XV): List derivation
- Python tells you what timing attacks are
- Python -- file and exception
- [Python from introduction to mastery] (IV) what are the built-in data types of Python? Figure out
- Python code to scan code to pay attention to official account login
- [algorithm learning] 1221 Split balanced string (Java / C / C + + / Python / go / trust)
- Python notes (22): errors and exceptions
- Python has been hidden for ten years, and once image recognition is heard all over the world
- Python notes (21): random number module
- Python notes (19): anonymous functions
- Use Python and OpenCV to calculate and draw two-dimensional histogram
- Python, Hough circle transformation in opencv
- A library for reading and writing markdown in Python: mdutils
- Datetime of Python time operation (Part I)
- The most useful decorator in the python standard library
- Python iterators and generators