current position:Home>These 20 pandas functions can improve your 'data cleaning' ability by 100 times

These 20 pandas functions can improve your 'data cleaning' ability by 100 times

2022-02-02 07:40:58 jiejie

Hello everyone , I am a Jiejie

Today I'm going to introduce an article Super liver goods !**Pandas ** Is based on NumPy A tool of , The tool was created to solve the task of data analysis . It provides a large number of functions and methods that enable us to process data quickly and easily . This article introduces 20 individual 【 Divided into 15 Group 】 function , Is absolutely Data processing killer , You'll love it when you use it .

image.png

Construct data set

Let's start here Construct a dataset , Used to demonstrate this 20 A function .

import pandas as pd
df ={' full name ':['  Schoolmate Huang ',' Huang Zhizun ',' Huanglaoxie  ',' Chen Dami ',' Sun shangxiang '],
     ' English name ':['Huang tong_xue','huang zhi_zun','Huang Lao_xie','Chen Da_mei','sun shang_xiang'],
     ' Gender ':[' male ','women','men',' Woman ',' male '],
     ' Id card ':['463895200003128433','429475199912122345','420934199110102311','431085200005230122','420953199509082345'],
     ' height ':['mid:175_good','low:165_bad','low:159_bad','high:180_verygood','low:172_bad'],
     ' Home address ':[' Guangshui, Hubei Province ',' Xinyang, Henan ',' Guangxi Guilin ',' Xiaogan, Hubei ',' Guangdong guangzhou '],
     ' Phone number ':['13434813546','19748672895','16728613064','14561586431','19384683910'],
     ' income ':['1.1 ten thousand ','8.5 thousand ','0.9 ten thousand ','6.5 thousand ','2.0 ten thousand ']}
df = pd.DataFrame(df)
df
 Copy code 

design sketch :

image.png

1. cat function

This function is mainly used for String splicing ;

df[" full name "].str.cat(df[" Home address "],sep='-'*3)
 Copy code 

design sketch :

image.png

2. contains function

This function is mainly used for Determine whether a string contains a given character ;

df[" Home address "].str.contains(" wide ")</pre>
 Copy code 

design sketch :

image.png

3. startswith、endswith function

This function is mainly used for Determine whether a string is represented by ... start / ending ;

#  The first line “  Huang Wei ” It starts with a space 
df[" full name "].str.startswith(" yellow ") 
df[" English name "].str.endswith("e")</pre>
 Copy code 

design sketch :

image.png

4. count function

This function is mainly used for Calculates the number of occurrences of a given character in a string ;

df[" Phone number "].str.count("3")</pre>
 Copy code 

design sketch :

image.png

5. get function

This function is mainly used for Gets the string at the specified location ;

df[" full name "].str.get(-1)
df[" height "].str.split(":")
df[" height "].str.split(":").str.get(0)</pre>
 Copy code 

design sketch :

image.png

6. len function

This function is mainly used for Calculate string length ;

df[" Gender "].str.len()</pre>
 Copy code 

design sketch :

image.png

7. upper、lower function

This function is mainly used for English case conversion ;

df[" English name "].str.upper()
df[" English name "].str.lower()</pre>
 Copy code 

design sketch :

image.png

8. pad+side Parameters /center function

This function is mainly used for To the left of the string 、 Add the given character to the right or left ;

df[" Home address "].str.pad(10,fillchar="*")      #  amount to ljust()
df[" Home address "].str.pad(10,side="right",fillchar="*")    #  amount to rjust()
df[" Home address "].str.center(10,fillchar="*")</pre>
 Copy code 

design sketch :

image.png

9.  repeat function

This function is mainly used for Repeat string several times ;

df[" Gender "].str.repeat(3)</pre>
 Copy code 

design sketch :

image.png

10.  slice_replace function

This function is mainly used for Use the given string , Replace the character at the specified position ;

df[" Phone number "].str.slice_replace(4,8,"*"*4)</pre>
 Copy code 

design sketch :

image.png

11. replace function

This function is mainly used for The character at the specified position , Replace with the given string ;

df[" height "].str.replace(":","-")</pre>
 Copy code 

design sketch :

image.png

This function also Accept regular expressions , The character at the specified position , Replace with the given string .

df[" income "].str.replace("\d+\.\d+"," Regular ")</pre>
 Copy code 

design sketch :

image.png

12.  split Method +expand Parameters

This function is mainly used for Expand a column into several columns ;

#  Common usage 
df[" height "].str.split(":")
# split Method , collocation expand Parameters 
df[[" Height description ","final height "]] = df[" height "].str.split(":",expand=True)
df
# split Method collocation join Method 
df[" height "].str.split(":").str.join("?"*5)</pre>
 Copy code 

design sketch :

image.png

13. strip、rstrip、lstrip function

This function is mainly used for Remove blanks 、 A newline ;

df[" full name "].str.len()
df[" full name "] = df[" full name "].str.strip()
df[" full name "].str.len()</pre>
 Copy code 

design sketch :

image.png

14. findall function

This function is mainly used for Using regular expressions , To match... In a string , Returns a list of search results ;

df[" height "]
df[" height "].str.findall("[a-zA-Z]+")</pre>
 Copy code 

design sketch :

image.png

15. extract、extractall function

This function is mainly used for Accept regular expressions , Extract the matching string ( Be sure to put parentheses );

df[" height "].str.extract("([a-zA-Z]+)")
# extractall Extract the composite index 
df[" height "].str.extractall("([a-zA-Z]+)")
# extract collocation expand Parameters 
df[" height "].str.extract("([a-zA-Z]+).*?([a-zA-Z]+)",expand=True)</pre>
 Copy code 

design sketch :

image.png

If you think this article , If it's of some use to you , Don't forget the third company , Because this will be the strongest driving force for me to continue to output more quality articles !

copyright notice
author[jiejie],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202020740556199.html

Random recommended