current position:Home>Introduction to pandas operation

Introduction to pandas operation

2022-02-02 02:23:32 Xiao Wang is not serious

Pandas Introduction to operation

Indexes

establish & increase

Method 1 :

import pandas as pd
df=pd.read_excel('text.xlsx',index_col='name')
print(df)
 Copy code 

image-20211206144157713

Method 2 :

import pandas as pd
df=pd.read_excel('text.xlsx')
df=df.set_index('name')
print(df)
 Copy code 

image-20211206144350301

Multi level index

import pandas as pd
df=pd.read_excel('text.xlsx')
df=df.set_index(['name','team'])
print(df)
 Copy code 

image-20211206144605689

Do not delete the set index column 、

import pandas as pd
df=pd.read_excel('text.xlsx')
df=df.set_index('name',drop=False)
print(df)
 Copy code 

image-20211206144859531

Add index ( Keep the original index )

import pandas as pd
df=pd.read_excel('text.xlsx')
df=df.set_index('name',append=True)
print(df)
 Copy code 

image-20211206145008963

Delete ( Restore )

All level indexes are deleted by default

df=pd.read_excel('text.xlsx')
df=df.set_index('name')
print(df)
df=df.reset_index()
#  Specify index columns 
# df=df.reset_index(level=0)
# df=df.reset_index(level='name')
print(df)

 Copy code 

name From index to column

image-20211206150439580

Be careful :

If when setting the index Set not to delete the set index column An error will be reported during operation

Prompt data already exists

import pandas as pd
df=pd.read_excel('text.xlsx')
df=df.set_index('name',drop=False)
print(df)
df=df.reset_index()
print(df)
 Copy code 

image-20211206150639513

attribute

import pandas as pd
df=pd.read_excel('text.xlsx')
#  name 
print(df.index.name)
# array Array 
print(df.index.array)
#  data type 
print(df.index.dtype)
#  Element quantity 
print(df.index.size)
# array Array 
print(df.index.values)
 Copy code 

image-20211206152407631

Common operations

df.index.astype('int64') #  Conversion type 
df.index.isin() #  Check for presence 
df.index.rename('number') #  Modify index name 
df.index.rename(['name', 'team']) #  Multi-storey , Rename index 
df.index.nunique() #  The number of non repeating values 
df.index.sort_values(ascending=False,) #  Sort , In reverse order 
df.index.map(lambda x:x+'_') # map Function processing 
df.index.str.replace('_', '') # str Replace 
df.index.str.split('_') #  Separate 
df.index.to_list() #  Turn to list 
df.index.to_frame(index=False, name='a') #  Turn into DataFrame
df.index.to_series() #  To series
df.index.to_numpy() #  To numpy
df.index.unique() #  duplicate removal 
df.index.value_counts() #  Weight removal and counting 
df.index.where(df.index=='adf') #  Screening 
df.index.max() #  Maximum 
df.index.argmax() #  Maximum index value 
df.index.min() #  Maximum 
df.index.argmin() #  Maximum index value 
df.index.T #  Transposition 
 Copy code 

rename

import pandas as pd
df=pd.read_excel('text.xlsx')
df=df.set_index('name')
print(df)
df=df.rename_axis('index')
print(df)
 Copy code 

image-20211206163254157

Modify multi-level index name

import pandas as pd
df=pd.read_excel('text.xlsx')
df=df.set_index(['name','team'])
print(df)
df=df.rename_axis(['index1','index2'])
print(df)
 Copy code 

image-20211206163357986

Modify index content

Change column names

import pandas as pd
df=pd.read_excel('text.xlsx')
df=df.set_index('name')
print(df)
#  One to one modification 
df=df.rename(columns={'team':'0'})
print(df)
#  Modify all 
df=df.set_axis(['0','1','2','3','4'],axis=1)
print(df)
 Copy code 

image-20211206165001627

Modify the index

import pandas as pd
df=pd.read_excel('text.xlsx')
df=df.set_index('name')
print(df)
#  One to one modification 
df=df.rename(index={'Liver':'1'})
print(df)
#  Modify all 
df=df.set_axis(list(range(0,100)),axis='index')
print(df)
 Copy code 

image-20211206165505490

data

Style view

import pandas as pd
df=pd.read_excel('text.xlsx')
#  Check the previous data   Default 5 strip 
print(df.head())
#  Check the following data   Default 5 strip 
print(df.tail())
#  Random view of data   Default 1 strip 
print(df.sample())
 Copy code 

image-20211206170758801

Specify the number

import pandas as pd
df=pd.read_excel('text.xlsx')
print(df.head(2))
print(df.tail(2))
print(df.sample(2))
 Copy code 

image-20211206170854642

attribute

import pandas as pd
df=pd.read_excel('text.xlsx')
#  dimension 
print(df.shape)
#  Information 
print(df.info)
#  data type 
print(df.dtypes)
 Copy code 

image-20211206171308252

Data statistics or processing

Statistical table

total 、 The average 、 Standard deviation 、 minimum value 、 Four percentile 、 Maximum

import pandas as pd
df=pd.read_excel('text.xlsx')
print(df.describe())
 Copy code 

image-20211206173054392

function

df.mean() #  Returns the mean of all columns 
df.corr() #  Returns the correlation coefficient between columns 
df.count() #  Returns the number of non null values in each column 
df.max() #  Returns the maximum value of each column 
df.min() #  Returns the minimum value of each column 
df.abs() #  The absolute value 
df.median() #  Returns the median of each column 
df.std() #  Returns the standard deviation of each column , Bessel corrected sample standard deviation 
df.var() #  No bias 
df.sem() #  The standard error of the mean 
df.mode() #  The number of 
df.prod() #  multiply continuously 
df.mad() #  Mean absolute deviation 
df.cumprod() #  Cumulative ride , Multiplicative multiplication 
df.cumsum(axis=0) #  Add up , Add up 
df.nunique() #  De weight quantity , Quantities of different values 
df.idxmax() #  Index name of the maximum value per column 
df.idxmin() #  The index name of the minimum value of each column 
df.cummax() #  Cumulative maximum 
df.cummin() #  Cumulative minimum 
df.skew() #  sample skewness ( The third stage )
df.kurt() #  Sample Kurtosis ( Fourth order )
df.quantile() #  Sample quantiles ( Different  %  Value )
 Copy code 

Specify a single column

import pandas as pd
df=pd.read_excel('text.xlsx')
print(df)
print(df['Q1'].mean())
 Copy code 

image-20211206173623806

Specify a single line

Because the first two data are str type So use slicing

import pandas as pd
df=pd.read_excel('text.xlsx')
print(df.loc[0])
print(df.loc[0][2:].mean())
 Copy code 

image-20211206174313423

df.round(2) #  Specify the field to retain decimal places 
df.nunique()#  Number of de duplication values per column 
s.nunique() #  The de duplication value of this column 
 Copy code 

Difference value

import pandas as pd
df=pd.Series([2,12,6,5,10])
#  The difference between the current number and the previous number 
print(df.diff())
#  The difference between the current number and the next number 
print(df.diff(-2))
 Copy code 

image-20211206181949328

DataFrame

import pandas as pd
df=pd.read_excel('text.xlsx')
print(df.loc[:5,'Q1':'Q4'])
print(df.loc[:5,'Q1':'Q4'].diff())
print(df.loc[:5,'Q1':'Q4'].diff(1,axis=1))
 Copy code 

image-20211206182541675

Position shifting

import pandas as pd
df=pd.read_excel('text.xlsx')
#  Move down the 2 That's ok 
print(df.shift(2))
#  Move up 2 That's ok 
print(df.shift(-2))
#  Moving to the left 
print(df.shift(2,axis=1))
#  Move right 
print(df.shift(-2,axis=1))
 Copy code 

image-20211206183746169

image-20211206183759205

ranking rank()

import pandas as pd
df=pd.read_excel('text.xlsx')
print(df.head())
print(df.head().rank())
print(df.head().rank(axis=1))
 Copy code 

image-20211206184009366

Data selection

operation grammar
Select column df[x]
Select rows by index df.loc[x]
Select rows by numeric index df.iloc[x]
Use slice to select rows df[0:x]
Filter rows with expressions df[x>=0]

Select column

import pandas as pd
df=pd.read_excel('text.xlsx')
print(df['name'])
 Copy code 

image-20211206184540897

Slice select row

import pandas as pd
df=pd.read_excel('text.xlsx')
print(df[:4])
print(df[5:10])
print(df[0::2])
 Copy code 

image-20211206184740386

Take line by label loc

import pandas as pd
df=pd.read_excel('text.xlsx',index_col='name')
print(df.loc['Arry'])
 Copy code 

image-20211206184922338

Using slice Choose more than one

import pandas as pd
df=pd.read_excel('text.xlsx',index_col='name')
print(df.loc['Arry':'Oah'])
print(df.loc[['Arry','Oah']])
 Copy code 

image-20211206185131675

Set read column

import pandas as pd
df=pd.read_excel('text.xlsx',index_col='name')
print(df.loc['Arry':'Oah',['Q1','Q2']])
 Copy code 

image-20211206185304313

Get rows by numeric index iloc

import pandas as pd
df=pd.read_excel('text.xlsx',index_col='name')
print(df.iloc[0:3])
print(df.iloc[0:10:2])
#  Set the fetched column 
print(df.iloc[0:5,[0,1]])
 Copy code 

image-20211206185604759

Take the specific value

at[] iat[] The previous parameter is the index The latter parameter is the column name iat Get by numeric index

import pandas as pd
df=pd.read_excel('text.xlsx',index_col='name')
print(df)
print(df.at['Arry','Q1'])
print(df.iat[1,1])
 Copy code 

image-20211206190011751

get Get a column

import pandas as pd
df=pd.read_excel('text.xlsx',index_col='name')
print(df.get('team',0))
 Copy code 

image-20211206190408150

Intercept data

Note that only numeric indexes

import pandas as pd
df=pd.read_excel('text.xlsx')
print(df.truncate(before=2,after=6))
 Copy code 

image-20211206190635536

copyright notice
author[Xiao Wang is not serious],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202020223314244.html

Random recommended