current position:Home>V. pandas based on Python

V. pandas based on Python

2022-01-29 18:10:59 Favor 316

This article has participated in 「 Digging force Star Program 」, Win a creative gift bag , Challenge creation incentive fund .
Little knowledge , Great challenge ! This article is participating in “ A programmer must have a little knowledge ” Creative activities .

1, pandas brief introduction

Pandas Is based on NumPy A tool of , The tool is created to solve data analysis tasks .Pandas It includes a large number of databases and some standard data models , Provides the tools needed to operate large datasets efficiently .pandas Provides a large number of functions and methods that enable us to process data quickly and conveniently .

2.Series object

  • Pandas Based on two data types : series And dataframe.
  • Series yes Pandas The most basic object in ,Series Like a one-dimensional array . in fact ,Series Basically based on NumPy From the array object of . and NumPy The array of is different ,Series Can customize labels for data , That's the index (index), Then access the data in the array through the index .
  • Dataframe Is a two-dimensional table structure .Pandas Of dataframe Many different data types can be stored , And each axis has its own label . You can think of it as a series Dictionary entry for .

3,Series Perform arithmetic operations

"""
 Yes series All arithmetic operations are based on index On going .
 We can add, subtract, multiply and divide (+- */) Such an operator is used for two series Carry out operations ,
Pandas Will be based on the index  index, Calculate the response data , The results will be stored as floating point numbers , To avoid losing accuracy .
 If Pandas In two series Can't find the same  index, The corresponding position returns a null value  NaN'' '

"""
series1 = pd.Series( [ 1,2,3,4],[ 'London ', 'HongKong' , ' Humbai ' , 'lagos'] )
series2 = pd.Series( [ 1,3,6,4],[ 'London ' , ' Accra ' , 'lagos ' , ' Delhi ' ] )
print ( series1-series2 )
print('*'*30)
print ( series1+series2 )
print('*'*30)
print ( series1*series2)


 Copy code 

image.png

4,DataFrame The creation of

DataFrame( Data sheet ) It's a kind of ⒉ Dimensional data structure , Data is stored in tabular form , Divided into rows and columns . adopt DataFrame, You can easily process data . Common operations, such as selecting 、 Replace row or column data , It can also reorganize data tables 、 Modify the index 、 Multiple screening, etc . We can basically put DataFrame Understand as a group of with the same index Series Set . call DataFrame() Data in various formats can be converted into DataFrame object , Its three parameters data、index and columns They are data 、 Row index and column index .

5, DataFrame Object common properties

import pandas as pd
from pandas import Series,DataFrame
import numpy as np

# dataframe Common properties 
df_dict = {
	'name':['James','Curry','Iversion'],
	'age':['18','20','19'],
	'national':['us','China','us']
}
df = pd.DataFrame(data=df_dict,index=['0','1','2'])
print(df)
#  Gets the number of rows and columns 
print(df.shape)

# #  Get row index 
print(df.index.tolist())

# #  Get column index 
print(df.columns.tolist())

#  The type of data obtained 
print(df.dtypes)

#  Get the dimensions of the data 
print(df.ndim)

# values Properties are also displayed in two dimensions ndarray Form return of DataFrame The data of 
print(df.values)

#  Exhibition df Overview of 
print(df.info())

#  Display the first few lines , Default display 5 That's ok 
print(df.head(2))

#  Show the last few lines 
print(df.tail(1))

#  obtain DataFrame The column of 
print(df['name'])
# Because we only get one column , So the return is a  Series
print(type(df['name']))

#  If you get multiple columns , Then the return is a  DataFrame  type :
print(df[['name','age']])
print(type(df[['name','age']]))

#  Get a row 
print(df[0:1])

#  Go to many lines 
print(df[1:3])

#  Take a column in multiple rows ( You cannot select multiple rows and columns )
print(df[1:3][['name','age']])
#  Be careful : df[] You can only select rows , Or column selection , You cannot select multiple rows and columns at the same time .

'''
df.loc  Index row data through tags 
df.iloc  Get row data through location 
'''
#  Get the data of a row and a column 
print(df.loc['0','name'])

#  All columns in one row 
print(df.loc['0',:])

#  Data with one row and multiple columns 
print(df.loc['0',['name','age']])

#  Select multiple rows and columns with intervals 
print(df.loc[['0','2'],['name','national']])
#  Select consecutive rows and spaced columns 
print(df.loc['0':'2',['name','national']])

#  Take a line 
print(df.iloc[1])

#  Take consecutive lines 
print(df.iloc[0:2])

#  Take multiple lines with discontinuities 
print(df.iloc[[0,2],:])

#  Take a column 
print(df.iloc[:,1])

#  A certain value 
print(df.iloc[1,0])

#  Modified value 
df.iloc[0,0]='panda'
print(df)
# dataframe The sorting method in 
df = df.sort_values(by='age',ascending=False)
# ascending=False  :  Descending order , The default is ascending 
print(df)

 Copy code 

6, dataframe modify index、columns

df1 = pd.DataFrame(np.arange(9).reshape(3, 3), index = ['bj', 'sh', 'gz'], columns=['a', 'b', 'c'])
print(df1)
#  modify  df1  Of  index
print(df1.index) #  You can print it out print Value , You can also assign a value to it 
df1.index = ['beijing', 'shanghai', 'guangzhou']
print(df1)

#  Customize map function (x Is the original row and column value )
def test_map(x):
    
    return x+'_ABC'
# inplace: Boolean value , The default is False. Specifies whether to return a new DataFrame. If True, In the original df Modify the , The return value is None.
print(df1.rename(index=test_map, columns=test_map, inplace=True))

#  meanwhile ,rename  You can also pass in a dictionary , For a  index  Modify the name separately 
df3 = df1.rename(index={'bj':'beijing'}, columns = {'a':'aa'}) 
print(df3)

#  Convert column to index 
df1=pd.DataFrame({'X':range(5),'Y':range(5),'S':list("abcde"),'Z':[1,1,2,2,2]})
print(df1)
#  Specify a column as the index  (drop=False  Specifies that the columns that are indexes are retained at the same time )
result = df1.set_index('S',drop=False)
result.index.name=None
print(result)

#  Row to column index 
result = df1.set_axis(df1.iloc[0],axis=1,inplace=False)
result.columns.name=None
print(result)
 Copy code 

copyright notice
author[Favor 316],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201291810568430.html

Random recommended