# V. pandas based on Python

2022-01-29 18:10:59

This article has participated in 「 Digging force Star Program 」, Win a creative gift bag , Challenge creation incentive fund .
Little knowledge , Great challenge ！ This article is participating in “ A programmer must have a little knowledge ” Creative activities .

# 1, pandas brief introduction

Pandas Is based on NumPy A tool of , The tool is created to solve data analysis tasks .Pandas It includes a large number of databases and some standard data models , Provides the tools needed to operate large datasets efficiently .pandas Provides a large number of functions and methods that enable us to process data quickly and conveniently .

# 2.Series object

• Pandas Based on two data types : series And dataframe.
• Series yes Pandas The most basic object in ,Series Like a one-dimensional array . in fact ,Series Basically based on NumPy From the array object of . and NumPy The array of is different ,Series Can customize labels for data , That's the index (index), Then access the data in the array through the index .
• Dataframe Is a two-dimensional table structure .Pandas Of dataframe Many different data types can be stored , And each axis has its own label . You can think of it as a series Dictionary entry for .

# 3,Series Perform arithmetic operations

``````"""
Yes series All arithmetic operations are based on index On going .
We can add, subtract, multiply and divide (+- */） Such an operator is used for two series Carry out operations ,
Pandas Will be based on the index  index, Calculate the response data , The results will be stored as floating point numbers , To avoid losing accuracy .
If Pandas In two series Can't find the same  index, The corresponding position returns a null value  NaN'' '

"""
series1 = pd.Series( [ 1,2,3,4],[ 'London ', 'HongKong' , ' Humbai ' , 'lagos'] )
series2 = pd.Series( [ 1,3,6,4],[ 'London ' , ' Accra ' , 'lagos ' , ' Delhi ' ] )
print ( series1-series2 )
print('*'*30)
print ( series1+series2 )
print('*'*30)
print ( series1*series2)

Copy code `````` # 4,DataFrame The creation of

DataFrame( Data sheet ） It's a kind of ⒉ Dimensional data structure , Data is stored in tabular form , Divided into rows and columns . adopt DataFrame, You can easily process data . Common operations, such as selecting 、 Replace row or column data , It can also reorganize data tables 、 Modify the index 、 Multiple screening, etc . We can basically put DataFrame Understand as a group of with the same index Series Set . call DataFrame() Data in various formats can be converted into DataFrame object , Its three parameters data、index and columns They are data 、 Row index and column index .

# 5, DataFrame Object common properties

``````import pandas as pd
from pandas import Series,DataFrame
import numpy as np

# dataframe Common properties
df_dict = {
'name':['James','Curry','Iversion'],
'age':['18','20','19'],
'national':['us','China','us']
}
df = pd.DataFrame(data=df_dict,index=['0','1','2'])
print(df)
#  Gets the number of rows and columns
print(df.shape)

# #  Get row index
print(df.index.tolist())

# #  Get column index
print(df.columns.tolist())

#  The type of data obtained
print(df.dtypes)

#  Get the dimensions of the data
print(df.ndim)

# values Properties are also displayed in two dimensions ndarray Form return of DataFrame The data of
print(df.values)

#  Exhibition df Overview of
print(df.info())

#  Display the first few lines , Default display 5 That's ok

#  Show the last few lines
print(df.tail(1))

#  obtain DataFrame The column of
print(df['name'])
# Because we only get one column , So the return is a  Series
print(type(df['name']))

#  If you get multiple columns , Then the return is a  DataFrame  type ：
print(df[['name','age']])
print(type(df[['name','age']]))

#  Get a row
print(df[0:1])

#  Go to many lines
print(df[1:3])

#  Take a column in multiple rows （ You cannot select multiple rows and columns ）
print(df[1:3][['name','age']])
#  Be careful ： df[] You can only select rows , Or column selection , You cannot select multiple rows and columns at the same time .

'''
df.loc  Index row data through tags
df.iloc  Get row data through location
'''
#  Get the data of a row and a column
print(df.loc['0','name'])

#  All columns in one row
print(df.loc['0',:])

#  Data with one row and multiple columns
print(df.loc['0',['name','age']])

#  Select multiple rows and columns with intervals
print(df.loc[['0','2'],['name','national']])
#  Select consecutive rows and spaced columns
print(df.loc['0':'2',['name','national']])

#  Take a line
print(df.iloc)

#  Take consecutive lines
print(df.iloc[0:2])

#  Take multiple lines with discontinuities
print(df.iloc[[0,2],:])

#  Take a column
print(df.iloc[:,1])

#  A certain value
print(df.iloc[1,0])

#  Modified value
df.iloc[0,0]='panda'
print(df)
# dataframe The sorting method in
df = df.sort_values(by='age',ascending=False)
# ascending=False  ：  Descending order , The default is ascending
print(df)

Copy code ``````

# 6, dataframe modify index、columns

``````df1 = pd.DataFrame(np.arange(9).reshape(3, 3), index = ['bj', 'sh', 'gz'], columns=['a', 'b', 'c'])
print(df1)
#  modify  df1  Of  index
print(df1.index) #  You can print it out print Value , You can also assign a value to it
df1.index = ['beijing', 'shanghai', 'guangzhou']
print(df1)

#  Customize map function （x Is the original row and column value ）
def test_map(x):

return x+'_ABC'
# inplace： Boolean value , The default is False. Specifies whether to return a new DataFrame. If True, In the original df Modify the , The return value is None.
print(df1.rename(index=test_map, columns=test_map, inplace=True))

#  meanwhile ,rename  You can also pass in a dictionary , For a  index  Modify the name separately
df3 = df1.rename(index={'bj':'beijing'}, columns = {'a':'aa'})
print(df3)

#  Convert column to index
df1=pd.DataFrame({'X':range(5),'Y':range(5),'S':list("abcde"),'Z':[1,1,2,2,2]})
print(df1)
#  Specify a column as the index  (drop=False  Specifies that the columns that are indexes are retained at the same time )
result = df1.set_index('S',drop=False)
result.index.name=None
print(result)

#  Row to column index
result = df1.set_axis(df1.iloc,axis=1,inplace=False)
result.columns.name=None
print(result)
Copy code ``````