current position:Home>V. pandas based on Python

V. pandas based on Python

2022-01-29 18:10:59 Favor 316

This article has participated in 「 Digging force Star Program 」, Win a creative gift bag , Challenge creation incentive fund .
Little knowledge , Great challenge ! This article is participating in “ A programmer must have a little knowledge ” Creative activities .

1, pandas brief introduction

Pandas Is based on NumPy A tool of , The tool is created to solve data analysis tasks .Pandas It includes a large number of databases and some standard data models , Provides the tools needed to operate large datasets efficiently .pandas Provides a large number of functions and methods that enable us to process data quickly and conveniently .

2.Series object

  • Pandas Based on two data types : series And dataframe.
  • Series yes Pandas The most basic object in ,Series Like a one-dimensional array . in fact ,Series Basically based on NumPy From the array object of . and NumPy The array of is different ,Series Can customize labels for data , That's the index (index), Then access the data in the array through the index .
  • Dataframe Is a two-dimensional table structure .Pandas Of dataframe Many different data types can be stored , And each axis has its own label . You can think of it as a series Dictionary entry for .

3,Series Perform arithmetic operations

 Yes series All arithmetic operations are based on index On going .
 We can add, subtract, multiply and divide (+- */) Such an operator is used for two series Carry out operations ,
Pandas Will be based on the index  index, Calculate the response data , The results will be stored as floating point numbers , To avoid losing accuracy .
 If Pandas In two series Can't find the same  index, The corresponding position returns a null value  NaN'' '

series1 = pd.Series( [ 1,2,3,4],[ 'London ', 'HongKong' , ' Humbai ' , 'lagos'] )
series2 = pd.Series( [ 1,3,6,4],[ 'London ' , ' Accra ' , 'lagos ' , ' Delhi ' ] )
print ( series1-series2 )
print ( series1+series2 )
print ( series1*series2)

 Copy code 


4,DataFrame The creation of

DataFrame( Data sheet ) It's a kind of ⒉ Dimensional data structure , Data is stored in tabular form , Divided into rows and columns . adopt DataFrame, You can easily process data . Common operations, such as selecting 、 Replace row or column data , It can also reorganize data tables 、 Modify the index 、 Multiple screening, etc . We can basically put DataFrame Understand as a group of with the same index Series Set . call DataFrame() Data in various formats can be converted into DataFrame object , Its three parameters data、index and columns They are data 、 Row index and column index .

5, DataFrame Object common properties

import pandas as pd
from pandas import Series,DataFrame
import numpy as np

# dataframe Common properties 
df_dict = {
df = pd.DataFrame(data=df_dict,index=['0','1','2'])
#  Gets the number of rows and columns 

# #  Get row index 

# #  Get column index 

#  The type of data obtained 

#  Get the dimensions of the data 

# values Properties are also displayed in two dimensions ndarray Form return of DataFrame The data of 

#  Exhibition df Overview of 

#  Display the first few lines , Default display 5 That's ok 

#  Show the last few lines 

#  obtain DataFrame The column of 
# Because we only get one column , So the return is a  Series

#  If you get multiple columns , Then the return is a  DataFrame  type :

#  Get a row 

#  Go to many lines 

#  Take a column in multiple rows ( You cannot select multiple rows and columns )
#  Be careful : df[] You can only select rows , Or column selection , You cannot select multiple rows and columns at the same time .

df.loc  Index row data through tags 
df.iloc  Get row data through location 
#  Get the data of a row and a column 

#  All columns in one row 

#  Data with one row and multiple columns 

#  Select multiple rows and columns with intervals 
#  Select consecutive rows and spaced columns 

#  Take a line 

#  Take consecutive lines 

#  Take multiple lines with discontinuities 

#  Take a column 

#  A certain value 

#  Modified value 
# dataframe The sorting method in 
df = df.sort_values(by='age',ascending=False)
# ascending=False  :  Descending order , The default is ascending 

 Copy code 

6, dataframe modify index、columns

df1 = pd.DataFrame(np.arange(9).reshape(3, 3), index = ['bj', 'sh', 'gz'], columns=['a', 'b', 'c'])
#  modify  df1  Of  index
print(df1.index) #  You can print it out print Value , You can also assign a value to it 
df1.index = ['beijing', 'shanghai', 'guangzhou']

#  Customize map function (x Is the original row and column value )
def test_map(x):
    return x+'_ABC'
# inplace: Boolean value , The default is False. Specifies whether to return a new DataFrame. If True, In the original df Modify the , The return value is None.
print(df1.rename(index=test_map, columns=test_map, inplace=True))

#  meanwhile ,rename  You can also pass in a dictionary , For a  index  Modify the name separately 
df3 = df1.rename(index={'bj':'beijing'}, columns = {'a':'aa'}) 

#  Convert column to index 
#  Specify a column as the index  (drop=False  Specifies that the columns that are indexes are retained at the same time )
result = df1.set_index('S',drop=False)

#  Row to column index 
result = df1.set_axis(df1.iloc[0],axis=1,inplace=False)
 Copy code 

copyright notice
author[Favor 316],Please bring the original link to reprint, thank you.

Random recommended