current position:Home>[pandas learning notes 01] powerful tool set for analyzing structured data

[pandas learning notes 01] powerful tool set for analyzing structured data

2022-02-01 10:36:29 Hang Seng light cloud community

author : Illusory good

source : Hang Seng LIGHT Cloud community

Background

In the process of quantitative analysis , Always need to use a large data base , Mining the association between data , Finally find the data we need . Only pass Python Data analysis is very complex , Is there a simpler tool to help us analyze data efficiently and quickly ?

I'll introduce you today Pandas- A powerful tool set for analyzing structured data .

This paper is mainly directed to some Python Grammar based students , Need to learn Python Students can find tutorials in the community to recharge (developer.hs.net/course/?nav…).

image-20211121200612284.png

Basic concepts

Pandas The library is a free 、 Open source third parties Python library , yes Python One of the indispensable tools for data analysis , It's for Python Data analysis provides high performance , And easy to use data structure , namely Series and DataFrame.

Pandas The basis of use is Numpy( Provides high performance matrix operations ); For data mining and data analysis , It also provides data cleaning function .

Pandas Kuji is based on Python NumPy Library Development , therefore , It can work with Python In conjunction with the Scientific Computing Library .

Pandas Since its birth, it has been applied in many fields , Like finance 、 statistical 、 Social Sciences 、 Construction Engineering, etc .

Through the above introduction , I think everyone is concerned about Pandas What are the , Can have a basic understanding .pandas amount to python in excel: It uses tables ( That is to say dataframe), Can do all kinds of transformations on data , But there are many other features .

data structure

DataFrame

DataFrame It's a tabular data structure , It has an ordered set of columns , Each column can be of a different value type ( The number 、 character string 、 Boolean value ).DataFrame There are both row and column indexes , It can be seen by Series A dictionary made up of ( Share an index ).

image-20211121202948923.png

DataFrame The construction method is as follows :

pandas.DataFrame( data, index, columns, dtype, copy)
 Copy code 

Parameter description :

  • data: A set of data (ndarray、series, map, lists, dict Other types ).
  • index: Index value , Or it can be called a line label .
  • columns: Column labels , The default is RangeIndex (0, 1, 2, …, n) .
  • dtype: data type .
  • copy: Copy the data , The default is False.

Series

Series Similar to a column in a table (column), Similar to one-dimensional arrays , You can save any data type .

image-20211121203109765.png

Series By index (index) And columns make up , Function as follows :

pandas.Series( data, index, dtype, name, copy)
 Copy code 

Parameter description :

  • data: A set of data (ndarray type ).
  • index: Data index label , If you don't specify , The default from the 0 Start .
  • dtype: data type , By default, I will judge .
  • name: Set the name .
  • copy: Copy the data , The default is False.

Quick start

Import components

introduce Pandas Components into the code :

import pandas as pd
 Copy code 

If you can't introduce , It means there is a problem with the environment configuration or you haven't downloaded it at all , Download the components in the following ways :

pip install Pandas
 Copy code 

Series The object operation

adopt Series() Function to create Series object , Through this object, you can call the corresponding methods and properties :

import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)
 Copy code 

DataFrame The object operation

adopt DataFrame() The syntax format for creating objects is as follows :

import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)
 Copy code 

Read file data

Can pass read_csv() Function on local .csv Format file for reading :

data = pd.read_csv('file.csv')
data = pd.read_csv('file.csv', nrows=1000, skiprows=[1,5], encoding= gbk)
 Copy code 

Parameter meaning :

  • 'file.csv' : Indicates read file name , The system location can be added for reading
  • nrows : Indicates how many rows of data before reading
  • skiprows : Indicates that the number of unread lines will be automatically skipped when reading the file .
  • encoding : Indicates the encoding format of the read file

And read_csv , A similar approach is read_excel Read Excel File data .

Write file data

Pandas Provided to_csv() The function is used to DataFrame Convert to CSV data . If you want to CSV Write data to file , Just pass a file object to the function . otherwise ,CSV The data will be returned in string format .

data.to_csv(‘my_new_file.csv’, index=None)
 Copy code 

Parameter meaning :

  • index : Indicates whether an index needs to be added , The index is automatically added by default

And to_csv , A similar approach is to_excel write in Excel File data .

summary

This paper mainly introduces Pandas The basics of the toolset , Study Pandas It can help us process and analyze data quickly , The utility operation will be updated later , Coming soon .

copyright notice
author[Hang Seng light cloud community],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202011036282549.html

Random recommended