current position:Home>Data analysis starts from scratch, and pandas reads and writes CSV data

Data analysis starts from scratch, and pandas reads and writes CSV data

2022-02-01 10:28:48 cousin

This is my participation 11 The fourth of the yuegengwen challenge 16 God , Check out the activity details :2021 One last more challenge

One 、 Write it at the front

This series of study notes reference books : 《 Data analysis practice 》 Tomaz · Drobas , I will share my notes of learning this book with you , Also open into a series 『 Data analysis starts from scratch 』.

Two 、 Summary of knowledge points

1. Create a virtual python Running environment , Dedicated to this series of studies ;

2. Common modules for data analysis pandas install

3. utilize pandas Module reading and writing CSV Format file

3、 ... and 、 Start thinking

1. Creating a virtual environment

I usually prefer Pycharm, So this series is going to use Pycharm do ,Pycharm The installation can be downloaded directly from the official website , Just use the Community Edition .

【 notes 】 I used to like Pycharm, Now I prefer VS Code, You can still use it jupyter notebook.

# Windows/Mac install 、 Use Python Environmental Science +jupyter notebook

(1) OK, let's start , open Pycharm, Click on File->New Project, See the figure below for the basic configuration description .

In particular :python There should be no Chinese in the project path , At the same time, the project name should not appear in Chinese , The name should try to summarize the content of the project .

 Operation steps diagram

(2) Once created , We will find many project files and virtual environment files under the corresponding directory .

2. Common modules for data analysis Pandas install

(1) Zero basic course , First, teach you how to enter the virtual environment : Go to directory I:\pyCoding\Frame\Data_analysis\Scripts( My virtual environment Directory ), Hold down shift+ Right mouse button , open powershell perhaps cmd( If it is powershell Just enter cmd), Input again activate, Enter the virtual environment , You will find that there is an extra bracket in front of the path, which is your virtual environment name , Indicates that you have entered the virtual environment . See below for details :

PS I:\pyCoding\Frame\Data_analysis\Scripts> cmd
Microsoft Windows [ edition  10.0.17134.112]
(c) 2018 Microsoft Corporation. All rights reserved .

(Data_analysis) I:\pyCoding\Frame\Data_analysis\Scripts>
 Copy code 

I don't know if you feel very troublesome , I think it's particularly troublesome , Each time you enter the virtual environment, you must first go to the specified file path , And then enter the command , It doesn't conform to the style of programmers !

【 notes 】 I used to use virtualenvwrapper Manage virtual environments , Now I prefer to use pipenv Manage virtual environments .

(2) install pandas modular

After using the shortcut to enter the virtual environment , direct pip Command installation

# cmd Under direct operation 

Pass a name to activate one of the following virtualenvs:

C:\Users\82055>workon Data_analysis
(Data_analysis) C:\Users\82055>pip install pandas
 Copy code 

Installation results :

 setup script

The installation process is about 1 About minutes , When finished, it will show

Installing collected packages: pytz, numpy, six, python-dateutil, pandas
Successfully installed numpy-1.15.4 pandas-0.23.4 python-dateutil-2.7.5 pytz-2018.7 six-1.11.0
 Copy code 

It is obvious that , This process not only installs pandas package , And installed numpy,pytz,six,python-dateutil These additional packages , We'll use it later .

3. utilize pandas Module reading and writing CSV Format file

(1) Data file download

Click here to download , The data in this series are all from this ,《 Data analysis practice 》 The source code in the book is also in this code warehouse , Of course, I will build a code warehouse myself later , Record your learning process , You can download the data file from here first .

(2)pandas Basic introduction

pandas by Python Programming languages provide high performance , Is based on NumPy An easy-to-use data structure and data analysis tool ,pandas It provides us with high-performance advanced data structure ( such as :DataFrame) And the tools needed to efficiently operate large data sets , At the same time, it provides a large number of functions and methods that enable us to process data quickly and conveniently .

(3) utilize pandas Read CSV file

Read the code :

#  Import data processing module 
import pandas as pd
import os

#  Get the current file parent directory path 
father_path = os.getcwd()

#  Raw data file path 
rpath_csv = father_path+r'\data01\city_station.csv'
#  Reading data 
csv_read = pd.read_csv(rpath_csv)
#  Before displaying data 10 strip 
 Copy code 

Running results :

Function analysis :

read_csv(filepath_or_buffer,sep,header,names,skiprows,na_values,encoding,nrows) Read in the specified format csv file .

Common parameter parsing :

  • filepath_or_buffer: character string , Represents the file path ;
  • sep: character string , Specify the separator , The default is ’,’;
  • header: The number , Specify the row as the column name ( Ignore comment lines ), If no column name is specified , Default header=0; If the column name is specified header=None;
  • names: list , Specifies the column name , If not in the file header The line of , Should be explicit header=None.
  • skiprows: list , Number of rows to ignore ( from 0 Start ), The set number of rows will not be read .
  • na_values: list , The setting needs to replace the value with NAN Value ,pandas Default NAN For default , Can be used to handle some default 、 Wrong value .
  • encoding: character string , be used for unicode Text encoding format for . for example ,"utf-8" or "gbk" And other text encoding formats .
  • nrows: Number of rows to read .
(4) utilize pandas write in CSV file

write in CSV file :

import pandas as pd
import os

#  Get the current file parent directory path 
father_path = os.getcwd()

#  Save data file path 
path_csv = father_path+r'\data01\temp_city.csv'
#  Write data ( Name + The column value )
data = {" Site name ": [" Beijing North ", " Beijing East ", " Beijing ", " Beijing South ", " Beijing West "],
        " Code name ": ["VAP", "BOP", "BJP", "VNP", "BXP"]}
#  The data is initialized to DataFrame object 
df = pd.DataFrame(data)
#  Data writing 
 Copy code 

Running results :

Function analysis :


  • path_or_buf: character string , file name 、 Document specific 、 Relative paths 、 File stream, etc ;
  • sep: character string , File segmentation symbol ;
  • na_rep: character string , take NaN Convert to a specific value ;
  • columns: list , Select some columns to write ;
  • header:None, Ignore column names when writing ;
  • index:False Select not to write index , The default is True.

Four 、 Conclusion

insist and Strive : You'll get what you get .

The thought is complex ,

Implementation is interesting ,

As long as you don't give up ,

There will be a day of fame .

—《 Old watch limericks 》

See you next time , I'm an old watch who loves cats and Technology , If you think this article is helpful to your study , Welcome to thumb up 、 Comment on 、 Pay attention to me !

copyright notice
author[cousin],Please bring the original link to reprint, thank you.

Random recommended