current position:Home>Data analysis starts from scratch, and pandas reads and writes CSV data
Data analysis starts from scratch, and pandas reads and writes CSV data
2022-02-01 10:28:48 【cousin】
This is my participation 11 The fourth of the yuegengwen challenge 16 God , Check out the activity details :2021 One last more challenge
One 、 Write it at the front
This series of study notes reference books : 《 Data analysis practice 》 Tomaz · Drobas , I will share my notes of learning this book with you , Also open into a series 『 Data analysis starts from scratch 』.
Two 、 Summary of knowledge points
1. Create a virtual python Running environment , Dedicated to this series of studies ;
2. Common modules for data analysis pandas install
3. utilize pandas Module reading and writing CSV Format file
3、 ... and 、 Start thinking
1. Creating a virtual environment
I usually prefer Pycharm, So this series is going to use Pycharm do ,Pycharm The installation can be downloaded directly from the official website , Just use the Community Edition .
【 notes 】 I used to like Pycharm, Now I prefer VS Code, You can still use it jupyter notebook.
# Windows/Mac install 、 Use Python Environmental Science +jupyter notebook
(1) OK, let's start , open Pycharm, Click on File->New Project, See the figure below for the basic configuration description .
In particular :python There should be no Chinese in the project path , At the same time, the project name should not appear in Chinese , The name should try to summarize the content of the project .
(2) Once created , We will find many project files and virtual environment files under the corresponding directory .
2. Common modules for data analysis Pandas install
(1) Zero basic course , First, teach you how to enter the virtual environment : Go to directory I:\pyCoding\Frame\Data_analysis\Scripts( My virtual environment Directory ), Hold down shift+ Right mouse button , open powershell perhaps cmd( If it is powershell Just enter cmd), Input again activate, Enter the virtual environment , You will find that there is an extra bracket in front of the path, which is your virtual environment name , Indicates that you have entered the virtual environment . See below for details :
PS I:\pyCoding\Frame\Data_analysis\Scripts> cmd
Microsoft Windows [ edition 10.0.17134.112]
(c) 2018 Microsoft Corporation. All rights reserved .
I:\pyCoding\Frame\Data_analysis\Scripts>activate
(Data_analysis) I:\pyCoding\Frame\Data_analysis\Scripts>
Copy code
I don't know if you feel very troublesome , I think it's particularly troublesome , Each time you enter the virtual environment, you must first go to the specified file path , And then enter the command , It doesn't conform to the style of programmers !
【 notes 】 I used to use virtualenvwrapper Manage virtual environments , Now I prefer to use pipenv Manage virtual environments .
(2) install pandas modular
After using the shortcut to enter the virtual environment , direct pip
Command installation
# cmd Under direct operation
C:\Users\82055>workon
Pass a name to activate one of the following virtualenvs:
==============================================================================
Data_analysis
spiderenv
C:\Users\82055>workon Data_analysis
(Data_analysis) C:\Users\82055>pip install pandas
Copy code
Installation results :
The installation process is about 1 About minutes , When finished, it will show
Installing collected packages: pytz, numpy, six, python-dateutil, pandas
Successfully installed numpy-1.15.4 pandas-0.23.4 python-dateutil-2.7.5 pytz-2018.7 six-1.11.0
Copy code
It is obvious that , This process not only installs pandas package , And installed numpy,pytz,six,python-dateutil These additional packages , We'll use it later .
3. utilize pandas Module reading and writing CSV Format file
(1) Data file download
Click here to download , The data in this series are all from this ,《 Data analysis practice 》 The source code in the book is also in this code warehouse , Of course, I will build a code warehouse myself later , Record your learning process , You can download the data file from here first .
(2)pandas Basic introduction
pandas
by Python Programming languages provide high performance , Is based on NumPy An easy-to-use data structure and data analysis tool ,pandas
It provides us with high-performance advanced data structure ( such as :DataFrame) And the tools needed to efficiently operate large data sets , At the same time, it provides a large number of functions and methods that enable us to process data quickly and conveniently .
(3) utilize pandas Read CSV file
Read the code :
# Import data processing module
import pandas as pd
import os
# Get the current file parent directory path
father_path = os.getcwd()
# Raw data file path
rpath_csv = father_path+r'\data01\city_station.csv'
# Reading data
csv_read = pd.read_csv(rpath_csv)
# Before displaying data 10 strip
print(csv_read.head(10))
Copy code
Running results :
Function analysis :
read_csv(filepath_or_buffer,sep,header,names,skiprows,na_values,encoding,nrows) Read in the specified format csv file .
Common parameter parsing :
- filepath_or_buffer: character string , Represents the file path ;
- sep: character string , Specify the separator , The default is ’,’;
- header: The number , Specify the row as the column name ( Ignore comment lines ), If no column name is specified , Default header=0; If the column name is specified header=None;
- names: list , Specifies the column name , If not in the file header The line of , Should be explicit header=None.
- skiprows: list , Number of rows to ignore ( from 0 Start ), The set number of rows will not be read .
- na_values: list , The setting needs to replace the value with NAN Value ,pandas Default NAN For default , Can be used to handle some default 、 Wrong value .
- encoding: character string , be used for unicode Text encoding format for . for example ,"utf-8" or "gbk" And other text encoding formats .
- nrows: Number of rows to read .
(4) utilize pandas write in CSV file
write in CSV file :
import pandas as pd
import os
# Get the current file parent directory path
father_path = os.getcwd()
# Save data file path
path_csv = father_path+r'\data01\temp_city.csv'
# Write data ( Name + The column value )
data = {" Site name ": [" Beijing North ", " Beijing East ", " Beijing ", " Beijing South ", " Beijing West "],
" Code name ": ["VAP", "BOP", "BJP", "VNP", "BXP"]}
# The data is initialized to DataFrame object
df = pd.DataFrame(data)
# Data writing
df.to_csv(path_csv)
Copy code
Running results :
Function analysis :
to_csv(path_or_buf,sep,na_rep,columns,header,index)
- path_or_buf: character string , file name 、 Document specific 、 Relative paths 、 File stream, etc ;
- sep: character string , File segmentation symbol ;
- na_rep: character string , take NaN Convert to a specific value ;
- columns: list , Select some columns to write ;
- header:None, Ignore column names when writing ;
- index:False Select not to write index , The default is True.
Four 、 Conclusion
insist and Strive : You'll get what you get .
The thought is complex ,
Implementation is interesting ,
As long as you don't give up ,
There will be a day of fame .
—《 Old watch limericks 》
See you next time , I'm an old watch who loves cats and Technology , If you think this article is helpful to your study , Welcome to thumb up 、 Comment on 、 Pay attention to me !
copyright notice
author[cousin],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202011028467175.html
The sidebar is recommended
- Python * * packaging and unpacking details
- Python realizes weather query function
- Python from 0 to 1 (day 12) - Python data application 2 (STR function)
- Python from 0 to 1 (day 13) - Python data application 3
- Numpy common operations of Python data analysis series Chapter 8
- How to implement mockserver [Python version]
- Van * Python! Write an article and publish the script on multiple platforms
- Python data analysis - file reading
- Python data De duplication and missing value processing
- Python office automation - play with browser
guess what you like
-
Python series tutorial 127 -- Reference vs copy
-
Control flow in Python: break and continue
-
Teach you how to extract tables in PDF with Python
-
leetcode 889. Construct Binary Tree from Preorder and Postorder Traversal(python)
-
leetcode 1338. Reduce Array Size to The Half(python)
-
Object oriented and exception handling in Python
-
How to configure load balancing for Django service
-
How to embed Python in go
-
Python Matplotlib drawing graphics
-
Python object-oriented programming 05: concluding summary of classes and objects
Random recommended
- Python from 0 to 1 (day 14) - Python conditional judgment 1
- Several very interesting modules in Python
- How IOS developers learn Python Programming 15 - object oriented programming 1
- Daily python, Chapter 20, exception handling
- Understand the basis of Python collaboration in a few minutes
- [centos7] how to install and use Python under Linux
- leetcode 1130. Minimum Cost Tree From Leaf Values(python)
- leetcode 1433. Check If a String Can Break Another String(python)
- Python Matplotlib drawing 3D graphics
- Talk about deep and shallow copying in Python
- Python crawler series - network requests
- Python thread 01 understanding thread
- Analysis of earthquake distribution in the past 10 years with Python~
- You need to master these before learning Python crawlers
- After the old friend (R & D post) was laid off, I wanted to join the snack bar. I collected some data in Python. It's more or less a intention
- Python uses redis
- Python crawler - ETF fund acquisition
- Detailed tutorial on Python operation Tencent object storage (COS)
- [Python] comparison of list, tuple, array and bidirectional queue methods
- Go Python 3 usage and pit Prevention Guide
- Python logging log error and exception exception callback method
- Learn Python quickly and take a shortcut~
- Python from 0 to 1 (day 15) - Python conditional judgment 2
- Python crawler actual combat, requests module, python to capture headlines and take beautiful pictures
- The whole activity collected 8 proxy IP sites to pave the way for the python proxy pool, and the 15th of 120 crawlers
- Why can't list be used as dictionary key value in Python
- Python from 0 to 1 (day 16) - Python conditional judgment 3
- What is the python programming language?
- Python crawler reverse webpack, a real estate management platform login password parameter encryption logic
- Python crawler reverse, a college entrance examination volunteer filling platform encrypts the parameter signsafe and decrypts the returned results
- Python simulated Login, selenium module, python identification graphic verification code to realize automatic login
- Python -- datetime (timedelta class)
- Python's five strange skills will bring you a sense of enrichment in mastering efficient programming skills
- [Python] comparison of dictionary dict, defaultdict and orderdict
- Test driven development using Django
- Face recognition practice: face recognition using Python opencv and deep learning
- leetcode 1610. Maximum Number of Visible Points(python)
- Python thread 03 thread synchronization
- Introduction and internal principles of Python's widely used concurrent processing Library Futures
- Python - progress bar artifact tqdm usage