current position:Home>[Pandas] A primer on Pandas processing csv file datasets (neural network/machine learning algorithm data preprocessing)
[Pandas] A primer on Pandas processing csv file datasets (neural network/machine learning algorithm data preprocessing)
2022-08-06 06:33:04【little girl】
Motivation
The data collected with a certain boss iscsv格式的,Haven't dealt with it beforecsv格式的数据.When I used it to write neural network training, I stepped on a lot of pits,这里记录一下,It is also convenient for later people to learn.
Pandas处理csv文件
处理csvThere should be quite a few packages of files,这里就做一个pandas的教程了(其他的没用过hhhh).Here I take one of my data as an example to demonstrate some common processing methods.
文件读取
- 语句:
origin_data = pd.read_csv("origin_data.csv", na_values=" NaN")
- csvNull values in the file(NaN)是什么? 这里是一个大坑.I recommend everyone to read itcsvWhen I use the following parameters,Set missing values uniformly to
"NaN"
.In this way, if you need to manually filter out missing values later, you can index to the position.之前试过,如果不设置这个参数,缺失值不是False、0、"NaN"中的任何一个. - 结果:
dataframeIndex a column
pandas读进来的csvThe data will be encapsulated into a calldataframe的格式,This format can be converted to numpy数组.Let's see how it works firstdataframe.
- 语句: 使用
data.name
to index a column by label.origin_data.Height
- 结果:
删除某一列
- 语句:
del
Keyword tagging removes a columndel origin_data["Weight change"]
- 结果: 可以看到"Weight change"A column has been deleted
删除缺失值所在的行/列
对于缺失值,In general, interpolation can be used to complete or directly discard the data.这里以删除NaNThe row where the value is located is an example to demonstrate.
- 语句:
.dropna()
方法,Delete by defaultNaN值的行.可以设置.dropna(axis=1)
删除有NaN值的列.Other usages can be consulted by yourself.This usage is the most common.origin_data = origin_data.dropna()
- 结果: You can see that there are fewer lines,没有NaN值了.
修改索引
After doing some processing on the data,The index of the data is likely to be messed up directly.比如这里:We deleted some lines,So the index is discontinuous.At this time, if we traverse the data according to the index, an error will be reported.Therefore, it is generally necessary to reset the index after the data is processed.
- 语句: 这里重点说一下
drop
参数.drop参数为TrueIndicates that it is not necessary to drop the index column directly,Then reset the order.drop参数为FalseIndicates to reset the index,and keep the index column.origin_data = origin_data.reset_index(drop=True)
- 结果:
Modify the value conditionally
We are doing data preprocessing,Need to convert some non-numeric values to numbers.比如性别、省市等.Here is an example of gender,我希望把M/F转化为0/1,for the neural network to process.
- 语句:
.loc[row, flag]
Get the data that needs to be indexed,The value is then modified by conditional judgmentfor i in range(len(origin_data)): origin_data.loc[i, 'Sex'] = 1 if origin_data.loc[i, 'Sex'] == "F" else 0
- 结果: Here I have changed the data of two columns,结果如图所示
copyright notice
author[little girl],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/218/202208060519291274.html
The sidebar is recommended
- Get the input and output information of the onnx model Python script
- Common parameters of python matplotlib and drawing examples
- python axessubplot_ use matplotlib's savefig to save plots generated from python pandas (AxesSubPlot)
- Qixi Festival_A Python program that moves the mouse to play the confession balloon (available in August 2022)
- Python knowledge points: Python variables, data types
- Python open virtual environment
- Python uses Hive to query data
- python+opencv study notes
- python--log processing logging.handlers.TimedRotatingFileHandler
- The problem that yum is unavailable after python upgrade
guess what you like
Application of bubbling method in program thought in advanced scl programming in python and 1200PLC
Tensorflow C++ deployment practice - python environment establishment on linux platform (2)
Python graduation design works based on django framework personal blog system completed finished product (7) mid-term inspection report
Python graduation design works based on django framework personal blog system completed design (8) graduation design thesis template
Python graduation design works are based on the django framework enterprise company website. The finished product (1) Development overview
Application of bubbling method in program thought in advanced scl programming in python and 1200PLC
python get all characters before or after a specified character
[Python | Word Cloud] Draw a super beautiful word cloud from chat records (Happy Qixi Festival, classmate Zeng)
Python data visualization-----make a global earthquake scatter plot
Python. Iterator object iter() (Based on the iterator feature, dismantle the complicated single-line "forced code", and understand the "secret method" for selecting fixed-length elements of the sequence.)
Random recommended
- Python's common modules of threading and Thread modules The first stage of thread implementation
- Blender Python Programming: Creating Emitting Materials
- Python multiprocessing
- How does python implement image edge detection
- Django paging method
- django regex
- How does Python represent time?2 modules, 3 ways, 1 text to understand~
- Modify column name and row index in pandas DataFrame
- [python pandas groupby]
- Python Daily Practice (New Question Bank of Niu Ke) - Day 20: Dictionary Practice
- [LeetCode brush questions] Hash table - some questions are implemented with built-in functions (with Python code)
- [LeetCode brush questions] Linked list topic (1) (with Python code)
- [Small case of python learning] Simulation system invasion to enhance interest
- Getting Started with Python Basics - Essential Knowledge for Getting Started
- How does Python represent time?2 modules, 3 ways, 1 text to get it~
- Python office software automation, master openpyxl operation in 5 minutes
- Introduction to Python Basics - Variables, Strings
- [python2] remove the u in front of the unicode string
- How to use the Python Color class to draw text
- How to use Asyncio Python web programming
- 27 Python artificial intelligence libraries have been sorted out, it is recommended to collect!
- [Python] Word2Vec predicts what will be left if I subtract 'love' from my 'life'
- When I export a pandas package, there is a problem. If I don't import it, there is no problem. Is this not enough memory?
- Python version 3.7.4 How can I install locust successfully?
- How does python use pyinstaller to package music into exe, that is, play background music when running the packaged program?
- Python use pyinstaller how to wrap up music exe, is to run a packaged program play background music?
- Rescue 007 of graph traversal application, detailed analysis of python version
- 27 Python artificial intelligence libraries have been sorted out, it is recommended to collect~
- pandas DataFrame data filtering (2)
- Python is how to represent time?- two modules, three ways, complete the ~ 1
- The definition of pycharm writing python code
- Problems defining functions in Python
- Python Socket Network Programming
- Django server running error
- Python image processing notes - image matching based on Harris corners (skimage)
- (Thirteen--1) Concurrent programming of python (thread, queue, process, coroutine, process thread coroutine comparison)
- (12) Python's memory management mechanism
- Python crawler entry case day07: Hippopx
- Django reports an error ModuleNotFoundError: No module named 'mysqlclient'
- Python study notes