current position:Home>Pandas. read_ Detailed explanation of csv() function and all parameter usage + example code

Pandas. read_ Detailed explanation of csv() function and all parameter usage + example code

2022-06-24 08:06:23fanstuck

Catalog

Preface

One 、 Basic grammar and function

Two 、 Parameter description and code demonstration

1.filepath_or_buffer

2.sep

3.delimiter

 4.header

 5.names

 6.index_col

7.usecols

 8.squeeze

9.prefix

 10.mangle_dupe_cols

11.dtype

 12.engine

13.converters

 14.true_values

15.false_values

16.skipinitialspace

17. skiprows

​ edit  18.skipfooter

 19.nrows

 20.na_values

 21.keep_default_na

22. na_filter

23.verbose

 24.skip_blank_lines

 25.parse_dates

 26.infer_datetime_format

 27.keep_date_col

 28.date_parser

29. dayfirst

 30.cache_dates

31.iterator

32.chunksize

33.compression

 34.thousands

35.decimal

36.lineterminator

 37.quotechar

38.quoting

39.doublequote

40.escapechar

41.comment

 42.encoding

43.encoding_errors

44.dialect

45. error_bad_lines

46.warn_bad_lines

47.on_bad_lines

 48.delim_whitespace

49.low_memory

50.memory_map

51.float_precision

52.storage_options

Focus , Prevent losing , If there is a mistake , Please leave a message , Thank you very much



Preface

Pandas It is often used as a data analysis tool library and its own DataFrame Data types do some flexible data conversion 、 Calculation 、 Operation and other complex operations , But they are all built after we get the data from the data source . Therefore, as an interface function for reading data source information, it must have its powerful and convenient ability , When reading different class sources or data of different classes, there are corresponding read Function can be processed one step first , This will reduce a considerable part of our data processing operations . every last read() function , As a data analyst, I personally think everyone should master and be familiar with its corresponding parameters , Corresponding read() Function bloggers have read three articles in detail read_json、read_excel and read_sql:

Pandas obtain SQL database read_sql() Functions and parameters in detail + The sample code

Pandas Handle JSON file read_json() A detailed explanation + Code display

Pandas in read_excel Detailed explanation of the use of function parameters + The sample code

Looking at the entire data source path , The most commonly used data storage objects :SQL、JSON、EXCEL And what to explain in detail this time CSV All over the country .  If you can understand the use of the function parameters, you can reduce a lot of subsequent processing DataFrame Data structure code , Only a few... Need to be set read_csv Parameters can be implemented , Therefore, the original intention of this article is to introduce and use this function in detail to achieve the purpose of thorough mastery . I hope readers can ask questions or opinions after reading , The blogger will maintain the blog for a long time and update it in time , Hope you enjoy it . 


Here's the desensitized one user_info.csv Document as a presentation :

One 、 Basic grammar and function

read_csv The basic syntax format is :

pandas.read_csv(filepath_or_buffer, 
                sep=NoDefault.no_default, 
                delimiter=None, 
                header='infer', 
                names=NoDefault.no_default, 
                index_col=None, 
                usecols=None, 
                squeeze=None, 
                prefix=NoDefault.no_default, 
                mangle_dupe_cols=True, 
                dtype=None, 
                engine=None, 
                converters=None, 
                true_values=None, 
                false_values=None, 
                skipinitialspace=False, 
                skiprows=None, 
                skipfooter=0, 
                nrows=None, 
                na_values=None, 
                keep_default_na=True, 
                na_filter=True, 
                verbose=False, 
                skip_blank_lines=True, 
                parse_dates=None, 
                infer_datetime_format=False, 
                keep_date_col=False, 
                date_parser=None, 
                dayfirst=False, 
                cache_dates=True, 
                iterator=False, 
                chunksize=None, 
                compression='infer', 
                thousands=None, 
                decimal='.', 
                lineterminator=None, 
                quotechar='"', 
                quoting=0, 
                doublequote=True, 
                escapechar=None, 
                comment=None, 
                encoding=None, 
                encoding_errors='strict', 
                dialect=None, 
                error_bad_lines=None, 
                warn_bad_lines=None, 
                on_bad_lines=None, 
                delim_whitespace=False, 
                low_memory=True,
                memory_map=False, 
                float_precision=None, 
                storage_options=None)

You can see that there are quite a few parameters , Compared with read_excel、read_json and read_sql Add up to more . Explains the use of csv Files store data several times as often as other record data files , So about csv There are so many processing parameters for files . It's a good thing , When we will csv Convert file to DataFrame Data reprocessing , You need to write a lot of code to deal with , Providing so many parameters can greatly speed up our efficiency in processing files . 

read_csv The basic function of is to csv File to DataFrame Or is it TextParser, It also supports the option of iterating or decomposing files into blocks .

import numpy as np
import pandas as pd
df_csv=pd.read_csv('user_info.csv')

Two 、 Parameter description and code demonstration

The following is the official document , There are too many words. It is recommended to directly click the directory to see :

pandas.read_csv

First, we will understand the function and function of each parameter one by one , Use the example after understanding the meaning of the parameter .

1.filepath_or_buffer

The type of acceptance :{str, path object or file-like object} character string ( It is usually a file name and can only be read under the directory of the currently running program )、 File path ( Generally, absolute path is recommended )、 Or other data types that can be referenced . Such as url,http This type of .

This parameter specifies the path to the read in file . Any valid string path can be accepted . The string can be URL. Effective URL Include http、ftp、s3、gs And documents . For the file URL, Local files can be :file://localhost/path/to/table.csv.
If you want to pass in a path object ,pandas Accept any path .
For class file objects , We use read() Method reference object , For example, file handles ( For example, through the built-in open function ) or StringIO.

df_csv=pd.read_csv('user_info.csv')
df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv')
df_csv=pd.read_csv('C:\\Users\\10799\\test-python\\user_info.csv')

2.sep

The type of acceptance {str, default ‘,’}

csv Generally, the file defaults to ',' To separate each string , and sep Is to specify what type of separator symbol is .

If sep by None, be C The engine cannot automatically detect delimiters , but Python The parsing engine can , It means Python Built in sniffer tool csv The latter will be used and the separator will be automatically detected . Besides , The length exceeds 1 Characters and with “\s+” Different delimiters will be interpreted as regular expressions , And will force the use of Python Parsing engine . Please note that regex Delimiters tend to ignore referenced data .regex Examples of regular expressions :“\r\t”.

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',sep=',')
df_csv

  about sep Characters can refer to escape characters :

Escape character describe
\( When at the tail ) Line continuation operator
\\ Backslash notation
\‘ Single quotation marks
\” Double quotes
\a Ring the bell
\b Backspace (Backspace)
\e escape
\000 empty
\n Line break
\v Vertical tabs
\t Horizontal tabs
\r enter
\f Change the page
\oyy Octal number yy Representative character , for example :\o12 On behalf of the line
\xyy Decimal number yy Representative character , for example :\x0a On behalf of the line
\other The other characters are output in normal format
df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',sep='/t')

  In this way, the experiment will not be repeated , If you are interested, you can try it by yourself .

3.delimiter

The type of acceptance :{str, default None

sep Another name for , Can be set to the desired name , It doesn't make much sense .

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',delimiter=',')
df_csv

 4.header

The type of acceptance :{int, list of int, None, default ‘infer’} Integers 、 List of integers 、None. The default is infer

Specify the row number to use as the column name , And the beginning of the data . The default behavior is to infer column names : If no name is passed , Behavior and header=0 identical , And infer column names from the first line of the file , If you explicitly pass the column name , Behavior and header=None identical . Deliver... Explicitly header=0, So that you can replace the existing name . The title can be a list of integers , Specify the row position of multiple indexes on the column , for example [0,1,3]. Unspecified middle lines will be skipped ( for example , Skip... In this example 2). Please note that , If skip_blank_lines=True, This parameter ignores comment lines and empty lines , therefore header=0 Represents the first row of data , Instead of the first line of the file .

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',header=1)

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',header=[0,1,3])

In this case . The second case will disappear :

 

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',header=None)

 5.names

The type of acceptance :{array-like, optional}

Specify the list of column names to use . If the file contains a header line , Should be passed explicitly header=0 To override the column name . Duplicate... Is not allowed in this list .

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',
header=0,names=['id','time','name1','name2','name3','name4','name5','name6'])

 6.index_col

The type of acceptance :{int, str, sequence of int / str, or False, optional} The default is None.

Specify to use as DataFrame Columns of row labels , Given as a string name or column index . If a given int/str Sequence , Use multiple indexes .
Be careful :index_col=False Can be used to force pandas Do not use the first column as an index , for example , When you have a malformed file , When there is a separator at the end of each line .

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',index_col='user_id')

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',index_col=0)

 

 

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',index_col=[1,'user_id'])

  When index_col by False when , Index columns will not be specified .

7.usecols

The type of acceptance :{list-like or callable, optional}

Returns a subset of the column . If it is in list form , Then all elements must be location elements ( That is, the integer index in the document column ) Or a string corresponding to the column name provided by the user in the name or inferred from the document title row . If given names Parameters , The document title line is not considered . for example , image usecols A valid list like parameter should be [0、1、2] or ['foo',bar',baz'].

The specified list order is not taken into account , And the original csv The columns of the file are in the same original order , therefore usecols=[0,1] And [1,0] identical . If callable , The callable function... Will be calculated based on the column name , The return callable function evaluates to True The name of . An example of a valid callable parameter is ['AAA','BBB','DDD' Medium lambda x:x.upper(). Using this parameter can speed up parsing time and reduce memory usage .

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',usecols=['age_month'])

 

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',usecols=[0,2])

 

 

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',
header=0,
names=['id','time','name1','name2','name3','name4','name5','name6'],
usecols=['id','name1'])

 

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',
usecols=['age_month','first_order_time'])

 

  The given list is visible but the resulting DataFrame The order of is not changed according to the order of the given list .

 8.squeeze

The type of acceptance :{bool, default False}

If the parsed data contains only one column , It returns a sequence .

df_csv=pd.read_csv('user_info.csv',usecols=['age_month'],squeeze=True)

 

9.prefix

The type of acceptance :{str, optional} 

Prefix added to the column number when there is no title , for example “X” Express X0,X1.

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',header=None,prefix='name')

 10.mangle_dupe_cols

The type of acceptance :{bool, default True}

The repeating column will be specified as “X”、“X.1”、“X.N”, instead of “X”…“X”. If there is a duplicate name in the column , Pass in False Will cause the data to be covered .

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',header=0,names=['id','time','name1','name2','name3','name4','name5','name6'],mangle_dupe_cols=False)
ValueError: Setting mangle_dupe_cols=False is not supported yet

Set to False Time is not yet supported .

11.dtype

The type of acceptance :{Type name or dict of column -> type, optional}

Specify the data type of the data or column . for example :{‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} .

take str or object With the right na_values Settings are used together , To preserve rather than interpret data types . If a converter is specified , Will be applied INSTEAD Instead of data type conversion .

df_csv=pd.read_csv('user_info.csv')
df_csv.dtypes

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',dtype={'user_id':'str'})
df_csv.dtypes

 

 12.engine

The type of acceptance :{‘c’, ‘python’, ‘pyarrow’}

Specify the parser engine to use .C and pyarrow The engine is faster , and python The current functions of the engine are more complete . Multithreading currently consists only of pyarrow Engine support .

time_start=time.time()
df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',engine='python')
time_end=time.time()
print('time cost',time_end-time_start,'s')
time cost 1.0299623012542725 s
time_start=time.time()
df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',engine='c')
time_end=time.time()
print('time cost',time_end-time_start,'s')
time cost 0.20297026634216309 s

13.converters

The type of acceptance :{dict, optional}

Function for converting values in some columns . Keys can be integers or column labels .

df_csv=pd.read_csv(r'C:\Users\10799\test-python\user_info.csv',converters={'city_num':lambda x:x})

 14.true_values

The type of acceptance :{list, optional}

Specifies that the list should be treated as true , When read csv When there are too many, this parameter will have BUG, I can't read it :

df_csv=pd.read_csv('user_info.csv',true_values=[' Guangzhou '])

  Guangzhou has not changed , I also tried a lot of values, but it didn't work , The logic of estimating internal parameters is not suitable for large quantities of csv data .

  Just run a simple one :

from io import StringIO
data = ('a,b,c\n1,Yes,2\n3,No,4')
pd.read_csv(StringIO(data),
            true_values=['Yes'], false_values=['No'])

 

  It is not recommended that you use this parameter to adjust TRUE, Write your own lambda Function or replace It's the same , This will cause problems in the overall situation . Only when all the data in a column appears in true_values + false_values Inside , Will be replaced .

15.false_values

The type of acceptance :{list, optional}

Specify that the list be treated as false . Same as the previous parameter , It belongs to useless parameter , Pit man 、 skip . Only when all the data in a column appears in true_values + false_values Inside , Will be replaced .

16.skipinitialspace

The type of acceptance :{bool, default False}

Skip spaces after delimiters .

df_csv=pd.read_csv('user_info.csv',skipinitialspace=True)

  Do not have what difference :

17. skiprows

The type of acceptance :{list-like,int or callable, Optional }

Specify the line number to skip at the beginning of the file (0 Initial index ) Or the number of lines to skip (int).
If callable , Callable functions will be calculated based on the row index , If this line should be skipped , Then return to True, Otherwise return to False. An example of a valid callable parameter is [0,2] Medium lambda x:x.

df_csv=pd.read_csv('user_info.csv',skiprows=[0,1,2,3])

 

  For example, select rows with even rows :

df_csv=pd.read_csv('user_info.csv',skiprows=lambda x :x%2==0)

 18.skipfooter

The type of acceptance :{int, default 0}

Appoint Number of lines at the bottom of the file to skip (engine='c' I won't support it ).

df_csv=pd.read_csv('user_info.csv',skipfooter=1)

Skip the specified number of rows at the bottom :

 19.nrows

The type of acceptance :{int, optional}

Specify the number of file lines to read . For reading large files .

df_csv=pd.read_csv('user_info.csv',nrows=50)

 20.na_values

The type of acceptance :{scalar, str, list-like, or dict, optional}

To identify as NA/NaN Other strings . If dict adopt , Specify each column NA value . By default , The following values are interpreted as  NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

df_csv=pd.read_csv('user_info.csv',na_values='0')

df_csv=pd.read_csv('user_info.csv',na_values=['0','32'])

 21.keep_default_na

The type of acceptance :{bool, default True}

Whether to include the default when parsing data NaN value . Depending on whether or not it's coming in na_values, The behavior is as follows :

  • If keep_default_na by True, And specify that na_value, Will na_value Attached to the default for parsing NaN value .
  • If keep_default_na by True, And it doesn't specify na_value, Only the default NaN Value to parse .
  • If keep_default_na by False, And specify that na_value, Then only the specified NaN value na_value To analyze .
  • If keep_default_na by False, And it doesn't specify na_value, No string will be parsed to NaN.

Be careful , If na_filter As False Pass in ,keep_default_na and na_values Parameters are ignored .

df_csv=pd.read_csv('user_info.csv',keep_default_na=False)

22. na_filter

The type of acceptance :{bool, default True}

Detect missing value tags ( Empty string and na_values Value ). In the absence of any NAs Data in , Pass on na_filter=False It can improve the performance of reading large files .

This parameter is used for tuning , Many stored data have null values , So the default is True appropriate .

23.verbose

The type of acceptance :{bool, default False}

Print the output information of various parsers , Can indicate that... Is placed in a non numeric column NA The amount of value .

df_csv=pd.read_csv('user_info.csv',verbose=True)

 

 24.skip_blank_lines

The type of acceptance :{bool, default True}

If True, Skip the blank line , It is not interpreted as NaN value . Otherwise it would be NaN

df_csv=pd.read_csv('user_info.csv',skip_blank_lines=False)

 25.parse_dates

The type of acceptance :{bool or list of int or names or list of lists or dict, default False}

Parameter selection functions are as follows :

  • bool: If True Then analyze the index .
  • ist of int or names: for example : If [1、2、3] Then try to column 1、2、3 Resolve to separate date columns .
  • list of lists. for example : If [[1,3]] Then combine the 1 Column and the first 3 Column , And resolve to a single date column .
  • dict, for example {'foo':[1,3]}, Column 1,3 Resolve to date and call the result “foo”.

If a column or index cannot be represented as datetimes Array , For example, due to non analyzable values or mixing of time zones , Then the column or index will be returned intact as the object data type . For nonstandard datetime analysis ,pd.read_csv Subsequent processing uses to_datetime. To resolve the index or column of the mixed time zone , Please put date_parser Specified as partially applied pandas.to_datetime() Use utc=True. For more information , See using mixed time zone resolution CSV:https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-csv-mixed-timezones.
Be careful : There is iso8601 Quick path to format date .

df_csv=pd.read_csv('user_info.csv',parse_dates=['first_order_time'])

 

 

df_csv.dtypes

 

 

df_csv=pd.read_csv('user_info.csv',parse_dates=[1])
df_csv.dtypes

 

 26.infer_datetime_format

The type of acceptance :{bool, default False}

If enabled True and parse_dates,pandas Will try to infer... In the column datetime The format of the string , If we can infer , Switch to a faster parsing method . In some cases , This can speed up parsing 5-10 times .

time_start=time.time()
df_csv=pd.read_csv('user_info.csv',parse_dates=[1])
time_end=time.time()
print('time cost',time_end-time_start,'s')
df_csv.dtypes

 

time_start=time.time()
df_csv=pd.read_csv('user_info.csv',parse_dates=[1],infer_datetime_format='%Y/%m/%d %H:%M')
time_end=time.time()
print('time cost',time_end-time_start,'s')
df_csv

 

 27.keep_date_col

The type of acceptance :{bool, default False}

If True and parse_dates Specify to combine multiple columns , Then keep the original column .

  Because if multiple columns specify different time units , After merging, there will be no original columns , Appoint keep_date_col by True when , They will remain .

from io import StringIO
data = ('year,month,day\n2022,6,21\n2022,6,22\n2022,6,23')
pd.read_csv(StringIO(data),parse_dates=[[0,1,2]]
            )

 

from io import StringIO
data = ('year,month,day\n2022,6,21\n2022,6,22\n2022,6,23')
pd.read_csv(StringIO(data),parse_dates=[[0,1,2]],keep_date_col=True
            )

 

 28.date_parser

The type of acceptance :{function, optional}

Specify the function , Used to convert a character string column sequence to datetime Instance array . By default dateutil.parser.parser  To switch .Pandas You will try to call... In three different ways date_parser, If something unusual happens , Will move on to the next way :1) Pass one or more arrays ( from parse_dates Definition ) As a parameter ;2) take parse_dates String values in defined columns are concatenated ( Press the line ) Into an array , And pass the array ; and 3) Use one or more strings ( Corresponding to parse_dates Defined columns ) As a parameter , Call once for each row parse_dates.

Generally speaking, it will be used in read_csv Convert to DataFrame after , Handle datetime Then write the function , But with this parameter, you can directly process the value of the parameter with time after writing the user-defined function in the early stage .

from io import StringIO
from datetime import datetime
def dele_date(dateframe):
    for x in dateframe:
        x=pd.to_datetime(x,format='%Y/%m/%d %H:%M')
        x.strftime('%m/%d/%Y')
    return x
df_csv=pd.read_csv('user_info.csv',parse_dates=['first_order_time'],date_parser=dele_date)
df_csv

29. dayfirst

The type of acceptance :{bool, default False}

Japan / Month format date , International and European formats .

df_csv=pd.read_csv('user_info.csv',parse_dates=['first_order_time'],dayfirst=True)
df_csv

It doesn't work , What form should it be or what form , Unless it is DD/MM Format is useful , Not very useful .

 30.cache_dates

The type of acceptance :{bool, default True}

If True, Use a unique converted date cache to apply datetime conversion . When parsing duplicate date strings , Especially for date strings with time zone offsets , There may be a significant acceleration .

Optimization parameters , To speed up .

31.iterator

The type of acceptance :{bool, default False}

return TextFileReader Object to iterate or get blocks get_chunk().

df_csv=pd.read_csv('user_info.csv',iterator=True)
print(df_csv)
<pandas.io.parsers.readers.TextFileReader object at 0x000002624BBA5848>

32.chunksize

The type of acceptance :{int, optional}

Returns the for iteration TextFileReader object . of iterator  and chunksize For more information , see also IO Tools documentation .

Functional functions , Specify to convert to TextFileReader The number of pieces .

33.compression

The type of acceptance :{str or dict, default ‘infer’}

Used to decompress disk data in real time . If “infer” and “%s” Similar to path , Detect compression from the following extensions :'.gz','.bz2’,”.zip“,”.xz’, or ’.zst’( Otherwise there is no compression ). If you use “zip”,zip The file must contain only one data file to be read in . Set to “None” Indicates no decompression . It can also be a dict, Middle key “method” Set to {'zip',gzip',bz2',zstd} One of , Other key value pairs are forwarded to zipfile.ZipFile,gzip.gzip file ,bz2.BZ2 File or zstandard.ZstdDecompressor. for example , You can use a custom compression dictionary to Zstandard Decompress and pass the following :compression={'method':'zstd','dict_data':my_compression_dict}.

df_csv=pd.read_csv('user_info.csv',compression=None)
df_csv

 34.thousands

The type of acceptance :{str, optional}

Thousand separator .

35.decimal

The type of acceptance :{str, default ‘.’}

Characters to be recognized as decimal points ( for example , For European data use “,”.)

It's usually float All data are decimal points , I think this parameter may be used for encryption .

36.lineterminator

The type of acceptance :{str (length 1), optional}

Characters split the file into lines . Only on C The parser is valid . Set to engine by C:

df_csv=pd.read_csv('user_info.csv',engine='c',lineterminator='2')
df_csv

 37.quotechar

The type of acceptance :{str (length 1), optional}

A character used to indicate the beginning and end of a reference item . Quoted items can contain delimiters , It will be ignored .

38.quoting

The type of acceptance :{int or csv.QUOTE_* instance, default 0}

df_csv=pd.read_csv('user_info.csv',quotechar = '"')
df_csv

  

Every csv Control field reference behavior .QUOTE_* Constant . Use QUOTE\u MINIMAL(0)、QUOTE\u ALL(1)、QUOTE_NONNERIAL(2) or QUOTE_NONE(3) One of .

I feel that the following parameters are added temporarily , It's hard to meet ordinary business needs .

39.doublequote

The type of acceptance :{bool, default True

If you specify quotechar And Quoteching No QUOTE_NONE, Please indicate whether two consecutive in the field quotechar The element is interpreted as a single quotechar Elements .

df_csv=pd.read_csv('user_info.csv',quotechar='"', doublequote=True)
df_csv

 

40.escapechar

The type of acceptance :{str (length 1), optional}

A string used to escape other characters . When quoting by QUOTE_NONE when , Specifies an undelimited value for a character to cause .

41.comment

The type of acceptance :{str, optional}

Indicates that the rest of the line should not be parsed . If you find... At the beginning of a line , This line is completely ignored . This parameter must be a single character . Same as blank line ( as long as skip_blank_lines=True), Fully annotated lines are ignored by the parameter header , Rather than being skiprows Ignore . for example , If the note =“#”, The analysis is titled 0 Of #empty\na、b、c\n1、2、3 Will lead to “a、b、c” Be treated as a title .

df_csv=pd.read_csv('user_info.csv',sep=',', comment='#', skiprows=1)
df_csv

 42.encoding

The type of acceptance :{str, optional}

read / Write for UTF The coding ( for example “UTF-8”).

43.encoding_errors

The type of acceptance :{str, optional, default “strict”}

Handling encoding errors .

44.dialect

The type of acceptance :{str or csv.Dialect, optional}

Provided , This parameter will override the values of the following parameters ( Default or non default ):delimiter、doublequote、escapechar、skipinitialspace、quotechar and quoting. If you need to override the value , Will send out ParserWarning. Please see the csv. If no specific language is specified , If sep Larger than one character is ignored . Dialect documentation for more details .

45. error_bad_lines

The type of acceptance :{bool, optional, default None

By default , Rows with too many fields ( for example , Too many commas csv That's ok ) Exception will be thrown , And will not return DataFrame. If False, These... Will be deleted from the returned data frame “ Wrong line ”. This is a good way to handle erroneous data :

46.warn_bad_lines

The type of acceptance :{bool, optional, default None

If error_bad_lines by False,warn_bad_lines by True, Each... Will be output “bad line” Warning of .

47.on_bad_lines

The type of acceptance :{{‘error’, ‘warn’, ‘skip’} or callable, default ‘error’}

Specify the error line encountered ( Rows with too many fields ) Is the operation to be performed . The allowed values are :

  • “error”, An exception is thrown when an error line is encountered .
  • “ Warning ”, Warn when an error line is encountered and skip that line .
  • “ skip ”, Skip error lines when they are encountered without raising or warning .
df_csv=pd.read_csv('http://localhost:8889/edit/test-python/user_info.csv',sep=',',on_bad_lines='skip')
df_csv

 48.delim_whitespace

The type of acceptance :{bool, default False}

Specifies whether spaces will be ( for example “.” or “”) Used as a sep. It's equivalent to setting sep=“\s+”. If this option is set to True, Should not be delimiter Parameter passes in anything .

df_csv=pd.read_csv('user_info.csv',delim_whitespace=True)
df_csv

 

Here is a space in the time that causes all to separate . Set up sep=“\s+” It's the same .

49.low_memory

The type of acceptance :{bool, default True}

Process files internally in blocks , This reduces memory usage during parsing , But it may be a mixed type inference . Make sure there are no mixed types , Please set up False, Or use dtype Parameter specified type . Please note that , The entire file is read into a single data frame , No matter how , Please use chunksize or iterator Parameters return data in blocks .( Only on C The parser is valid ).

50.memory_map

The type of acceptance :{bool, default False}

If filepath_or_buffer Provides filepath, Please map the file object directly to memory , And access data directly from memory . Use this option to improve performance , Because there is no longer any I/O expenses .

51.float_precision

The type of acceptance :{str, optional}

Appoint C Which converter should the engine use for floating point values . The options for a normal converter are none or “ high ”, The options for the original low precision panda converter are “ Tradition ”, The options for the round trip converter are “ Back and forth ”.

52.storage_options

Additional options that make sense for specific storage connections , For example, host 、 port 、 user name 、 Password etc. . about HTTP(S)URL, Key value pairs are forwarded to as header options urllib. For others URL( for example , With “s3://” and “gcs://”) start ), Forward key value pairs to fsspec. For more details , see also fsspec and urllib.

Focus , Prevent losing , If there is a mistake , Please leave a message , Thank you very much

That's what this issue is all about . I am a fanstuck , If you have any questions, please leave a message for discussion , See you next time .


copyright notice
author[fanstuck],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/175/202206240433473864.html

Random recommended