current position:Home>Use of pandas timestamp index

Use of pandas timestamp index

2022-02-01 14:28:11 Salted fish Science

「 This is my participation 11 The fourth of the yuegengwen challenge 27 God , Check out the activity details :2021 One last more challenge

Pandas Time stamp index -DatetimeIndex

pd.DatetimeIndex() And TimeSeries The time series

pd.DatetimeIndex() The timestamp index can be generated directly , Support use str、datetime.datetime. The type of a single timestamp is Timestamp, Multiple timestamps are of type DatetimeIndex, Examples are as follows :

rng = pd.DatetimeIndex(['12/1/2017','12/2/2017','12/3/2017','12/4/2017','12/5/2017'])
print(rng,type(rng))
print(rng[0],type(rng[0]))
>>>
DatetimeIndex(['2017-12-01', '2017-12-02', '2017-12-03', '2017-12-04',
               '2017-12-05'],
              dtype='datetime64[ns]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'> 2017-12-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>  Copy code 
What is? TimeSeries The time series ?

With DatetimeIndex by index Of Series, by TimeSries The time series Take a chestnut :

st = pd.Series(np.random.rand(len(rng)), index = rng)
print(st,type(st))
print(st.index)
>>>
2017-12-01    0.081920
2017-12-02    0.921781
2017-12-03    0.489779
2017-12-04    0.257632
2017-12-05    0.805373
dtype: float64 <class 'pandas.core.series.Series'> DatetimeIndex(['2017-12-01', '2017-12-02', '2017-12-03', '2017-12-04', '2017-12-05'], dtype='datetime64[ns]', freq=None)  Copy code 

pd.date_range()- Generate date range

pd.date_range() There are two ways to generate a date range ( The default frequency is day):

  • Starting time (start) + End time (end)
  • Starting time (start)/ End time (end) + Offset (periods)

Take a chestnut :

date1 = pd.date_range('2017/1/1','2017/10/1',normalize=True)
print(date1)
date2 = pd.date_range(start = '1/1/2017', periods = 10)
print(date2)
date3 = pd.date_range(end = '1/30/2017 15:00:00', periods = 10,normalize=True)  #  When added 、 branch 、 second 
print(date3)
>>>
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
               '2017-01-09', '2017-01-10',
               ...
               '2017-09-22', '2017-09-23', '2017-09-24', '2017-09-25',
               '2017-09-26', '2017-09-27', '2017-09-28', '2017-09-29',
               '2017-09-30', '2017-10-01'],
              dtype='datetime64[ns]', length=274, freq='D')
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
               '2017-01-09', '2017-01-10'],
              dtype='datetime64[ns]', freq='D')
DatetimeIndex(['2017-01-21', '2017-01-22', '2017-01-23', '2017-01-24',
               '2017-01-25', '2017-01-26', '2017-01-27', '2017-01-28',
               '2017-01-29', '2017-01-30'],
              dtype='datetime64[ns]', freq='D')
 Copy code 
pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, **kwargs)
 Copy code 

The meanings of common parameters are as follows :

  • start: Starting time
  • end: End time
  • periods: Offset
  • freq: frequency , Default days ,pd.date_range() The default frequency is calendar day ,pd.bdate_range() The default frequency is working days
  • tz: The time zone
  • normalize: Time parameter values are regularized to midnight timestamp
  • closed: The default is None Under the circumstances , Left and right closed ,left Then left closed right open ,right Then open on the left and close on the right

Take chestnuts for example normalize Parameters for practical application :

rng4 = pd.date_range(start = '1/1/2017 15:30', periods = 10, name = 'hello world!', normalize = True)
print(rng4)
>>>
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
               '2017-01-09', '2017-01-10'],
              dtype='datetime64[ns]', name='hello world!', freq='D')
 Copy code 
freq Use (1) - Generation of fixed frequency time series

The foundation is used as follows :

print(pd.date_range('2017/1/1','2017/1/4'))  #  Default freq = 'D': Every calendar day 
print(pd.date_range('2017/1/1','2017/1/4', freq = 'B'))  # B: Every working day 
print(pd.date_range('2017/1/1','2017/1/2', freq = 'H'))  # H: Every hour 
print(pd.date_range('2017/1/1 12:00','2017/1/1 12:10', freq = 'T'))  # T/MIN: Per cent 
print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10', freq = 'S'))  # S: Per second 
print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10', freq = 'L'))  # L: Every millisecond ( One thousandth of a second )
print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10', freq = 'U'))  # U: Every microsecond ( One millionth of a second )
 Copy code 

Advanced use is as follows :

print(pd.date_range('2017/1/1','2017/2/1', freq = 'W-MON'))  
# W-MON: From the specified day of the week , Once a week 
#  Day of the week :MON/TUE/WED/THU/FRI/SAT/SUN

print(pd.date_range('2017/1/1','2017/5/1', freq = 'WOM-2MON'))  
# WOM-2MON: The first few weeks of each month begin to count , This is the second Monday of every month 
 Copy code 
freq Use (2) - Diversify the time series needed to generate

Generate calendar days with the specified frequency :

print(pd.date_range('2017','2018', freq = 'M'))  
print(pd.date_range('2017','2020', freq = 'Q-DEC'))  
print(pd.date_range('2017','2020', freq = 'A-DEC'))
print('------')
# M: The last calendar day of every month 
# Q- month : Specify the month as the end of the quarter , The last calendar day of the last month at the end of each quarter 
# A- month : The last calendar day of the specified month of each year 
#  Month abbreviation :JAN/FEB/MAR/APR/MAY/JUN/JUL/AUG/SEP/OCT/NOV/DEC
#  therefore Q- There are only three situations in the month :1-4-7-10,2-5-8-11,3-6-9-12
 Copy code 

Generate working days of the specified frequency :

print(pd.date_range('2017','2018', freq = 'BM'))  
print(pd.date_range('2017','2020', freq = 'BQ-DEC'))  
print(pd.date_range('2017','2020', freq = 'BA-DEC'))
print('------')
# BM: The last working day of every month 
# BQ- month : Specify the month as the end of the quarter , The last working day of the last month at the end of each quarter 
# BA- month : The last working day of a specified month every year 
 Copy code 

A special time for generating a specified law :

print(pd.date_range('2017','2018', freq = 'MS'))  
print(pd.date_range('2017','2020', freq = 'QS-DEC'))  
print(pd.date_range('2017','2020', freq = 'AS-DEC'))
print('------')
# M: The first calendar day of every month 
# QS- month : Specify the month as the end of the quarter , The first calendar day of the last month at the end of each quarter 
# AS- month : The first calendar day of a specified month every year 

print(pd.date_range('2017','2018', freq = 'BMS'))  
print(pd.date_range('2017','2020', freq = 'BQS-DEC'))  
print(pd.date_range('2017','2020', freq = 'BAS-DEC'))
print('------')
# BMS: The first working day of every month 
# BQS- month : Specify the month as the end of the quarter , The first working day of the last month at the end of each quarter 
# BAS- month : The first working day of a given month every year 
 Copy code 
freq Use (3) - Use of composite frequencies

Generate time series with specified compound frequency :

print(pd.date_range('2017/1/1','2017/2/1', freq = '7D'))  # 7 God 
print(pd.date_range('2017/1/1','2017/1/2', freq = '2h30min'))  # 2 Hours 30 minute 
print(pd.date_range('2017','2018', freq = '2M'))  #  Every interval 2 The first calendar day of the month 
 Copy code 
asfreq - Period frequency conversion

How to modify the time series with the interval frequency of days into the time series with smaller unit interval ?

ts = pd.Series(np.random.rand(4),
              index = pd.date_range('20170101','20170104'))
print(ts)
print(ts.asfreq('4H',method = 'ffill'))
#  Change frequency , Here is D Change it to 4H
# method: Interpolation mode ,None No interpolation ,ffill Fill in with the previous value ,bfill Fill with the following values 
 Copy code 
How to advance / Lag data ?

The chestnuts below / The lagging data moves in numerical terms :

ts = pd.Series(np.random.rand(4),
              index = pd.date_range('20170101','20170104'))
print(ts)
print(ts.shift(2))
print(ts.shift(-2))
print('------')
#  Positive numbers : Move the value back ( lagging ); negative : Value forward ( leading )
>>>
2017-01-01    0.575076
2017-01-02    0.514981
2017-01-03    0.221506
2017-01-04    0.410396
Freq: D, dtype: float64
2017-01-01         NaN
2017-01-02         NaN
2017-01-03    0.575076
2017-01-04    0.514981
Freq: D, dtype: float64
2017-01-01    0.221506
2017-01-02    0.410396
2017-01-03         NaN
2017-01-04         NaN
Freq: D, dtype: float64
 Copy code 

And add freq The offset parameter offsets the previous index timestamp instead of the value :

print(ts.shift(2, freq = 'D'))
print(ts.shift(2, freq = 'T'))
#  add freq Parameters : Shift the timestamp , Instead of shifting the value 
 Copy code 

Pandas period - Period

pd.Period() The founding period

Generate a to 2017-01 Start , Time constructor with month as frequency :

p = pd.Period('2017', freq = 'M')
print(p, type(p))
>>>
2017-01 <class 'pandas._period.Period'>  Copy code 

We can add and subtract integers , Move the cycle as a whole :

p = pd.Period('2017', freq = 'M')
print(p, type(p))
print(p + 1)
print(p - 2)
>>>
2017-02
2016-11
 Copy code 
pd.period_range() Creation period range

Create a specified period range :

prng = pd.period_range('1/1/2011', '1/1/2012', freq='M')
print(prng,type(prng))
>>>
PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06',
             '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12',
             '2012-01'],
            dtype='int64', freq='M') <class 'pandas.tseries.period.PeriodIndex'>  Copy code 

Combined with the above period sequence , Create time series :

ts = pd.Series(np.random.rand(len(prng)), index = prng)
print(ts,type(ts))
print(ts.index)
>>>
2011-01    0.342571
2011-02    0.826151
2011-03    0.370505
2011-04    0.137151
2011-05    0.679976
2011-06    0.265928
2011-07    0.416502
2011-08    0.874078
2011-09    0.112801
2011-10    0.112504
2011-11    0.448408
2011-12    0.851046
2012-01    0.370605
Freq: M, dtype: float64 <class 'pandas.core.series.Series'> PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06', '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12', '2012-01'], dtype='int64', freq='M')  Copy code 
pd.period - asfreq: Frequency conversion

adopt .asfreq(freq, method=None, how=None) Method can convert the previously generated frequency into another frequency

p = pd.Period('2017','A-DEC')
print(p)
print(p.asfreq('M', how = 'start'))  #  It can also be written  how = 's'
print(p.asfreq('D', how = 'end'))  #  It can also be written  how = 'e'
>>>
2017
2017-01
2017-12-31
 Copy code 

asfreq You can also convert TIMESeries Of index:

prng = pd.period_range('2017','2018',freq = 'M')
ts1 = pd.Series(np.random.rand(len(prng)), index = prng)
ts2 = pd.Series(np.random.rand(len(prng)), index = prng.asfreq('D', how = 'start'))
print(ts1.head(),len(ts1))
print(ts2.head(),len(ts2))
 Copy code 

Conversion between timestamp and period

Use pd.to_period()、pd.to_timestamp() It can realize the conversion between timestamp and period .

rng = pd.date_range('2017/1/1', periods = 10, freq = 'M')
prng = pd.period_range('2017','2018', freq = 'M')

ts1 = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts1.head())
print(ts1.to_period().head())
#  The last day of each month , Convert to monthly 

ts2 = pd.Series(np.random.rand(len(prng)), index = prng)
print(ts2.head())
print(ts2.to_timestamp().head())
#  monthly , Convert to the first day of each month 
>>>
2017-01-31    0.125288
2017-02-28    0.497174
2017-03-31    0.573114
2017-04-30    0.665665
2017-05-31    0.263561
Freq: M, dtype: float64
2017-01    0.125288
2017-02    0.497174
2017-03    0.573114
2017-04    0.665665
2017-05    0.263561
Freq: M, dtype: float64
2017-01    0.748661
2017-02    0.095891
2017-03    0.280341
2017-04    0.569813
2017-05    0.067677
Freq: M, dtype: float64
2017-01-01    0.748661
2017-02-01    0.095891
2017-03-01    0.280341
2017-04-01    0.569813
2017-05-01    0.067677
Freq: MS, dtype: float64
 Copy code 

Indexing and slicing of time series

Indexes

The indexing method of time series is also applicable to Dataframe, And in the time series, because it is sorted in time order , Therefore, there is no need to consider the order problem .

Base location index , The method used is similar to the list :


from datetime import datetime

rng = pd.date_range('2017/1','2017/3')
ts = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts.head())

print(ts[0])
print(ts[:2])
>>>
2017-01-01    0.107736
2017-01-02    0.887981
2017-01-03    0.712862
2017-01-04    0.920021
2017-01-05    0.317863
Freq: D, dtype: float64
0.107735945027
2017-01-01    0.107736
2017-01-02    0.887981
Freq: D, dtype: float64
 Copy code 

In addition to the basic location index, there is also a time series label index :

from datetime import datetime

rng = pd.date_range('2017/1','2017/3')
ts = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts['2017/1/2'])
print(ts['20170103'])
print(ts['1/10/2017'])
print(ts[datetime(2017,1,20)])
>>>
0.887980757812
0.712861778966
0.788336674948
0.93070380011
 Copy code 
section

The use of slices is mentioned in the basic position index in the index section above Series according to index Indexing works the same way , Also, the end contains .

rng = pd.date_range('2017/1','2017/3',freq = '12H')
ts = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts['2017/1/5':'2017/1/10'])
>>>
2017-01-05 00:00:00    0.462085
2017-01-05 12:00:00    0.778637
2017-01-06 00:00:00    0.356306
2017-01-06 12:00:00    0.667964
2017-01-07 00:00:00    0.246857
2017-01-07 12:00:00    0.386956
2017-01-08 00:00:00    0.328203
2017-01-08 12:00:00    0.260853
2017-01-09 00:00:00    0.224920
2017-01-09 12:00:00    0.397457
2017-01-10 00:00:00    0.158729
2017-01-10 12:00:00    0.501266
Freq: 12H, dtype: float64


#  Here we can pass in the month and directly get the slice of the whole month 
print(ts['2017/2'].head())
>>>
2017-02-01 00:00:00    0.243932
2017-02-01 12:00:00    0.220830
2017-02-02 00:00:00    0.896107
2017-02-02 12:00:00    0.476584
2017-02-03 00:00:00    0.515817
Freq: 12H, dtype: float64
 Copy code 
Time series of repeated indexes
dates = pd.DatetimeIndex(['1/1/2015','1/2/2015','1/3/2015','1/4/2015','1/1/2015','1/2/2015'])
ts = pd.Series(np.random.rand(6), index = dates)
print(ts)
#  We can go through is_unique Check the value or index Whether to repeat 
print(ts.is_unique,ts.index.is_unique)
>>>
2015-01-01    0.300286
2015-01-02    0.603865
2015-01-03    0.017949
2015-01-04    0.026621
2015-01-01    0.791441
2015-01-02    0.526622
dtype: float64
True False
 Copy code 

According to the above results , It can be seen that in the above time series , There is index(ts.index.is_unique) Repeat, but the value (ts.is_unique) No repetition .

We can solve the problem of index duplication by averaging the corresponding values of repeated indexes through time series :

print(ts.groupby(level = 0).mean())
#  adopt groupby Make groups , The repeated values are treated with the average value here 
>>>
2015-01-01    0.545863
2015-01-02    0.565244
2015-01-03    0.017949
2015-01-04    0.026621
dtype: float64
 Copy code 

copyright notice
author[Salted fish Science],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202011428073130.html

Random recommended