pandas statistical analysis foundation of Python data analysis 4

pandas time related classes

In most cases, the premise of analyzing time type data is to convert the original time string to standard time type. pandas inherits the time related modules of numpy library and datetime library, and provides six time related classes:

import pandas as pd  # Introducing pandas module

data = pd.read_csv('../data/meal_order_info.csv', encoding='gbk')  # Read csv data
print('Get column names', data.columns)
# D:\Anaconda3\python.exe E:/Python data analysis / code/test.py
# Get the column name index (['info'id ',' emp'id ',' number'consumers', 'mode', 'ding'table'id',
#        'dining_table_name', 'expenditure', 'dishes_count', 'accounts_payable',
#        'use_start_time', 'check_closed', 'lock_time', 'cashier_id', 'pc_id',
#        'order_number', 'org_id', 'print_doc_bill_num', 'lock_table_info',
#        'order_status', 'phone', 'name'],
#       dtype='object')

Class Timestamp
As the most basic and commonly used time class. In most cases, time related strings are converted to Timestamp. pandas provides the to ﹣ datetime function, which can achieve this goal. It is worth noting that there is a limit to Timestamp type time.

import pandas as pd  # Introducing pandas module

data = pd.read_csv('../data/meal_order_info.csv', encoding='gbk')  # Read csv data
data['lock_time'] = pd.to_datetime(data['lock_time'])  # Time related string converted to Timestamp

DatetimeIndex and PeriodIndex classes
In addition to directly converting the data words in the original DataFrame to Timestamp format, you can also extract the data separately and convert it to DatetimeIndex or PeriodIndex.
When converting to PeriodIndex, you need to specify the time interval through the freq parameter. The commonly used time intervals are Y for year, M for month, D for day, H for hour, T for minute, and S for second. The two functions can be used to transform data and create time series data, with very similar parameters.
DatetimeIndex and PeriodIndex functions and their parameter descriptions:
The difference between DatetimeIndex and PeriodIndex is relatively small in the process of daily use. DatetimeIndex is a data structure used to refer to a series of time points, while PeriodIndex is a data structure used to refer to a series of time periods.

import pandas as pd  # Introducing pandas module

data = pd.read_csv('../data/meal_order_info.csv', encoding='gbk')  # Read csv data
data['lock_time'] = pd.to_datetime(data['lock_time'])  # Time related string converted to Timestamp
print(pd.DatetimeIndex(data['use_start_time'].values))
print(pd.PeriodIndex(data['use_start_time'].values, freq='S'))  # In seconds
# D:\Anaconda3\python.exe E:/Python data analysis / code/test.py
# DatetimeIndex(['2016-08-01 11:05:36', '2016-08-01 11:15:57',
#                '2016-08-01 12:42:52', '2016-08-01 12:51:38',
#                '2016-08-01 12:58:44', '2016-08-01 13:15:42',
#                '2016-08-01 13:17:37', '2016-08-01 13:38:27',
#                '2016-08-01 17:06:20', '2016-08-01 17:32:27',
#                ...
#                '2016-08-31 18:05:51', '2016-08-31 18:28:06',
#                '2016-08-31 18:40:56', '2016-08-31 19:14:12',
#                '2016-08-31 20:25:16', '2016-08-31 21:23:48',
#                '2016-08-31 21:24:12', '2016-08-31 21:25:18',
#                '2016-08-31 21:37:39', '2016-08-31 21:41:56'],
#               dtype='datetime64[ns]', length=945, freq=None)
# PeriodIndex(['2016-08-01 11:05:36', '2016-08-01 11:15:57',
#              '2016-08-01 12:42:52', '2016-08-01 12:51:38',
#              '2016-08-01 12:58:44', '2016-08-01 13:15:42',
#              '2016-08-01 13:17:37', '2016-08-01 13:38:27',
#              '2016-08-01 17:06:20', '2016-08-01 17:32:27',
#              ...
#              '2016-08-31 18:05:51', '2016-08-31 18:28:06',
#              '2016-08-31 18:40:56', '2016-08-31 19:14:12',
#              '2016-08-31 20:25:16', '2016-08-31 21:23:48',
#              '2016-08-31 21:24:12', '2016-08-31 21:25:18',
#              '2016-08-31 21:37:39', '2016-08-31 21:41:56'],
#             dtype='period[S]', length=945, freq='S')
# 
# Process finished with exit code 0

Class Timestamp
In the process of most time-related data processing and statistical analysis, it is necessary to extract the year, month and other data in the time. This can be achieved by using the corresponding Timestamp class property.
Combined with Python list derivation, we can extract a column of time information data from DataFrame.

import pandas as pd  # Introducing pandas module

data = pd.read_csv('../data/meal_order_info.csv', encoding='gbk')  # Read csv data
data['lock_time'] = pd.to_datetime(data['lock_time'])  # Time related string converted to Timestamp
a = data.loc[0, 'lock_time']
print('Zeroth elements lock_time The data for the column is:', a)
print('The year of the data is' + str(a.year) + 'year')
print('This is the first in a year' + str(a.dayofyear) + 'day')
print('That's the data of the week', a.weekday_name)
# D:\Anaconda3\python.exe E:/Python data analysis / code/test.py
# E:/Python data analysis / code/test.py:9: FutureWarning: `weekday_name` is deprecated and will be removed in a future version. Use `day_name` instead
# The data of lock u time column in row 0 is: 2016-08-01 11:11:46
#   print('the data is in the week ', a.weekday_name)
# The year of this data is 2016
# This is the 214th day of the year
# The data is Monday in a week
# 
# Process finished with exit code 0

Class Timedelta
Timedelta is a heterogeneous class in time related classes. It can not only use positive numbers, but also use negative numbers to represent unit time, such as 1 second, 2 minutes, 3 hours, etc. Using the timedelta class, together with the regular time related classes, the arithmetic operation of time can be easily realized. At present, there are no years and months in the time period in the timedelta function. All cycle names, corresponding units and their descriptions are as follows:

With Timedelta, you can easily add or subtract a period of time at a certain time.
In addition to using Timedelta to realize time translation, it can also directly subtract two time series, thus obtaining a Timedelta.

import pandas as pd  # Introducing pandas module

data = pd.read_csv('../data/meal_order_info.csv', encoding='gbk')  # Read csv data
data['lock_time'] = pd.to_datetime(data['lock_time'])  # Time related string converted to Timestamp
print(pd.Timedelta.min)  # Limit of negative value
b = data['lock_time'] + pd.Timedelta(days=1, seconds=1)  # Time line plus 1 day and 1 second
print(b)
c = data['lock_time'] - pd.to_datetime('2019-5-20 17:55:00')  # Time line - May 20, 2019 17:55:00
print(c)

END

Published 24 original articles, won praise 1, visited 827
Private letter follow

Tags: Python encoding

Posted on Mon, 09 Mar 2020 07:08:32 -0400 by mulysa