Prophet-based pure-time series prediction of high-speed passenger numbers--TOP10 scheme (non-ranking submission scheme)

Prophet-based pure-time series prediction of high-speed passenger numbers--TOP10 (non-ranking submission plan) - Knows

Title address: AI Research Society - to study new knowledge of AI industry, University and research, to help AI academic developers grow.

Introduction to the competition: Give the passenger flow data of a high-speed rail line from 08-25 to 06-25, and ask to forecast the number of passengers from 06-25 to 24-09-25 in the morning or afternoon. When predicting the number of passengers in the morning, the afternoon data will be given, and the morning data will be given when predicting the afternoon data.

Score for questions: 100-MAE

Data Understanding: From the black bold part of the introduction to the title, we can see that there is a problem with the data leaking (or training data traversing) in the future (within the prediction period), so it has little practical significance.

Online score: 42.6881

Top Tips:

1. The overall trend of historical data is increasing and there is a large gap between them. Training with historical data that is too far away will reduce the predictive power of the model. Only data from June 10, 2014 will be used for training modeling in this scheme.

2. Because of the missing morning or afternoon data in the training data, it is better to use "Morning" and "Afternoon" to model predictions separately. Quote the morning and afternoon because 8 am to 20 pm were chosen as "Morning" and the rest of the time as "Afternoon" in the actual operation.The reason for this is that 8:00 to 22:00 is a relatively complete process that monotonically increases to a peak and then decreases (in the words of 4th graders of Dalai Primary, obeys Poisson distribution). You can draw your own pictures to understand it or find a better way to divide it.

3. After 1-2 steps of operation + parameter optimization, the score can reach 39+. On this basis, label can do boxcox transformation for more than 3 points, up to 42+.

The following is the code explanation section:

Import related packages

import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
from fbprophet import Prophet
import matplotlib.pyplot as plt
from scipy import stats
from scipy.special import inv_boxcox

Data Loading and Processing

#Reading training data
rese_df = pd.read_csv('../data/train.csv', names =['ds', 'y'], header=0)
rese_df['ds'] = rese_df['ds'].astype('datetime64[ns]')
rese_df['y'] = rese_df['y'].astype(int)

#Read test data
test = pd.read_csv('../data/test.csv', names =['id', 'ds'], header=0)
test['ds'] = test['ds'].astype('datetime64[ns]')

#Constructing prediction time points
test_df = test[['ds']].copy()

#Submitting data on construction lines
subs = test[['id']].copy()

Data Split

hours = np.arange(8, 21)
#Split Training Data
rese_df_mor = rese_df[rese_df['ds'].dt.hour.isin( hours) ]
rese_df_aft = rese_df[~rese_df['ds'].dt.hour.isin(hours) ]
#Split Forecast Point in Time
test_df_mor = test_df[test_df['ds'].dt.hour.isin( hours) ]
test_df_aft = test_df[~test_df['ds'].dt.hour.isin(hours) ]

Separate Modeling

#Modeling in the morning and afternoon
#Select data from 20140610 onwards as training data
cut_off = pd.date_range(start='06/10/2014', freq='1M', periods=1)

#Morning Part Modeling Prediction
tail_df = rese_df_mor[ rese_df_mor['ds']>cut_off[len(cut_off)-1] ].copy()
#y does boxcox transformation
tail_df.loc[tail_df['y']==0, 'y'] = tail_df['y'].mean()
xt, fitted_lambda_mor = stats.boxcox(tail_df['y'])
tail_df['y'] = xt

m = Prophet(yearly_seasonality=False
            , daily_seasonality=True
            , weekly_seasonality=True
            , seasonality_mode='multiplicative'
            , interval_width=0.95
            , changepoint_range=0.95
            , changepoint_prior_scale=0.1
           )
m.fit(tail_df)
preds_mor = m.predict(test_df_mor)['yhat']

#Afternoon Part Modeling Prediction
tail_df = rese_df_aft[ rese_df_aft['ds']>cut_off[len(cut_off)-1] ].copy()
#y does boxcox transformation
tail_df.loc[tail_df['y']==0, 'y'] = tail_df['y'].mean()
xt, fitted_lambda_aft = stats.boxcox(tail_df['y'])
tail_df['y'] = xt

m = Prophet(yearly_seasonality=False
            , daily_seasonality=True
            , weekly_seasonality=True
            , seasonality_mode='multiplicative'
            , interval_width=0.95
            , changepoint_range=0.95
            , changepoint_prior_scale=0.1
           )
m.fit(tail_df)
preds_aft = m.predict(test_df_aft)['yhat']

Generate online submission files

#Inverse boxcox transformation of predicted values
test_df_mor['y'] = inv_boxcox( np.array( preds_mor ), fitted_lambda_mor)
test_df_aft['y'] = inv_boxcox( np.array( preds_aft ), fitted_lambda_aft)
#Merge results
result = pd.concat([test_df_mor, test_df_aft]).sort_values(by=['ds'], ascending=True)

#Generate online submission files
subs['y'] = result['y']
subs.to_csv('../subs/prophet39_boxcox.csv', index=None,header=False)

Published in 2020-11-16

Tags: Machine Learning AI Deep Learning

Posted on Mon, 20 Sep 2021 17:44:06 -0400 by sir nitr0z