over 2 years ago

時間序列 Arima 模型

此模型引用網路的教學資料,分析產品價格指數(product price index, ppi)並預測後續可能走勢。是一個非季節相關的Arima model (non-seasonal Arima model)

%matplotlib inline
from __future__ import division
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

讀取原始資料

取出ppi 與 年份,對此原始資料作圖

dta = pd.read_csv('timeseries_ppi.csv')
dta.tail(5)
yearqrt m3 ppi cpi gdp m1nsa ddnsa t lnppi dppi dlnppi lppi trend
164 2001q1 7400.2002 110.43 115.30 9334.5000 1098.13 309.92001 2001q2 4.704382 1.790001 0.016342 108.64 165
165 2001q2 7597.5000 109.20 116.50 9341.7002 1118.85 309.72000 2001q3 4.693181 -1.230003 -0.011201 110.43 166
166 2001q3 7764.6001 106.90 116.66 9310.4004 1156.50 330.98001 2001q4 4.671894 -2.299995 -0.021287 109.20 167
167 2001q4 8067.8999 103.69 116.33 9348.5996 1174.46 334.92999 2002q1 4.641406 -3.209999 -0.030488 106.90 168
168 2002q1 8148.6001 103.40 116.75 9482.0996 1182.14 323.06000 2002q2 4.638605 -0.290001 -0.002800 103.69 169
ydata = dta['ppi']
dates = sm.tsa.datetools.dates_from_range('1960q1','2002q1')
ydata.index = dates
ydata.plot()

自相關函數(ACF) 與偏相關函數(PCF)

Arima 模型中必須確定:

  1. 決定原函數是否為平穩函數?
  2. 若否應選擇第幾階差分(difference)能得到?

原函數

fig = plt.figure(figsize=(8,6))
ax1 = plt.subplot(211)
fig = sm.graphics.tsa.plot_acf(ydata,lags=40,ax=ax1)
ax2 = plt.subplot(212)
fig = sm.graphics.tsa.plot_pacf(ydata,lags=40,ax=ax2)

Dickey-Fuller test

檢查資料y(t)是否為平穩狀態
若p值小於0.05則可確認為

sm.tsa.stattools.adfuller(ydata)[:2]
(-0.77456177567695983, 0.82648040144240387)

一階差分

由於原函數並非穩定函數,所以先作一階差分

y_diff1 = ydata.diff(1)[1:]
y_diff1.plot()
plt.title('first order difference function')

fig = plt.figure(figsize=(8,6))
ax1 = plt.subplot(211)
fig = sm.graphics.tsa.plot_acf(y_diff1,lags=40,ax=ax1)
ax2 = plt.subplot(212)
fig = sm.graphics.tsa.plot_pacf(y_diff1,lags=40,ax=ax2)

Dickey-Fuller test

檢查資料y'(t)y''(t)是否為平穩狀態 -->觀察P value是否小於0.05

print sm.tsa.stattools.adfuller(y_diff1)[:2];
y_diff2 = y_diff1.diff(1)[1:]
print sm.tsa.stattools.adfuller(y_diff2)[:2]
(-4.0477426790655979, 0.0011801482847234568)
(-6.0671303301315698, 1.1744622125962198e-07)

從ACF,PACF可猜測此為對一階差分的AR(1)模型。所以ARIMA模型選擇p=1,d=1,q=0

使用Arima model

y_diff2.index
DatetimeIndex(['1960-09-30', '1960-12-31', '1961-03-31', '1961-06-30',
               '1961-09-30', '1961-12-31', '1962-03-31', '1962-06-30',
               '1962-09-30', '1962-12-31',
               ...
               '1999-12-31', '2000-03-31', '2000-06-30', '2000-09-30',
               '2000-12-31', '2001-03-31', '2001-06-30', '2001-09-30',
               '2001-12-31', '2002-03-31'],
              dtype='datetime64[ns]', length=167, freq=None)

比較ydata在

  • arima(p=1,d=0,q=0) --> Ar(1)模型
    • ar.L1.ppi 約1 -->表示為非穩態
  • arima(p=1,d=1,q=0) --> 對於y'(t)的Ar(1)模型
    • ar.Li.ppi 穩態
    • AIC=392(arima(1,1,0) > AIC=502(arima(1,0,0))
from statsmodels.tsa.arima_model import ARIMA
arima100 = ARIMA(ydata,(1,0,0),freq='Q').fit()
arima110= ARIMA(ydata,[1,1,0],freq='Q').fit()
print arima100.summary()
print "\n"
print arima110.summary()
                              ARMA Model Results                              
==============================================================================
Dep. Variable:                    ppi   No. Observations:                  169
Model:                     ARMA(1, 0)   Log Likelihood                -248.202
Method:                       css-mle   S.D. of innovations              1.029
Date:                Sun, 24 Jan 2016   AIC                            502.404
Time:                        23:02:36   BIC                            511.794
Sample:                    03-31-1960   HQIC                           506.215
                         - 03-31-2002                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const         64.4080     37.989      1.695      0.092       -10.049   138.865
ar.L1.ppi      0.9996      0.000   2015.140      0.000         0.999     1.001
                                    Roots                                    
=============================================================================
                 Real           Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            1.0004           +0.0000j            1.0004            0.0000
-----------------------------------------------------------------------------


                             ARIMA Model Results                              
==============================================================================
Dep. Variable:                  D.ppi   No. Observations:                  168
Model:                 ARIMA(1, 1, 0)   Log Likelihood                -193.395
Method:                       css-mle   S.D. of innovations              0.764
Date:                Sun, 24 Jan 2016   AIC                            392.790
Time:                        23:02:36   BIC                            402.161
Sample:                    06-30-1960   HQIC                           396.593
                         - 03-31-2002                                         
===============================================================================
                  coef    std err          z      P>|z|      [95.0% Conf. Int.]
-------------------------------------------------------------------------------
const           0.4558      0.131      3.486      0.001         0.200     0.712
ar.L1.D.ppi     0.5522      0.064      8.622      0.000         0.427     0.678
                                    Roots                                    
=============================================================================
                 Real           Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            1.8109           +0.0000j            1.8109            0.0000
-----------------------------------------------------------------------------

預測資料

# dates[-5:]

predict_diff1 = arima110.predict('2002q1','2003q4') # predict 1st difference result

# ydata.plot()


y_diff1.plot()
predict_diff1.plot(color = 'green')

predict_ppi = arima110.predict('2002q1','2003q4',typ='levels')
ydata.plot()
predict_ppi.plot(color='red')
<matplotlib.axes._subplots.AxesSubplot at 0x7fbbcf358f90>

判斷估計是否合理

  • 可透過QQplot來判斷預估的值是否滿足高斯分佈
  • 透過殘差的ACF, PCF判斷是否已經是白噪音
from statsmodels.graphics.api import qqplot
resid = arima110.resid
qqplot(resid,line='q',fit=True)

fig = plt.figure(figsize=(8,6))
ax1 = plt.subplot(211)
fig = sm.graphics.tsa.plot_acf(resid,lags=40,ax=ax1)
ax2 = plt.subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid,lags=40,ax=ax2)


由以上可以推論,ARIMA(1,1,0)已經是足夠好的預測模型。

← ARIMA時間模型(非季節性) 字體辨識- notMNIST(SVM) →
 
comments powered by Disqus