AI For Trading: Exercise:ARMA and ARIMA (24)

自回归滑动平均模型(英语:Autoregressive moving average model,简称:ARMA模型);差分自回归滑动平均模型(ARIMA模型)

Autoregressive moving average

Install packages

import sys
!{sys.executable} -m pip install -r requirements.txt
Collecting statsmodels==0.9.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/85/d1/69ee7e757f657e7f527cbf500ec2d295396e5bcec873cf4eb68962c41024/statsmodels-0.9.0-cp36-cp36m-manylinux1_x86_64.whl (7.4MB)
[K    100% |████████████████████████████████| 7.4MB 63kB/s  eta 0:00:01    50% |████████████████                | 3.7MB 38.9MB/s eta 0:00:01
[?25hRequirement already satisfied: colour==0.1.5 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 2))
Collecting numpy==1.14.5 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/68/1e/116ad560de97694e2d0c1843a7a0075cc9f49e922454d32f49a80eb6f1f2/numpy-1.14.5-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
[K    100% |████████████████████████████████| 12.2MB 37kB/s  eta 0:00:01    95% |██████████████████████████████▌ | 11.6MB 28.5MB/s eta 0:00:01
[?25hCollecting pandas==0.21.1 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/3a/e1/6c514df670b887c77838ab856f57783c07e8760f2e3d5939203a39735e0e/pandas-0.21.1-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
[K    100% |████████████████████████████████| 26.2MB 17kB/s  eta 0:00:01  2% |▉                               | 645kB 30.9MB/s eta 0:00:01    56% |██████████████████              | 14.8MB 33.0MB/s eta 0:00:01    81% |██████████████████████████▏     | 21.4MB 31.4MB/s eta 0:00:01
[?25hCollecting plotly==2.2.3 (from -r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/99/a6/8214b6564bf4ace9bec8a26e7f89832792be582c042c47c912d3201328a0/plotly-2.2.3.tar.gz (1.1MB)
[K    100% |████████████████████████████████| 1.1MB 430kB/s eta 0:00:01
[?25hCollecting scipy==1.0.0 (from -r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/d8/5e/caa01ba7be11600b6a9d39265440d7b3be3d69206da887c42bef049521f2/scipy-1.0.0-cp36-cp36m-manylinux1_x86_64.whl (50.0MB)
[K    100% |████████████████████████████████| 50.0MB 9.3kB/s eta 0:00:01  3% |█▎                              | 1.9MB 28.5MB/s eta 0:00:02    12% |████                            | 6.1MB 29.0MB/s eta 0:00:02    14% |████▋                           | 7.2MB 23.8MB/s eta 0:00:02    19% |██████▎                         | 9.7MB 26.0MB/s eta 0:00:02    34% |███████████                     | 17.3MB 24.6MB/s eta 0:00:02    39% |████████████▋                   | 19.7MB 23.0MB/s eta 0:00:02    44% |██████████████▏                 | 22.2MB 23.6MB/s eta 0:00:02    46% |███████████████                 | 23.5MB 22.5MB/s eta 0:00:02    49% |███████████████▊                | 24.6MB 24.9MB/s eta 0:00:02    51% |████████████████▌               | 25.8MB 29.9MB/s eta 0:00:01    53% |█████████████████▎              | 27.0MB 25.8MB/s eta 0:00:01    61% |███████████████████▋            | 30.7MB 27.6MB/s eta 0:00:01    66% |█████████████████████▏          | 33.1MB 26.1MB/s eta 0:00:01    68% |██████████████████████          | 34.4MB 25.4MB/s eta 0:00:01    73% |███████████████████████▌        | 36.8MB 23.6MB/s eta 0:00:01    75% |████████████████████████▎       | 38.0MB 27.4MB/s eta 0:00:01    80% |█████████████████████████▉      | 40.4MB 20.2MB/s eta 0:00:01    82% |██████████████████████████▌     | 41.5MB 28.6MB/s eta 0:00:01    85% |███████████████████████████▎    | 42.6MB 21.0MB/s eta 0:00:01    87% |████████████████████████████    | 43.8MB 26.1MB/s eta 0:00:01    93% |██████████████████████████████  | 46.9MB 27.8MB/s eta 0:00:01    98% |███████████████████████████████▌| 49.3MB 26.6MB/s eta 0:00:01
[?25hRequirement already satisfied: scikit-learn==0.19.1 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 7))
Requirement already satisfied: six==1.11.0 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 8))
Collecting seaborn===0.9.0 (from -r requirements.txt (line 9))
  Downloading https://files.pythonhosted.org/packages/a8/76/220ba4420459d9c4c9c9587c6ce607bf56c25b3d3d2de62056efe482dadc/seaborn-0.9.0-py3-none-any.whl (208kB)
[K    100% |████████████████████████████████| 215kB 1.9MB/s eta 0:00:01
[?25hRequirement already satisfied: patsy in /opt/conda/lib/python3.6/site-packages (from statsmodels==0.9.0->-r requirements.txt (line 1))
Requirement already satisfied: python-dateutil>=2 in /opt/conda/lib/python3.6/site-packages (from pandas==0.21.1->-r requirements.txt (line 4))
Requirement already satisfied: pytz>=2011k in /opt/conda/lib/python3.6/site-packages (from pandas==0.21.1->-r requirements.txt (line 4))
Requirement already satisfied: decorator>=4.0.6 in /opt/conda/lib/python3.6/site-packages (from plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: nbformat>=4.2 in /opt/conda/lib/python3.6/site-packages (from plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: requests in /opt/conda/lib/python3.6/site-packages (from plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: matplotlib>=1.4.3 in /opt/conda/lib/python3.6/site-packages (from seaborn===0.9.0->-r requirements.txt (line 9))
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: ipython-genutils in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: jupyter-core in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: traitlets>=4.1 in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: idna<2.7,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.6/site-packages/cycler-0.10.0-py3.6.egg (from matplotlib>=1.4.3->seaborn===0.9.0->-r requirements.txt (line 9))
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib>=1.4.3->seaborn===0.9.0->-r requirements.txt (line 9))
Building wheels for collected packages: plotly
  Running setup.py bdist_wheel for plotly ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/98/54/81/dd92d5b0858fac680cd7bdb8800eb26c001dd9f5dc8b1bc0ba
Successfully built plotly
Installing collected packages: numpy, pandas, statsmodels, plotly, scipy, seaborn
  Found existing installation: numpy 1.12.1
    Uninstalling numpy-1.12.1:
      Successfully uninstalled numpy-1.12.1
  Found existing installation: pandas 0.20.3
    Uninstalling pandas-0.20.3:
      Successfully uninstalled pandas-0.20.3
  Found existing installation: statsmodels 0.8.0
    Uninstalling statsmodels-0.8.0:
      Successfully uninstalled statsmodels-0.8.0
  Found existing installation: plotly 2.0.15
    Uninstalling plotly-2.0.15:
      Successfully uninstalled plotly-2.0.15
  Found existing installation: scipy 0.19.1
    Uninstalling scipy-0.19.1:
      Successfully uninstalled scipy-0.19.1
  Found existing installation: seaborn 0.8.1
    Uninstalling seaborn-0.8.1:
      Successfully uninstalled seaborn-0.8.1
Successfully installed numpy-1.14.5 pandas-0.21.1 plotly-2.2.3 scipy-1.0.0 seaborn-0.9.0 statsmodels-0.9.0
[33mYou are using pip version 9.0.1, however version 19.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
import pandas as pd
import numpy as np
import os
from statsmodels.tsa.arima_model import ARMA
import matplotlib.pyplot as plt
import seaborn as sns
import quiz_tests
sns.set()
#note that for the figure size to show, this cell should be run
#separately from the import of pyplot
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (14, 8)

Simulate return series with autoregressive properties

from statsmodels.tsa.arima_process import ArmaProcess
np.random.seed(200)

ar_params = np.array([1, -0.5])
ma_params = np.array([1, -0.3])
ret = ArmaProcess(ar_params, ma_params).generate_sample(nsample=5*252)

ret = pd.Series(ret)
drift = 100
price = pd.Series(np.cumsum(ret)) + drift
ret.plot(figsize=(15,6), color=sns.xkcd_rgb["pale purple"], title="simulated return series")
plt.show()

file

price.plot(figsize=(15,6), color=sns.xkcd_rgb["baby blue"], title="simulated price series")
plt.show()

file

log returns

lret = np.log(price) - np.log(price.shift(1))
lret = lret[1:]

autocorrelation

Use autocorrelation to get a sense of what lag to use for the autoregressive model.

from statsmodels.graphics.tsaplots import plot_acf
_ = plot_acf(lret,lags=10, title='log return autocorrelation')

file

Since the sample series was simulated to have autoregressive properties, we also see autocorrelation between the current periods and the lags.

Note that with actual stock data, there won't be much autocorrelation of returns from one day to the next.
Stock returns (log returns and normal returns) can be described as a "random walk", in that each new period's value is more or less random.

plot partial autocorrelation

from statsmodels.graphics.tsaplots import plot_pacf

Notice how the partial autocorrelation of price shows that most of the correlation is found in the previous period. Partial autocorrelation is different from autocorrelation in that it shows the influence of each period that is not attributed to the other periods leading up to the current period. In other words, the two-day lag had a fairly strong correlation with the current value because it had a strong correlation with the one-day lag. However, the two-day lag's partial correlation with the current period that isn't attributable to the one-day lag is relatively small.

_ = plot_pacf(lret, lags=10, title='log return Partial Autocorrelation', color=sns.xkcd_rgb["crimson"])

file

Discussion

Notice that there isn't much correlation between previous periods with the current period. In general, using past stock returns to predict future stock returns is rather difficult. Volatility tends to have more of a correlation with past volatility. We'll cover volatility in a later lesson within this module.

Ljung-Box Test

The Ljung-Box test helps us check whether the lag we chose gives autocorrelations that are significantly different from zero. The null hypothesis is that the previous lags as a whole are not correlated with the current period. If the p-value is small enough (say 0.05), we can reject the null and assume that the past lags have some correlation with the current period.


returns:
lbvalue (float or array) – test statistic
pvalue (float or array) – p-value based on chi-square distribution
... (we'll ignore the other outputs, which are for another similar hypothesis test)
from statsmodels.stats.diagnostic import acorr_ljungbox
lb_test_stat, lb_p_value = acorr_ljungbox(lret,lags=20)
lb_p_value
array([2.01640711e-14, 1.24123312e-21, 5.10501473e-22, 1.86446247e-22,
       6.13688232e-22, 2.96811370e-21, 1.18392407e-20, 4.64232373e-20,
       1.78935377e-19, 2.36770725e-19, 5.54712773e-19, 1.32980392e-18,
       3.72359442e-18, 5.86709112e-18, 1.72205886e-17, 4.22143078e-17,
       1.15704571e-16, 9.89290170e-17, 2.59299780e-16, 1.50593115e-16])

Discussion

Since this series was simulated to have autoregressive properties, the Ljung-Box test shows p-values less than 0.05 for the 20 lag periods that we tested.

Fit an ARMA model

For the purpose of familiarizing ourselves with the ARMA model, we'll fit the model to our simulated return series.

We'll just use one lag for the autoregression and one lag for the moving average.
Check out the statsmodel arma documentation.

from statsmodels.tsa.arima_model import ARMA
AR_lag_p = 1
MA_lag_q = 1
order = (AR_lag_p, MA_lag_q)
arma_model = ARMA(lret.values, order=order)
arma_result = arma_model.fit()
arma_pred = pd.Series(arma_result.fittedvalues)

View fitted predictions against actual values

plt.plot(lret, color=sns.xkcd_rgb["pale purple"])
plt.plot(arma_pred, color=sns.xkcd_rgb["dark sky blue"])
plt.title('Log returns and predictions using an ARMA(p=1,q=1) model');
print(f"Fitted AR parameter {arma_result.arparams[0]:.2f}, MA parameter {arma_result.maparams[0]:.2f}")
Fitted AR parameter 0.65, MA parameter -0.45

file

Discussion

In general, autoregressive moving average models are not able to forecast stock returns because stock returns are non-stationary and also quite noisy.

There are other techniques that build upon the concepts of ARMA models, so the goal here was really to help you get familiar with these concepts, as they are the basis for other models that you'll see later in this module.

Quiz: ARIMA

Fit an autoregressive integrated moving average model. Choose an order of integration of 1, autoregresion lag of 1, and moving average lag of 1.

Check out the stats model arima documentation to help you.

from statsmodels.tsa.arima_model import ARIMA
def fit_arima(lret):

    #TODO: choose autoregression lag of 1
    AR_lag_p = 1

    #TODO: choose moving average lag of 1
    MA_lag_q = 1

    #TODO: choose order of integration 1
    order_of_integration_d = 1

    #TODO: Create a tuple of p,d,q
    order = (AR_lag_p, order_of_integration_d, MA_lag_q)

    #TODO: create an ARIMA model object, passing in the values of the lret pandas series,
    # and the tuple containing the (p,d,q) order arguments
    arima_model = ARIMA(lret.values, order=order)

    arima_result = arima_model.fit()

    #TODO: from the result of calling ARIMA.fit(),
    # save and return the fitted values, autoregression parameters, and moving average parameters
    fittedvalues = arima_result.fittedvalues
    arparams = arima_result.arparams
    maparams = arima_result.maparams

    return fittedvalues,arparams,maparams

quiz_tests.test_fit_arima(fit_arima)
Tests Passed
fittedvalues,arparams,maparams = fit_arima(lret)
arima_pred = pd.Series(fittedvalues)
plt.plot(lret, color=sns.xkcd_rgb["pale purple"])
plt.plot(arima_pred, color=sns.xkcd_rgb["jade green"])
plt.title('Log Returns and predictions using an ARIMA(p=1,d=1,q=1) model');
print(f"fitted AR parameter {arparams[0]:.2f}, MA parameter {maparams[0]:.2f}")
fitted AR parameter 0.21, MA parameter -0.98

file

为者常成,行者常至