AI For Trading: Risk Factor Models (51)

In this lesson, we will focus on risk factors and the fundamentals of the risk factor model. Risk factor models are used to describe the volatility or risk of assets such as stocks based on the movements of the risk factors.

install libraries

Install python libraries for all code exercises in this lesson

Hi friends! The code exercises use zipline, and it takes 8 to 9 minutes to install the packages the first time you get to the first jupyter notebook. To help limit your wait on the installation during your first code exercise, please run the installation now in a separate tab. So, open this page in a new tab on your browser, and run the cell in the notebook below:

import sys
!{sys.executable} -m pip install -r requirements.txt

Continue on with the next video while it's installing. By the time you get to the first coding exercise, if the installation has finished, you'll be able to work on the exercise without waiting for the installation to complete. You should only have to wait once for the installation for all the coding exercises in this lesson. After that, your workspace should have all the libraries installed as you get to the later code exercises.

If you notice that you're waiting 8 to 9 minutes on the pip install code cell for every code exercise, please add a ticket to the waffleboard https://waffle.io/udacity/aitnd-issues so that I can follow up. Thanks, and happy studying!

Troubleshooting
Note, if you're seeing a message that says "404", please click on the orange jupyter icon to see the directory of all notebooks. Then go to the bottom and choose "menu" -> "reset data".

requirements.txt

numpy==1.14.5
pandas==0.18.1
plotly==2.2.3
scikit-learn==0.19.1
six==1.11.0
zipline===1.2.0

quiz_helper.py

import numpy as np
import pandas as pd
import time
from zipline.assets._assets import Equity  # Required for USEquityPricing
from zipline.pipeline.data import USEquityPricing
from zipline.pipeline.classifiers import Classifier
from zipline.pipeline.engine import SimplePipelineEngine
from zipline.pipeline.loaders import USEquityPricingLoader
from zipline.utils.numpy_utils import int64_dtype

EOD_BUNDLE_NAME = 'm4-quiz-eod-quotemedia'

class PricingLoader(object):
    def __init__(self, bundle_data):
        self.loader = USEquityPricingLoader(
            bundle_data.equity_daily_bar_reader,
            bundle_data.adjustment_reader)

    def get_loader(self, column):
        if column not in USEquityPricing.columns:
            raise Exception('Column not in USEquityPricing')
        return self.loader

class Sector(Classifier):
    dtype = int64_dtype
    window_length = 0
    inputs = ()
    missing_value = -1

    def __init__(self):
        self.data = np.load('../../data/project_4_sector/data.npy')

    def _compute(self, arrays, dates, assets, mask):
        return np.where(
            mask,
            self.data[assets],
            self.missing_value,
        )

def build_pipeline_engine(bundle_data, trading_calendar):
    pricing_loader = PricingLoader(bundle_data)

    engine = SimplePipelineEngine(
        get_loader=pricing_loader.get_loader,
        calendar=trading_calendar.all_sessions,
        asset_finder=bundle_data.asset_finder)

    return engine

def get_factor_exposures(factor_betas, weights):
    return factor_betas.loc[weights.index].T.dot(weights)

def get_pricing(data_portal, trading_calendar, assets, start_date, end_date, field='close'):
    end_dt = pd.Timestamp(end_date.strftime('%Y-%m-%d'), tz='UTC', offset='C')
    start_dt = pd.Timestamp(start_date.strftime('%Y-%m-%d'), tz='UTC', offset='C')

    end_loc = trading_calendar.closes.index.get_loc(end_dt)
    start_loc = trading_calendar.closes.index.get_loc(start_dt)

    return data_portal.get_history_window(
        assets=assets,
        end_dt=end_dt,
        bar_count=end_loc - start_loc,
        frequency='1d',
        field=field,
        data_frequency='daily')

Historical Variance

Let's see how we'd be calculating a covariance matrix of assets without the help of a factor model

import sys
!{sys.executable} -m pip install -r requirements.txt
Requirement already satisfied: numpy==1.14.5 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 1))
Requirement already satisfied: pandas==0.18.1 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 2))

Requirement already satisfied: decorator>=4.0.6 in /opt/conda/lib/python3.6/site-packages (from plotly==2.2.3->-r requirements.txt (line 3))
Requirement already satisfied: requests in /opt/conda/lib/python3.6/site-packages (from plotly==2.2.3->-r requirements.txt (line 3))
Requirement already satisfied: intervaltree>=2.1.0 in /opt/conda/lib/python3.6/site-packages (from zipline===1.2.0->-r requirements.txt (line 6))
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 3))
Requirement already satisfied: ipython-genutils in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 3))

Requirement already satisfied: python-editor>=0.3 in /opt/conda/lib/python3.6/site-packages (from alembic>=0.7.7->zipline===1.2.0->-r requirements.txt (line 6))
[33mYou are using pip version 9.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
import numpy as np
import pandas as pd
import time
import os
import quiz_helper
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (14, 8)

data bundle

import os
import quiz_helper
from zipline.data import bundles
os.environ['ZIPLINE_ROOT'] = os.path.join(os.getcwd(), '..', '..','data','module_4_quizzes_eod')
ingest_func = bundles.csvdir.csvdir_equities(['daily'], quiz_helper.EOD_BUNDLE_NAME)
bundles.register(quiz_helper.EOD_BUNDLE_NAME, ingest_func)
print('Data Registered')
Data Registered

Build pipeline engine

from zipline.pipeline import Pipeline
from zipline.pipeline.factors import AverageDollarVolume
from zipline.utils.calendars import get_calendar

universe = AverageDollarVolume(window_length=120).top(500) 
trading_calendar = get_calendar('NYSE') 
bundle_data = bundles.load(quiz_helper.EOD_BUNDLE_NAME)
engine = quiz_helper.build_pipeline_engine(bundle_data, trading_calendar)

View Data露

With the pipeline engine built, let's get the stocks at the end of the period in the universe we're using. We'll use these tickers to generate the returns data for the our risk model.

universe_end_date = pd.Timestamp('2016-01-05', tz='UTC')

universe_tickers = engine\
    .run_pipeline(
        Pipeline(screen=universe),
        universe_end_date,
        universe_end_date)\
    .index.get_level_values(1)\
    .values.tolist()

universe_tickers
[Equity(0 [A]),
 Equity(1 [AAL]),
 Equity(2 [AAP]),
 Equity(3 [AAPL]),
 Equity(4 [ABBV]),
 Equity(5 [ABC]),
 Equity(6 [ABT]),
 Equity(7 [ACN]),
 Equity(8 [ADBE]),
 Equity(9 [ADI]),
 Equity(10 [ADM]),
 Equity(11 [ADP]),
 Equity(12 [ADS]),
 Equity(13 [ADSK]),
 Equity(14 [AEE]),
 Equity(15 [AEP]),
 Equity(16 [AES]),
 Equity(17 [AET]),
 Equity(18 [AFL]),
 Equity(19 [AGN]),
 Equity(20 [AIG]),
 Equity(21 [AIV]),
 Equity(22 [AIZ]),
 Equity(23 [AJG]),
 Equity(24 [AKAM]),
 Equity(25 [ALB]),
 Equity(26 [ALGN]),
 Equity(27 [ALK]),
 Equity(28 [ALL]),
 Equity(29 [ALLE]),
 Equity(30 [ALXN]),
 Equity(31 [AMAT]),
 Equity(32 [AMD]),
 Equity(33 [AME]),
 Equity(34 [AMG]),
 Equity(35 [AMGN]),
 Equity(36 [AMP]),
 Equity(37 [AMT]),
 Equity(38 [AMZN]),
 Equity(39 [ANDV]),
 Equity(40 [ANSS]),
 Equity(41 [ANTM]),
 Equity(42 [AON]),
 Equity(43 [AOS]),
 Equity(44 [APA]),
 Equity(45 [APC]),
 Equity(46 [APD]),
 Equity(47 [APH]),
 Equity(48 [ARE]),
 Equity(49 [ARNC]),
 Equity(50 [ATVI]),
 Equity(51 [AVB]),
 Equity(52 [AVGO]),
 Equity(53 [AVY]),
 Equity(54 [AWK]),
 Equity(55 [AXP]),
 Equity(56 [AYI]),
 Equity(57 [AZO]),
 Equity(58 [BA]),
 Equity(59 [BAC]),
 Equity(60 [BAX]),
 Equity(61 [BBT]),
 Equity(62 [BBY]),
 Equity(63 [BCR]),
 Equity(64 [BDX]),
 Equity(65 [BEN]),
 Equity(66 [BIIB]),
 Equity(67 [BK]),
 Equity(68 [BLK]),
 Equity(69 [BLL]),
 Equity(70 [BMY]),
 Equity(71 [BSX]),
 Equity(72 [BWA]),
 Equity(73 [BXP]),
 Equity(74 [C]),
 Equity(75 [CA]),
 Equity(76 [CAG]),
 Equity(77 [CAH]),
 Equity(78 [CAT]),
 Equity(79 [CB]),
 Equity(80 [CBG]),
 Equity(81 [CBOE]),
 Equity(82 [CBS]),
 Equity(83 [CCI]),
 Equity(84 [CCL]),
 Equity(85 [CELG]),
 Equity(86 [CERN]),
 Equity(87 [CF]),
 Equity(88 [CFG]),
 Equity(89 [CHD]),
 Equity(90 [CHK]),
 Equity(91 [CHRW]),
 Equity(92 [CHTR]),
 Equity(93 [CI]),
 Equity(94 [CINF]),
 Equity(95 [CL]),
 Equity(96 [CLX]),
 Equity(97 [CMA]),
 Equity(98 [CMCSA]),
 Equity(99 [CME]),
 Equity(100 [CMG]),
 Equity(101 [CMI]),
 Equity(102 [CMS]),
 Equity(103 [CNC]),
 Equity(104 [CNP]),
 Equity(105 [COF]),
 Equity(106 [COG]),
 Equity(107 [COL]),
 Equity(108 [COO]),
 Equity(109 [COP]),
 Equity(110 [COST]),
 Equity(111 [COTY]),
len(universe_tickers)
490
from zipline.data.data_portal import DataPortal

data_portal = DataPortal(
    bundle_data.asset_finder,
    trading_calendar=trading_calendar,
    first_trading_day=bundle_data.equity_daily_bar_reader.first_trading_day,
    equity_minute_reader=None,
    equity_daily_reader=bundle_data.equity_daily_bar_reader,
    adjustment_reader=bundle_data.adjustment_reader)

Get pricing data helper function

from quiz_helper import get_pricing

get pricing data into a dataframe

returns_df = \
    get_pricing(
        data_portal,
        trading_calendar,
        universe_tickers,
        universe_end_date - pd.DateOffset(years=5),
        universe_end_date)\
    .pct_change()[1:].fillna(0) #convert prices into returns

returns_df
Equity(0 [A]) Equity(1 [AAL]) Equity(2 [AAP]) Equity(3 [AAPL]) Equity(4 [ABBV]) Equity(5 [ABC]) Equity(6 [ABT]) Equity(7 [ACN]) Equity(8 [ADBE]) Equity(9 [ADI]) ... Equity(481 [XL]) Equity(482 [XLNX]) Equity(483 [XOM]) Equity(484 [XRAY]) Equity(485 [XRX]) Equity(486 [XYL]) Equity(487 [YUM]) Equity(488 [ZBH]) Equity(489 [ZION]) Equity(490 [ZTS])
2011-01-07 00:00:00+00:00 0.008437 0.014230 0.026702 0.007146 0.000000 0.001994 0.004165 0.001648 -0.007127 -0.005818 ... -0.001838 -0.005619 0.005461 -0.004044 -0.013953 0.000000 0.012457 -0.000181 -0.010458 0.000000
2011-01-10 00:00:00+00:00 -0.004174 0.006195 0.007435 0.018852 0.000000 -0.005714 -0.008896 -0.008854 0.028714 0.002926 ... 0.000947 0.007814 -0.006081 0.010466 0.009733 0.000000 0.001440 0.007784 -0.017945 0.000000

1256 rows 脳 490 columns

Quiz 1

Check out the numpy.cov documentation. Then think about what's wrong with the following use of numpy.cov

# What's wrong with this?
annualization_factor = 252
covariance_assets_not_correct = annualization_factor*np.cov(returns_df)
## TODO: Check the shape of the covariance matrix

covariance_assets_not_correct.shape
(1256, 1256)

Answer 1

Notice that the dimensions are 1256 by 1256, which is the number of observations for each stock. We are expecting a matrix of 490 by 490, since that's how many stocks that are stored in the dataframe.

Quiz 2

How can you adjust the input so that we get the desired covariance matrix of assets?

# TODO: calculate the covariance matrix of assets
annualization_factor = 252
covariance_assets = annualization_factor*np.cov(returns_df.T)
covariance_assets.shape
(490, 490)

Answer 2

Since the documentation expects each row to represent a variable (one stock), we can transpose the dataframe before passing it into the numpy.cov function. So now we have a covariance matrix that's 490 by 490.

Visualize the covariance matrix

import seaborn as sns
# view a heatmap of the covariance matrix
sns.heatmap(covariance_assets,cmap='Paired');
## If the colors aren't distinctive, please try a couple of these color schemes:
## cmap = 'tab10'
# cmap = 'Accent'

file

Quiz 3

Looking at the colormap are covariances more likely to be positive or negative? Are covariances likely to be above 0.10 or below 0.10?

Answer 3

The colormap range is mostly positive, from 0 to 0.30+, so covariances are more likely to be positive than negative. In other words, stocks move with the market. Also, the covariances are mostly below 0.10 rather than higher than 0.10.

Fun Quiz!

Do you know what the seaborn visualization package was named after?

Fun Answer!

The seaborn package is named after "Samuel Norman Seaborn", a fictional character from the [TV Series "The West Wing"].(https://en.wikipedia.org/wiki/Sam_Seaborn)

为者常成,行者常至