AI For Trading:Sector Neutral Exercise (62)

Sector Neutral

Install packages

import sys
!{sys.executable} -m pip install -r requirements.txt
Collecting alphalens==0.3.2 (from -r requirements.txt (line 1))
[?25l  Downloading (18.9MB)
[K    100% |████████████████████████████████| 18.9MB 172kB/s eta 0:00:01
[?25hCollecting colour==0.1.5 (from -r requirements.txt (line 2))
Collecting tqdm==4.19.5 (from -r requirements.txt (line 15))
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
import cvxpy as cvx
import numpy as np
import pandas as pd
import time
import os
import quiz_helper
import matplotlib.pyplot as plt
%matplotlib inline'ggplot')
plt.rcParams['figure.figsize'] = (14, 8)

following zipline bundle documentation

data bundle

import os
import quiz_helper
from import bundles
os.environ['ZIPLINE_ROOT'] = os.path.join(os.getcwd(), '..', '..','data','module_4_quizzes_eod')
ingest_func = bundles.csvdir.csvdir_equities(['daily'], quiz_helper.EOD_BUNDLE_NAME)
bundles.register(quiz_helper.EOD_BUNDLE_NAME, ingest_func)
print('Data Registered')
Data Registered

Build pipeline engine

from zipline.pipeline import Pipeline
from zipline.pipeline.factors import AverageDollarVolume
from zipline.utils.calendars import get_calendar

universe = AverageDollarVolume(window_length=120).top(500) 
trading_calendar = get_calendar('NYSE') 
bundle_data = bundles.load(quiz_helper.EOD_BUNDLE_NAME)
engine = quiz_helper.build_pipeline_engine(bundle_data, trading_calendar)

FileNotFoundError                         Traceback (most recent call last)

~/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/bundles/ in most_recent_data(bundle_name, timestamp, environ)
    479             candidates = os.listdir(
--> 480                 pth.data_path([bundle_name], environ=environ),
    481             )

FileNotFoundError: [Errno 2] No such file or directory: '/Users/kaiyiwang/Code/AI/UDA/AITND/Alpha_Factor/../../data/module_4_quizzes_eod/data/m4-quiz-eod-quotemedia'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)

<ipython-input-7-5e2d7c96a570> in <module>()
      5 universe = AverageDollarVolume(window_length=120).top(500)
      6 trading_calendar = get_calendar('NYSE')
----> 7 bundle_data = bundles.load(quiz_helper.EOD_BUNDLE_NAME)
      8 engine = quiz_helper.build_pipeline_engine(bundle_data, trading_calendar)

~/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/bundles/ in load(name, environ, timestamp)
    519         if timestamp is None:
    520             timestamp = pd.Timestamp.utcnow()
--> 521         timestr = most_recent_data(name, timestamp, environ=environ)
    522         return BundleData(
    523             asset_finder=AssetFinder(

~/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/bundles/ in most_recent_data(bundle_name, timestamp, environ)
    495                 'maybe you need to run: $ zipline ingest -b {bundle}'.format(
    496                     bundle=bundle_name,
--> 497                     timestamp=timestamp,
    498                 ),
    499             )

ValueError: no data for bundle 'm4-quiz-eod-quotemedia' on or before 2019-05-03 03:00:25.193775+00:00
maybe you need to run: $ zipline ingest -b m4-quiz-eod-quotemedia

View Data¶

With the pipeline engine built, let's get the stocks at the end of the period in the universe we're using. We'll use these tickers to generate the returns data for the our risk model.

universe_end_date = pd.Timestamp('2016-01-05', tz='UTC')

universe_tickers = engine\


Get Returns data

from import DataPortal

data_portal = DataPortal(

Get pricing data helper function

def get_pricing(data_portal, trading_calendar, assets, start_date, end_date, field='close'):
    end_dt = pd.Timestamp(end_date.strftime('%Y-%m-%d'), tz='UTC', offset='C')
    start_dt = pd.Timestamp(start_date.strftime('%Y-%m-%d'), tz='UTC', offset='C')

    end_loc = trading_calendar.closes.index.get_loc(end_dt)
    start_loc = trading_calendar.closes.index.get_loc(start_dt)

    return data_portal.get_history_window(
        bar_count=end_loc - start_loc,

get pricing data into a dataframe

returns_df = \
        universe_end_date - pd.DateOffset(years=5),
    .pct_change()[1:].fillna(0) #convert prices into returns


Sector data helper function

We'll create an object for you, which defines a sector for each stock. The sectors are represented by integers. We inherit from the Classifier class. Documentation for Classifier, and the source code for Classifier

from zipline.pipeline.classifiers import Classifier
from zipline.utils.numpy_utils import int64_dtype
class Sector(Classifier):
    dtype = int64_dtype
    window_length = 0
    inputs = ()
    missing_value = -1

    def __init__(self): = np.load('../../data/project_4_sector/data.npy')

    def _compute(self, arrays, dates, assets, mask):
        return np.where(
sector = Sector()

Quiz 1

How many unique sectors are in the sector variable?

Answer 1

There are 11 sector categories.
-1 represents missing values. There are categories 0 to 10

print(f"set of unique categories: {set(}")

Create an alpha factor based on momentum

We want to calculate the one-year return.
In other words, get the close price of today, minus the close price of 252 trading days ago, and divide by that price from 252 days ago.

$1YearReturnt = \frac{price{t} - price{t-252}}{price{t-252}}$

from zipline.pipeline.factors import Returns

We'll use 2 years of data to calculate the factor

Note: Going back 2 years falls on a day when the market is closed. Pipeline package doesn't handle start or end dates that don't fall on days when the market is open. To fix this, we went back 2 extra days to fall on the next day when the market is open.

factor_start_date = universe_end_date - pd.DateOffset(years=2, days=2)
## 1 year returns can be the basis for an alpha factor
p1 = Pipeline(screen=universe)
rets1 = Returns(window_length=252, mask=universe)
df1 = engine.run_pipeline(p1, factor_start_date, universe_end_date)
#graphviz lets us visualize the pipeline
import graphviz

View the data of the factor


Explore the demean function

The Returns class inherits from zipline.pipeline.factors.factor.
The documentation for demean is located here, and is also pasted below:

demean(mask=sentinel('NotSpecified'), groupby=sentinel('NotSpecified'))[source]
Construct a Factor that computes self and subtracts the mean from row of the result.

If mask is supplied, ignore values where mask returns False when computing row means, and output NaN anywhere the mask is False.

If groupby is supplied, compute by partitioning each row based on the values produced by groupby, de-meaning the partitioned arrays, and stitching the sub-results back together.

mask (zipline.pipeline.Filter, optional) – A Filter defining values to ignore when computing means.
groupby (zipline.pipeline.Classifier, optional) – A classifier defining partitions over which to compute means.

Quiz 2

By looking at the documentation, and then the source code for demean, what are two parameters for this function? Which one or ones would you call if you wanted to demean by sector and wish to demean for all values in the chosen universe?

The source code has useful comments to help you answer this question.

Answer 2

We would use the groupby parameter, and we don't need to use the mask parameter, since we are not going to exclude any of the stocks in the universe from the demean calculation.

Quiz 3

Turn 1 year returns into an alpha factor

We can do some processing to convert our signal (1 year return) into an alpha factor. One step is to demean by sector.

  • demean
    For each stock, we want to take the average return of stocks that are in the same sector, and then remove this from the return of each individual stock.

Answer 3

# create a pipeline called p2
p2 = Pipeline(screen=universe)
# create a factor of one year returns, deman by sector
factor_demean_by_sector = (
    Returns(window_length=252, mask=universe).
    demean(groupby=Sector()) #we use the custom Sector class that we reviewed earlier
# add the factor to the p2 pipeline
p2.add(factor_demean_by_sector, 'Momentum_1YR_demean_by_sector')

visualize the second pipeline


Quiz 4

How does this pipeline compare with the first pipeline that we created earlier?

Answer 4

The second pipeline now adds sector information in the GroupedRowTransform('demean') step.

run pipeline and view the factor data

df2 = engine.run_pipeline(p2, factor_start_date, universe_end_date)