AI For Trading:Project3:Smart Beta and Portfolio Optimization (46)

Smart Beta and Portfolio Optimization

In this project, you will build a smart beta portfolio and compare it to a benchmark index. To find out how well the smart beta portfolio did, you’ll calculate the tracking error against the index. You’ll then build a portfolio by using quadratic programming to optimize the weights. Your code will rebalance this portfolio and calculate turn over to evaluate the performance. You’ll use this metric to find the optimal rebalancing Frequency. For the dataset, we'll be using the end of day from Quotemedia.

Project 3: Smart Beta Portfolio and Portfolio Optimization


Smart beta has a broad meaning, but we can say in practice that when we use the universe of stocks from an index, and then apply some weighting scheme other than market cap weighting, it can be considered a type of smart beta fund. A Smart Beta portfolio generally gives investors exposure or "beta" to one or more types of market characteristics (or factors) that are believed to predict prices while giving investors a diversified broad exposure to a particular market. Smart Beta portfolios generally target momentum, earnings quality, low volatility, and dividends or some combination. Smart Beta Portfolios are generally rebalanced infrequently and follow relatively simple rules or algorithms that are passively managed. Model changes to these types of funds are also rare requiring prospectus filings with US Security and Exchange Commission in the case of US focused mutual funds or ETFs.. Smart Beta portfolios are generally long-only, they do not short stocks.

In contrast, a purely alpha-focused quantitative fund may use multiple models or algorithms to create a portfolio. The portfolio manager retains discretion in upgrading or changing the types of models and how often to rebalance the portfolio in attempt to maximize performance in comparison to a stock benchmark. Managers may have discretion to short stocks in portfolios.

Imagine you're a portfolio manager, and wish to try out some different portfolio weighting methods.

One way to design portfolio is to look at certain accounting measures (fundamentals) that, based on past trends, indicate stocks that produce better results.

For instance, you may start with a hypothesis that dividend-issuing stocks tend to perform better than stocks that do not. This may not always be true of all companies; for instance, Apple does not issue dividends, but has had good historical performance. The hypothesis about dividend-paying stocks may go something like this:

Companies that regularly issue dividends may also be more prudent in allocating their available cash, and may indicate that they are more conscious of prioritizing shareholder interests. For example, a CEO may decide to reinvest cash into pet projects that produce low returns. Or, the CEO may do some analysis, identify that reinvesting within the company produces lower returns compared to a diversified portfolio, and so decide that shareholders would be better served if they were given the cash (in the form of dividends). So according to this hypothesis, dividends may be both a proxy for how the company is doing (in terms of earnings and cash flow), but also a signal that the company acts in the best interest of its shareholders. Of course, it's important to test whether this works in practice.

You may also have another hypothesis, with which you wish to design a portfolio that can then be made into an ETF. You may find that investors may wish to invest in passive beta funds, but wish to have less risk exposure (less volatility) in their investments. The goal of having a low volatility fund that still produces returns similar to an index may be appealing to investors who have a shorter investment time horizon, and so are more risk averse.

So the objective of your proposed portfolio is to design a portfolio that closely tracks an index, while also minimizing the portfolio variance. Also, if this portfolio can match the returns of the index with less volatility, then it has a higher risk-adjusted return (same return, lower volatility).

Smart Beta ETFs can be designed with both of these two general methods (among others): alternative weighting and minimum volatility ETF.


Each problem consists of a function to implement and instructions on how to implement the function. The parts of the function that need to be implemented are marked with a # TODO comment. After implementing the function, run the cell to test it against the unit tests we've provided. For each problem, we provide one or more unit tests from our project_tests package. These unit tests won't tell you if your answer is correct, but will warn you of any major errors. Your code will be checked for the correct solution when you submit it to Udacity.


When you implement the functions, you'll only need to you use the packages you've used in the classroom, like Pandas and Numpy. These packages will be imported for you. We recommend you don't add any import statements, otherwise the grader might not be able to run your code.

The other packages that we're importing are helper, project_helper, and project_tests. These are custom packages built to help you solve the problems. The helper and project_helper module contains utility functions and graph functions. The project_tests contains the unit tests for all the problems.

Install Packages

import sys
!{sys.executable} -m pip install -r requirements.txt
Requirement already satisfied: colour==0.1.5 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 1))
Collecting cvxpy==1.0.3 (from -r requirements.txt (line 2))
  Downloading (880kB)
[K    100% |████████████████████████████████| 880kB 528kB/s eta 0:00:01
[?25hRequirement already satisfied: cycler==0.10.0 in /opt/conda/lib/python3.6/site-packages/cycler-0.10.0-py3.6.egg (from -r requirements.txt (line 3))
Collecting numpy==1.14.5 (from -r requirements.txt (line 4))
  Downloading (12.2MB)
[K    100% |████████████████████████████████| 12.2MB 38kB/s  eta 0:00:01  6% |██                              | 778kB 30.2MB/s eta 0:00:01

Load Packages

import pandas as pd
import numpy as np
import helper
import project_helper
import project_tests

Market Data

Load Data

For this universe of stocks, we'll be selecting large dollar volume stocks. We're using this universe, since it is highly liquid.

df = pd.read_csv('../../data/project_3/eod-quotemedia.csv')

# fields字段:adj_close(收盘价), adj_volume(成交量), dividends(股息)
percent_top_dollar = 0.2
high_volume_symbols = project_helper.large_dollar_volume_stocks(df, 'adj_close', 'adj_volume', percent_top_dollar)

# print(打印)

# 只获取high_volume_symbols 的数据
df = df[df['ticker'].isin(high_volume_symbols)]

# 重设index
# @doc:reset_index:
# @doc: pivot method:

# 按照日期取出收盘价,列是每个股票,一行可以显示所有的股票收盘价,其余字段相同
close = df.reset_index().pivot(index='date', columns='ticker', values='adj_close')
volume = df.reset_index().pivot(index='date', columns='ticker', values='adj_volume')
dividends = df.reset_index().pivot(index='date', columns='ticker', values='dividends')

         date ticker   adj_close       adj_volume  dividends
0  2013-07-01      A 29.99418563 4283600.00000000 0.00000000
1  2013-07-02      A 29.65013670 2986500.00000000 0.00000000
2  2013-07-03      A 29.70518453 1940600.00000000 0.00000000
3  2013-07-05      A 30.43456826 2097800.00000000 0.00000000
4  2013-07-08      A 30.52402098 2570100.00000000 0.00000000

['ABT', 'WYNN', 'EBAY', 'UPS', 'FDX', 'USB', 'UAL', 'LMT', 'HON', 'AMAT', 'COST', 'CBS', 'TXN', 'LLY', 'OXY', 'PXD', 'MO', 'MON', 'CHTR', 'VLO', 'MDLZ', 'ESRX', 'CRM', 'AVGO', 'REGN', 'TGT', 'LOW', 'APC', 'EOG', 'AXP', 'KMI', 'WBA', 'TWX', 'FOXA', 'PM', 'FCX', 'MS', 'UNP', 'UTX', 'CMG', 'MDT', 'COP', 'MA', 'PEP', 'F', 'NKE', 'AIG', 'AAL', 'SBUX', 'UNH', 'DAL', 'BMY', 'CVS', 'CAT', 'ABBV', 'GM', 'AMGN', 'HAL', 'BIIB', 'BA', 'CELG', 'MCD', 'KO', 'NVDA', 'WMT', 'MRK', 'HD', 'SLB', 'GS', 'ORCL', 'MU', 'QCOM', 'IBM', 'V', 'CMCSA', 'CSCO', 'CVX', 'DIS', 'VZ', 'PG', 'AGN', 'JNJ', 'T', 'PFE', 'INTC', 'WFC', 'GILD', 'GE', 'JPM', 'XOM', 'C', 'GOOG', 'NFLX', 'MSFT', 'BAC', 'GOOGL', 'AMZN', 'FB', 'AAPL']

            date ticker   adj_close        adj_volume  dividends
1009  2013-07-01    AAL 16.17609308 12511796.00000000 0.00000000
1010  2013-07-02    AAL 15.81983388 10748794.00000000 0.00000000
1011  2013-07-03    AAL 16.12794994  7039678.00000000 0.00000000
1012  2013-07-05    AAL 16.21460758  6426810.00000000 0.00000000
1013  2013-07-08    AAL 16.31089385  7161394.00000000 0.00000000

ticker             AAL        AAPL        ABBV         ABT          AGN  \
2013-07-01 16.17609308 53.10917319 34.92447839 31.42538772 122.62751990   
2013-07-02 15.81983388 54.31224742 35.42807578 31.27288084 121.05361758   
2013-07-03 16.12794994 54.61204262 35.44486235 30.72565028 121.21003024   
2013-07-05 16.21460758 54.17338125 35.85613355 31.32670680 123.53666845   
2013-07-08 16.31089385 53.86579916 36.66188936 31.76628544 123.65397794   

ticker             AIG        AMAT        AMGN         AMZN         APC  \
2013-07-01 41.55339742 13.63297558 86.80333359 282.10000000 82.90947737   
2013-07-02 41.36908428 13.63757665 85.53008744 283.73000000 82.89037160   
2013-07-03 40.75163526 13.80321523 85.30749196 284.03000000 83.04321774   
2013-07-05 41.64555399 14.00566239 86.93689088 285.88000000 84.42838587   
2013-07-08 41.94967067 13.93204524 87.41769712 290.59000000 84.35196280   

ticker         ...             USB         UTX           V         VLO  \
date           ...                                                       
2013-07-01     ...     32.32004572 84.38110191 44.66007989 29.28427559   
2013-07-02     ...     32.31114458 83.54388177 44.66490227 28.67721470   
2013-07-03     ...     32.35565029 84.37219531 45.09891665 28.80546700   
2013-07-05     ...     32.72059710 85.97538281 46.00311328 28.94226945   
2013-07-08     ...     32.97873021 86.37617969 45.35450290 29.52367989   

ticker              VZ         WBA         WFC         WMT         WYNN  \
2013-07-01 40.07272093 41.02191970 35.91510532 65.48939807 112.18880046   
2013-07-02 40.28756674 41.13293978 35.79353582 65.59475707 110.19759336   
2013-07-03 40.58994231 40.81838289 35.79353582 65.63865665 110.16265990   
2013-07-05 40.82070262 40.93865464 36.53163639 66.03375290 110.98359616   
2013-07-08 41.12705682 41.93783537 37.19158513 67.35074039 110.89626252   

ticker             XOM  
2013-07-01 76.32080247  
2013-07-02 76.60816761  
2013-07-03 76.65042719  
2013-07-05 77.39419581  
2013-07-08 77.96892611  

[5 rows x 99 columns]

View Data

To see what one of these 2-d matrices looks like, let's take a look at the closing prices matrix.


Part 1: Smart Beta Portfolio

In Part 1 of this project, you'll build a portfolio using dividend yield to choose the portfolio weights. A portfolio such as this could be incorporated into a smart beta ETF. You'll compare this portfolio to a market cap weighted index to see how well it performs.

Note that in practice, you'll probably get the index weights from a data vendor (such as companies that create indices, like MSCI, FTSE, Standard and Poor's), but for this exercise we will simulate a market cap weighted index.

Index Weights

The index we'll be using is based on large dollar volume stocks. Implement generate_dollar_volume_weights to generate the weights for this index. For each date, generate the weights based on dollar volume traded for that date. For example, assume the following is close prices and volume data:

               A         B         ...
2013-07-08     2         2         ...
2013-07-09     5         6         ...
2013-07-10     1         2         ...
2013-07-11     6         5         ...
...            ...       ...       ...

               A         B         ...
2013-07-08     100       340       ...
2013-07-09     240       220       ...
2013-07-10     120       500       ...
2013-07-11     10        100       ...
...            ...       ...       ...

The weights created from the function generate_dollar_volume_weights should be the following:

               A         B         ...
2013-07-08     0.126..   0.194..   ...
2013-07-09     0.759..   0.377..   ...
2013-07-10     0.075..   0.285..   ...
2013-07-11     0.037..   0.142..   ...
...            ...       ...       ...
def generate_dollar_volume_weights(close, volume):
    Generate dollar volume weights.

    close : DataFrame
        Close price for each ticker and date
    volume : str
        Volume for each ticker and date

    dollar_volume_weights : DataFrame
        The dollar volume weights for each ticker and date
    # assert 断言语句
    assert close.index.equals(volume.index)
    assert close.columns.equals(volume.columns)

    # problem:怎么利用收盘价和交易量计算权重指数?

    #TODO: Implement function
    dollar_volume = close * volume

    # print(dollar_volume)
    # print("SUM:");
    # print(dollar_volume.sum(axis=1))  # 计算某一天的总交易金额
    # 每只股票金额/总交易金额之后
    dollar_volume_weights = dollar_volume.divide(dollar_volume.sum(axis=1), axis=0)


    return dollar_volume_weights

                  PQJ        CSK        DUH
2005-03-16 0.27719777 0.48394253 0.23885970
2005-03-17 0.41632975 0.34293308 0.24073717
2005-03-18 0.41848548 0.33536102 0.24615350
2005-03-19 0.05917255 0.85239760 0.08842984
Tests Passed

View Data

Let's generate the index weights using generate_dollar_volume_weights and view them using a heatmap.

index_weights = generate_dollar_volume_weights(close, volume)
project_helper.plot_weights(index_weights, 'Index Weights')
ticker            AAL       AAPL       ABBV        ABT        AGN        AIG  \
2013-07-01 0.00458693 0.11767248 0.00363163 0.00452034 0.00390809 0.01193725   
2013-07-02 0.00363550 0.13639979 0.00330692 0.00387337 0.00266703 0.01524297   
2013-07-03 0.00466860 0.13526072 0.00352350 0.01058904 0.00348543 0.01245591   
2013-07-05 0.00319288 0.11370932 0.00301426 0.00447694 0.00289257 0.01236056   
2013-07-08 0.00267775 0.09203767 0.00334354 0.00549484 0.00250670 0.00909724   
2013-07-09 0.00381208 0.10082010 0.00288745 0.00669273 0.00230203 0.01324442   
2013-07-10 0.00447257 0.09036346 0.00267033 0.00667761 0.00329537 0.01009668   
2013-07-11 0.00279390 0.08937189 0.00297546 0.00567649 0.00274211 0.01334972   
2013-07-12 0.00324340 0.07013009 0.00393849 0.00358572 0.00272487 0.01016838   
2013-07-15 0.00343331 0.07690605 0.00282922 0.00475185 0.00337597 0.00968997   
2013-07-16 0.00400739 0.06808705 0.00310505 0.00615101 0.00298901 0.00804328   
2013-07-17 0.00657753 0.06098209 0.00297935 0.00565051 0.00340205 0.00972553   
2013-07-18 0.00760543 0.05814757 0.00216680 0.00420515 0.00290471 0.00941840   
2013-07-19 0.00313688 0.05990564 0.00263260 0.00372828 0.00199034 0.00472091   
2013-07-22 0.00500106 0.06520309 0.00329260 0.00349814 0.00541434 0.00891040   
2013-07-23 0.00508247 0.11193160 0.00295727 0.00670411 0.00216171 0.00962065   
2013-07-24 0.00985709 0.16179725 0.00218296 0.00380502 0.00245569 0.00741853   
2013-08-12 0.00615471 0.15074963 0.00395585 0.00436969 0.00244195 0.00802787   
...               ...        ...        ...        ...        ...        ...   
2017-05-19 0.00428806 0.05768555 0.00606241 0.00353434 0.00905647 0.00820911   
2017-05-22 0.00356866 0.06124647 0.00616950 0.00437554 0.00672621 0.00972794   
2017-05-23 0.00268572 0.05820720 0.00574691 0.00386194 0.00698638 0.00930265   
2017-06-30 0.00505383 0.04996914 0.00507628 0.00344529 0.00631443 0.00723481   

ticker           AMAT       AMGN       AMZN        APC    ...            USB  \
date                                                      ...                  
2013-07-01 0.00277884 0.00963710 0.01846543 0.00522426    ...     0.00630446   
2013-07-02 0.00285419 0.00732560 0.01964317 0.00427094    ...     0.00717138   
2017-06-30 0.00628914 0.00690689 0.05022972 0.00367206    ...     0.00531861   

ticker            UTX          V        VLO         VZ        WBA        WFC  \
2013-07-01 0.00743820 0.01175723 0.00512460 0.00885143 0.00582218 0.01316761   
2013-07-02 0.00660160 0.00835483 0.00533076 0.00832122 0.00632561 0.01190640   
2013-07-03 0.00602634 0.00840670 0.00965935 0.00995714 0.00737145 0.01188419   
2017-06-28 0.00333344 0.00823113 0.00336078 0.01234098 0.00772702 0.01712018   
2017-06-29 0.00361304 0.01072070 0.00274084 0.00975787 0.01473140 0.02275530   
2017-06-30 0.00440285 0.01288414 0.00296413 0.00869974 0.00831818 0.01381868   

ticker            WMT       WYNN        XOM  
2013-07-01 0.00994090 0.00275262 0.02280131  
2013-07-02 0.00943310 0.00394488 0.02017963  
2013-07-03 0.00625213 0.00271568 0.01788375  
2017-06-29 0.00674801 0.00300113 0.02052075  
2017-06-30 0.00787574 0.00294507 0.01663095  

[1009 rows x 99 columns]

The graph for Index Weights is too large. You can view it here.

Portfolio Weights

Now that we have the index weights, let's choose the portfolio weights based on dividend. You would normally calculate the weights based on trailing dividend yield, but we'll simplify this by just calculating the total dividend yield over time.

Implement calculate_dividend_weights to return the weights for each stock based on its total dividend yield over time. This is similar to generating the weight for the index, but it's using dividend data instead.
For example, assume the following is dividends data:

               A         B
2013-07-08     0         0
2013-07-09     0         1
2013-07-10     0.5       0
2013-07-11     0         0
2013-07-12     2         0
...            ...       ...

The weights created from the function calculate_dividend_weights should be the following:

               A         B
2013-07-08     NaN       NaN
2013-07-09     0         1
2013-07-10     0.333..   0.666..
2013-07-11     0.333..   0.666..
2013-07-12     0.714..   0.285..
...            ...       ...
def calculate_dividend_weights(dividends):
    Calculate dividend weights.

    dividends : DataFrame
        Dividend for each stock and date

    dividend_weights : DataFrame
        Weights for each stock and date
    #TODO: Implement function
    # 根据股息计算分红权重,cumsum()累加
    # print(dividends)
    div_cumsum = dividends.cumsum()
    # print(div_cumsum)
    dividend_weights = div_cumsum.divide(div_cumsum.sum(axis=1), axis=0)

    return dividend_weights

Tests Passed

View Data

Just like the index weights, let's generate the ETF weights and view them using a heatmap.

etf_weights = calculate_dividend_weights(dividends)
project_helper.plot_weights(etf_weights, 'ETF Weights')

The graph for ETF Weights is too large. You can view it here.


Implement generate_returns to generate returns data for all the stocks and dates from price data. You might notice we're implementing returns and not log returns. Since we're not dealing with volatility, we don't have to use log returns.

def generate_returns(prices):
    Generate returns for ticker and date.

    prices : DataFrame
        Price for each ticker and date

    returns : Dataframe
        The returns for each ticker and date
    #TODO: Implement function
    # print(prices)
    # print(prices.shift(1)) 后一天和前一天差值/前一天
    returns = (prices - prices.shift(1))/prices.shift(1)

    return returns

Tests Passed

View Data

Let's generate the closing returns using generate_returns and view them using a heatmap.

returns = generate_returns(close)
project_helper.plot_returns(returns, 'Close Returns')

The graph for Close Returns is too large. You can view it here.

Weighted Returns

With the returns of each stock computed, we can use it to compute the returns for an index or ETF. Implement generate_weighted_returns to create weighted returns using the returns and weights.

def generate_weighted_returns(returns, weights):
    Generate weighted returns.

    returns : DataFrame
        Returns for each ticker and date
    weights : DataFrame
        Weights for each ticker and date

    weighted_returns : DataFrame
        Weighted returns for each ticker and date
    assert returns.index.equals(weights.index)
    assert returns.columns.equals(weights.columns)

    #TODO: Implement function(收益*权重)
    weighted_returns = returns.multiply(weights, axis=1)

    return weighted_returns

Tests Passed

View Data

Let's generate the ETF and index returns using generate_weighted_returns and view them using a heatmap.

index_weighted_returns = generate_weighted_returns(returns, index_weights)
etf_weighted_returns = generate_weighted_returns(returns, etf_weights)
project_helper.plot_returns(index_weighted_returns, 'Index Returns')
project_helper.plot_returns(etf_weighted_returns, 'ETF Returns')

The graph for Index Returns is too large. You can view it here.

The graph for ETF Returns is too large. You can view it here.

Cumulative Returns

To compare performance between the ETF and Index, we're going to calculate the tracking error. Before we do that, we first need to calculate the index and ETF comulative returns. Implement calculate_cumulative_returns to calculate the cumulative returns over time given the returns.

def calculate_cumulative_returns(returns):
    Calculate cumulative returns.

    returns : DataFrame
        Returns for each ticker and date

    cumulative_returns : Pandas Series
        Cumulative returns for each date
    #TODO: Implement function
    # print(returns)
    # he = returns.sum(axis=1)
    # (he)
    # print(he.cumprod())
    return_cum = (returns.sum(axis=1)+1).cumprod()

    return return_cum

Tests Passed

View Data

Let's generate the ETF and index cumulative returns using calculate_cumulative_returns and compare the two.

index_weighted_cumulative_returns = calculate_cumulative_returns(index_weighted_returns)
etf_weighted_cumulative_returns = calculate_cumulative_returns(etf_weighted_returns)
project_helper.plot_benchmark_returns(index_weighted_cumulative_returns, etf_weighted_cumulative_returns, 'Smart Beta ETF vs Index')

Tracking Error

In order to check the performance of the smart beta portfolio, we can calculate the annualized tracking error against the index. Implement tracking_error to return the tracking error between the ETF and benchmark.

For reference, we'll be using the following annualized tracking error function:
$$ TE = \sqrt{252} * SampleStdev(r_p - r_b) $$

Where $ r_p $ is the portfolio/ETF returns and $ r_b $ is the benchmark returns.

Note: When calculating the sample standard deviation, the delta degrees of freedom is 1, which is the also the default value.

def tracking_error(benchmark_returns_by_date, etf_returns_by_date):
    Calculate the tracking error.

    benchmark_returns_by_date : Pandas Series
        The benchmark returns for each date
    etf_returns_by_date : Pandas Series
        The ETF returns for each date

    tracking_error : float
        The tracking error
    assert benchmark_returns_by_date.index.equals(etf_returns_by_date.index)

    #TODO: Implement function
    # Where rp is the portfolio/ETF returns and rb is the benchmark returns.
    tracking_error = np.sqrt(252)*np.std(etf_returns_by_date - benchmark_returns_by_date, ddof=1)

    return tracking_error

Tests Passed

View Data

Let's generate the tracking error using tracking_error.

smart_beta_tracking_error = tracking_error(np.sum(index_weighted_returns, 1), np.sum(etf_weighted_returns, 1))
print('Smart Beta Tracking Error: {}'.format(smart_beta_tracking_error))
Smart Beta Tracking Error: 0.1020761483200753

Part 2: Portfolio Optimization

Now, let's create a second portfolio. We'll still reuse the market cap weighted index, but this will be independent of the dividend-weighted portfolio that we created in part 1.

We want to both minimize the portfolio variance and also want to closely track a market cap weighted index. In other words, we're trying to minimize the distance between the weights of our portfolio and the weights of the index.

$Minimize \left [ \sigma^2p + \lambda \sqrt{\sum{1}^{m}(weight_i - indexWeight_i)^2} \right ]$ where $m$ is the number of stocks in the portfolio, and $\lambda$ is a scaling factor that you can choose.

Why are we doing this? One way that investors evaluate a fund is by how well it tracks its index. The fund is still expected to deviate from the index within a certain range in order to improve fund performance. A way for a fund to track the performance of its benchmark is by keeping its asset weights similar to the weights of the index. We’d expect that if the fund has the same stocks as the benchmark, and also the same weights for each stock as the benchmark, the fund would yield about the same returns as the benchmark. By minimizing a linear combination of both the portfolio risk and distance between portfolio and benchmark weights, we attempt to balance the desire to minimize portfolio variance with the goal of tracking the index.


Implement get_covariance_returns to calculate the covariance of the returns. We'll use this to calculate the portfolio variance.

If we have $m$ stock series, the covariance matrix is an $m \times m$ matrix containing the covariance between each pair of stocks. We can use Numpy.cov to get the covariance. We give it a 2D array in which each row is a stock series, and each column is an observation at the same period of time. For any NaN values, you can replace them with zeros using the DataFrame.fillna function.

The covariance matrix $\mathbf{P} =
\sigma^2{1,1} & ... & \sigma^2{1,m} \
... & ... & ...\
\sigma{m,1} & ... & \sigma^2{m,m} \

def get_covariance_returns(returns):
    Calculate covariance matrices.

    returns : DataFrame
        Returns for each ticker and date

    returns_covariance  : 2 dimensional Ndarray
        The covariance of the returns
    #TODO: Implement function
    # 1、replace them with zeros using the DataFrame.fillna function.
    returns = returns.fillna(0)
    # print(returns)

    # 2、Transform T 转置
    returns_t = returns.T
    # print(returns_t)

    return np.cov(returns_t)

Tests Passed

View Data

Let's look at the covariance generated from get_covariance_returns.

covariance_returns = get_covariance_returns(returns)
covariance_returns = pd.DataFrame(covariance_returns, returns.columns, returns.columns)

covariance_returns_correlation = np.linalg.inv(np.diag(np.sqrt(np.diag(covariance_returns))))
covariance_returns_correlation = pd.DataFrame(,

    'Covariance Returns Correlation Matrix')

The graph for Covariance Returns Correlation Matrix is too large. You can view it here.

portfolio variance

We can write the portfolio variance $\sigma^2_p = \mathbf{x^T} \mathbf{P} \mathbf{x}$

Recall that the $\mathbf{x^T} \mathbf{P} \mathbf{x}$ is called the quadratic form.
We can use the cvxpy function quad_form(x,P) to get the quadratic form.

Distance from index weights

We want portfolio weights that track the index closely. So we want to minimize the distance between them.
Recall from the Pythagorean theorem that you can get the distance between two points in an x,y plane by adding the square of the x and y distances and taking the square root. Extending this to any number of dimensions is called the L2 norm. So: $\sqrt{\sum_{1}^{n}(weight_i - indexWeight_i)^2}$ Can also be written as $\left | \mathbf{x} - \mathbf{index} \right |_2$. There's a cvxpy function called norm()
norm(x, p=2, axis=None). The default is already set to find an L2 norm, so you would pass in one argument, which is the difference between your portfolio weights and the index weights.

objective function

We want to minimize both the portfolio variance and the distance of the portfolio weights from the index weights.
We also want to choose a scale constant, which is $\lambda$ in the expression.

$\mathbf{x^T} \mathbf{P} \mathbf{x} + \lambda \left | \mathbf{x} - \mathbf{index} \right |_2$

This lets us choose how much priority we give to minimizing the difference from the index, relative to minimizing the variance of the portfolio. If you choose a higher value for scale ($\lambda$).

We can find the objective function using cvxpy objective = cvx.Minimize(). Can you guess what to pass into this function?


We can also define our constraints in a list. For example, you'd want the weights to sum to one. So $\sum_{1}^{n}x = 1$. You may also need to go long only, which means no shorting, so no negative weights. So $x_i >0 $ for all $i$. you could save a variable as [x >= 0, sum(x) == 1], where x was created using cvx.Variable().


So now that we have our objective function and constraints, we can solve for the values of $\mathbf{x}$.
cvxpy has the constructor Problem(objective, constraints), which returns a Problem object.

The Problem object has a function solve(), which returns the minimum of the solution. In this case, this is the minimum variance of the portfolio.

It also updates the vector $\mathbf{x}$.

We can check out the values of $x_A$ and $x_B$ that gave the minimum portfolio variance by using x.value

import cvxpy as cvx

def get_optimal_weights(covariance_returns, index_weights, scale=2.0):
    Find the optimal weights.

    covariance_returns : 2 dimensional Ndarray
        The covariance of the returns
    index_weights : Pandas Series
        Index weights for all tickers at a period in time
    scale : int
        The penalty factor for weights the deviate from the index 
    x : 1 dimensional Ndarray
        The solution for x
    assert len(covariance_returns.shape) == 2
    assert len(index_weights.shape) == 1
    assert covariance_returns.shape[0] == covariance_returns.shape[1]  == index_weights.shape[0]

    #TODO: Implement function
    # print varaible
    # print(covariance_returns)
    # print(index_weights)
    x = cvx.Variable(len(index_weights))
    contraints = [x >=0, sum(x) == 1]

    # @doc:
    covVariance = cvx.quad_form(x, covariance_returns)
    # print(covVariance)
    index_dist = cvx.norm(x - index_weights, p=2)
    # print(index_dist)
    objective = cvx.Minimize(covVariance + scale * index_dist)

    problem = cvx.Problem(objective, contraints)

    xVal = x.value

    return xVal

Tests Passed

Optimized Portfolio

Using the get_optimal_weights function, let's generate the optimal ETF weights without rebalanceing. We can do this by feeding in the covariance of the entire history of data. We also need to feed in a set of index weights. We'll go with the average weights of the index over time.

raw_optimal_single_rebalance_etf_weights = get_optimal_weights(covariance_returns.values, index_weights.iloc[-1])
optimal_single_rebalance_etf_weights = pd.DataFrame(
    np.tile(raw_optimal_single_rebalance_etf_weights, (len(returns.index), 1)),

With our ETF weights built, let's compare it to the index. Run the next cell to calculate the ETF returns and compare it to the index returns.

optim_etf_returns = generate_weighted_returns(returns, optimal_single_rebalance_etf_weights)
optim_etf_cumulative_returns = calculate_cumulative_returns(optim_etf_returns)
project_helper.plot_benchmark_returns(index_weighted_cumulative_returns, optim_etf_cumulative_returns, 'Optimized ETF vs Index')

optim_etf_tracking_error = tracking_error(np.sum(index_weighted_returns, 1), np.sum(optim_etf_returns, 1))
print('Optimized ETF Tracking Error: {}'.format(optim_etf_tracking_error))
Optimized ETF Tracking Error: 0.05795012630412267

Rebalance Portfolio Over Time

The single optimized ETF portfolio used the same weights for the entire history. This might not be the optimal weights for the entire period. Let's rebalance the portfolio over the same period instead of using the same weights. Implement rebalance_portfolio to rebalance a portfolio.

Reblance the portfolio every n number of days, which is given as shift_size. When rebalancing, you should look back a certain number of days of data in the past, denoted as chunk_size. Using this data, compute the optoimal weights using get_optimal_weights and get_covariance_returns.

def rebalance_portfolio(returns, index_weights, shift_size, chunk_size):
    Get weights for each rebalancing of the portfolio.

    returns : DataFrame
        Returns for each ticker and date
    index_weights : DataFrame
        Index weight for each ticker and date
    shift_size : int
        The number of days between each rebalance
    chunk_size : int
        The number of days to look in the past for rebalancing

    all_rebalance_weights  : list of Ndarrays
        The ETF weights for each point they are rebalanced
    assert returns.index.equals(index_weights.index)
    assert returns.columns.equals(index_weights.columns)
    assert shift_size > 0
    assert chunk_size >= 0

    #TODO: Implement function
    all_rebalance_weights = []

    for i in range(chunk_size, len(returns), shift_size):
        chunks = returns.iloc[i - chunk_size:i]
        # print(chunks)
        cov_returns = get_covariance_returns(chunks)
        # get_optimal_weights(cov_returns, index_weights.iloc[i])
        opt_weights = get_optimal_weights(cov_returns, index_weights.iloc[i-1])
        # print(opt_weights)

    return all_rebalance_weights

Tests Passed

Run the following cell to rebalance the portfolio using rebalance_portfolio.

chunk_size = 250
shift_size = 5
all_rebalance_weights = rebalance_portfolio(returns, index_weights, shift_size, chunk_size)

Portfolio Turnover

With the portfolio rebalanced, we need to use a metric to measure the cost of rebalancing the portfolio. Implement get_portfolio_turnover to calculate the annual portfolio turnover. We'll be using the formulas used in the classroom:

$ AnnualizedTurnover =\frac{SumTotalTurnover}{NumberOfRebalanceEvents} * NumberofRebalanceEventsPerYear $

$ SumTotalTurnover =\sum{t,n}{\left | x{t,n} - x{t+1,n} \right |} $ Where $ x{t,n} $ are the weights at time $ t $ for equity $ n $.

$ SumTotalTurnover $ is just a different way of writing $ \sum \left | x_{t1,n} - x{t_2,n} \right | $

def get_portfolio_turnover(all_rebalance_weights, shift_size, rebalance_count, n_trading_days_in_year=252):
    Calculage portfolio turnover.

    all_rebalance_weights : list of Ndarrays
        The ETF weights for each point they are rebalanced
    shift_size : int
        The number of days between each rebalance
    rebalance_count : int
        Number of times the portfolio was rebalanced
    n_trading_days_in_year: int
        Number of trading days in a year

    portfolio_turnover  : float
        The portfolio turnover
    assert shift_size > 0
    assert rebalance_count > 0

    #TODO: Implement function
    weight_flip = np.diff(np.flip(all_rebalance_weights, axis=0), axis=0)
    # print(weight_flip)
    # calcute sum total turnover
    total_turnover = np.abs(weight_flip).sum()
    # print(total_turnover)
    num_rebalance_events = n_trading_days_in_year // shift_size
    # print(num_rebalance_events)
    turnover = (total_turnover / rebalance_count) * num_rebalance_events

    return turnover

Tests Passed

Run the following cell to get the portfolio turnover from get_portfolio turnover.

print(get_portfolio_turnover(all_rebalance_weights, shift_size, len(all_rebalance_weights) - 1))

That's it! You've built a smart beta portfolio in part 1 and did portfolio optimization in part 2. You can now submit your project.