Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
| Download
Views: 2873
Image: ubuntu2004
Kernel: Python 3 (system-wide)

Mean variance frontier with market data

** Camilo A. Garcia Trillos 2020**

In this notebook

We revisit the mean-variance frontier construction, this time using market data. We use a sub-sample of data, belonging to a selection of stocks in the market.

%matplotlib inline import numpy as np import pandas as pd import matplotlib.pyplot as plt import scipy.stats as stats from math import ceil

Data cleaning

We start by reading in data

data = pd.read_csv('~/Data/basket_20171201_10y.csv')
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) <ipython-input-2-14adc80f4979> in <module> ----> 1 data = pd.read_csv('~/Data/basket_20171201_10y.csv') /usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision) 686 ) 687 --> 688 return _read(filepath_or_buffer, kwds) 689 690 /usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds) 452 453 # Create the parser. --> 454 parser = TextFileReader(fp_or_buf, **kwds) 455 456 if chunksize or iterator: /usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds) 946 self.options["has_index_names"] = kwds["has_index_names"] 947 --> 948 self._make_engine(self.engine) 949 950 def close(self): /usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in _make_engine(self, engine) 1178 def _make_engine(self, engine="c"): 1179 if engine == "c": -> 1180 self._engine = CParserWrapper(self.f, **self.options) 1181 else: 1182 if engine == "python": /usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds) 2008 kwds["usecols"] = self.usecols 2009 -> 2010 self._reader = parsers.TextReader(src, **kwds) 2011 self.unnamed_cols = self._reader.unnamed_cols 2012 pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__() pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source() FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Data/basket_20171201_10y.csv'

This corresponds to 10 years of daily information for a selection of assets.

First of all, we need to summarise, understand, visualize and clean the data. Let us look first at the different entries and their types:

data.dtypes
ticker object date object open float64 high float64 low float64 close float64 volume float64 ex-dividend float64 split_ratio float64 adj_open float64 adj_high float64 adj_low float64 adj_close float64 adj_volume float64 dtype: object

Note that once again the 'date' field is not recognised as a data but as an object. Let us have a look at how it looks before we decide to transform it

data.head()
ticker date open high low close volume ex-dividend split_ratio adj_open adj_high adj_low adj_close adj_volume
0 BAC 2007-01-03 53.40 54.18 52.99 53.33 16028200.0 0.0 1.0 44.077816 44.721649 43.739391 44.020036 16028200.0
1 BAC 2007-01-04 53.33 53.89 53.05 53.67 13175000.0 0.0 1.0 44.020036 44.482276 43.788917 44.300682 13175000.0
2 BAC 2007-01-05 53.59 53.59 53.03 53.24 10205000.0 0.0 1.0 44.234647 44.234647 43.772408 43.945748 10205000.0
3 BAC 2007-01-08 53.46 53.64 52.80 53.45 9685900.0 0.0 1.0 44.127342 44.275919 43.582560 44.119088 9685900.0
4 BAC 2007-01-09 53.60 53.71 52.97 53.50 12546500.0 0.0 1.0 44.242902 44.333699 43.722882 44.160359 12546500.0

It does look like a date with the format 'yyy-mm-dd'. We can then transform this field and update it.

data['date']=pd.to_datetime(data['date'], yearfirst=True)

Let us check that it looks the same, but that the field is now a date

data.head()
ticker date open high low close volume ex-dividend split_ratio adj_open adj_high adj_low adj_close adj_volume
0 BAC 2007-01-03 53.40 54.18 52.99 53.33 16028200.0 0.0 1.0 44.077816 44.721649 43.739391 44.020036 16028200.0
1 BAC 2007-01-04 53.33 53.89 53.05 53.67 13175000.0 0.0 1.0 44.020036 44.482276 43.788917 44.300682 13175000.0
2 BAC 2007-01-05 53.59 53.59 53.03 53.24 10205000.0 0.0 1.0 44.234647 44.234647 43.772408 43.945748 10205000.0
3 BAC 2007-01-08 53.46 53.64 52.80 53.45 9685900.0 0.0 1.0 44.127342 44.275919 43.582560 44.119088 9685900.0
4 BAC 2007-01-09 53.60 53.71 52.97 53.50 12546500.0 0.0 1.0 44.242902 44.333699 43.722882 44.160359 12546500.0
data.dtypes
ticker object date datetime64[ns] open float64 high float64 low float64 close float64 volume float64 ex-dividend float64 split_ratio float64 adj_open float64 adj_high float64 adj_low float64 adj_close float64 adj_volume float64 dtype: object

Let su check if elements are unique per date.

print( len(np.unique(data['date'])), len(data['date']))
2749 24741

Not equal... because we have several indices. So this time the index should be a combination of ticker and date. However, since we are going to work fron now on only with adjusted_close prices, an alternative is to "pivot" the table, so that the 'ticker', which identifies the asset, becomes now another entry.

P = data.pivot(index='date', columns='ticker', values='adj_close') P.head()
ticker BAC C GE JNJ MSFT PG T WMT XOM
date
2007-01-03 44.020036 484.789018 25.827146 47.613096 22.969861 46.463307 19.568303 36.764375 55.121310
2007-01-04 44.300682 483.121870 25.677502 48.208260 22.931399 46.110549 19.316351 36.942205 54.087460
2007-01-05 43.945748 480.577276 25.548264 47.770850 22.800626 45.714596 19.014008 36.640668 54.474224
2007-01-08 44.119088 483.034125 25.541462 47.691973 23.023709 45.815384 19.128786 36.339130 54.035396
2007-01-09 44.160359 478.822383 25.541462 47.512707 23.046786 45.700197 19.202336 36.640668 53.618881

The dataframe 'P' has only information of the 'adjusted close price' for each date for each one of the companies that are in the database, namely BAC (Bank of America), C (Citybank), GE (General Electric), JNJ (Johnson and Johnson), MSFT (Microsoft), PG (Proctor & Gamble), T (AT&T), WMT (Wallmart), and XOM (Exxon Mobile). All of these are large companies from different economic sectors including finance, technology, energy, retail and pharmaceutical.

Let us check that we did not loose any data when creating our pivot table.

P.isnull().any()
ticker BAC False C False GE False JNJ False MSFT False PG False T False WMT False XOM False dtype: bool

No null data, so all fields are complete. Now, let us check that the data makes sense visually.

P.plot()
<AxesSubplot:xlabel='date'>
Image in a Jupyter notebook

That 'C' line look suspicious... until we realise it belongs to Citygroup: a huge drop in between 2008-2010 is not so starnge if we think about the financial crisis of the time.

Efficient frontier calculation

We are now going to build the efficient frontier in this reduced portfolio.

  • Using one week as your period, find the mean_variance frontier if we assume that these are all the stocks on the market, and only using information before 01/01/2012.

  • We find the minimal variance portfolio (note there is no risk free asset).

Building the log-returns

Let us now resample using only weekly data...

P=P.resample('W-FRI').last() P.head()
ticker BAC C GE JNJ MSFT PG T WMT XOM
date
2007-01-05 43.945748 480.577276 25.548264 47.770850 22.800626 45.714596 19.014008 36.640668 54.474224
2007-01-12 44.061308 477.155236 25.772730 47.785192 24.008351 46.794468 19.649297 37.096840 54.042834
2007-01-19 44.234647 478.208171 25.133343 48.588304 23.931426 47.804842 19.841660 37.351987 54.689920
2007-01-26 42.955235 479.699830 24.534768 47.376465 23.539108 46.915146 20.594138 36.857156 54.749422
2007-02-02 43.533034 484.344234 24.670808 47.742168 23.223714 47.262344 21.567268 37.174157 56.184911
type(P)
pandas.core.frame.DataFrame

Replot to check we did not do anything wrong

P.plot()
<AxesSubplot:xlabel='date'>
Image in a Jupyter notebook

Good. Now, let us calculate both the returns and the log-returns.

logR=np.log(P).diff() logR.drop(logR.index[0],inplace=True) # We drop the first line which produces NAN logR.head()
ticker BAC C GE JNJ MSFT PG T WMT XOM
date
2007-01-12 0.002626 -0.007146 0.008748 0.000300 0.051614 0.023347 0.032866 0.012373 -0.007951
2007-01-19 0.003926 0.002204 -0.025122 0.016667 -0.003209 0.021362 0.009742 0.006854 0.011902
2007-01-26 -0.029350 0.003114 -0.024104 -0.025257 -0.016529 -0.018786 0.037223 -0.013336 0.001087
2007-02-02 0.013362 0.009635 0.005529 0.007689 -0.013489 0.007373 0.046170 0.008564 0.025881
2007-02-09 0.004729 -0.023321 -0.020614 -0.014829 -0.040905 -0.014025 -0.032528 -0.002290 0.000024
logR.head()
ticker BAC C GE JNJ MSFT PG T WMT XOM
date
2007-01-12 0.002626 -0.007146 0.008748 0.000300 0.051614 0.023347 0.032866 0.012373 -0.007951
2007-01-19 0.003926 0.002204 -0.025122 0.016667 -0.003209 0.021362 0.009742 0.006854 0.011902
2007-01-26 -0.029350 0.003114 -0.024104 -0.025257 -0.016529 -0.018786 0.037223 -0.013336 0.001087
2007-02-02 0.013362 0.009635 0.005529 0.007689 -0.013489 0.007373 0.046170 0.008564 0.025881
2007-02-09 0.004729 -0.023321 -0.020614 -0.014829 -0.040905 -0.014025 -0.032528 -0.002290 0.000024
logR.plot()
<AxesSubplot:xlabel='date'>
Image in a Jupyter notebook

Remember that negative results of log-returns means returns lower then 1 (i.e. effective losses)

Stationarity check

How stationary are the data? Let us look first at the log-returns. I will simply plot the mean and variance with a rolling window of 10 observations.

logR.rolling(20).mean().plot(alpha=0.5) plt.title('Log returns')
Text(0.5, 1.0, 'Log returns')
Image in a Jupyter notebook

These values seem to be significantly bigger initially. What about standard deviation?

logR.rolling(20).std().plot(alpha=0.5)
<AxesSubplot:xlabel='date'>
Image in a Jupyter notebook

It looks that there is a significant ``change of regime'' around 2010. It is of course associated with the results of the financial crisis. We are going to consider first the mean variance using the entire history from 2007 to 2010 and then compare with the result from 2010 on.

Note: In class we worked with mean-varaince analysis of returns. It is often convenient to work with log-retunrs instead: on the one hand, log-returns are usually closer to sattionary. On the other hand, they tend to look closer to Gaussian

Mean-variance frontier construction

We start by calculating, the mean and variance for the selected history on the first case

logR_sub = logR[logR.index< '2012-01-01'] mu = logR_sub.mean() Sigma = logR_sub.cov() Rho = logR_sub.corr() print('Mean:', mu) print('Variances', np.diag(Sigma)) print('Var-cov:', Sigma) print('Correlation', Rho)
Mean: ticker BAC -0.008158 C -0.011287 GE -0.002101 JNJ 0.000542 MSFT -0.000119 PG 0.000720 T 0.000589 WMT 0.001307 XOM 0.000983 dtype: float64 Variances [0.01065754 0.01643106 0.00302104 0.00060041 0.00164302 0.00067786 0.00114917 0.0008024 0.00119484] Var-cov: ticker BAC C GE JNJ MSFT PG T \ ticker BAC 0.010658 0.010821 0.003996 0.001206 0.001436 0.001176 0.001680 C 0.010821 0.016431 0.004706 0.001277 0.001674 0.001060 0.002169 GE 0.003996 0.004706 0.003021 0.000551 0.000855 0.000614 0.000794 JNJ 0.001206 0.001277 0.000551 0.000600 0.000493 0.000410 0.000463 MSFT 0.001436 0.001674 0.000855 0.000493 0.001643 0.000460 0.000587 PG 0.001176 0.001060 0.000614 0.000410 0.000460 0.000678 0.000492 T 0.001680 0.002169 0.000794 0.000463 0.000587 0.000492 0.001149 WMT 0.001000 0.000953 0.000495 0.000375 0.000495 0.000437 0.000505 XOM 0.001490 0.001562 0.000762 0.000490 0.000745 0.000483 0.000675 ticker WMT XOM ticker BAC 0.001000 0.001490 C 0.000953 0.001562 GE 0.000495 0.000762 JNJ 0.000375 0.000490 MSFT 0.000495 0.000745 PG 0.000437 0.000483 T 0.000505 0.000675 WMT 0.000802 0.000479 XOM 0.000479 0.001195 Correlation ticker BAC C GE JNJ MSFT PG T \ ticker BAC 1.000000 0.817741 0.704303 0.476717 0.343086 0.437369 0.480161 C 0.817741 1.000000 0.668015 0.406486 0.322244 0.317525 0.499051 GE 0.704303 0.668015 1.000000 0.409441 0.383542 0.428870 0.426370 JNJ 0.476717 0.406486 0.409441 1.000000 0.496616 0.642216 0.557587 MSFT 0.343086 0.322244 0.383542 0.496616 1.000000 0.435775 0.427469 PG 0.437369 0.317525 0.428870 0.642216 0.435775 1.000000 0.557387 T 0.480161 0.499051 0.426370 0.557587 0.427469 0.557387 1.000000 WMT 0.341866 0.262506 0.318131 0.540748 0.431313 0.592972 0.526187 XOM 0.417678 0.352628 0.401172 0.579001 0.531398 0.536973 0.575743 ticker WMT XOM ticker BAC 0.341866 0.417678 C 0.262506 0.352628 GE 0.318131 0.401172 JNJ 0.540748 0.579001 MSFT 0.431313 0.531398 PG 0.592972 0.536973 T 0.526187 0.575743 WMT 1.000000 0.489380 XOM 0.489380 1.000000

Correlation are positive as expected... means seem reasonable.

Before calculating the optimal portfolio, we transform the variable Sigma in a numpy matrix and calculate its condition number to establish the stability of its inverse.

Sigma=Sigma.values np.linalg.cond(Sigma)
128.7153892613439

It is a large number, but still OK. We do not need to stabilise the inverse.

Now, we are going to re-define the Markowitz weights function to find the mean-variance frontier.

def Markowitz_weights(Sigma, mean_returns, mu_p): S_inv = np.linalg.inv(Sigma) pi_mu = S_inv @ (mean_returns)/sum(S_inv @ mean_returns) pi_1 = np.sum(S_inv, axis = 1) / sum(np.sum(S_inv, axis = 1) ) lambda_demoninator = (mean_returns @ pi_mu) - (mean_returns@pi_1) ll = np.array((mu_p - (mean_returns @ pi_1))/lambda_demoninator) # previous line: to convert into array in case that mu_p is a single number ll.shape=(ll.size,1) return pi_mu * ll + pi_1 * (1-ll)
# Using the function: note that Sigma and mean are Series, so we turn them into arrays by using the property 'values' print(mu) print('======') #mup= np.linspace(0.999,1.003) mup= np.linspace(-0.002,0.004) mvf_portfolios = Markowitz_weights(Sigma, mu,mup ) print(mvf_portfolios[0:3])
ticker BAC -0.008158 C -0.011287 GE -0.002101 JNJ 0.000542 MSFT -0.000119 PG 0.000720 T 0.000589 WMT 0.001307 XOM 0.000983 dtype: float64 ====== [[ 0.04274296 0.15931801 0.02639219 0.59758119 0.20169459 0.41534182 -0.14901258 -0.06063188 -0.23342629] [ 0.03794259 0.15316594 0.02884724 0.59361264 0.19470381 0.40944844 -0.14098252 -0.05195897 -0.22477917] [ 0.03314222 0.14701386 0.03130228 0.5896441 0.18771302 0.40355506 -0.13295245 -0.04328606 -0.21613205]]

Let us calculate the means and variances of those portfolios.

mu_mvf = mvf_portfolios @ mu sd_mvf = np.diag((( mvf_portfolios @ Sigma) @ mvf_portfolios.T ))**0.5

We can also calculate the portfolio with the (quasi-) minimal variance portfolio

sigma_inv = np.linalg.inv(Sigma) ones_vec = np.ones(Sigma.shape[0]) min_var_p = sigma_inv@ones_vec min_var_p/=min_var_p.sum() plt.bar(R.columns, min_var_p) print(R.columns) print(min_var_p)
Index(['BAC', 'C', 'GE', 'JNJ', 'MSFT', 'PG', 'T', 'WMT', 'XOM'], dtype='object', name='ticker') [-0.09416917 -0.01614631 0.0964129 0.48439373 0.00230924 0.24725583 0.08001433 0.18672968 0.01319977]
Image in a Jupyter notebook

So that the minimal variance portfolio invests mostly on health and consumption (Johnson and Johnson, Proctor and Gamble, and Wallmart). Note that this is a risky strategy because it is too centered on one type of asset.

Let us check that everything is correct. We plot our surface and the minimum variance portfolio. To do so, we generate a bunch of random portfolios to visually verify that our calculated values are right.

#Generating random portfolios random_portfolios = 2.*np.random.rand(1000,Sigma.shape[0]) -1 random_portfolios/= random_portfolios.sum(axis=1)[..., np.newaxis] rand_mean = np.dot(random_portfolios, mu) rand_sd = np.diag((random_portfolios@Sigma)@random_portfolios.T)**0.5 rand_sd #Calculating mean and variance of minimal variance portfolio min_var_mean = np.dot(min_var_p, mu) min_var_sd = min_var_p.dot(Sigma).dot(min_var_p.T)**0.5 print(min_var_sd, min_var_mean) max_sd_plot=0.06 #Making the plot plt.plot(rand_sd[rand_sd <max_sd_plot], rand_mean[rand_sd < max_sd_plot], 'b.') plt.plot(sd_mvf, mu_mvf, 'go') plt.plot(min_var_sd, min_var_mean, 'rx')
0.02063705581048089 0.001492388376839055
[<matplotlib.lines.Line2D at 0x7f2c334cfd00>]
Image in a Jupyter notebook

Stability in time

What happens if we consider only recent history? If we had perfect stationarity, the optimal should still be very close to the one we found in the previous part. Let us make some calculations

logR_sub2 = logR[(logR.index< '2012-01-01') & (logR.index> '2010-01-01')] mu2 = logR_sub2.mean().values Sigma2 = logR_sub2.cov().values Rho2 = logR_sub2.corr().values print('Mean:', mu2) print('Variances', np.diag(Sigma2)) print('Var-cov:', Sigma2) print('Correlation', Rho2) mvf_portfolios2 = Markowitz_weights(Sigma2, mu2, mup) mu2_mvf = mvf_portfolios2 @ mu2 sd2_mvf = np.diag((( mvf_portfolios2 @ Sigma2) @ mvf_portfolios2.T ))**0.5 print(np.linalg.cond(Sigma2)) sigma2_inv = np.linalg.inv(Sigma2) ones_vec = np.ones(Sigma2.shape[0]) min_var_p2 = sigma2_inv@ones_vec min_var_p2/=min_var_p2.sum() plt.bar(R.columns, min_var_p2) print(R.columns) print(min_var_p2)
Mean: [-0.00950856 -0.00219951 0.00220996 0.00085082 -0.00109434 0.00151515 0.00188769 0.00155423 0.00257817] Variances [0.00311498 0.00377738 0.00168709 0.00036972 0.00095914 0.00035433 0.00046654 0.00036697 0.00091758] Var-cov: [[0.00311498 0.00284224 0.00164264 0.00047208 0.00086103 0.00042768 0.00067589 0.00043497 0.00107392] [0.00284224 0.00377738 0.00180427 0.00052624 0.00108874 0.00050054 0.00083692 0.00047916 0.001286 ] [0.00164264 0.00180427 0.00168709 0.00041486 0.00086875 0.00044694 0.00058337 0.00046338 0.00093544] [0.00047208 0.00052624 0.00041486 0.00036972 0.00028924 0.00019963 0.00021918 0.00016211 0.00036518] [0.00086103 0.00108874 0.00086875 0.00028924 0.00095914 0.00029394 0.00037959 0.00031198 0.00062288] [0.00042768 0.00050054 0.00044694 0.00019963 0.00029394 0.00035433 0.00023336 0.00021372 0.00033215] [0.00067589 0.00083692 0.00058337 0.00021918 0.00037959 0.00023336 0.00046654 0.00022388 0.00044079] [0.00043497 0.00047916 0.00046338 0.00016211 0.00031198 0.00021372 0.00022388 0.00036697 0.00031368] [0.00107392 0.001286 0.00093544 0.00036518 0.00062288 0.00033215 0.00044079 0.00031368 0.00091758]] Correlation [[1. 0.82858805 0.71655018 0.43989882 0.49813507 0.40708948 0.56066372 0.40683055 0.63521888] [0.82858805 1. 0.71472261 0.44530459 0.57199022 0.43265766 0.63044506 0.40697403 0.6907519 ] [0.71655018 0.71472261 1. 0.52529064 0.68294018 0.57806134 0.65755684 0.58891775 0.75184149] [0.43989882 0.44530459 0.52529064 1. 0.48572049 0.55154699 0.52774308 0.44010584 0.62698097] [0.49813507 0.57199022 0.68294018 0.48572049 1. 0.50421683 0.56745031 0.52585777 0.66395411] [0.40708948 0.43265766 0.57806134 0.55154699 0.50421683 1. 0.57396207 0.59268311 0.58251898] [0.56066372 0.63044506 0.65755684 0.52774308 0.56745031 0.57396207 1. 0.54107775 0.6736912 ] [0.40683055 0.40697403 0.58891775 0.44010584 0.52585777 0.59268311 0.54107775 1. 0.54057532] [0.63521888 0.6907519 0.75184149 0.62698097 0.66395411 0.58251898 0.6736912 0.54057532 1. ]] 66.18958930410454 Index(['BAC', 'C', 'GE', 'JNJ', 'MSFT', 'PG', 'T', 'WMT', 'XOM'], dtype='object', name='ticker') [ 0.02480495 -0.06254287 -0.17280838 0.35353751 0.04500131 0.26564107 0.27430629 0.34214459 -0.07008448]
Image in a Jupyter notebook

The first big change is that in the minimal variance portfolio, there is no longer room for a positive position in GE or Exxon Mobile. On the other hand Bank of America makes its comeback: Note also the reduction in exposition to JNJ.

#Generating random portfolios random_portfolios2 = 2.*np.random.rand(1000,Sigma2.shape[0]) -1 random_portfolios2/= random_portfolios2.sum(axis=1)[..., np.newaxis] rand_mean2 = np.dot(random_portfolios2, mu2) rand_sd2 = np.diag((random_portfolios2@Sigma2)@random_portfolios2.T)**0.5 #Calculating mean and variance of minimal varaince portfolio min_var_mean2 = np.dot(min_var_p2, mu2) min_var_sd2 = min_var_p.dot(Sigma2).dot(min_var_p2.T)**0.5 print(min_var_sd2, min_var_mean2) max_sd_plot=0.06 #Making the plot plt.plot(rand_sd2[rand_sd2 <max_sd_plot], rand_mean2[rand_sd2 < max_sd_plot], 'b.') plt.plot(sd2_mvf, mu2_mvf, 'go') plt.plot(min_var_sd2, min_var_mean2, 'rx')
0.013923096082564245 0.0010427265331609799
[<matplotlib.lines.Line2D at 0x7f2c3342a2e0>]
Image in a Jupyter notebook

Note that with this data, the mean variance frontier achieves the same returns with much smaller variance.

How does the efficient portfolio calculated with the data of the first part do on this second part?

#Making the plot mu1in2_mvf = mvf_portfolios @ mu2 sd1in2_mvf = np.diag((( mvf_portfolios @ Sigma2) @ mvf_portfolios.T ))**0.5 plt.plot(rand_sd2[rand_sd2 <max_sd_plot], rand_mean2[rand_sd2 < max_sd_plot], 'b.') plt.plot(sd2_mvf, mu2_mvf, 'go') plt.plot(sd1in2_mvf, mu1in2_mvf, 'y*') plt.plot(min_var_sd2, min_var_mean2, 'rx')
[<matplotlib.lines.Line2D at 0x7f2c33382c40>]
Image in a Jupyter notebook

We can see that the efficient part of the frontier is not doing that bad.

Let us move forward in time a couple of years and evaluate how these two portfolios would do in the period 2015-.

R_sub3 = R[(R.index> '2015-01-01')] mu3 = R_sub3.mean().values Sigma3 = R_sub3.cov().values Rho3 = R_sub3.corr().values print('Mean:', mu3) print('Variances', np.diag(Sigma3)) print('Var-cov:', Sigma3) print('Correlation', Rho3) mvf_portfolios3 = Markowitz_weights(Sigma3, mu3, mup) mu3_mvf = mvf_portfolios3 @ mu3 sd3_mvf = np.diag((( mvf_portfolios3 @ Sigma3) @ mvf_portfolios3.T ))**0.5 print(np.linalg.cond(Sigma3)) sigma3_inv = np.linalg.inv(Sigma3) ones_vec = np.ones(Sigma3.shape[0]) min_var_p3 = sigma3_inv@ones_vec min_var_p3/=min_var_p3.sum() plt.bar(R.columns, min_var_p3) print(R.columns) print(min_var_p3)
Mean: [ 3.17089980e-03 2.31414528e-03 -1.79822463e-03 2.41587099e-03 4.18994786e-03 4.04470847e-04 1.44399790e-03 1.23922279e-03 -2.76520721e-05] Variances [0.00148144 0.00116645 0.00096945 0.00032701 0.00095902 0.00035725 0.00046562 0.00068835 0.00048457] Var-cov: [[1.48144347e-03 1.16573124e-03 4.35206897e-04 1.81572602e-04 4.49598304e-04 7.74107987e-05 1.37466132e-04 1.28871207e-04 3.04613149e-04] [1.16573124e-03 1.16644598e-03 4.71421081e-04 1.88652743e-04 4.15244861e-04 1.25625235e-04 1.65589075e-04 7.98007726e-05 2.87273708e-04] [4.35206897e-04 4.71421081e-04 9.69450039e-04 2.12555959e-04 2.65003065e-04 1.66592903e-04 2.40478405e-04 8.25903973e-06 2.70597336e-04] [1.81572602e-04 1.88652743e-04 2.12555959e-04 3.27014030e-04 2.42208339e-04 1.52696242e-04 1.02605747e-04 1.18529988e-04 1.36443676e-04] [4.49598304e-04 4.15244861e-04 2.65003065e-04 2.42208339e-04 9.59021380e-04 2.65334278e-04 1.44848437e-04 2.23905822e-04 2.17360718e-04] [7.74107987e-05 1.25625235e-04 1.66592903e-04 1.52696242e-04 2.65334278e-04 3.57253412e-04 1.35743174e-04 1.44619226e-04 1.20544663e-04] [1.37466132e-04 1.65589075e-04 2.40478405e-04 1.02605747e-04 1.44848437e-04 1.35743174e-04 4.65622780e-04 6.59445138e-05 1.44725823e-04] [1.28871207e-04 7.98007726e-05 8.25903973e-06 1.18529988e-04 2.23905822e-04 1.44619226e-04 6.59445138e-05 6.88354591e-04 7.13161133e-05] [3.04613149e-04 2.87273708e-04 2.70597336e-04 1.36443676e-04 2.17360718e-04 1.20544663e-04 1.44725823e-04 7.13161133e-05 4.84568573e-04]] Correlation [[1. 0.8867957 0.36315373 0.26087027 0.37719678 0.10640715 0.16551451 0.12761662 0.35952449] [0.8867957 1. 0.44331657 0.30545526 0.39260679 0.19460607 0.22468923 0.08905713 0.38210783] [0.36315373 0.44331657 1. 0.37750906 0.27483626 0.28307765 0.35792848 0.01011022 0.39480549] [0.26087027 0.30545526 0.37750906 1. 0.43250562 0.44674219 0.26294892 0.24982675 0.34276194] [0.37719678 0.39260679 0.27483626 0.43250562 1. 0.45330554 0.21676171 0.27557838 0.31885211] [0.10640715 0.19460607 0.28307765 0.44674219 0.45330554 1. 0.33282251 0.29162969 0.28972246] [0.16551451 0.22468923 0.35792848 0.26294892 0.21676171 0.33282251 1. 0.1164811 0.30468516] [0.12761662 0.08905713 0.01011022 0.24982675 0.27557838 0.29162969 0.1164811 1. 0.12348208] [0.35952449 0.38210783 0.39480549 0.34276194 0.31885211 0.28972246 0.30468516 0.12348208 1. ]] 24.900558004055362 Index(['BAC', 'C', 'GE', 'JNJ', 'MSFT', 'PG', 'T', 'WMT', 'XOM'], dtype='object', name='ticker') [ 0.03548436 -0.00513189 -0.0128339 0.29949406 -0.06414959 0.23864434 0.20060666 0.14062327 0.16726271]
Image in a Jupyter notebook
#Generating random portfolios random_portfolios3 = 2.*np.random.rand(1000,Sigma3.shape[0]) -1 random_portfolios3/= random_portfolios3.sum(axis=1)[..., np.newaxis] rand_mean3 = np.dot(random_portfolios3, mu3) rand_sd3 = np.diag((random_portfolios3@Sigma2)@random_portfolios3.T)**0.5 #Calculating mean and variance of minimal varaince portfolio min_var_mean3 = np.dot(min_var_p3, mu3) min_var_sd3 = min_var_p.dot(Sigma3).dot(min_var_p3.T)**0.5 print(min_var_sd3, min_var_mean3) max_sd_plot=0.06 #Calculating results of previous optimals mu2in3_mvf = mvf_portfolios2 @ mu3 sd2in3_mvf = np.diag((( mvf_portfolios2 @ Sigma3) @ mvf_portfolios2.T ))**0.5 mu1in3_mvf = mvf_portfolios @ mu3 sd1in3_mvf = np.diag((( mvf_portfolios @ Sigma3) @ mvf_portfolios.T ))**0.5 #Making the plot plt.plot(rand_sd3[rand_sd3 <max_sd_plot], rand_mean3[rand_sd3 < max_sd_plot], 'b.') plt.plot(sd3_mvf, mu3_mvf, 'go') plt.plot(min_var_sd3, min_var_mean3, 'rx') plt.plot(sd1in3_mvf, mu1in3_mvf, '*', color='orange') plt.plot(sd2in3_mvf, mu2in3_mvf, 'y*')
0.013478190320770416 0.0011343138763670057
[<matplotlib.lines.Line2D at 0x7f2c332e3e50>]
Image in a Jupyter notebook

Note that the efficient portfolios calculated with the first data-base (all before 2010) are subperforming (even if they are not that bad). This shows that periodic recalculation is needed. If trading costs are important, this might reduce the possibilities to keep true efficient portfolios in time.