1

In [1]:

using Distributions, DelimitedFiles, LinearAlgebra, Optim, Dates include("./printmat.jl") using Plots # gr(size=(600,400)) #pyplot(size=(600,400)) gr(size=(480,320)) default(fmt = :svg)

2

The data loading section is done for you, so you can focus on the other tasks.

3

In [2]:

File = "./Data/6_Portfolios_2x3.csv" x = readdlm(File,',',skipstart=1) println("The first few rows of x") printmat(x[1:4,:]) R = x[:,2:end] # Net returns of 6 assets. File = "./Data/FFmFactorsFs.csv" x = readdlm(File,',',skipstart=1) println("The first few rows of x") printmat(x[1:4,:]) (Rme,RSMB,RHML,RMOM,Rf) = (x[:,2],x[:,3],x[:,4],x[:,8],x[:,7]) # Market excess return, SMB, HML, MOM, Rf. Re = R .- Rf # Excess returns of the 6 assets. (x,R) = (nothing,nothing) # Cleaning up, help avoiding mistakes below. println("size of Re: ", size(Re))

4

The first few rows of x
197901.000 8.560 7.900 9.589 3.240 5.077 6.785
197902.000 -3.444 -2.230 -1.914 -3.712 -2.369 -2.823
197903.000 11.660 8.339 8.486 5.710 5.687 7.484
197904.000 1.651 2.998 3.310 0.193 0.599 0.635
The first few rows of x
197901.000 4.230 3.840 2.290 -2.500 1.580 0.770 -1.230 -0.750 5.550
197902.000 -3.560 0.530 1.210 -1.150 1.040 0.730 -1.010 0.960 1.320
197903.000 5.680 3.180 -0.700 0.700 0.280 0.810 2.870 -0.360 1.370
197904.000 -0.060 2.400 1.050 1.180 0.210 0.800 0.830 -0.390 -0.340
size of Re: (473, 6)

We run OLS of each of the returns on `(1, Rme, RSMB, RHML, RMOM)`

and report the regression coeffs and their t-stats. The returns are 6 portfolios formed on Size and Book-to-Market that can be retrieved from Kenneth R. French's website.

5

In [3]:

T,n = size(Re) x = [ones(T) Rme RSMB RHML RMOM] #regressors K = size(x,2) b = fill(NaN,(K,n)) stdb = copy(b) VarRes = fill(NaN,n) for i = 1:n y = Re[:,i] #standard OLS notation b_i = x\y u = y - x*b_i b[:,i] = b_i covb = inv(x'x)*var(u) #cov(b), see any textbook or Wikipedia stdb[:,i] = sqrt.(diag(covb)) #std(b) end println("\nOLS coefficients, regressing Re on [constant; Rme; RSMB; RHML; MOM] ") printlnPs(" Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6") printlnPs("Constant: ", b[1,:]) # all the means are printed by using printlnPS printlnPs("Rme: ", b[2,:]) printlnPs("RSMB: ", b[3,:]) printlnPs("RHML: ", b[4,:]) printlnPs("RMOM: ", b[5,:]) tstat = b./stdb println("tstats of OLS coefficients") printlnPs(" Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6") printlnPs("Constant: ", tstat[1,:]) # all the means are printed by using printlnPS printlnPs("Rme: ", tstat[2,:]) printlnPs("RSMB: ", tstat[3,:]) printlnPs("RHML: ", tstat[4,:]) printlnPs("RMOM: ", tstat[5,:])

6

OLS coefficients, regressing Re on [constant; Rme; RSMB; RHML; MOM]
Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6
Constant: -0.174 0.119 0.085 0.129 -0.051 -0.130
Rme: 1.079 0.952 0.982 0.979 0.988 1.076
RSMB: 1.035 0.817 0.865 -0.163 -0.147 0.007
RHML: -0.421 0.258 0.578 -0.260 0.333 0.742
RMOM: -0.044 -0.002 0.003 0.001 -0.010 -0.046
tstats of OLS coefficients
Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6
Constant: -4.209 3.514 3.021 4.024 -0.854 -2.543
Rme: 110.165 118.809 148.591 128.943 70.569 89.096
RSMB: 73.088 70.579 90.552 -14.805 -7.265 0.407
RHML: -28.526 21.396 58.085 -22.721 15.783 40.772
RMOM: -4.698 -0.324 0.529 0.140 -0.723 -4.012

Estimate the coefficients using Maximum Likelhood Estimation (MLE) of the returns on `(1, Rme, RSMB, RHML, RMOM)`

. Report the regression coeffs. The goal is to show that under normal error assumption, as is typically assumed in linear regression, MLE and the OLS lead to identical estimates.

Hint 1:

```
using Distributions, Optim
function loglike(rho)
#Calculate residuals
#Calculate $\sigma^2$ to initialize the appropriate normal distribution
#Returning the negative of the sum of the individual contributions to the log-likelihood
return -loglikelihood
end
output = fill(NaN,(size(),size())
for i = 1:???
#Find the MLE, \hat\rho.
params0 = []; # some random starting parameters
optimum = optimize(???,???,ConjugateGradient());
end
```

Hint 2: `optimum.minizer`

contains the estimates.

Hint 3: When experiencing troubles try different optimization methods:

- NelderMead()
- SimulatedAnnealing()
- GradientDescent()
- see http://julianlsolvers.github.io/Optim.jl/latest/#algo/nelder_mead/ for further methods

7

In [11]:

using Distributions, Optim x = [ones(T) Rme RSMB RHML RMOM] function loglike(rho) beta = rho[1:5] sigma2 = exp(rho[6]) residual = y-x*beta dist = Normal(0, sqrt(sigma2)) contributions = logpdf.(dist,residual) loglikelihood = sum(contributions) return -loglikelihood end MLE = fill(NaN,(6,size(Re,2))) for i = 1:n y = Re[:,i] global y params0 = [.1,.2,.3,.4,.5,.6] optimum = optimize(loglike,params0,NelderMead()) MLE[:,i] = optimum.minimizer MLE[6,i] = MLE[6,i]^2 end println("\nCoefficients from MLE, regressing Re on [constant; Rme; RSMB; RHML; MOM]") printlnPs(" Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6") printlnPs("Constant: ", MLE[1,:]) # all the means are printed by using printlnPS printlnPs("Rme: ", MLE[2,:]) printlnPs("RSMB: ", MLE[3,:]) printlnPs("RHML: ", MLE[4,:]) printlnPs("RMOM: ", MLE[5,:]) if round.(b,digits =3) == round.(MLE[1:5,:],digits =3) println("\nMLE and OLS results are the same.") else println("\nMLE and OLS results are different.") end

8

Coefficients from MLE, regressing Re on [constant; Rme; RSMB; RHML; MOM]
Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6
Constant: -0.174 0.119 0.085 0.129 -0.051 -0.130
Rme: 1.079 0.952 0.982 0.979 0.988 1.076
RSMB: 1.035 0.817 0.865 -0.163 -0.147 0.007
RHML: -0.421 0.258 0.578 -0.260 0.333 0.742
RMOM: -0.044 -0.002 0.003 0.001 -0.010 -0.046
MLE and OLS results are the same.

This excercise is for ambitious students who would like to receive a high mark.

Suppose that we wish to test if the the parameter estimates of $\rho$ are statistically different from zero. Suppose further that we do not know how to compute analytically the standard errors of the MLE parameter estimates.

We decide to (non-parametrically) bootstrap by resampling cases in order to estimate the standard errors. This means that we treat the sample of N individuals as if it were a population from which we randomly draw B samples, each of size N. This produces a sample of MLEs of size B, that is, it provides an empirical approximation to the distribution of the MLE. From the empirical approximation, we can compare the full-sample point MLE to the MLE distribution under the null hypothesis.

Bootstrap samples can be easily generated using the built-in function `sample()`

. Each bootstrap sample should be drawn with replacment from the original sample and should have the same number of observations.

Proceed in three steps:

1.) Make sure that the log likelihood function has both independent and dependent variables as an input. 2.)et up a bootstrap function for the standard errors (see Hint 1).

Hint 1:

```
B = 1000 # bootstrap repetitions
bootstrapSE = fill(NaN,(size(x,2),size(Re,2))) # pre-locate output
for i = 1:size(??,2) # match with the number of assets
Y = Re[:,i]; # select assets
samples = zeros(B,size(??,2)+1) # pre-locate samples
for b=1:B
#Draw an index with replacement using the sample function
params0 = [??]; # set some random starting parameters
#Collect optimal values for each bootstrap.
end
#Generate bootstrap standard errors by taking the empirical computing the empirical standard deviation of the estimates of each coefficient.
end
```

Hint 2: When running into errors, try to reduce the number of boostrap samples to something smaller than 1,000 and then gradually increase it.

9

In [12]:

function loglike(rho, y, x) beta = rho[1:5] residual = y - x*beta sigma2 = exp(rho[6]) dist = Normal(0, sqrt(sigma2)) contributions = logpdf.(dist,residual) loglikelihood = sum(contributions) return -loglikelihood end B = 1000 # bootstrap repetitions bootstrapSE = fill(NaN,(size(x,2)+1,size(Re,2))) # pre-locate output X = [ones(T) Rme RSMB RHML RMOM] for i = 1:size(Re,2) # match with the number of assets Y = Re[:,i]; # select assets samples = zeros(B,6) # pre-locate samples for b=1:B anIndex = sample(1:T,T, replace=true, ordered=false) y = Y[anIndex,:] x = X[anIndex,:] function wrapLoglike(rho) return loglike(rho,y,x) end params0 = [.1,.2,.3,.4,.5,.6] # set some random starting parameters samples[b,:] = optimize(wrapLoglike,params0,NelderMead()).minimizer #Collect optimal values for each bootstrap. end bootstrapSE[i,:] = std(samples, dims =1) end println("\nStandard Errors from bootstrapping procedure:") printlnPs(" Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6") printlnPs("Constant: ", bootstrapSE[:,1]') printlnPs("Rme: ", bootstrapSE[:,2]') printlnPs("RSMB: ", bootstrapSE[:,3]') printlnPs("RHML: ", bootstrapSE[:,4]') printlnPs("RMOM: ", bootstrapSE[:,5]')

10

Standard Errors from bootstrapping procedure:
Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6
Constant: 0.042 0.035 0.029 0.032 0.057 0.050
Rme: 0.012 0.010 0.007 0.009 0.017 0.014
RSMB: 0.019 0.014 0.012 0.016 0.032 0.022
RHML: 0.016 0.022 0.011 0.016 0.038 0.026
RMOM: 0.012 0.012 0.009 0.009 0.022 0.016

This excercise is for ambitious students who would like to receive a high mark. Please note that you do not need to solve **Task 3** in order to give an elaborate answer here.

Give a brief inpretation of your result i.e. why bootstrapped confidence bands provide better finite sample performance. Be brief (no more than 100 words).

11

**Comment**

Since bootstrapping creates multiple resamples with replacement from a single finite set of observations, even if the data is not exactly normally distributed, the large number of indepedent resamples will approach normal distribution due to Central Limit Theorem with a sufficiently large sample size.

This removes the need for the assumption of normal distribution to be made in the linear regression.

Hence, in the case where the original sample set is not normally distributed, the bootstrapped confidence bands which are constructed based on the 25th ranked value and 95th ranked value (assuming 1000 resamples) are likely to provide better performance than simply taking the 5th and 95th percentile values in the sample dataset for hypothesis testing.

12

13

Replicating OLS by MLE in **Task 1** might look boring and one is tempted to ask why going through the effort of deriving a log-likelihood function when OLS just works fine? Therefore, we now turn to a more hands-on application of MLE.

Imagine, that after your degree, you start working in the risk managament division of a large hedge fund on the Cayman Islands. On the first day your manager gives you the following task:

- You get some data (
`mledata.csv`

) containing the maximum drawdown every month of one of the funds (numbers are in percentages) - load the data and plot a histogram and make an educated guess about the underlying distribution (1-2 sentences are enough) - Estimate the parameters $k$ and $\theta$ of a gamma($k$,$\theta$) distribution using MLE
- Based on the estimated parameters, what is the expected maximum drawdown for next month?

Hin 1: use `SpecialFunctions`

to calculate $\Gamma(k)$.

`using Pkg Pkg.add("SpecialFunctions")`

Hint 2: the gamma distribution PDF takes the form:

$\frac{1}{\Gamma(k)\theta^k}x^{k-1}e^{-\frac{x}{\theta}}$.

Hint 3: The maximum drawdown in this dataset is the maximum pecentage loss in each month relative to the highest value.

14