SharedProblemSets_BLANK / PS06 / PS06_MLE_v2.ipynbOpen in CoCalc
Author: Chi Hern Ang
Views : 18

In [1]:
using Distributions, DelimitedFiles, LinearAlgebra, Optim, Dates

include("./printmat.jl")

using Plots
# gr(size=(600,400))
#pyplot(size=(600,400))
gr(size=(480,320))
default(fmt = :svg)


In [2]:
File = "./Data/6_Portfolios_2x3.csv"

println("The first few rows of x")
printmat(x[1:4,:])
R    = x[:,2:end]             # Net returns of 6 assets.

File = "./Data/FFmFactorsFs.csv"

println("The first few rows of x")
printmat(x[1:4,:])
(Rme,RSMB,RHML,RMOM,Rf) = (x[:,2],x[:,3],x[:,4],x[:,8],x[:,7])  # Market excess return, SMB, HML, MOM, Rf.

Re = R .- Rf                  # Excess returns of the 6 assets.
(x,R) = (nothing,nothing)     # Cleaning up, help avoiding mistakes below.

println("size of Re: ", size(Re))

The first few rows of x 197901.000 8.560 7.900 9.589 3.240 5.077 6.785 197902.000 -3.444 -2.230 -1.914 -3.712 -2.369 -2.823 197903.000 11.660 8.339 8.486 5.710 5.687 7.484 197904.000 1.651 2.998 3.310 0.193 0.599 0.635 The first few rows of x 197901.000 4.230 3.840 2.290 -2.500 1.580 0.770 -1.230 -0.750 5.550 197902.000 -3.560 0.530 1.210 -1.150 1.040 0.730 -1.010 0.960 1.320 197903.000 5.680 3.180 -0.700 0.700 0.280 0.810 2.870 -0.360 1.370 197904.000 -0.060 2.400 1.050 1.180 0.210 0.800 0.830 -0.390 -0.340 size of Re: (473, 6)

## Linear regressions

We run OLS of each of the returns on (1, Rme, RSMB, RHML, RMOM) and report the regression coeffs and their t-stats. The returns are 6 portfolios formed on Size and Book-to-Market that can be retrieved from Kenneth R. French's website.

In [3]:
T,n = size(Re)

x      = [ones(T) Rme RSMB RHML RMOM]     #regressors
K      = size(x,2)

b      = fill(NaN,(K,n))
stdb   = copy(b)
VarRes = fill(NaN,n)
for i = 1:n
y         = Re[:,i]                   #standard OLS notation
b_i       = x\y
u         = y - x*b_i
b[:,i]    = b_i
covb      = inv(x'x)*var(u)           #cov(b), see any textbook or Wikipedia
stdb[:,i] = sqrt.(diag(covb))         #std(b)
end

println("\nOLS coefficients, regressing Re on [constant; Rme; RSMB; RHML; MOM] ")
printlnPs("             Asset 1   Asset 2   Asset 3   Asset 4   Asset 5   Asset 6")
printlnPs("Constant: ",   b[1,:]) # all the means are printed by using printlnPS
printlnPs("Rme:      ",   b[2,:])
printlnPs("RSMB:     ",   b[3,:])
printlnPs("RHML:     ",   b[4,:])
printlnPs("RMOM:     ",   b[5,:])

tstat = b./stdb
println("tstats of OLS coefficients")
printlnPs("             Asset 1   Asset 2   Asset 3   Asset 4   Asset 5   Asset 6")
printlnPs("Constant: ",   tstat[1,:]) # all the means are printed by using printlnPS
printlnPs("Rme:      ",   tstat[2,:])
printlnPs("RSMB:     ",   tstat[3,:])
printlnPs("RHML:     ",   tstat[4,:])
printlnPs("RMOM:     ",   tstat[5,:])

OLS coefficients, regressing Re on [constant; Rme; RSMB; RHML; MOM] Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6 Constant: -0.174 0.119 0.085 0.129 -0.051 -0.130 Rme: 1.079 0.952 0.982 0.979 0.988 1.076 RSMB: 1.035 0.817 0.865 -0.163 -0.147 0.007 RHML: -0.421 0.258 0.578 -0.260 0.333 0.742 RMOM: -0.044 -0.002 0.003 0.001 -0.010 -0.046 tstats of OLS coefficients Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6 Constant: -4.209 3.514 3.021 4.024 -0.854 -2.543 Rme: 110.165 118.809 148.591 128.943 70.569 89.096 RSMB: 73.088 70.579 90.552 -14.805 -7.265 0.407 RHML: -28.526 21.396 58.085 -22.721 15.783 40.772 RMOM: -4.698 -0.324 0.529 0.140 -0.723 -4.012

Estimate the coefficients using Maximum Likelhood Estimation (MLE) of the returns on (1, Rme, RSMB, RHML, RMOM). Report the regression coeffs. The goal is to show that under normal error assumption, as is typically assumed in linear regression, MLE and the OLS lead to identical estimates.

Hint 1:

using Distributions, Optim

function loglike(rho)
#Calculate residuals
#Calculate  $\sigma^2$ to initialize the appropriate normal distribution
#Returning the negative of the sum of the individual contributions to the log-likelihood
return -loglikelihood
end

output = fill(NaN,(size(),size())
for i = 1:???

#Find the MLE, \hat\rho.
params0 = []; # some random starting parameters
end


Hint 2: optimum.minizer contains the estimates.

Hint 3: When experiencing troubles try different optimization methods:

In [11]:
using Distributions, Optim

x = [ones(T) Rme RSMB RHML RMOM]

function loglike(rho)
beta = rho[1:5]
sigma2 = exp(rho[6])
residual = y-x*beta
dist = Normal(0, sqrt(sigma2))
contributions = logpdf.(dist,residual)
loglikelihood = sum(contributions)
return -loglikelihood
end

MLE = fill(NaN,(6,size(Re,2)))
for i = 1:n
y = Re[:,i]
global y
params0 = [.1,.2,.3,.4,.5,.6]
MLE[:,i] = optimum.minimizer
MLE[6,i] = MLE[6,i]^2
end

println("\nCoefficients from MLE, regressing Re on [constant; Rme; RSMB; RHML; MOM]")
printlnPs("             Asset 1   Asset 2   Asset 3   Asset 4   Asset 5   Asset 6")
printlnPs("Constant: ",   MLE[1,:]) # all the means are printed by using printlnPS
printlnPs("Rme:      ",   MLE[2,:])
printlnPs("RSMB:     ",   MLE[3,:])
printlnPs("RHML:     ",   MLE[4,:])
printlnPs("RMOM:     ",   MLE[5,:])

if round.(b,digits =3) == round.(MLE[1:5,:],digits =3)
println("\nMLE and OLS results are the same.")
else
println("\nMLE and OLS results are different.")
end

Coefficients from MLE, regressing Re on [constant; Rme; RSMB; RHML; MOM] Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6 Constant: -0.174 0.119 0.085 0.129 -0.051 -0.130 Rme: 1.079 0.952 0.982 0.979 0.988 1.076 RSMB: 1.035 0.817 0.865 -0.163 -0.147 0.007 RHML: -0.421 0.258 0.578 -0.260 0.333 0.742 RMOM: -0.044 -0.002 0.003 0.001 -0.010 -0.046 MLE and OLS results are the same.

This excercise is for ambitious students who would like to receive a high mark.

Suppose that we wish to test if the the parameter estimates of $\rho$ are statistically different from zero. Suppose further that we do not know how to compute analytically the standard errors of the MLE parameter estimates.

We decide to (non-parametrically) bootstrap by resampling cases in order to estimate the standard errors. This means that we treat the sample of N individuals as if it were a population from which we randomly draw B samples, each of size N. This produces a sample of MLEs of size B, that is, it provides an empirical approximation to the distribution of the MLE. From the empirical approximation, we can compare the full-sample point MLE to the MLE distribution under the null hypothesis.

Bootstrap samples can be easily generated using the built-in function sample(). Each bootstrap sample should be drawn with replacment from the original sample and should have the same number of observations.

Proceed in three steps:

1.) Make sure that the log likelihood function has both independent and dependent variables as an input. 2.)et up a bootstrap function for the standard errors (see Hint 1).

Hint 1:

B = 1000 # bootstrap repetitions
bootstrapSE = fill(NaN,(size(x,2),size(Re,2))) # pre-locate output
for i = 1:size(??,2) # match with the number of assets
Y = Re[:,i];     # select assets
samples = zeros(B,size(??,2)+1) # pre-locate samples
for b=1:B
#Draw an index with replacement using the sample function

params0 = [??]; # set some random starting parameters

#Collect optimal values for each bootstrap.
end
#Generate bootstrap standard errors by taking the empirical computing the empirical standard deviation of the estimates of each coefficient.
end


Hint 2: When running into errors, try to reduce the number of boostrap samples to something smaller than 1,000 and then gradually increase it.

In [12]:
function loglike(rho, y, x)
beta = rho[1:5]
residual = y - x*beta
sigma2 = exp(rho[6])
dist = Normal(0, sqrt(sigma2))
contributions = logpdf.(dist,residual)
loglikelihood = sum(contributions)
return -loglikelihood
end

B = 1000 # bootstrap repetitions
bootstrapSE = fill(NaN,(size(x,2)+1,size(Re,2))) # pre-locate output

X      = [ones(T) Rme RSMB RHML RMOM]

for i = 1:size(Re,2) # match with the number of assets
Y = Re[:,i];     # select assets

samples = zeros(B,6) # pre-locate samples
for b=1:B

anIndex = sample(1:T,T, replace=true, ordered=false)
y = Y[anIndex,:]
x = X[anIndex,:]

function wrapLoglike(rho)
return loglike(rho,y,x)
end

params0 = [.1,.2,.3,.4,.5,.6]  # set some random starting parameters

#Collect optimal values for each bootstrap.
end

bootstrapSE[i,:] = std(samples, dims =1)
end

println("\nStandard Errors from bootstrapping procedure:")
printlnPs("             Asset 1   Asset 2   Asset 3   Asset 4   Asset 5   Asset 6")
printlnPs("Constant: ",   bootstrapSE[:,1]')
printlnPs("Rme:      ",   bootstrapSE[:,2]')
printlnPs("RSMB:     ",   bootstrapSE[:,3]')
printlnPs("RHML:     ",   bootstrapSE[:,4]')
printlnPs("RMOM:     ",   bootstrapSE[:,5]')

Standard Errors from bootstrapping procedure: Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6 Constant: 0.042 0.035 0.029 0.032 0.057 0.050 Rme: 0.012 0.010 0.007 0.009 0.017 0.014 RSMB: 0.019 0.014 0.012 0.016 0.032 0.022 RHML: 0.016 0.022 0.011 0.016 0.038 0.026 RMOM: 0.012 0.012 0.009 0.009 0.022 0.016

This excercise is for ambitious students who would like to receive a high mark. Please note that you do not need to solve Task 3 in order to give an elaborate answer here.

Give a brief inpretation of your result i.e. why bootstrapped confidence bands provide better finite sample performance. Be brief (no more than 100 words).

Comment
Since bootstrapping creates multiple resamples with replacement from a single finite set of observations, even if the data is not exactly normally distributed, the large number of indepedent resamples will approach normal distribution due to Central Limit Theorem with a sufficiently large sample size.
This removes the need for the assumption of normal distribution to be made in the linear regression.
Hence, in the case where the original sample set is not normally distributed, the bootstrapped confidence bands which are constructed based on the 25th ranked value and 95th ranked value (assuming 1000 resamples) are likely to provide better performance than simply taking the 5th and 95th percentile values in the sample dataset for hypothesis testing.

Replicating OLS by MLE in Task 1 might look boring and one is tempted to ask why going through the effort of deriving a log-likelihood function when OLS just works fine? Therefore, we now turn to a more hands-on application of MLE.

Imagine, that after your degree, you start working in the risk managament division of a large hedge fund on the Cayman Islands. On the first day your manager gives you the following task:

1. You get some data (mledata.csv) containing the maximum drawdown every month of one of the funds (numbers are in percentages) - load the data and plot a histogram and make an educated guess about the underlying distribution (1-2 sentences are enough)
2. Estimate the parameters $k$ and $\theta$ of a gamma($k$,$\theta$) distribution using MLE
3. Based on the estimated parameters, what is the expected maximum drawdown for next month?

Hin 1: use SpecialFunctions to calculate $\Gamma(k)$.

using Pkg Pkg.add("SpecialFunctions")

Hint 2: the gamma distribution PDF takes the form:

$\frac{1}{\Gamma(k)\theta^k}x^{k-1}e^{-\frac{x}{\theta}}$.

Hint 3: The maximum drawdown in this dataset is the maximum pecentage loss in each month relative to the highest value.