Load Packages

In [1]:

using Distributions, DelimitedFiles, LinearAlgebra, Optim, Dates

include("./printmat.jl")

using Plots
# gr(size=(600,400))
#pyplot(size=(600,400))    
gr(size=(480,320))
default(fmt = :svg)

Load Data

The data loading section is done for you, so you can focus on the other tasks.

In [2]:

File = "./Data/6_Portfolios_2x3.csv"
x    = readdlm(File,',',skipstart=1)

println("The first few rows of x")
printmat(x[1:4,:])
R    = x[:,2:end]             # Net returns of 6 assets.

File = "./Data/FFmFactorsFs.csv"
x    = readdlm(File,',',skipstart=1)

println("The first few rows of x")
printmat(x[1:4,:])
(Rme,RSMB,RHML,RMOM,Rf) = (x[:,2],x[:,3],x[:,4],x[:,8],x[:,7])  # Market excess return, SMB, HML, MOM, Rf.

Re = R .- Rf                  # Excess returns of the 6 assets.
(x,R) = (nothing,nothing)     # Cleaning up, help avoiding mistakes below.

println("size of Re: ", size(Re))

The first few rows of x
197901.000     8.560     7.900     9.589     3.240     5.077     6.785
197902.000    -3.444    -2.230    -1.914    -3.712    -2.369    -2.823
197903.000    11.660     8.339     8.486     5.710     5.687     7.484
197904.000     1.651     2.998     3.310     0.193     0.599     0.635

The first few rows of x
197901.000     4.230     3.840     2.290    -2.500     1.580     0.770    -1.230    -0.750     5.550
197902.000    -3.560     0.530     1.210    -1.150     1.040     0.730    -1.010     0.960     1.320
197903.000     5.680     3.180    -0.700     0.700     0.280     0.810     2.870    -0.360     1.370
197904.000    -0.060     2.400     1.050     1.180     0.210     0.800     0.830    -0.390    -0.340

size of Re: (473, 6)

Linear regressions

We run OLS of each of the returns on (1, Rme, RSMB, RHML, RMOM) and report the regression coeffs and their t-stats. The returns are 6 portfolios formed on Size and Book-to-Market that can be retrieved from Kenneth R. French's website.

In [3]:

T,n = size(Re)

x      = [ones(T) Rme RSMB RHML RMOM]     #regressors
K      = size(x,2)

b      = fill(NaN,(K,n))
stdb   = copy(b)
VarRes = fill(NaN,n)
for i = 1:n
    y         = Re[:,i]                   #standard OLS notation 
    b_i       = x\y                      
    u         = y - x*b_i
    b[:,i]    = b_i
    covb      = inv(x'x)*var(u)           #cov(b), see any textbook or Wikipedia
    stdb[:,i] = sqrt.(diag(covb))         #std(b)
end  


println("\nOLS coefficients, regressing Re on [constant; Rme; RSMB; RHML; MOM] ")
printlnPs("             Asset 1   Asset 2   Asset 3   Asset 4   Asset 5   Asset 6")
printlnPs("Constant: ",   b[1,:]) # all the means are printed by using printlnPS
printlnPs("Rme:      ",   b[2,:])
printlnPs("RSMB:     ",   b[3,:])
printlnPs("RHML:     ",   b[4,:])
printlnPs("RMOM:     ",   b[5,:])

tstat = b./stdb
println("tstats of OLS coefficients")
printlnPs("             Asset 1   Asset 2   Asset 3   Asset 4   Asset 5   Asset 6")
printlnPs("Constant: ",   tstat[1,:]) # all the means are printed by using printlnPS
printlnPs("Rme:      ",   tstat[2,:])
printlnPs("RSMB:     ",   tstat[3,:])
printlnPs("RHML:     ",   tstat[4,:])
printlnPs("RMOM:     ",   tstat[5,:])

OLS coefficients, regressing Re on [constant; Rme; RSMB; RHML; MOM] 
             Asset 1   Asset 2   Asset 3   Asset 4   Asset 5   Asset 6
Constant:     -0.174     0.119     0.085     0.129    -0.051    -0.130
Rme:           1.079     0.952     0.982     0.979     0.988     1.076
RSMB:          1.035     0.817     0.865    -0.163    -0.147     0.007
RHML:         -0.421     0.258     0.578    -0.260     0.333     0.742
RMOM:         -0.044    -0.002     0.003     0.001    -0.010    -0.046
tstats of OLS coefficients
             Asset 1   Asset 2   Asset 3   Asset 4   Asset 5   Asset 6
Constant:     -4.209     3.514     3.021     4.024    -0.854    -2.543
Rme:         110.165   118.809   148.591   128.943    70.569    89.096
RSMB:         73.088    70.579    90.552   -14.805    -7.265     0.407
RHML:        -28.526    21.396    58.085   -22.721    15.783    40.772
RMOM:         -4.698    -0.324     0.529     0.140    -0.723    -4.012

Task 1 (35 points)

Estimate the coefficients using Maximum Likelhood Estimation (MLE) of the returns on (1, Rme, RSMB, RHML, RMOM). Report the regression coeffs. The goal is to show that under normal error assumption, as is typically assumed in linear regression, MLE and the OLS lead to identical estimates.

Hint 1:

using Distributions, Optim


function loglike(rho)
   #Calculate residuals
   #Calculate  $\sigma^2$ to initialize the appropriate normal distribution
   #Returning the negative of the sum of the individual contributions to the log-likelihood 
   return -loglikelihood
end


output = fill(NaN,(size(),size())
for i = 1:???

    #Find the MLE, \hat\rho.
    params0 = []; # some random starting parameters
    optimum = optimize(???,???,ConjugateGradient());
end

Hint 2: optimum.minizer contains the estimates.

Hint 3: When experiencing troubles try different optimization methods:

NelderMead()
SimulatedAnnealing()
GradientDescent()
see http://julianlsolvers.github.io/Optim.jl/latest/#algo/nelder_mead/ for further methods

In [11]:

using Distributions, Optim

x = [ones(T) Rme RSMB RHML RMOM] 

function loglike(rho)
   beta = rho[1:5]
   sigma2 = exp(rho[6])
   residual = y-x*beta
   dist = Normal(0, sqrt(sigma2))
   contributions = logpdf.(dist,residual)
   loglikelihood = sum(contributions)
      return -loglikelihood
end


MLE = fill(NaN,(6,size(Re,2)))
for i = 1:n
    y = Re[:,i]
    global y
    params0 = [.1,.2,.3,.4,.5,.6]
    optimum = optimize(loglike,params0,NelderMead())
    MLE[:,i] = optimum.minimizer
    MLE[6,i] = MLE[6,i]^2
end

println("\nCoefficients from MLE, regressing Re on [constant; Rme; RSMB; RHML; MOM]")
printlnPs("             Asset 1   Asset 2   Asset 3   Asset 4   Asset 5   Asset 6")
printlnPs("Constant: ",   MLE[1,:]) # all the means are printed by using printlnPS
printlnPs("Rme:      ",   MLE[2,:])
printlnPs("RSMB:     ",   MLE[3,:])
printlnPs("RHML:     ",   MLE[4,:])
printlnPs("RMOM:     ",   MLE[5,:])


if round.(b,digits =3) == round.(MLE[1:5,:],digits =3)
    println("\nMLE and OLS results are the same.")
else
    println("\nMLE and OLS results are different.")
end

Coefficients from MLE, regressing Re on [constant; Rme; RSMB; RHML; MOM]
             Asset 1   Asset 2   Asset 3   Asset 4   Asset 5   Asset 6
Constant:     -0.174     0.119     0.085     0.129    -0.051    -0.130
Rme:           1.079     0.952     0.982     0.979     0.988     1.076
RSMB:          1.035     0.817     0.865    -0.163    -0.147     0.007
RHML:         -0.421     0.258     0.578    -0.260     0.333     0.742
RMOM:         -0.044    -0.002     0.003     0.001    -0.010    -0.046

MLE and OLS results are the same.

Task 2 (20 points)

This excercise is for ambitious students who would like to receive a high mark.

Suppose that we wish to test if the the parameter estimates of $\rho$ are statistically different from zero. Suppose further that we do not know how to compute analytically the standard errors of the MLE parameter estimates.

We decide to (non-parametrically) bootstrap by resampling cases in order to estimate the standard errors. This means that we treat the sample of N individuals as if it were a population from which we randomly draw B samples, each of size N. This produces a sample of MLEs of size B, that is, it provides an empirical approximation to the distribution of the MLE. From the empirical approximation, we can compare the full-sample point MLE to the MLE distribution under the null hypothesis.

Bootstrap samples can be easily generated using the built-in function sample(). Each bootstrap sample should be drawn with replacment from the original sample and should have the same number of observations.

Proceed in three steps:

1.) Make sure that the log likelihood function has both independent and dependent variables as an input. 2.)et up a bootstrap function for the standard errors (see Hint 1).

Hint 1:

B = 1000 # bootstrap repetitions
bootstrapSE = fill(NaN,(size(x,2),size(Re,2))) # pre-locate output
for i = 1:size(??,2) # match with the number of assets
    Y = Re[:,i];     # select assets
    samples = zeros(B,size(??,2)+1) # pre-locate samples
    for b=1:B
       #Draw an index with replacement using the sample function

       params0 = [??]; # set some random starting parameters
       
       #Collect optimal values for each bootstrap. 
    end
#Generate bootstrap standard errors by taking the empirical computing the empirical standard deviation of the estimates of each coefficient.
end

Hint 2: When running into errors, try to reduce the number of boostrap samples to something smaller than 1,000 and then gradually increase it.

In [12]:

function loglike(rho, y, x)
   beta = rho[1:5]
   residual = y - x*beta
   sigma2 = exp(rho[6])
   dist = Normal(0, sqrt(sigma2))
   contributions = logpdf.(dist,residual)
   loglikelihood = sum(contributions)
   return -loglikelihood
end



B = 1000 # bootstrap repetitions
bootstrapSE = fill(NaN,(size(x,2)+1,size(Re,2))) # pre-locate output

X      = [ones(T) Rme RSMB RHML RMOM] 

for i = 1:size(Re,2) # match with the number of assets
    Y = Re[:,i];     # select assets

    samples = zeros(B,6) # pre-locate samples
    for b=1:B
        
        anIndex = sample(1:T,T, replace=true, ordered=false)
        y = Y[anIndex,:]
        x = X[anIndex,:]
        
        function wrapLoglike(rho)
            return loglike(rho,y,x)
        end
        
        params0 = [.1,.2,.3,.4,.5,.6]  # set some random starting parameters
        samples[b,:] = optimize(wrapLoglike,params0,NelderMead()).minimizer
        
       #Collect optimal values for each bootstrap. 
    end
    
    bootstrapSE[i,:] = std(samples, dims =1)
end

println("\nStandard Errors from bootstrapping procedure:")
printlnPs("             Asset 1   Asset 2   Asset 3   Asset 4   Asset 5   Asset 6")
printlnPs("Constant: ",   bootstrapSE[:,1]')
printlnPs("Rme:      ",   bootstrapSE[:,2]')
printlnPs("RSMB:     ",   bootstrapSE[:,3]')
printlnPs("RHML:     ",   bootstrapSE[:,4]')
printlnPs("RMOM:     ",   bootstrapSE[:,5]')

Standard Errors from bootstrapping procedure:
             Asset 1   Asset 2   Asset 3   Asset 4   Asset 5   Asset 6
Constant:      0.042     0.035     0.029     0.032     0.057     0.050
Rme:           0.012     0.010     0.007     0.009     0.017     0.014
RSMB:          0.019     0.014     0.012     0.016     0.032     0.022
RHML:          0.016     0.022     0.011     0.016     0.038     0.026
RMOM:          0.012     0.012     0.009     0.009     0.022     0.016

Task 4 (15 points)

This excercise is for ambitious students who would like to receive a high mark. Please note that you do not need to solve Task 3 in order to give an elaborate answer here.

Give a brief inpretation of your result i.e. why bootstrapped confidence bands provide better finite sample performance. Be brief (no more than 100 words).

Comment
Since bootstrapping creates multiple resamples with replacement from a single finite set of observations, even if the data is not exactly normally distributed, the large number of indepedent resamples will approach normal distribution due to Central Limit Theorem with a sufficiently large sample size.
This removes the need for the assumption of normal distribution to be made in the linear regression.
Hence, in the case where the original sample set is not normally distributed, the bootstrapped confidence bands which are constructed based on the 25th ranked value and 95th ranked value (assuming 1000 resamples) are likely to provide better performance than simply taking the 5th and 95th percentile values in the sample dataset for hypothesis testing.

Task 5 (30 points)

Replicating OLS by MLE in Task 1 might look boring and one is tempted to ask why going through the effort of deriving a log-likelihood function when OLS just works fine? Therefore, we now turn to a more hands-on application of MLE.

Imagine, that after your degree, you start working in the risk managament division of a large hedge fund on the Cayman Islands. On the first day your manager gives you the following task:

You get some data (mledata.csv) containing the maximum drawdown every month of one of the funds (numbers are in percentages) - load the data and plot a histogram and make an educated guess about the underlying distribution (1-2 sentences are enough)
Estimate the parameters $k$ and $\theta$ of a gamma( $k$ , $\theta$ ) distribution using MLE
Based on the estimated parameters, what is the expected maximum drawdown for next month?

Hin 1: use SpecialFunctions to calculate $\Gamma(k)$ .

using Pkg Pkg.add("SpecialFunctions")

Hint 2: the gamma distribution PDF takes the form:

$\frac{1}{\Gamma(k)\theta^k}x^{k-1}e^{-\frac{x}{\theta}}$ .

Hint 3: The maximum drawdown in this dataset is the maximum pecentage loss in each month relative to the highest value.