The data loading section is done for you, so you can focus on the other tasks.
File="./Data/6_Portfolios_2x3.csv"x=readdlm(File,',',skipstart=1)println("The first few rows of x")printmat(x[1:4,:])R=x[:,2:end]# Net returns of 6 assets.File="./Data/FFmFactorsFs.csv"x=readdlm(File,',',skipstart=1)println("The first few rows of x")printmat(x[1:4,:])(Rme,RSMB,RHML,RMOM,Rf)=(x[:,2],x[:,3],x[:,4],x[:,8],x[:,7])# Market excess return, SMB, HML, MOM, Rf.Re=R.-Rf# Excess returns of the 6 assets.(x,R)=(nothing,nothing)# Cleaning up, help avoiding mistakes below.println("size of Re: ",size(Re))
The first few rows of x
197901.000 8.560 7.900 9.589 3.240 5.077 6.785
197902.000 -3.444 -2.230 -1.914 -3.712 -2.369 -2.823
197903.000 11.660 8.339 8.486 5.710 5.687 7.484
197904.000 1.651 2.998 3.310 0.193 0.599 0.635
The first few rows of x
197901.000 4.230 3.840 2.290 -2.500 1.580 0.770 -1.230 -0.750 5.550
197902.000 -3.560 0.530 1.210 -1.150 1.040 0.730 -1.010 0.960 1.320
197903.000 5.680 3.180 -0.700 0.700 0.280 0.810 2.870 -0.360 1.370
197904.000 -0.060 2.400 1.050 1.180 0.210 0.800 0.830 -0.390 -0.340
size of Re: (473, 6)
We run OLS of each of the returns on (1, Rme, RSMB, RHML, RMOM) and report the regression coeffs and their t-stats. The returns are 6 portfolios formed on Size and Book-to-Market that can be retrieved from Kenneth R. French's website.
T,n=size(Re)x=[ones(T)RmeRSMBRHMLRMOM]#regressorsK=size(x,2)b=fill(NaN,(K,n))stdb=copy(b)VarRes=fill(NaN,n)fori=1:ny=Re[:,i]#standard OLS notation b_i=x\yu=y-x*b_ib[:,i]=b_icovb=inv(x'x)*var(u)#cov(b), see any textbook or Wikipediastdb[:,i]=sqrt.(diag(covb))#std(b)endprintln("\nOLS coefficients, regressing Re on [constant; Rme; RSMB; RHML; MOM] ")printlnPs(" Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6")printlnPs("Constant: ",b[1,:])# all the means are printed by using printlnPSprintlnPs("Rme: ",b[2,:])printlnPs("RSMB: ",b[3,:])printlnPs("RHML: ",b[4,:])printlnPs("RMOM: ",b[5,:])tstat=b./stdbprintln("tstats of OLS coefficients")printlnPs(" Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6")printlnPs("Constant: ",tstat[1,:])# all the means are printed by using printlnPSprintlnPs("Rme: ",tstat[2,:])printlnPs("RSMB: ",tstat[3,:])printlnPs("RHML: ",tstat[4,:])printlnPs("RMOM: ",tstat[5,:])
Estimate the coefficients using Maximum Likelhood Estimation (MLE) of the returns on (1, Rme, RSMB, RHML, RMOM). Report the regression coeffs. The goal is to show that under normal error assumption, as is typically assumed in linear regression, MLE and the OLS lead to identical estimates.
using Distributions, Optim
#Calculate $\sigma^2$ to initialize the appropriate normal distribution
#Returning the negative of the sum of the individual contributions to the log-likelihood
output = fill(NaN,(size(),size())
for i = 1:???
#Find the MLE, \hat\rho.
params0 = ; # some random starting parameters
optimum = optimize(???,???,ConjugateGradient());
Hint 2: optimum.minizer contains the estimates.
Hint 3: When experiencing troubles try different optimization methods:
usingDistributions,Optimx=[ones(T)RmeRSMBRHMLRMOM]functionloglike(rho)beta=rho[1:5]sigma2=exp(rho)residual=y-x*betadist=Normal(0,sqrt(sigma2))contributions=logpdf.(dist,residual)loglikelihood=sum(contributions)return-loglikelihoodendMLE=fill(NaN,(6,size(Re,2)))fori=1:ny=Re[:,i]globalyparams0=[.1,.2,.3,.4,.5,.6]optimum=optimize(loglike,params0,NelderMead())MLE[:,i]=optimum.minimizerMLE[6,i]=MLE[6,i]^2endprintln("\nCoefficients from MLE, regressing Re on [constant; Rme; RSMB; RHML; MOM]")printlnPs(" Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6")printlnPs("Constant: ",MLE[1,:])# all the means are printed by using printlnPSprintlnPs("Rme: ",MLE[2,:])printlnPs("RSMB: ",MLE[3,:])printlnPs("RHML: ",MLE[4,:])printlnPs("RMOM: ",MLE[5,:])ifround.(b,digits=3)==round.(MLE[1:5,:],digits=3)println("\nMLE and OLS results are the same.")elseprintln("\nMLE and OLS results are different.")end
This excercise is for ambitious students who would like to receive a high mark.
Suppose that we wish to test if the the parameter estimates of ρ are statistically different from zero. Suppose further that we do not know how to compute analytically the standard errors of the MLE parameter estimates.
We decide to (non-parametrically) bootstrap by resampling cases in order to estimate the standard errors. This means that we treat the sample of N individuals as if it were a population from which we randomly draw B samples, each of size N. This produces a sample of MLEs of size B, that is, it provides an empirical approximation to the distribution of the MLE. From the empirical approximation, we can compare the full-sample point MLE to the MLE distribution under the null hypothesis.
Bootstrap samples can be easily generated using the built-in function sample(). Each bootstrap sample should be drawn with replacment from the original sample and should have the same number of observations.
Proceed in three steps:
1.) Make sure that the log likelihood function has both independent and dependent variables as an input.
2.)et up a bootstrap function for the standard errors (see Hint 1).
B = 1000 # bootstrap repetitions
bootstrapSE = fill(NaN,(size(x,2),size(Re,2))) # pre-locate output
for i = 1:size(??,2) # match with the number of assets
Y = Re[:,i]; # select assets
samples = zeros(B,size(??,2)+1) # pre-locate samples
#Draw an index with replacement using the sample function
params0 = [??]; # set some random starting parameters
#Collect optimal values for each bootstrap.
#Generate bootstrap standard errors by taking the empirical computing the empirical standard deviation of the estimates of each coefficient.
Hint 2: When running into errors, try to reduce the number of boostrap samples to something smaller than 1,000 and then gradually increase it.
functionloglike(rho,y,x)beta=rho[1:5]residual=y-x*betasigma2=exp(rho)dist=Normal(0,sqrt(sigma2))contributions=logpdf.(dist,residual)loglikelihood=sum(contributions)return-loglikelihoodendB=1000# bootstrap repetitionsbootstrapSE=fill(NaN,(size(x,2)+1,size(Re,2)))# pre-locate outputX=[ones(T)RmeRSMBRHMLRMOM]fori=1:size(Re,2)# match with the number of assetsY=Re[:,i];# select assetssamples=zeros(B,6)# pre-locate samplesforb=1:BanIndex=sample(1:T,T,replace=true,ordered=false)y=Y[anIndex,:]x=X[anIndex,:]functionwrapLoglike(rho)returnloglike(rho,y,x)endparams0=[.1,.2,.3,.4,.5,.6]# set some random starting parameterssamples[b,:]=optimize(wrapLoglike,params0,NelderMead()).minimizer#Collect optimal values for each bootstrap. endbootstrapSE[i,:]=std(samples,dims=1)endprintln("\nStandard Errors from bootstrapping procedure:")printlnPs(" Asset 1 Asset 2 Asset 3 Asset 4 Asset 5 Asset 6")printlnPs("Constant: ",bootstrapSE[:,1]')printlnPs("Rme: ",bootstrapSE[:,2]')printlnPs("RSMB: ",bootstrapSE[:,3]')printlnPs("RHML: ",bootstrapSE[:,4]')printlnPs("RMOM: ",bootstrapSE[:,5]')
This excercise is for ambitious students who would like to receive a high mark. Please note that you do not need to solve Task 3 in order to give an elaborate answer here.
Give a brief inpretation of your result i.e. why bootstrapped confidence bands provide better finite sample performance. Be brief (no more than 100 words).
Since bootstrapping creates multiple resamples with replacement from a single finite set of observations, even if the data is not exactly normally distributed, the large number of indepedent resamples will approach normal distribution due to Central Limit Theorem with a sufficiently large sample size. This removes the need for the assumption of normal distribution to be made in the linear regression.
Hence, in the case where the original sample set is not normally distributed, the bootstrapped confidence bands which are constructed based on the 25th ranked value and 95th ranked value (assuming 1000 resamples) are likely to provide better performance than simply taking the 5th and 95th percentile values in the sample dataset for hypothesis testing.
Task 5 (30 points)
Replicating OLS by MLE in Task 1 might look boring and one is tempted to ask why going through the effort of deriving a log-likelihood function when OLS just works fine? Therefore, we now turn to a more hands-on application of MLE.
Imagine, that after your degree, you start working in the risk managament division of a large hedge fund on the Cayman Islands. On the first day your manager gives you the following task:
You get some data (mledata.csv) containing the maximum drawdown every month of one of the funds (numbers are in percentages) - load the data and plot a histogram and make an educated guess about the underlying distribution (1-2 sentences are enough)
Estimate the parameters k and θ of a gamma(k,θ) distribution using MLE
Based on the estimated parameters, what is the expected maximum drawdown for next month?
Hin 1: use SpecialFunctions to calculate Γ(k).
using Pkg Pkg.add("SpecialFunctions")
Hint 2: the gamma distribution PDF takes the form:
Hint 3: The maximum drawdown in this dataset is the maximum pecentage loss in each month relative to the highest value.