Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Views: 198
Kernel: Julia 1.2

Load Packages

using Pkg # Pkg.add("RCall") using RCall include("./printmat.jl") using Plots # gr(size=(600,400)) #pyplot(size=(600,400)) gr(size=(480,320)) default(fmt = :svg)

Calling R from Julia

R is the leading programming language when it comes to statistics and econometrics. Its power emerges from thousands of well documented packages oftentimes written by professors and leading researchers themselves. When time is scarce to code lines of GARCH estimation code or systems of VAR equations, a good option is to transfer data to R and let it do the computations. The (registered) package needed is RCall. There are several ways of interacting with R in Julia, all are more or less equally straightforward. We will stick to treating R as a black box: put in the data, press a button of required label and extract the result. Full and easy-to-understand documentation is available here.

Task 1 (10 points)

Using R from Julia, simulate two sequences (100 values each) following Cauchy distribution with location = 0 and scale = 1. Transport them to Julia and assign them to variables y1 and y2. The documentation on Cauchy distribution in R is here.

R"y1 <- rcauchy(100, location = 0, scale = 1)"; R"y2 <- rcauchy(100, location = 0, scale = 1)";
@rget y1 y2; println("y1") print(y1) println("\ny2") print(y2)
y1 [-0.40127448184851017, -0.8827806894415366, -2.630885626879231, -0.22787564323298118, 1.8696619799104872, 0.5611128834749841, 1.0064766997114047, 0.4415505195783929, 2.1396018741559297, 2.4502431598217917, 1.4237987715301852, 8.0061239768102, 0.4482553262705586, -1.0900655790920193, 9.242760595231688, 0.8396905468314516, 1.5309364832539532, -0.6840932631995894, 2.053594603809899, -1.3030266382445599, 0.16970050003133336, 3.908729400408724, -1.2284860956646613, 4.955058126819033, -0.37966360784696945, -1.1109886050068953, 0.8295033310925066, -0.4601945278565793, 1.9839964505551624, 0.9705125478483894, 5.327440922103983, -4.4219383542285655, 4.733822765732337, -0.41104021951223224, 0.777517259364673, 0.006233009965834879, -1.8898014435748318, -1.8434869990608191, -0.3426983756084228, 3.15281122383142, 1.2163816270934296, 0.8124791941057143, 1.5009734077040522, -0.35571476246077366, -3.304445958881701, 0.09118775312742781, 3.629689265301339, 0.0837028556332276, -4.3923416236843345, 0.21269028892365704, -8.773989127995385, 1.8630250092725957, 0.40577690813580497, 0.8594285113217158, 0.007435287218386363, -1.0961442669444508, 1.9735353032999567, -6.765891983128649, 0.9299340704967338, 112.31396269760704, 0.30976011166216877, 2.232124339650933, 0.5902102073737298, 0.9547688033341911, -11.692525071159958, -0.4971060611206508, 8.59718785250857, -1.8988412447854124, 1.5767729377571855, -69.6299914208395, 0.3595933162744196, -4.237728569116978, -0.4745511570431522, 1.2699606580939127, -0.04385275382838605, 0.6271257405974169, -0.37867027653446356, -1.065384971984369, -0.027794463006654613, 2.3580720152073815, -0.7212617034000289, -0.08066046578357954, -0.619510035597776, 1.2817110666393354, -0.13874899956406325, -1.2474287599685443, 0.1729188426649283, 0.19833143228015365, 2.717188248989831, -0.552814234178762, 1.4367602420405503, -1.4237272927091496, -6.635570675632419, 3.471097608407259, -0.011330515450909757, 1.7636067589865712, -0.37778040124019713, 1.1243783657675066, 3.1135738828907136, 0.5639370077525172] y2 [-4.599200460053147, 0.9465131349280903, -2.092506104308654, 1.636180911374073, -0.5781354280907853, 0.1120121933216801, -0.22563097377294622, -17.023432780613202, 3.8001411344065166, -0.02625066496062029, -2.773710750030968, 4.03453359663476, 17.453006854396023, 7.449995127584029, -1.584114505431959, -6.79294935716999, 0.6031829290043467, -0.8449551244190561, -1.3097017440519856, 1.3808781794527387, 0.19892802330304393, -0.16094555132692856, -0.037970656308170314, -1.6343354526293625, -0.7918280572437556, 1.5489470821310192, -0.02221817199197333, 0.45453575423066794, 0.18747068990997684, 0.8816645855330318, 0.9895347827305399, -0.6408498636056578, 1.380421542417587, -0.14638733343761173, -0.5224054230502674, 0.12476735993607693, -1.3687206002144903, -2.3086316325213563, -0.39708487825428884, 3.6214491547992105, 1.4580989179352981, -2.8555485187540155, 8.19741758393516, 8.099343216243405, 1.052499803897709, -0.7104045413099429, -1.1511984189901017, 0.733249216329992, 2.049677238822567, 0.15025754732798302, 0.044062393449126006, 0.159677522419271, 3.5190288771994704, -0.24259018970748644, -0.21756785760898253, 0.5544959861542452, 0.6373061081134824, -0.9387512297650141, -0.5067049756661499, -1.7506159561053183, -2.04726744472221, -1.1290896419590006, -30.475189600896215, 0.7139615367704037, -12.706087030408192, 0.7447074216871294, -1.4389311836995526, 14.688872021326242, 1.7650568978368044, -0.16921108373880944, 0.9305785969298231, 0.5859333134085813, -0.40764424246283176, 1.5828305402891847, 2.091783014652448, -1.268666005491125, -1.8940153166413332, -0.6507316363040037, -2.4433359794646092, -0.6953928830322529, 0.32811026260119924, -0.9882660216036473, 2.0270568978150347, 0.6497774741431493, -0.41292210658345313, -10.144199990693938, 0.13272318730165786, -0.5772461973361056, 5.905746513675395, -0.17232408516484604, -0.3536187012675405, 1.2631555478949645, -0.8799040779978813, 2.3671446445239273, 0.933239372376048, -4.8939731069718855, -0.5216441689995831, -1.6143818266020193, 0.5441468573728053, 1.983162624581192]

Task 2 (10 points)

Compute the summary statistics of the two sequences with the summary function. Print the output.

R"summary(y1)"
RObject{RealSxp} Min. 1st Qu. Median Mean 3rd Qu. Max. -69.6300 -0.5695 0.2612 0.7370 1.5085 112.3140
R"summary(y2)"
RObject{RealSxp} Min. 1st Qu. Median Mean 3rd Qu. Max. -30.47519 -0.95113 -0.03211 -0.17442 0.95727 17.45301

Task 3 (10 points)

Create a Q-Q plot of y1 vs. the normal distribution using function qqnorm. In general, a Q-Q plot compares the quantiles of two distribution against each other. If the distribution are equal, all quantiles have to lie on the 45 degree line.

R"qqnorm(y1, ylim=c(-2,2),xlim=c(-2,2))" R"abline(a = 0, b = 1)"
Image in a Jupyter notebook
RObject{NilSxp} NULL

Task 4 (30 points)

Write a simple function that returns the fitted residuals from a linear regression y ~ x. In R, having estimated a model with

mod = lm(y ~ x),

residuals can be extracted as

res = resid(mod)

Follow the sketch below.

""" Fetch the residuals from a linear regression y ~ x. Parameters ---------- x : Array regressor y : Array response addconstant : Bool true if constant should be added to `x` Returns ------- res : Array of fitted residuals """ function ols_resid(x::Array, y::Array; addconstant::Bool=false) return end
""" Fetch the residuals from a linear regression y ~ x. Parameters ---------- x : Array regressor y : Array response addconstant : Bool true if constant should be added to `x` Returns ------- res : Array of fitted residuals """ function ols_resid(x::Array, y::Array; addconstant::Bool=false) @rput x y formula = string("mod <- lm(y ~ x", addconstant ? "" : "-1", ")") reval(formula) reval("res <- resid(mod)") @rget res return convert(Array{Float64, 1}, res) end
ols_resid

Task 5 (15 points)

Create a third variable y3 as follows:

y3 = 2.5 - 3*y1 + y2

and use your ols_resid function to fetch the residuals from regressing y3 on y1 (with a constant). Compute their summary statistics using the summary function.

y3 = 2.5 .- 3 .*y1 .+ y2; @rput y3;
ols_resid(y1,y3;addconstant=true); R"summary(res)"
RObject{RealSxp} Min. 1st Qu. Median Mean 3rd Qu. Max. -30.3009 -0.7820 0.1422 0.0000 1.1316 17.6272

Task 6 (25 points)

This excercise is for ambitious students who would like to receive a high mark.

K-means clustering is one of the most commonly used unsupervised machine learning algorithm for partitioning a given dataset into a set of k groups (i.e. k clusters), where k represents the number of groups pre-specified by the analyst. It classifies objects in multiple groups (i.e., clusters), such that objects within the same cluster are as similar as possible (i.e., high intra-class similarity), whereas objects from different clusters are as dissimilar as possible (i.e., low inter-class similarity). In k-means clustering, each cluster is represented by its center (i.e, centroid) which corresponds to the mean of points assigned to the cluster.

Run k-means clustering analysis on the built in R dataset USArrests. This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas. Set the number of clusters equal to 3 (centers=3) and use 25 (nstart=25) as a starting value for k-means (feel free to experiment with other starting values). Scale the input data to have zero mean and standard deviaiton of unity. Use the fviz_cluster function from the Rpackage factoextra to visualise the result i.e. plot the clusters. The following link is useful for how to use RCall.

Hint: Part of this exercise is to install and load the factoextra package in R.

R"options(repos='http://cran.rstudio.com/')" "install R package" "load R package"
# some essential setting for installation to be possible R"options(repos='http://cran.rstudio.com/')" R"install.packages('factoextra',lib='../../myRpackages/')"
* installing *source* package ‘factoextra’ ... ** package ‘factoextra’ successfully unpacked and MD5 sums checked ** using staged installation ** R ** data *** moving datasets to lazyload DB ** byte-compile and prepare package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (factoextra) ┌ Warning: RCall.jl: trying URL 'http://cran.rstudio.com/src/contrib/factoextra_1.0.5.tar.gz' │ Content type 'application/x-gzip' length 1390733 bytes (1.3 MB) │ ================================================== │ downloaded 1.3 MB │ │ │ The downloaded source packages are in │ ‘/tmp/RtmpAs56hH/downloaded_packages’ └ @ RCall /home/user/.julia/packages/RCall/g7dhB/src/io.jl:113
RObject{NilSxp} NULL
R"library(datasets)" R"scdata <- scale(USArrests)" R"head(scdata)"
RObject{RealSxp} Murder Assault UrbanPop Rape Alabama 1.24256408 0.7828393 -0.5209066 -0.003416473 Alaska 0.50786248 1.1068225 -1.2117642 2.484202941 Arizona 0.07163341 1.4788032 0.9989801 1.042878388 Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602 California 0.27826823 1.2628144 1.7589234 2.067820292 Colorado 0.02571456 0.3988593 0.8608085 1.864967207
R"colMeans(scdata,dims=1)"
RObject{RealSxp} Murder Assault UrbanPop Rape -7.667478e-17 1.111611e-16 -4.332645e-16 8.938163e-17
R"library(factoextra)" R"km.res <- kmeans(scdata, 3, iter.max = 10, nstart = 25)" R"fviz_cluster(km.res, USArrests)"
Image in a Jupyter notebook
RObject{VecSxp}