SharedProblemSets_BLANK / PS07 / PS07_RCall_Blank.ipynbOpen in CoCalc
Author: Chi Hern Ang
Views : 29

Load Packages

In [2]:
using Pkg Pkg.add("RCall") using RCall include("./printmat.jl") using Plots # gr(size=(600,400)) #pyplot(size=(600,400)) gr(size=(480,320)) default(fmt = :svg)
Resolving package versions... Updating `~/.julia/environment/v1.2/Project.toml` [no changes] Updating `~/.julia/environment/v1.2/Manifest.toml` [no changes]

Calling R from Julia

R is the leading programming language when it comes to statistics and econometrics. Its power emerges from thousands of well documented packages oftentimes written by professors and leading researchers themselves. When time is scarce to code lines of GARCH estimation code or systems of VAR equations, a good option is to transfer data to R and let it do the computations. The (registered) package needed is RCall. There are several ways of interacting with R in Julia, all are more or less equally straightforward. We will stick to treating R as a black box: put in the data, press a button of required label and extract the result. Full and easy-to-understand documentation is available here.

Task 1 (10 points)

Using R from Julia, simulate two sequences (100 values each) following Cauchy distribution with location = 0 and scale = 1. Transport them to Julia and assign them to variables y1 and y2. The documentation on Cauchy distribution in R is here.

In [3]:
y1 = 1 ./((pi).*(1 .+(randn(100)).^2)); y2 = 1 ./((pi).*(1 .+(randn(100)).^2));
In [5]:
@rput y1 y2;
In [6]:
@rget y1 y2 println("y1") print(y1) println("\ny2") print(y2)
y1 [0.2716595082133937, 0.31375455759623183, 0.23615524986738443, 0.31104644887891136, 0.10853557799331377, 0.0837400337886975, 0.190846458655687, 0.17841200323759018, 0.3021881223310938, 0.08423413208947693, 0.2532986252517025, 0.12621080988242728, 0.06555948701094444, 0.2244799744996258, 0.10127837676892831, 0.2743414055171367, 0.23247712625174405, 0.31640417110612534, 0.1266211139647908, 0.0820446785999432, 0.21023221126070843, 0.31181953047044825, 0.3183061965185359, 0.318023137448885, 0.29449786442169834, 0.1698416709106755, 0.211749675299896, 0.11906675329897455, 0.2528091763207466, 0.2069701701114787, 0.12273104937060149, 0.10989184508075897, 0.2965387313242635, 0.23328152980816036, 0.31721631174697196, 0.12842245909575908, 0.08705974568477304, 0.08348703216867753, 0.3131736158264967, 0.18789263829138964, 0.1484393530784246, 0.3060601361256871, 0.30884785194854886, 0.3062505916799342, 0.2273549111479069, 0.2659951425351212, 0.08608511658131968, 0.31223776836676986, 0.31698727553018713, 0.2809414677533673, 0.25934029705385697, 0.31756677254947385, 0.2657983459420546, 0.31442579274683685, 0.3047988929154387, 0.2440598596289094, 0.14620166102677537, 0.3137380353500831, 0.11142937636493798, 0.2005060874044694, 0.08029779846692525, 0.08027108912838615, 0.0879699483451702, 0.3061670076991954, 0.05967024246819389, 0.30672390317469445, 0.2623310603919375, 0.28763768817010743, 0.16881483685936055, 0.2777918173728127, 0.31741389799393027, 0.31536869384788796, 0.1754267713501977, 0.14097697273826473, 0.3039632885691105, 0.10576328267557633, 0.15933264201137848, 0.3145432536069102, 0.1717120893967333, 0.16436484515111238, 0.2075098278535904, 0.2893074199181562, 0.31718003476209244, 0.22024891993202877, 0.11057999678449394, 0.29437910015900937, 0.18949651998502676, 0.028018107429657893, 0.12392847772956571, 0.31639698860079263, 0.28038312543306604, 0.31828311103919205, 0.16707895410262477, 0.25564561880281483, 0.20766631561934762, 0.30397818108386965, 0.31085855699234727, 0.29471812401928416, 0.30624215711575514, 0.22308407322944138] y2 [0.1226601088104884, 0.30067991209913425, 0.23460478204001126, 0.30979836440637126, 0.28522597122969034, 0.17561677095528921, 0.033006313693259175, 0.1631711488161548, 0.061943483858876026, 0.2894111829142465, 0.18529329133598718, 0.2868404360858639, 0.2787341792760113, 0.29103203881397016, 0.2955008334589555, 0.26459101674031965, 0.2077539669351992, 0.256501318891937, 0.07791816424790188, 0.284701526608942, 0.2592928533687757, 0.22248008738728337, 0.2020801855558459, 0.3105912614541402, 0.25610494757670044, 0.2375159991532291, 0.0811946747859257, 0.17527272464168014, 0.13915197198406518, 0.14485890755524525, 0.23781148259406606, 0.11954513472059826, 0.12481819189915326, 0.06770034792877519, 0.3167678804287447, 0.299332422006145, 0.2960683026151408, 0.08812624284750153, 0.043441098705406384, 0.3086176503364804, 0.1636893914921343, 0.2572586672889589, 0.24417835952459668, 0.3158814364286289, 0.1763784131134832, 0.31450813352983037, 0.3174295063421658, 0.313677414224167, 0.08241551543786106, 0.30031347803678293, 0.19341230272494792, 0.06409911258347302, 0.2532478517708398, 0.15774261148001395, 0.31349383347184584, 0.2222438514050432, 0.11156179417031332, 0.1763259967464483, 0.1399735401634156, 0.2607280695101059, 0.2811842358042634, 0.04396255075547221, 0.2940059199732809, 0.29967258756263987, 0.1501368631142591, 0.15960485996995402, 0.15525912735426417, 0.31773040345401965, 0.3081318917197308, 0.029607958686616656, 0.10791273982039083, 0.09512433543186355, 0.09031725501173367, 0.2624012063749462, 0.12917808912802742, 0.10418493672005955, 0.1809053899835294, 0.08567795211014077, 0.2226083133773214, 0.2042903843784891, 0.06519241751926436, 0.29492605387732107, 0.15932970507808847, 0.14583997745835992, 0.07413769955442559, 0.2780389755666177, 0.20651804900618295, 0.2811972853109285, 0.3081028311005687, 0.17727552734699442, 0.14727132193862658, 0.07399031900413183, 0.21677594353653992, 0.0667554396269113, 0.13344483125225567, 0.303480151984531, 0.23158564010187088, 0.30573023473204225, 0.26874853440800717, 0.2434853540191754]

Task 2 (10 points)

Compute the summary statistics of the two sequences with the summary function. Print the output.

In [7]:
R"summary(y1)"
RObject{RealSxp} Min. 1st Qu. Median Mean 3rd Qu. Max. 0.02802 0.14490 0.23472 0.22065 0.30609 0.31831
In [8]:
R"summary(y2)"
RObject{RealSxp} Min. 1st Qu. Median Mean 3rd Qu. Max. 0.02961 0.13238 0.21227 0.20218 0.28563 0.31773

Task 3 (10 points)

Create a Q-Q plot of y1 vs. the normal distribution using function qqnorm. In general, a Q-Q plot compares the quantiles of two distribution against each other. If the distribution are equal, all quantiles have to lie on the 45 degree line.

In [7]:
R"qqnorm(y1)"
RObject{VecSxp} $x [1] 0.16365849 0.85961736 -0.26631061 -0.03760829 -0.51007346 0.65883769 [7] -0.37185609 -0.13830421 -0.89647336 1.15034938 -0.42614801 -0.48172685 [13] -0.08784484 1.10306256 -0.78919165 1.51410189 1.69539771 -0.34512553 [19] 0.89647336 -0.16365849 0.48172685 0.26631061 1.37220381 2.57582930 [25] 0.51007346 1.31057911 -1.43953147 -0.59776013 0.31863936 0.97411388 [31] -0.01253347 0.42614801 1.05812162 -0.82389363 1.20035886 -0.72247905 [37] 1.25356544 0.18911843 0.39885507 -0.93458929 0.62800601 -0.06270678 [43] -0.21470157 -0.85961736 -0.75541503 0.01253347 0.82389363 0.21470157 [49] 0.08784484 -0.24042603 -1.10306256 0.24042603 -1.01522203 -1.31057911 [55] -1.59819314 0.06270678 0.69030882 -0.18911843 0.03760829 -1.15034938 [61] -0.56805150 0.45376219 0.37185609 -0.39885507 0.78919165 0.34512553 [67] -1.95996398 1.43953147 -0.62800601 1.81191067 -0.69030882 0.75541503 [73] 0.59776013 -0.65883769 -2.57582930 -0.31863936 -1.37220381 -0.11303854 [79] -1.25356544 1.01522203 0.72247905 -0.53883603 -1.05812162 -1.69539771 [85] -0.97411388 -0.45376219 -1.81191067 1.95996398 0.29237490 0.56805150 [91] 1.59819314 2.17009038 0.13830421 0.93458929 -0.29237490 -1.20035886 [97] 0.53883603 -2.17009038 -1.51410189 0.11303854 $y [1] 0.26072088 0.30011584 0.20045009 0.23476988 0.17088179 0.29756555 [7] 0.18757878 0.22419561 0.13729855 0.30836763 0.18409411 0.17562692 [13] 0.23318117 0.30728727 0.14527425 0.31624182 0.31674070 0.19444450 [19] 0.30017229 0.21037114 0.29006895 0.26860365 0.31539209 0.31825181 [25] 0.29014736 0.31264833 0.09768602 0.16541810 0.27134467 0.30618296 [31] 0.23831199 0.28196937 0.30679154 0.13928519 0.30972926 0.14971439 [37] 0.31065585 0.26335855 0.28111018 0.13633835 0.29591982 0.23338715 [43] 0.20271077 0.13852569 0.14750974 0.24535127 0.29975208 0.26705616 [49] 0.25634897 0.20111152 0.12301034 0.26770900 0.12559058 0.10685265 [55] 0.07842834 0.24752295 0.29797340 0.20866270 0.24571157 0.11882695 [61] 0.16939664 0.28338244 0.28031295 0.18611322 0.29970406 0.27171181 [67] 0.06540225 0.31556094 0.16216942 0.31706407 0.15278928 0.29826220 [73] 0.29465057 0.15670240 0.04827158 0.19857911 0.10299512 0.22764642 [79] 0.10820823 0.30669096 0.29817061 0.17040587 0.12334781 0.07748769 [85] 0.12678024 0.17942959 0.07180425 0.31709489 0.26887997 0.29294441 [91] 0.31673852 0.31798343 0.26011437 0.30498913 0.19964728 0.11019036 [97] 0.29119903 0.06341892 0.08121729 0.25968722
In [8]:
R"qqnorm(y2)"
RObject{VecSxp} $x [1] 2.57582930 -1.15034938 -0.51007346 -1.10306256 -0.45376219 -0.75541503 [7] 0.93458929 -1.01522203 -0.29237490 -0.08784484 -0.59776013 -0.03760829 [13] -0.18911843 -0.85961736 -2.17009038 -1.69539771 0.24042603 -0.89647336 [19] -0.93458929 -1.20035886 -0.65883769 0.42614801 0.34512553 0.11303854 [25] 1.25356544 0.03760829 -1.31057911 0.21470157 1.15034938 -0.42614801 [31] -1.37220381 -0.48172685 -1.59819314 -0.13830421 0.78919165 0.18911843 [37] 1.59819314 0.59776013 0.75541503 1.69539771 1.05812162 -1.81191067 [43] 1.51410189 1.81191067 -1.51410189 -1.25356544 0.08784484 0.51007346 [49] 0.16365849 0.01253347 1.10306256 0.85961736 -0.53883603 0.53883603 [55] 0.72247905 -0.21470157 1.20035886 -0.24042603 0.26631061 -2.57582930 [61] -0.01253347 0.39885507 -0.34512553 0.31863936 -1.05812162 0.89647336 [67] 1.95996398 -0.06270678 0.56805150 -0.11303854 -0.72247905 -0.39885507 [73] -0.69030882 -0.82389363 0.37185609 2.17009038 0.29237490 0.62800601 [79] 0.48172685 0.97411388 1.31057911 -0.97411388 -0.62800601 0.69030882 [85] 0.45376219 -1.43953147 -0.78919165 -0.26631061 0.82389363 0.13830421 [91] -0.16365849 -1.95996398 -0.37185609 1.01522203 0.06270678 -0.56805150 [97] 0.65883769 -0.31863936 1.43953147 1.37220381 $y [1] 0.31830724 0.08180947 0.13966936 0.08927641 0.14380490 0.12655127 [7] 0.30139843 0.09581758 0.16969960 0.20546995 0.13493446 0.20729662 [13] 0.18427824 0.12043900 0.04317419 0.05699245 0.24290842 0.10810259 [19] 0.10292986 0.07950540 0.12954004 0.25226271 0.25169151 0.23315769 [25] 0.31209922 0.22176315 0.06961997 0.24180741 0.30900126 0.15564259 [31] 0.06936458 0.14119291 0.05761067 0.19299716 0.29200934 0.24113358 [37] 0.31664238 0.28188134 0.29110993 0.31705774 0.30254058 0.05051039 [43] 0.31605892 0.31789329 0.06639315 0.07343716 0.23215782 0.26890274 [49] 0.24007155 0.21385309 0.30789874 0.29336656 0.13825805 0.26970559 [55] 0.29037794 0.18319106 0.31204491 0.17853173 0.24780728 0.03909708 [61] 0.21146019 0.25189614 0.16147835 0.24977079 0.09322635 0.29737474 [67] 0.31826407 0.20658830 0.27019210 0.19824709 0.12774041 0.15702306 [73] 0.12819398 0.12182486 0.25183954 0.31827873 0.24789283 0.28612982 [79] 0.26693416 0.30209946 0.31220821 0.09723732 0.13278482 0.28730551 [85] 0.26562945 0.06848332 0.12535125 0.17180640 0.29268078 0.23841548 [91] 0.19183825 0.04767098 0.15727348 0.30251165 0.22434912 0.13577782 [97] 0.28692755 0.16949009 0.31600370 0.31540403

Task 4 (30 points)

Write a simple function that returns the fitted residuals from a linear regression y ~ x. In R, having estimated a model with

mod = lm(y ~ x),

residuals can be extracted as

res = resid(mod)

Follow the sketch below.

"""
Fetch the residuals from a linear regression y ~ x.

Parameters
----------
x : Array
    regressor
y : Array
    response
addconstant : Bool
    true if constant should be added to `x`

Returns
-------
res : Array
    of fitted residuals
"""
function ols_resid(x::Array, y::Array; addconstant::Bool=false)

    return
    
end
In [9]:
""" Fetch the residuals from a linear regression y ~ x. Parameters ---------- x : Array regressor y : Array response addconstant : Bool true if constant should be added to `x` Returns ------- res : Array of fitted residuals """ function ols_resid(x::Array, y::Array; addconstant::Bool=false) @rput x y formula = string("mod <- lm(y ~ x", addconstant ? "" : "-1", ")") reval(formula) reval("res <- resid(mod)") @rget res return convert(Array{Float64, 1}, res) end
ols_resid

Task 5 (15 points)

Create a third variable y3 as follows:

y3 = 2.5 - 3*y1 + y2

and use your ols_resid function to fetch the residuals from regressing y3 on y1 (with a constant). Compute their summary statistics using the summary function.

In [10]:
y3 = 2.5 .- 3 .*y1 .+ y2; @rput y3;
In [11]:
ols_resid(y1,y3;addconstant=false); R"summary(res)"
RObject{RealSxp} Min. 1st Qu. Median Mean 3rd Qu. Max. -0.88421 -0.49211 0.02217 0.30212 0.97460 2.23086

Task 6 (25 points)

This excercise is for ambitious students who would like to receive a high mark.

K-means clustering is one of the most commonly used unsupervised machine learning algorithm for partitioning a given dataset into a set of k groups (i.e. k clusters), where k represents the number of groups pre-specified by the analyst. It classifies objects in multiple groups (i.e., clusters), such that objects within the same cluster are as similar as possible (i.e., high intra-class similarity), whereas objects from different clusters are as dissimilar as possible (i.e., low inter-class similarity). In k-means clustering, each cluster is represented by its center (i.e, centroid) which corresponds to the mean of points assigned to the cluster.

Run k-means clustering analysis on the built in R dataset USArrests. This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas. Set the number of clusters equal to 3 (centers=3) and use 25 (nstart=25) as a starting value for k-means (feel free to experiment with other starting values). Scale the input data to have zero mean and standard deviaiton of unity. Use the fviz_cluster function from the Rpackage factoextra to visualise the result i.e. plot the clusters. The following link is useful for how to use RCall.

Hint: Part of this exercise is to install and load the factoextra package in R.

R"options(repos='http://cran.rstudio.com/')"
"install R package"
"load R package"
In [11]:
R"options(repos='http://cran.rstudio.com/')" R"install.packages('factoextra',lib='/myRpackages/')" R"load.packages('factoextra',lib='/myRpackages/')"
REvalError: Warning in install.packages("factoextra", lib = "/myRpackages/") : 'lib = "/myRpackages/"' is not writable Error in install.packages("factoextra", lib = "/myRpackages/") : unable to install packages Stacktrace: [1] #handle_eval_stderr#41(::Bool, ::typeof(RCall.handle_eval_stderr)) at /home/user/.julia/packages/RCall/g7dhB/src/io.jl:115 [2] #handle_eval_stderr at ./none:0 [inlined] [3] reval_p(::Ptr{LangSxp}, ::Ptr{EnvSxp}) at /home/user/.julia/packages/RCall/g7dhB/src/eval.jl:99 [4] reval_p(::Ptr{RCall.ExprSxp}, ::Ptr{EnvSxp}) at /home/user/.julia/packages/RCall/g7dhB/src/eval.jl:115 [5] reval(::String, ::RObject{EnvSxp}) at /home/user/.julia/packages/RCall/g7dhB/src/eval.jl:132 [6] top-level scope at /home/user/.julia/packages/RCall/g7dhB/src/macros.jl:71 [7] top-level scope at In[11]:2
In [ ]: