Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

r sagews demos

Project: 🐪 SRE
Views: 3843

RPy2

RPy2 is a Python module for interacting with R from Python. It exposes functions, packages and more in Python and allows to reference them. Dots \dots{} in R functions are automatically converted to underscores. Additionally, data conversions for various types can be enabled - first and foremost for NumPy arrays.

%auto %default_mode python #pure python mode import numpy as np import rpy2 import rpy2.robjects as robjects from rpy2.robjects import r, pandas2ri pandas2ri.activate() rpy2.__version__
'2.8.5'

Referencing R functions

RPy2's robjects (or sometimes just imported as ro) exposes the R instance as .r. It is rather easy to get hold of functions and reference them from Python:

c = robjects.r['c'] summary = robjects.r['summary']
v1 = c(5,4.4,1,-1.8) sumv1 = summary(v1) print sumv1.__repr__()
R object with classes: ('summaryDefault', 'table') mapped to: <FloatVector - Python:0x7f506223f4d0 / R:0x634af40> [-1.800000, 0.300000, 2.700000, 2.150000, 4.550000, 5.000000]
print sumv1
Min. 1st Qu. Median Mean 3rd Qu. Max. -1.80 0.30 2.70 2.15 4.55 5.00
sumv1[3]
2.15

Evaluating R code directly

robjects.reval("""\ zz <- 1:10 print(paste("sd(zz) = ", sd(zz))) """)
[1] "sd(zz) = 3.02765035409749" <rpy2.rinterface.StrSexpVector - Python:0x7f506e5eac00 / R:0x5a8ceb8>
myfunc = robjects.r("""\ function(x) { a <- x^2 + rnorm(1) k <- 2 * a + 1 return(k) }""")
myfunc(2.5)
R object with classes: ('numeric',) mapped to: <FloatVector - Python:0x7f506223fb48 / R:0x56db568> [12.440484]

Vectorization

First, enable automatic conversion from NumPy arrays to R arrays. Then, even the custom function works out of the box.

xx = np.array([5,4,2.2,-1,-5.5]) print "Data Type: ", type(xx) print "Element Type:", xx.dtype print "Array Shape: ", xx.shape
Data Type: <type 'numpy.ndarray'> Element Type: float64 Array Shape: (5,)
summary(xx)
R object with classes: ('summaryDefault', 'table') mapped to: <FloatVector - Python:0x7f506223fd40 / R:0x5bdd258> [-5.500000, -1.000000, 2.200000, 0.940000, 4.000000, 5.000000]
myfunc(xx)
R object with classes: ('array',) mapped to: <Array - Python:0x7f506223fc20 / R:0x5b0df58> [52.274453, 34.274453, 11.954453, 4.274453, 62.774453]

Types of Vectors

[ ] and [[ ]] are rx and rx2

# Python style: (10 exclusive) # v1 = robjects.IntVector(range(1,10)) # R style: (10 inclusive) v1 = robjects.r.seq(1,10) print v1
[1] 1 2 3 4 5 6 7 8 9 10
# Python style, 0-based indexing of vectors print v1[0] v1[0] = -99 print v1
1 [1] -99 2 3 4 5 6 7 8 9 10
# R style, 1-based indexing v1.rx[3] = 99 print v1.rx(3)
[1] 99
print v1
[1] -99 2 99 4 5 6 7 8 9 10
l1 = robjects.r("list(aa = c(1,2,3), bb = c(-5,5), cc = 'help')") print l1
$aa [1] 1 2 3 $bb [1] -5 5 $cc [1] "help"
# R's [[1]] print l1.rx2(1)
[1] 1 2 3
# indexing into the element [[1]] print l1.rx2(1).rx(2)
[1] 2
# versus print l1.rx2(1)[1]
2.0
# Constructing the same from Python is harder, since we need an ordered dictionary import rpy2.rlike.container as rlc l2 = robjects.ListVector( rlc.OrdDict(( ('aa', robjects.IntVector([1,2,3])), ('bb', robjects.IntVector([-5,5])), ('cc', "help")))) print l2
$aa [1] 1 2 3 $bb [1] -5 5 $cc [1] "help"
# assigning a new string vector to "bb" l1.rx2["bb"] = robjects.StrVector("this is a short sentence".split()) print(l1[l1.names.index("bb")])
[1] "this" "is" "a" "short" "sentence"
# Matrix m = robjects.r.matrix(range(10), nrow=5) print(m)
[,1] [,2] [1,] 0 5 [2,] 1 6 [3,] 2 7 [4,] 3 8 [5,] 4 9
type(m)
<class 'rpy2.robjects.vectors.Matrix'>
m.rx2(4,2)
R object with classes: ('integer',) mapped to: <IntVector - Python:0x7f506224acf8 / R:0x60528c8> [ 8]
# R-operators work, too print(m.ro > 5)
[,1] [,2] [1,] FALSE FALSE [2,] FALSE TRUE [3,] FALSE TRUE [4,] FALSE TRUE [5,] FALSE TRUE
print m.rx((m.ro > 3).ro & (m.ro <= 6))
[[1]] [1] 4 [[2]] [1] 5 [[3]] [1] 6
sv = robjects.StrVector('xyyyxyzyzyxx') fac = robjects.FactorVector(sv) print(fac)
[1] x y y y x y z y z y x x Levels: x y z
print(summary(fac))
x y z 4 6 2

Packages

The idea is to get hold of a reference to a package. The reference is like a module-namespace and populated with all the members.

from rpy2.robjects.packages import importr r_base = importr("base")
# a bit of the namespace print(dir(r_base)[-50:-40])
['upper_tri', 'url', 'utf8ToInt', 'vapply', 'vector', 'version', 'warning', 'warnings', 'weekdays', 'weekdays_Date']
print(r_base.version)
_ platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status Revised major 3 minor 2.4 year 2016 month 03 day 16 svn rev 70336 language R version.string R version 3.2.4 Revised (2016-03-16 r70336) nickname Very Secure Dishes
# use Python's `getattr` to access non-standard named identifyers. # e.g. matrix multiplication A = np.array([[1, 1], [1, 7]]) B = np.array([[4, 5], [6, 7]]) matrix_mult = getattr(r_base, "%*%") print(matrix_mult(A, B))
[,1] [,2] [1,] 10 12 [2,] 46 54
r_base.rep(r_base.c("x", "y", "z"), 10)
R object with classes: ('character',) mapped to: <StrVector - Python:0x7f5061ddff80 / R:0x65295e0> [str, str, str, ..., str, str, str]
from rpy2.robjects.packages import importr
# datasets datasets = importr('datasets') # Note: the __rdata__ should be a plain "data", but doesn't work in this development version. faithful = datasets.__rdata__.fetch("faithful")["faithful"] print type(faithful)
<class 'rpy2.robjects.vectors.DataFrame'>
# number of columns! len(faithful)
2
# S3 datatypes in R for each column [column.rclass[0] for column in faithful]
['numeric', 'numeric']
# extract some rows print(faithful.rx(robjects.IntVector([2,3,4,10]), True))
eruptions waiting 2 1.800 54 3 3.333 74 4 2.283 62 10 4.350 85
# extract part of a column print(faithful.rx2("eruptions")[:10])
[1] 3.600 1.800 3.333 2.283 4.533 2.883 4.700 3.600 1.950 4.350

Example: lm

data = robjects.DataFrame({ 'y' : np.array([4, 5, 5.5, 7, 7.6, 8, 11, 12, 13]), 'x' : np.array([1, 2, 3, 4, 4.4, 5, 7, 8, 8.5]) })
lmod = robjects.r.lm("y ~ x", data = data)
print lmod.names
[1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model"
coeffs = lmod.rx2("coefficients") print "R's representation via 'print'" print(coeffs) print print "Same coefficients in Python's floats:" print ([x for x in coeffs])
R's representation via 'print' (Intercept) x 2.328485 1.215469 Same coefficients in Python's floats: [2.3284853249475894, 1.2154692791485244]
# max is from Python, iterates naturally over the entries in all residuals print max(lmod.rx2("residuals"))
0.456045395904

Plot

grdevices = importr('grDevices')
# just calling "plot" on the dataframe _ = grdevices.png(file="rpy2_plot.png", width=640, height=320) _ = robjects.r.plot(data) grdevices.dev_off()
R object with classes: ('integer',) mapped to: <IntVector - Python:0x7f5061dfb830 / R:0x75524b8> [ 1]
salvus.file("rpy2_plot.png")
# Plot of the linear model lmod _ = grdevices.png(file="rpy2_plot_2.png", width=640, height=520) _ = robjects.reval("par(mfrow=c(2,2))") _ = robjects.r.plot(lmod) grdevices.dev_off()
R object with classes: ('integer',) mapped to: <IntVector - Python:0x7f5061ec8878 / R:0x78dad28> [ 1]
salvus.file("rpy2_plot_2.png")
# get R's "print" via globalenv, otherwise it's a syntax error in Python! rprint = robjects.globalenv.get("print") volcano = datasets.__rdata__.fetch("volcano")["volcano"] lattice = importr("lattice") _ = grdevices.png(file="rpy2_plot_wireframe.png", width=480, height=480) p = lattice.wireframe(volcano, shade = True, zlab = "", aspect = robjects.FloatVector((61.0/87, 0.5)), light_source = robjects.IntVector((10,0,10))) _ = rprint(p) grdevices.dev_off()
R object with classes: ('integer',) mapped to: <IntVector - Python:0x7f5061ebcd40 / R:0x83f4d78> [ 1]
salvus.file("rpy2_plot_wireframe.png")

Advanced: PCA

USArrests = datasets.__rdata__.fetch("USArrests")["USArrests"] r_stats = importr("stats") pca_usarrest = r_stats.princomp(USArrests, cor=True) print(summary(pca_usarrest))
Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Standard deviation 1.5748783 0.9948694 0.5971291 0.41644938 Proportion of Variance 0.6200604 0.2474413 0.0891408 0.04335752 Cumulative Proportion 0.6200604 0.8675017 0.9566425 1.00000000
grdevices = importr('grDevices') _ = grdevices.png(file="rpy2_plot_pca.png", width=480, height=480) _ = robjects.r.biplot(pca_usarrest) _ = grdevices.dev_off() salvus.file("rpy2_plot_pca.png")
#low level print(robjects.r.help("sum"))
R Help on ‘sum’sum package:base R Documentation Sum of Vector Elements Description: ‘sum’ returns the sum of all the values present in its arguments. Usage: sum(..., na.rm = FALSE) Arguments: ...: numeric or complex or logical vectors. na.rm: logical. Should missing values (including ‘NaN’) be removed? Details: This is a generic function: methods can be defined for it directly or via the ‘Summary’ group generic. For this to work properly, the arguments ‘...’ should be unnamed, and dispatch is on the first argument. If ‘na.rm’ is ‘FALSE’ an ‘NA’ or ‘NaN’ value in any of the arguments will cause a value of ‘NA’ or ‘NaN’ to be returned, otherwise ‘NA’ and ‘NaN’ values are ignored. Logical true values are regarded as one, false values as zero. For historical reasons, ‘NULL’ is accepted and treated as if it were ‘integer(0)’. Loss of accuracy can occur when summing values of different signs: this can even occur for sufficiently long integer inputs if the partial sums would cause integer overflow. Where possible extended-precision accumulators are used, but this is platform-dependent. Value: The sum. If all of ‘...’ are of type integer or logical, then the sum is integer, and in that case the result will be ‘NA’ (with a warning) if integer overflow occurs. Otherwise it is a length-one numeric or complex vector. *NB:* the sum of an empty set is zero, by definition. S4 methods: This is part of the S4 ‘Summary’ group generic. Methods for it must use the signature ‘x, ..., na.rm’. ‘plotmath’ for the use of ‘sum’ in plot annotation. References: Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S Language_. Wadsworth & Brooks/Cole. See Also: ‘colSums’ for row and column sums. Examples: ## Pass a vector to sum, and it will add the elements together. sum(1:5) ## Pass several numbers to sum, and it also adds the elements. sum(1, 2, 3, 4, 5) ## In fact, you can pass vectors into several arguments, and everything gets added. sum(1:2, 3:5) ## If there are missing values, the sum is unknown, i.e., also missing, .... sum(1:5, NA) ## ... unless we exclude missing values explicitly: sum(1:5, NA, na.rm = TRUE)
# via RPy2 wrappers help_sum = robjects.help.Package("base").fetch("sum")
print(help_sum.title())
Sum of Vector Elements
print(help_sum.description())
\code{sum} returns the sum of all the values present in its arguments.
print(help_sum.usage())
sum(\dots, na.rm = FALSE)
for arg, descr in help_sum.arguments(): print("%-10s: %s" % (arg, descr))
... : numeric or complex or logical vectors. na.rm : logical. Should missing values (including \code{NaN}) be removed?
print(help_sum.seealso())
\code{\link{colSums}} for row and column sums.
print(help_sum.value())
The sum. If all of \code{\dots} are of type integer or logical, then the sum is integer, and in that case the result will be \code{NA} (with a warning) if integer overflow occurs. Otherwise it is a length-one numeric or complex vector. \strong{NB:} the sum of an empty set is zero, by definition.
help_sum.sections.keys()
('title', 'name', 'alias', 'keyword', 'description', 'usage', 'arguments', 'details', 'value', 'section', 'references', 'seealso', 'examples')
print(''.join(help_sum.to_docstring(("title", "usage", 'details', "references", "section"))))
title ----- Sum of Vector Elements usage ----- sum( , na.rm = FALSE) details ------- This is a generic function: methods can be defined for it directly or via the Summary group generic. For this to work properly, the arguments should be unnamed, and dispatch is on the first argument. If na.rm is FALSE an NA or NaN value in any of the arguments will cause a value of NA or NaN to be returned, otherwise NA and NaN values are ignored. Logical true values are regarded as one, false values as zero. For historical reasons, NULL is accepted and treated as if it were integer(0) . Loss of accuracy can occur when summing values of different signs: this can even occur for sufficiently long integer inputs if the partial sums would cause integer overflow. Where possible extended-precision accumulators are used, but this is platform-dependent. references ---------- Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language . Wadsworth & Brooks/Cole. section ------- S4 methods This is part of the S4 Summary group generic. Methods for it must use the signature x, , na.rm . plotmath for the use of sum in plot annotation.