This repository contains the course materials from Math 157: Intro to Mathematical Software.
Creative Commons BY-SA 4.0 license.
License: OTHER
Math 157: Intro to Mathematical Software
UC San Diego, winter 2018
February 21, 2018: Introduction to R and statistics (part 1 of 2)
Administrivia:
HW 4 has been returned.
The solution set for HW 5 has been distributed.
The final project will be assigned shortly (target: end of this week). Look for a folder called
assignments/2018-03-16
for both parts.
The R Project
The R Project (or just "R" for short) is an open-source project which provides a full programming environment for scientific computation. Although in principle it provides comparable functionality to Mathematica, MATLAB, and Sage, it has become entrenched primarily as a tool for statistics. (It started out as an open-source clone of a system called "S".)
According to this index based on search results, as of this writing R is among the top 15 most widely used programming languages worldwide. It ranks behind Java, C, C++, and Python, but somewhat ahead of MATLAB.
There is a massive R ecosystem; the Comprehensive R Archive Network currently lists more than 12000 packages! The CoCalc installation of R includes some of these packages without any extra download required.
Warning: I am neither a statistician nor a frequent R user. Please bear with me!
Accessing R from Jupyter
There are several ways to access R from Jupyter. If you are using R by itself, then your best bet is to set the kernel to "R (R Project)". Try this now.
- 2
- 3
- 4
The following is an example from the datasets package, which is a collection of data sets. This is an incredible resource for practicing basic concepts of statistics!
If you are using Python, you can use the rpy2 package. This allows you to perform R operations "within Python", so that you can migrate results in and out of Python. (Note: rpy2 is not included in a standard Python install, but CoCalc provides it.)
This isn't a perfect solution, though.
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-7-e54ef377d9fd> in <module>()
----> 1 robjects.r.plot(sunspots, main = "sunspots data", xlab = "Year", ylab = "Monthly sunspot numbers") # This won't work.
NameError: name 'sunspots' is not defined
One can easily transfer information from an R array into Python, using Python syntax.
It's not quite as obvious how to go the other way...
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-11-5b150eebabbf> in <module>()
----> 1 robjects.r([Integer(1),Integer(2),Integer(3)])
/ext/sage/sage-8.1/local/lib/python2.7/site-packages/rpy2/robjects/__init__.pyc in __call__(self, string)
356
357 def __call__(self, string):
--> 358 p = _rparse(text=StrSexpVector((string,)))
359 res = self.eval(p)
360 return conversion.ri2py(res)
RRuntimeError: Error in (function (file = "", n = NULL, text = NULL, prompt = "?", keep.source = getOption("keep.source"), :
<text>:1:1: unexpected '['
1: [
^
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-13-c091c3840841> in <module>()
----> 1 robjects.r(numpy.array([Integer(2),Integer(3),Integer(4)]))
/ext/sage/sage-8.1/local/lib/python2.7/site-packages/rpy2/robjects/__init__.pyc in __call__(self, string)
356
357 def __call__(self, string):
--> 358 p = _rparse(text=StrSexpVector((string,)))
359 res = self.eval(p)
360 return conversion.ri2py(res)
RRuntimeError: Error in (function (file = "", n = NULL, text = NULL, prompt = "?", keep.source = getOption("keep.source"), :
<text>:1:1: unexpected '['
1: [
^
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-5e60702c68fd> in <module>()
----> 1 robjects.r(x) # This won't work.
/ext/sage/sage-8.1/local/lib/python2.7/site-packages/rpy2/robjects/__init__.pyc in __call__(self, string)
356
357 def __call__(self, string):
--> 358 p = _rparse(text=StrSexpVector((string,)))
359 res = self.eval(p)
360 return conversion.ri2py(res)
ValueError: Error raised when calling str() for element 0.
... because R needs a bit more coaching than Python.
This should all work in SageMath also. Try switching the kernel to "SageMath 8.1" and see for yourself!
rpy2 also provides a gadget that lets you switch back and forth between Python and R at the level of individual cells.
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-3-009520053b00> in <module>()
----> 1 y
NameError: name 'y' is not defined
There is also a version of this where one switches into R for a single line of code.
Within Sage, there is yet another option: Sage's own interface to R. If your kernel is not currently "SageMath 8.1", switch it now.
If you assign r
as a variable name, you will of course lose the predefined value. But there is another way to get a hold of it...
Did that actually fail? Check your file directory.
Unnamed: 0 | x | |
---|---|---|
count | 2820.000000 | 2820.000000 |
mean | 1410.500000 | 51.265957 |
std | 814.208204 | 43.448971 |
min | 1.000000 | 0.000000 |
25% | 705.750000 | 15.700000 |
50% | 1410.500000 | 42.000000 |
75% | 2115.250000 | 74.925000 |
max | 2820.000000 | 253.800000 |
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-32-650a4c59e077> in <module>()
----> 1 sunspots.iloc(Integer(1))
/ext/sage/sage-8.1/local/lib/python2.7/site-packages/pandas/core/indexing.pyc in __call__(self, axis)
101
102 if axis is not None:
--> 103 axis = self.obj._get_axis_number(axis)
104 new_self.axis = axis
105 return new_self
/ext/sage/sage-8.1/local/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_axis_number(self, axis)
355 pass
356 raise ValueError('No axis named {0} for object type {1}'
--> 357 .format(axis, type(self)))
358
359 def _get_axis_name(self, axis):
ValueError: No axis named 1 for object type <class 'pandas.core.frame.DataFrame'>