\n", "References: \n", "\n", "* Section 2.1 (pp. 5 - 17), _An Introduction to Statistics with Python_ (2016) by Thomas Haslwanter.\n", "* \"Python for Data Analysis\" (October 2016) by Andrew Tedstone __[http://chris35wills.github.io/courses/pydata_stack/](http://chris35wills.github.io/courses/pydata_stack/)__\n", "* _Python Package Index_ by Python Software Foundation __[https://pypi.org](https://pypi.org)__\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "source": [ "### Python Packages for Statistics\n", "\n", "

\n", "* __[SciPy](https://pypi.org/project/scipy/)__ (_scipy_ 1.1.0): builds closely on _NumPy_, providing more advanced numerical methods, integration, ordinary differential equation (ODE) solvers. For the statistical data analysis, `script.stats` contains the algorithms for basic statistics.\n", "

\n", "* __[Matplotlib](https://pypi.org/project/matplotlib/)__ (_matplotlib_ 2.2.3): _Python’s_ main graphing/plotting library. The documentation on the __[*Matplotlib* website](https://matplotlib.org)__ is good, especially the gallery. \n", "

\n", "* __[JuPyter](https://pypi.org/project/jupyter/)__ (_jupyter_ 1.0.0): rather than using the interactive _IPython_ command line, during class we will use _Python_ in a ‘notebook’ style from inside the web browser. This keeps the commands and their outputs together in a single document that you can reference later on. " ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "source": [ "### Analyzing and Manipulating Data\n", "\n", "* __[Pandas](https://pypi.org/project/pandas/)__ (_pandas_ 0.23.4): provides fast, flexible, and expressive data structures (called _DataFrames_) designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in _Python_.\n", "\n", "* __[patsy](https://pypi.org/project/patsy/)__ (_patsy_ 0.5.0): for describing statistical models (especially linear models, or models that have a linear component) and building design matrices. Brings the convenience of *R* \"formulas\" to _Python_.\n", "\n", "* __[xlrd](https://pypi.org/project/xlrd/)__ (_xlrd_ 1.1.0): extracts data from _MS Excel_ spreadsheets on any platform.\n", "\n", "* __[PyMC](https://pypi.org/project/pymc/)__ (_pymc_ 2.3.6): for Bayesian statistics, including Markov chain Monte Carlo simulations.\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "* __[scikit-learn](https://pypi.org/project/scikit-learn/)__ (_scikit-learn_ 0.19.2): machine learning tools for _Python_. Increasingly popular, contains all the main algorithms used in this field such as K-means clustering. \n", "\n", "* __[scikits.bootstrap](https://pypi.org/project/scikits.bootstrap/)__ (_scikits.bootstrap_ 1.0.0): provides bootstrap confidence interval algorithms for _SciPy_.\n", "\n", "* __[scikit-image](https://pypi.org/project/scikit-image/)__ (_scikit-image_ 0.14.0): a bunch of functionality for doing image analysis, including satellite images.\n", "\n", "* __[lifelines](https://pypi.org/project/lifelines/)__ (_lifelines_ 0.14.6): survival analysis in _Python_.\n", "\n", "* __[xarray](https://pypi.org/project/xarray/)__ (_xarray_ 0.10.8): brings the labeled data power of _Pandas_ to the physical sciences, by providing N-dimensional variants of the core _Pandas_ data structures. Provides a _pandas_-like and _pandas_-compatible toolkit for analytics on multi-dimensional arrays." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "source": [ "### Advanced Statistics\n", "\n", "* __[Statsmodels](https://pypi.org/project/statsmodels/)__ (_statsmodels_ 0.9.0): provides implementations of all the major statistical algorithms. Preferentially works with _Pandas DataFrames_. Has the option of using *R*-like syntax, which you’ll probably like if you’re familiar with *R*.\n", "

\n", "* __[seaborn](https://pypi.org/project/seaborn/)__ (_seaborn_ 0.9.0): a set of statistical plotting tools which extends the plotting abilities of _matplotlib_. The plots look very elegant. Well worth looking at if you do a lot of statistical work. Takes _Pandas DataFrames_ as standard." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "source": [ "### Other Application-Dependent Packages\n", "\n", "* __[SPy](https://pypi.org/project/spectral/)__ (_spectral_ 0.19): for processing hyperspectral image data (imaging spectroscopy data). It has functions for reading, displaying, manipulating, and classifying hyperspectral imagery.\n", "* __[AstroPy](https://pypi.org/project/astropy/)__ (_astropy_ 3.0.4): contains core functionality and some common tools needed for performing astronomy and astrophysics research with _Python_.\n", "* __[PyTables](https://pypi.org/project/tables/)__ (_tables_ 3.4.4): or managing hierarchical datasets and designed to efficently cope with extremely large amounts of data (note Pandas does this pretty well for the most part).\n", "* __[Bokeh](https://pypi.org/project/bokeh/)__ (_bokeh_ 0.13.0): for interactive plotting\n", "* __[CartoPy](https://pypi.org/project/Cartopy/)__ (_cartopy_ 0.16.0): for geographic plotting. Requires install of _Proj4_ 4.9.3\n", "* __[Matplotlib basemap](https://pypi.org/project/basemap/)__ (_basemap_ 1.0.7): An add-on toolkit for _matplotlib_ that lets you plot data on map projections with coastlines, lakes, rivers and political boundaries. See http://matplotlib.github.com/basemap/users/examples.html for examples of what it can do.\n", "* __[GDAL](https://pypi.org/project/GDAL/)__ (_gdal_ 2.3.1) and OGR: geographic transformations and warping. Fantastic and the gold standard if you can get it to work, expect a bit of a fight but well worth it.\n", "* __[PySAL](https://pypi.org/project/PySAL/)__ (_PySAL_ 1.14.4.post2): Spatial Analysis Library. Particularly good at spatial econometrics, location modelling..." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "source": [ "### Python Tips\n", "\n", "Packages should be imported with their commonly used names:\n", ">`import numpy as np`

\n", ">`import matplotlib.pyplot as plt`

\n", ">`import scipy as sp`

\n", ">`import pandas as pd`

\n", ">`import seaborn as sns`

" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "source": [ "### Lesson Summary\n", "\n", "* _Python_ and _IPython_ provide the foundation for conducting Statistical Analysis in Python.\n", "\n", "\n", "* The basic building blocks consists of the _JuPyter_, _NumPy_, _SciPy_, and _Matplotlib_ packages.\n", "\n", "\n", "* _Pandas_ enables the use of _Series_ (1-dimensional) and _DataFrames_ (2-dimensional) to conduct statistical analysis.\n", "\n", "\n", "* _Statsmodels_ and _Seaborn_ provide advanced statistical modeling, analysis, and visualization capabilities in _Python_.\n", "\n", "\n", "* _Python_ packages for statistics can be managed using the package manager in _Anaconda_ (preferred) or through the use of the _Python Package Index_ (_PyPi_)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ ] } ], "metadata": { "anaconda-cloud": { }, "celltoolbar": "Slideshow", "kernelspec": { "display_name": "SageMath (stable)", "language": "sagemath", "name": "sagemath" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" } }, "nbformat": 4, "nbformat_minor": 0 }