Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Project: math480-2016
Views: 2158

Math 480: Open Source Mathematical Software

2016-05-09

William Stein

Lectures 19: Pandas (part 1 of 3)

Notes:

  • Homework (and grading that is due this friday at 6pm) is assigned

  • Screencast...

  • We will talk about Pandas this week, then statsmodels and numpy/scipy starting next week (rather than wait until the end).

  • Pandas - overview

  • Pandas foundations ("in 10 minutes")

  • Start on your homework

Pandas Overview

"pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language."

  • Problem pandas solves: data analysis and modeling. pandas enables you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R.

  • Pandas does not implement significant modeling functionality outside of linear and panel regression. Instead one uses statsmodels ("estimate statistical models, and perform statistical tests") and scikit-learn ("Machine Learning in Python"), which we will look at next week.

  • Look at the overview of functionality at the bottom here: http://pandas.pydata.org/#library-highlights

Next, let's see some very basic foundations, before you try it out...

  • Look at the very beginning of ten-minutes-to-pandas.sagews in same directory.

  • Look at the beginning of plotting.sagews in same directory.

  • Wednesday: pandas101 data example