Open University logo   


Learn to Code for Data Analysis | Jez Phipps


The Learn to Code for Data Analysis course is a free MOOC (Massive Open Online Course) created by the Open University and hosted by FutureLearn, a UK-based social learning platform.

Over 4 weeks, this hands-on course introduced the use of:

  • Python 3 and the pandas data analysis module
  • the Jupyter Notebook application
  • SageMathCloud (an online resource providing access to open source maths tools) and/or the Anaconda IDE (for offline work)

Having completed the course, I subsequently decided to create this online record of my learning.

My FutureLearn profile can be found here: Jez Phipps


Week 1

In the first week, students learned how to:

  • set up the Anaconda and/or SageMathCloud environment(s)
  • use Jupyter notebooks to write and execute simple programs with Python/pandas
  • load data from Excel and compute simple statistics from selected data
  • create new columns, derived from calculations on data from other columns
  • sort table data
  • write a simple data analysis report and share it online

Project 1

In Project 1, we applied the techniques and principles learned in Week 1 to TB population and death rate data sourced from the World Health Organisation. My completed project can be found here (in simple HTML format) and here (opens in the SMC web app, where you can also access the project notebook).


Week 2

In Week 2, we learned how to load a dataset into a dataframe from a CSV file, how to clean up the data and how to use the data to obtain answers to key questions. Week 2 activities included learning how to:

  • use expressions to display rows and create new dataframes consisting of just some of the columns of another dataframe
  • use comparison and bitwise operators to combine comparison expressions, allowing more complex data queries to be applied
  • remove unwanted spaces and characters and process missing values
  • change the datatype of values appropriately to allow correct processing and display
  • alter dataframe indexing and visualise the data as a graph

Project 2

In Project 2, we applied what we had learned so far to historic weather data sourced from Weather Underground. My completed project can be found here (in simple HTML format) and here (opens in the SMC web app, where you can also access the project notebook).


Week 3

In Week 3, we found out how to transform and combine data, including how to:

  • create small dataframes from scratch, to try out analyses and test code
  • define your own functions and test them, especially with borderline and unlikely cases
  • execute code selectively, based on conditions
  • apply a function to a column, to generate a new column with the transformed values
  • join two tables on a common column in four different ways
  • use constants to make code easier to read and change
  • find unique indicators, being aware that some make for better like-for-like comparisons than others
  • download indicator data directly from the World Bank into a dataframe
  • reset the index of a dataframe
  • compute a correlation coefficient between two series of values and check whether the correlation is statistically significant
  • generate scatterplots in order to identify relationships
  • make appropriate use of the logarithmic scale depending on the range of values

Project 3

In Project 3, we applied what we had learned so far to GDP and Life Expectancy datasets sourced from the World Bank development data site. My completed project can be found here (in simple HTML format) and here (opens in the SMC web app, where you can also access the project notebook).


Week 4

In Week 4, we learned how to take a dataset that contains multiple possible groupings or subsets of data, and work with those groups to perform a variety of more advanced transformations. In particular, learners discovered how to:

  • extract data directly from a data repository using an API
  • split the data contained in a dataframe into multiple groups based on the unique ‘key’ values in a single column, or unique combinations of values that appear across two or more columns
  • apply an aggregate (summary) function to generate a single summary result for a group
  • combine results to generate a summary report
  • apply a filter function that would use the rows contained in each group as the basis for a conditional filtering operation
  • use a pivot table to generate a variety of summary reports

Project 4

In Project 4, we consolidated what we had learned throughout the course by analysing trade data sourced from the United Nations Comtrade database. My completed project can be found here (in simple HTML format) and here (opens in the SMC web app, where you can also access the project notebook).


Course result

My overall (averaged) assessment score was 100% - Eligibility for a Certificate of Achievement confirmed.

Note: This record does not imply the conferment of a University qualification; nor does it verify the identity of the learner. For more information about the effort required to become eligible, visit FutureLearn's Certificates and Statements FAQ.

©  Phipps E&OE