Learn to Code for Data Analysis | Jez Phipps¶

The Learn to Code for Data Analysis course is a free MOOC (Massive Open Online Course) created by the Open University and hosted by FutureLearn, a UK-based social learning platform.

Over 4 weeks, this hands-on course introduced the use of:

Python 3 and the pandas data analysis module
the Jupyter Notebook application
SageMathCloud (an online resource providing access to open source maths tools) and/or the Anaconda IDE (for offline work)

Having completed the course, I subsequently decided to create this online record of my learning.

My FutureLearn profile can be found here: Jez Phipps

Week 1¶

In the first week, students learned how to:

set up the Anaconda and/or SageMathCloud environment(s)
use Jupyter notebooks to write and execute simple programs with Python/pandas
load data from Excel and compute simple statistics from selected data
create new columns, derived from calculations on data from other columns
sort table data
write a simple data analysis report and share it online

Project 1¶

In Project 1, we applied the techniques and principles learned in Week 1 to TB population and death rate data sourced from the World Health Organisation. My completed project can be found here (in simple HTML format) and here (opens in the SMC web app, where you can also access the project notebook).

Week 2¶

In Week 2, we learned how to load a dataset into a dataframe from a CSV file, how to clean up the data and how to use the data to obtain answers to key questions. Week 2 activities included learning how to:

use expressions to display rows and create new dataframes consisting of just some of the columns of another dataframe
use comparison and bitwise operators to combine comparison expressions, allowing more complex data queries to be applied
remove unwanted spaces and characters and process missing values
change the datatype of values appropriately to allow correct processing and display
alter dataframe indexing and visualise the data as a graph

Project 2¶

In Project 2, we applied what we had learned so far to historic weather data sourced from Weather Underground. My completed project can be found here (in simple HTML format) and here (opens in the SMC web app, where you can also access the project notebook).

Week 3¶

In Week 3, we found out how to transform and combine data, including how to:

create small dataframes from scratch, to try out analyses and test code
define your own functions and test them, especially with borderline and unlikely cases
execute code selectively, based on conditions
apply a function to a column, to generate a new column with the transformed values
join two tables on a common column in four different ways
use constants to make code easier to read and change
find unique indicators, being aware that some make for better like-for-like comparisons than others
download indicator data directly from the World Bank into a dataframe
reset the index of a dataframe
compute a correlation coefficient between two series of values and check whether the correlation is statistically significant
generate scatterplots in order to identify relationships
make appropriate use of the logarithmic scale depending on the range of values

Project 3¶

In Project 3, we applied what we had learned so far to GDP and Life Expectancy datasets sourced from the World Bank development data site. My completed project can be found here (in simple HTML format) and here (opens in the SMC web app, where you can also access the project notebook).

Week 4¶

In Week 4, we learned how to take a dataset that contains multiple possible groupings or subsets of data, and work with those groups to perform a variety of more advanced transformations. In particular, learners discovered how to:

extract data directly from a data repository using an API
split the data contained in a dataframe into multiple groups based on the unique ‘key’ values in a single column, or unique combinations of values that appear across two or more columns
apply an aggregate (summary) function to generate a single summary result for a group
combine results to generate a summary report
apply a filter function that would use the rows contained in each group as the basis for a conditional filtering operation
use a pivot table to generate a variety of summary reports

Project 4¶

In Project 4, we consolidated what we had learned throughout the course by analysing trade data sourced from the United Nations Comtrade database. My completed project can be found here (in simple HTML format) and here (opens in the SMC web app, where you can also access the project notebook).

Course result¶

My overall (averaged) assessment score was 100% - Eligibility for a Certificate of Achievement confirmed.

Note: This record does not imply the conferment of a University qualification; nor does it verify the identity of the learner. For more information about the effort required to become eligible, visit FutureLearn's Certificates and Statements FAQ.

Top of page