Image: ubuntu2004
Plotting data - one numerical variable
Plotting data
Matplotlib library
The Python programming language itself does not have functions for plotting graphs. We have to use an additional library to do this. Matplotlib is a popular Python library for plotting graphs. The official matplotlib website is http://matplotlib.org, which has a gallery of possible graph types.
To use matplotlib we must include the following code once in each notebook.
The line %matplotlib inline
allows us to display matplotlib-generated graphs within jupyter notebooks.
The line import matplotlib.pyplot as plt
loads the matplotlib library so we can use its plotting functions. In addition we rename the library plt
for convenience (otherwise we have to keep writing matplotlib.pyplot
every time we wanted to change something in the graph).
One numerical variable: histograms
Histograms are the main method of displaying a numerical (quantitative) variable.
If you haven't done so already, watch the first half of this video on histograms (you can ignore the second half on stemplots as they are rarely used nowadays).
To plot a histogram of Alaskan salmon masses we use the hist()
method like so:
Run the following code cell to see how this is done.
Notice that this histogram shows two distinct peaks. Such a distribution is described as bimodal, as in having two modes (peaks). In contrast, a distribution that has just one peak is called unimodal.
Can you think of a reason why these salmon have a bimodal distribution in mass, and why the peak at 3kg is lower than the peak at 1.75kg?
The piece of code
looks like we're accessing the value of the key 'mass'
in a Python dictionary called salmon_masses
. We're not. salmon_masses
is a DataFrame not a dictionary. The syntax is the same but the effect is different. salmon_masses['mass']
contains all of the masses. We can see this if we print it.
Note
Placing a semicolon at the end of the last plotting command in a code cell like so:
suppresses the printing of irrelevant information before the graph making the output cleaner. Try it in the above code.
Label your graphs
As with all graphs, the one we plotted above needs to be labelled fully and clearly so that someone else can look at it and know immediately what it is presenting. We need the following:
Labels on the and axes
A title
We add and axes labels with the functions
and a title with the function
It's worth pointing out that the unit of mass is included in the -axis label. This means a reader immediately knows what units the masses are in. If the units were missing the reader has to guess if the masses are in grams, kilograms or even pounds or ounces. Try to make life as easy as possible for other people to understand what you are presenting by including relevant information in your graphs and tables.