Lab 0
Getting Started
Welcome to the zeroth lab for CITS2402 Intro to Data Science!
This course is about doing, so let's get started.
The aim of this lab is twofold:
to get you up and running and familiar with the course software
to provide a refresher working with Python
Communicating Online
The official communication platform for notices and questions is help2402.
Getting started with cshelp (help2402) and csmarks
You can access the help server by navigating directly to the address https://secure.csse.uwa.edu.au/run/help2402. It may be useful to bookmark this page.
Log in using your UWA details.
On the help2402 page go to the 'Options' drop down menu and select 'Set your PREFERENCES'.
At the bottom of this page you should set your email preferences to either send an email for each posting, or send a daily digest.
It is important that you set your email preference, so that you receive announcements for the unit.
This is even more important during the pandemic period, where there are no face-to-face lectures.
You can also set other things here such as your preferred photo or avatar.
Please note that this unit does not support anonymous posting.
Finally, use the 'Jump to' menu to jump to csmarks
. This is where marks will be posted during the unit.
Getting Started with discord
The help server is great for asynchronous communication. For the online scheduled labs, in response to the pandemic, it is useful to be able to message and speak in 'real-time'. To assist with this we are trialing the use of a discord
server, in addition to Zoom
(https://www.uwa.edu.au/library/learning-online).
The discord server is called 'CITS2402'. You should have received an email inviting you to join, with a code. If the code has expired, please email for a new code. (This is to avoid trolls.) You are asked to use your real name on the server.
You can use the server through a browser, or download an app for your device(s).
Once you enter the server, in the list on the left you will find some general channels, then under the category 'LABS', a channel called lab-00
for questions or comments related to this lab.
This channel will be monitored during the scheduled lab times. If you need help, the teaching staff will be able to call you. You and the teaching staff can also both look at your notebook on CoCalc (see below) without having to share screens.
Getting Started in CoCalc
As discussed in the lectures we will be using a cloud computing environment provided by CoCalc
. This gives us access to Jupyter notebooks for python programming, the underlying Unix (linux) shell, and some extra facilities like in-built version control, shared workspaces, and assignment distribution and submission.
A key advantage of using this cloud solution is that you don't need specific software on your own machine or device, or to worry about configuration, to ensure we are all using a common environment - all you need is a web browser.
Its better to do than to see, so let's get straight into it!
Signing up
If you are reading this lab sheet in CoCalc then you have already signed up!
If not, you should have received an email to your UWA student email address from the address [email protected]
inviting you to sign up, with the following instructions.
Open CoCalc
Sign up/in using exactly your email address [your address]
Open the project 'CITS2402 Introduction to Data Science'.
If you can't find the email it may be because:
you have recently enrolled in the unit - in this case you can still sign up by going directly to https://cocalc.com/app, but you may not be able to access the CITS2402 project - let us know if this the case
perhaps your UWA email is being diverted to junk or spam?
Follow the link and sign up using your UWA email address. If you are not automatically signed in, sign in to your account now.
Projects and Directories
When you sign in you should find yourself on the Projects
page. Projects are the main organisational unit in CoCalc.
Click on the CITS2402 project to open it.
Note: If you see a banner at the top that says "Trial Project - buy a subscription and apply upgrades..." you can ignore this. The School will be taking care of this.
You should find you are in the 'Files' area, with a directory listing of your home directory
.
Click on the 'Labs' directory (folder). This is the directory where the lab sheets for the course will be published.
Each lab will be in its own subdirectory, containing the python jupyter notebook for the lab, along with any supporting files. In some cases you will also upload your own files to this subdirectory.
You should see the first lab called GettingStarted
. Click on this directory.
You should now see a file called GettingStarted.ipynb
. The extension ipynb
designates it as an IPYthon NoteBook.
Click on the notebook. You should now see this exercise sheet!
If you have been following a printed version of this lab sheet, you should now switch to the on-line version.
Before proceeding, read the CoCalc Student Guide which provides some extra information on using CoCalc.
Settings and keyboard shortcuts
You can access the preferences through Account | Preferences
. Here you can customise a number of things related to your working environment, put in your picture and the like.
I would recommend setting Editor keyboard bindings
to Emacs
. This will allow you to use GNU emacs/bash commands for quick editing. In particular, the following are handy for quickly editing code without leaving the central keypad:
Keys | Action |
---|---|
Ctrl-f | forward (right) |
Ctrl-b | back (left) |
Crtl-d | delete character (forward) |
Ctrl-p | previous line (up) |
Ctrl-n | next line (down) |
Ctrl-k | "kill" (delete line from cursor) |
Ctrl-y | "yank" (insert last deleted line here) |
Many other commands can be found in any of the many Quick References online, such as this one or this one.
One advantage of learning a few emacs commands is that they will also work in the (linux) shell.
Accessing the shell
Speaking of the shell, let's take a quick look.
In Unix-based operating systems, the shell provides access to the operating system. In the files area, on the top right, you will find a box labelled Terminal command
. This allows you to send a command to the shell. Type in ls
to list the contents of the directory. You will see the output in a drop down box.
For more serious use of the shell, we can open a terminal window. Under New
choose Terminal
and then press Create
.
This will open a terminal window. If you type ls
you will see the directory listing again. If you type echo $SHELL
you should see:
This tells you that you are using the GNU bash shell.
Try the pwd
(print working directory) command. This will tell you the path to the directory you are currently in.
Notice that you can now use Ctrl-p
and Ctrl-n
to go back and forth between previous commands without re-typing them. You can also edit the commands using the other shortcuts above.
The shell is not an examinable part of this course, but knowing some basic shell commands can be extremely useful. If you haven't used a Unix (or Linux) shell before, when you have some time its worth spending a little time getting to know it.
Jupyter Notebooks
Jupyter notebooks are a fabulous way to do data science, and have become somewhat of a defacto standard for moderate sized projects.
Jupyter notebooks are available as part of a number of local and cloud contexts that combine them with other facilities. In the UWA labs, for example, we also have Jupyter Lab as part of the Anaconda distribution. This is the recommended software for this unit should you need to run python offline for any reason, for example if you have trouble connecting to the internet. The Anaconda Individual Edition can be downloaded and installed on your home computer for free.
The primary environment for this unit, however, is the CoCalc cloud environment, and this is the environment where the labs and assignments will be distributed and marked. It contains a number of additional facilities alongside notebooks.
Kernels
The kernel is the program (language) that sits behind the notebook and runs or interprets the code you write.
The notebooks can run a range of interpreted languages. We will only use the 'Python 3 (system wide)' kernel to ensure we are all working in the same environment.
Cells
A major reason for the popularity of Jupyter Notebooks is that they mix text and programs (code) in a dynamic way that lets you record your findings in situ. It's a bit like a lab notebook (hence the name) with dynamic code in it rather than just text.
Notebooks achieve this by using two kinds of cells:
text cells (like this one) that contain text written using markdown (as well as html and latex in CoCalc)
code cells, that can be executed in-situ
Both kinds of cells are executed/rendered by hitting the 'run' button (right-facing triangle) or pressing 'Shift-Return'. Try it in the next cell...
Equally important to know is the 'stop' button (the square) which you will need if your code gets stuck in a loop and you need to halt it. You can also interrupt and restart the kernel from the 'Kernel' drop-down menu.
You can create a new cell by hitting the '+' button, and choose its type by selecting 'Code' or 'Markdown' in the drop down menu.
The python interpreter will print the value of the last evaluation in the cell:
Aside from using it as a calculator...
... this is useful for checking the values of variables when coding and debugging...
Two types of editing
Its important to understand that there are two types of editing within a notebook. Editing within a cell and editing of the notebook (outside the cells).
When you have clicked in a cell, and the cell box turns green, you are editing within the cell. The Emacs shortcuts discussed ealier are useful for quickly editing text and code in cells.
When you click the space to the left of the cell, and the cell box turns blue, you are editing the notebook. Try this now by using the arrow buttons in the toolbar to move this cell up and then back to where it started.
The drop down menus and buttons in the toolbar immediately above the workspace work on the notebook rather than within cells. (A few at the very top, like 'undo', 'cut' and 'paste' work within cells.)
It can also be useful to learn some of the shortcuts for working on the notebook. You can find these by clicking on the button that looks like a keyboard.
Some I find particularly useful are:
Keys | Action |
---|---|
Shift-Enter | run cell |
d-d (d twice) | delete cell |
x | cut cell |
c | copy cell |
v | paste cell (below) |
Shift-up (or down) | select multiple cells |
For more about cells and the toolbars, read CoCalc's Jupyter Notebooks page (at least as far as Jupyter Notebook Enhancements).
Backing Up Your Work
Cocalc regularly saves your work. You can access previous versions though the TimeTravel facility. Read the Time Travel documentation to see how this works. The 'Save' button colour is diluted when the work has been saved. You can save it manually at any time.
Time Travel works at the server (cloud) end, of course, so will not save your work if there is an internet failure. If a red badge comes up over the green Save button, then your work is not being autosaved. This is likely because there is a problem with your connection, and you may need to refresh the browser page. If this happens, you should stop typing as any unsaved work may be lost. (It may be worth copying the last thing you did before refreshing.)
I recommend that you download a copy of your work at the end of each session as a backup. You can do this through the Files directory.
To download an individual notebook or other file, just click the cloud with the arrow in the right hand side. Try this now.
To download a directory, select the directory and choose 'Compress' to create a zip file. You can then download the zip file.
A Python Refresher
In the same directory as this file, you will find a notebook called Python-Programming-Refresher
. This will give you a chance to refresh some of the prerequisite python you should have seen before, and follow up on any gaps, as well as practice using a notebook.
Proceed to the Refresher now...