Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
| Download
Views: 5313
Image: default
Kernel: Python 3 (system-wide)

Lab 0

Getting Started

Welcome to the zeroth lab for CITS2402 Intro to Data Science!

This course is about doing, so let's get started.

The aim of this lab is twofold:

  1. to get you up and running and familiar with the course software

  2. to provide a refresher working with Python

Communicating Online

The official communication platform for notices and questions is help2402.

Getting started with cshelp (help2402) and csmarks

You can access the help server by navigating directly to the address https://secure.csse.uwa.edu.au/run/help2402. It may be useful to bookmark this page.

Log in using your UWA details.

On the help2402 page go to the 'Options' drop down menu and select 'Set your PREFERENCES'.

At the bottom of this page you should set your email preferences to either send an email for each posting, or send a daily digest.

It is important that you set your email preference, so that you receive announcements for the unit.

This is even more important during the pandemic period, where there are no face-to-face lectures.

You can also set other things here such as your preferred photo or avatar.

Please note that this unit does not support anonymous posting.

Finally, use the 'Jump to' menu to jump to csmarks. This is where marks will be posted during the unit.

Getting Started with discord

The help server is great for asynchronous communication. For the online scheduled labs, in response to the pandemic, it is useful to be able to message and speak in 'real-time'. To assist with this we are trialing the use of a discord server, in addition to Zoom (https://www.uwa.edu.au/library/learning-online).

The discord server is called 'CITS2402'. You should have received an email inviting you to join, with a code. If the code has expired, please email for a new code. (This is to avoid trolls.) You are asked to use your real name on the server.

You can use the server through a browser, or download an app for your device(s).

Once you enter the server, in the list on the left you will find some general channels, then under the category 'LABS', a channel called lab-00 for questions or comments related to this lab.

This channel will be monitored during the scheduled lab times. If you need help, the teaching staff will be able to call you. You and the teaching staff can also both look at your notebook on CoCalc (see below) without having to share screens.

Getting Started in CoCalc


As discussed in the lectures we will be using a cloud computing environment provided by CoCalc. This gives us access to Jupyter notebooks for python programming, the underlying Unix (linux) shell, and some extra facilities like in-built version control, shared workspaces, and assignment distribution and submission.

A key advantage of using this cloud solution is that you don't need specific software on your own machine or device, or to worry about configuration, to ensure we are all using a common environment - all you need is a web browser.

Its better to do than to see, so let's get straight into it!

Signing up

If you are reading this lab sheet in CoCalc then you have already signed up!

If not, you should have received an email to your UWA student email address from the address [email protected] inviting you to sign up, with the following instructions.

  1. Open CoCalc

  2. Sign up/in using exactly your email address [your address]

  3. Open the project 'CITS2402 Introduction to Data Science'.

If you can't find the email it may be because:

  • you have recently enrolled in the unit - in this case you can still sign up by going directly to https://cocalc.com/app, but you may not be able to access the CITS2402 project - let us know if this the case

  • perhaps your UWA email is being diverted to junk or spam?

Follow the link and sign up using your UWA email address. If you are not automatically signed in, sign in to your account now.

Projects and Directories

When you sign in you should find yourself on the Projects page. Projects are the main organisational unit in CoCalc.

Click on the CITS2402 project to open it.

Note: If you see a banner at the top that says "Trial Project - buy a subscription and apply upgrades..." you can ignore this. The School will be taking care of this.


You should find you are in the 'Files' area, with a directory listing of your home directory.

Click on the 'Labs' directory (folder). This is the directory where the lab sheets for the course will be published.

Each lab will be in its own subdirectory, containing the python jupyter notebook for the lab, along with any supporting files. In some cases you will also upload your own files to this subdirectory.

You should see the first lab called GettingStarted. Click on this directory.

You should now see a file called GettingStarted.ipynb. The extension ipynb designates it as an IPYthon NoteBook.

Click on the notebook. You should now see this exercise sheet!

If you have been following a printed version of this lab sheet, you should now switch to the on-line version.

  • Before proceeding, read the CoCalc Student Guide which provides some extra information on using CoCalc.

Settings and keyboard shortcuts

You can access the preferences through Account | Preferences. Here you can customise a number of things related to your working environment, put in your picture and the like.

I would recommend setting Editor keyboard bindings to Emacs. This will allow you to use GNU emacs/bash commands for quick editing. In particular, the following are handy for quickly editing code without leaving the central keypad:

KeysAction
Ctrl-fforward (right)
Ctrl-bback (left)
Crtl-ddelete character (forward)
Ctrl-pprevious line (up)
Ctrl-nnext line (down)
Ctrl-k"kill" (delete line from cursor)
Ctrl-y"yank" (insert last deleted line here)

Many other commands can be found in any of the many Quick References online, such as this one or this one.

One advantage of learning a few emacs commands is that they will also work in the (linux) shell.

Accessing the shell

Speaking of the shell, let's take a quick look.

In Unix-based operating systems, the shell provides access to the operating system. In the files area, on the top right, you will find a box labelled Terminal command. This allows you to send a command to the shell. Type in ls to list the contents of the directory. You will see the output in a drop down box.

For more serious use of the shell, we can open a terminal window. Under New choose Terminal and then press Create.

This will open a terminal window. If you type ls you will see the directory listing again. If you type echo $SHELL you should see:

~$ echo $SHELL /bin/bash ~$

This tells you that you are using the GNU bash shell.

Try the pwd (print working directory) command. This will tell you the path to the directory you are currently in.

Notice that you can now use Ctrl-p and Ctrl-n to go back and forth between previous commands without re-typing them. You can also edit the commands using the other shortcuts above.

The shell is not an examinable part of this course, but knowing some basic shell commands can be extremely useful. If you haven't used a Unix (or Linux) shell before, when you have some time its worth spending a little time getting to know it.

Jupyter Notebooks

Jupyter notebooks are a fabulous way to do data science, and have become somewhat of a defacto standard for moderate sized projects.

Jupyter notebooks are available as part of a number of local and cloud contexts that combine them with other facilities. In the UWA labs, for example, we also have Jupyter Lab as part of the Anaconda distribution. This is the recommended software for this unit should you need to run python offline for any reason, for example if you have trouble connecting to the internet. The Anaconda Individual Edition can be downloaded and installed on your home computer for free.

The primary environment for this unit, however, is the CoCalc cloud environment, and this is the environment where the labs and assignments will be distributed and marked. It contains a number of additional facilities alongside notebooks.

Kernels

The kernel is the program (language) that sits behind the notebook and runs or interprets the code you write.

The notebooks can run a range of interpreted languages. We will only use the 'Python 3 (system wide)' kernel to ensure we are all working in the same environment.

Cells

A major reason for the popularity of Jupyter Notebooks is that they mix text and programs (code) in a dynamic way that lets you record your findings in situ. It's a bit like a lab notebook (hence the name) with dynamic code in it rather than just text.

Notebooks achieve this by using two kinds of cells:

  • text cells (like this one) that contain text written using markdown (as well as html and latex in CoCalc)

  • code cells, that can be executed in-situ

Both kinds of cells are executed/rendered by hitting the 'run' button (right-facing triangle) or pressing 'Shift-Return'. Try it in the next cell...

print("Hello World") print("This is a code cell. You execute the code by hitting the run (play) button in the toolbar, or hitting Shift-Return. Try this now.") from datetime import date print("\nToday is", date.today().strftime("%B %d, %Y"))
Hello World This is a code cell. You execute the code by hitting the run (play) button in the toolbar, or hitting Shift-Return. Try this now. Today is July 28, 2020

Equally important to know is the 'stop' button (the square) which you will need if your code gets stuck in a loop and you need to halt it. You can also interrupt and restart the kernel from the 'Kernel' drop-down menu.

You can create a new cell by hitting the '+' button, and choose its type by selecting 'Code' or 'Markdown' in the drop down menu.

The python interpreter will print the value of the last evaluation in the cell:

print("This is a code cell. You execute the code by hitting the run (play) button in the toolbar, or hitting Shift-Return. Try this now.") from datetime import date date.today().strftime("%B %d, %Y")
This is a code cell. You execute the code by hitting the run (play) button in the toolbar, or hitting Shift-Return. Try this now.
'July 28, 2020'

Aside from using it as a calculator...

7 * 6
42

... this is useful for checking the values of variables when coding and debugging...

x = 7 * 6 x
42

Two types of editing

Its important to understand that there are two types of editing within a notebook. Editing within a cell and editing of the notebook (outside the cells).

When you have clicked in a cell, and the cell box turns green, you are editing within the cell. The Emacs shortcuts discussed ealier are useful for quickly editing text and code in cells.

When you click the space to the left of the cell, and the cell box turns blue, you are editing the notebook. Try this now by using the arrow buttons in the toolbar to move this cell up and then back to where it started.

The drop down menus and buttons in the toolbar immediately above the workspace work on the notebook rather than within cells. (A few at the very top, like 'undo', 'cut' and 'paste' work within cells.)

It can also be useful to learn some of the shortcuts for working on the notebook. You can find these by clicking on the button that looks like a keyboard.

Some I find particularly useful are:

KeysAction
Shift-Enterrun cell
d-d (d twice)delete cell
xcut cell
ccopy cell
vpaste cell (below)
Shift-up (or down)select multiple cells
  • For more about cells and the toolbars, read CoCalc's Jupyter Notebooks page (at least as far as Jupyter Notebook Enhancements).

Backing Up Your Work

Cocalc regularly saves your work. You can access previous versions though the TimeTravel facility. Read the Time Travel documentation to see how this works. The 'Save' button colour is diluted when the work has been saved. You can save it manually at any time.

Time Travel works at the server (cloud) end, of course, so will not save your work if there is an internet failure. If a red badge comes up over the green Save button, then your work is not being autosaved. This is likely because there is a problem with your connection, and you may need to refresh the browser page. If this happens, you should stop typing as any unsaved work may be lost. (It may be worth copying the last thing you did before refreshing.)

I recommend that you download a copy of your work at the end of each session as a backup. You can do this through the Files directory.

To download an individual notebook or other file, just click the cloud with the arrow in the right hand side. Try this now.

To download a directory, select the directory and choose 'Compress' to create a zip file. You can then download the zip file.

A Python Refresher

In the same directory as this file, you will find a notebook called Python-Programming-Refresher. This will give you a chance to refresh some of the prerequisite python you should have seen before, and follow up on any gaps, as well as practice using a notebook.

Proceed to the Refresher now...