Notes
This notebook will contain brief notes on topics covered in the course.
Data Types, Operators, Expressions, and Comparisons
Data Types
There are many different classes of objects in Python. We will not cover all of them here. Below is a short list of some of the basic objects (datatypes) we will commonly use:
int: Integers like
...,-2, -1, 0, 1, 2....
There is a limit to how large these can be and you must be careful about what happens when you use division.float: Floating point numbers have a decimal part like
2.1
and3.000023
. Because the computer uses binary to represent numbers, not every number can be perfectly represented by a float.bool: Boolean variables are either
True
orFalse
and can be used to compare other data types.list: A list is an ordered collection of objects. These are typically used if you need to repeat a procedure. For example
[1,2,3,5]
is a list. Note: lists use square brackets and can contain different data types.string: A string is essentially a list of characters. For example,
'python'
is a string. Note: the int1
is different from the char'1'
.
You can figure out an objects type by using the type()
function.
Operators and Expressions
Operators allow you to take data types and manipulate their values. The same symbol may do different things depending on the data types used as inputs. Expressions are just how we tell the computer what we would like it to do.
Comparisons
Comparisons are expressions that usually evaluate to True
or False
. We use these when writing code that reacts to a given input.
Variables
Variables act as a storage or placeholder for different data types that allows us to write abstract code that can handle a unknown input values. Variable names are case sensitive and cannot begin with a number. There are a few other rules regarding variable names. As the scope of your project grows, make sure to use long, descriptive variable names to help. Note that =
is different from ==
. In particular, =
assigns the value on the right to the variable on the left, while ==
returns a boolean value when comparing the objects on the right and left.
Functions
Functions are helpful in that they allow you to write code that can be easily reused. Instead of writing the same block of code multiple times in a program (if needed), use a function. Note that in Python, whitespace is critical. Indented lines are considered to be part of the function, while lines that line up with the function definition line do not.
The basic structure is:
Control Flow: If/Elif/Else and Loops
If/Elif/Else
If/Elif/Else statements are useful whenever we need a program that reacts to some input. Note the importance of whitespace here as well. For example, the absolute value function is defined as $$|x|=\begin{cases}\phantom{-}x & \text{ if $x\geq 0$}\-x & \text{ otherwise.}\end{cases}$$
The basic structure is:
If the comparison is true, code1
executes. Otherwise, code2
executes. If more than two options are needed, use elif
with additional comparisons.
More on Lists and Strings
Lists and strings are very similar and share a lot of similar behavior/operators. When assigning a list or string to a variable, we can extract portions of the list of string using indexing. Remember that Python is zero indexed. Negative numbers may be used to select items counting from right to left rather thant he usual left to right.
Lists and strings allow for slicing, which produces a copy of the list or string containing the subset of elements listed.
The usual notation is:
Blanks may be used for the start and stop portion to indicate that you want to start from the beginning or go to the very end. A negative step size reverses the direction of the list or string.
While Loops
While loops are useful when you need to repeat some procedure but are not sure of how many iterations will be needed. In many situations, while and for loops are interchangable.
The basic structure is:
The code inside the while loop will continue to be executed until the comparison is false. That means it is up to you to update the variables associated with the comparison unless you want a loop that either never runs or runs forever.
For Loops
A for loop is useful when you you would like to compute something repeatedly based on each element in a list, string, or other iterable.
The basic structure is:
If your iterable is a list of numbers or a string, then we can perform a common operation based on each individual element in the list or string.
The following example combines the length function with string indexing. It accomplishes the same thing as the previous loop.
Compound Interest
The basic formula for compound interest:
principal
annual interest rate
number of compoundings per year
contribution for principal per compounding period
new principal after one compounding period
Recursion
Recursion involves objects whose definitions refer to themselves.
Recursion is useful when solving a problem whose solution involves first solving a “smaller” version of the same problem.
Recursive definitions involve two key ingredients:
A base case – this is an initial case or cases used to compute an answer.
An inductive case – this case tells you how to reduce the current case to previous cases.
Below we give a recursive definition of the function that computes the minimum number of moves needed to solve the Tower of Hanoi problem with discs.
Basic Plotting
We will use the Matplotlib library for basic plotting. Matplotlib is not available by default in Python and must be imported. The general idea is that you issue various commands that modify a plot until you are satisfied with the results. There are a wide variety of options. The recommendation is that you use the Matplotlib website for help. Do not attempt to memorize all of Matplotlib.
List Comprehensions
A list comprehension if a shortcut that allows you to create lists of elements using notation similar to set-builder notation.
The basic structure is:
This is equivalent to:
The list comprehension below contains the set of odd cubes from 0 to 1000.
Bisection Method
We use the Bisection Method to estimate the roots of a continuous function . This is done in the following way:
Select a tolerance level for the error in our approximation.
Find real numbers and so that .
Compute .
If is smaller than the tolerance level for the error, stop and return as an approximation of a root of .
If , we have found a root and we stop.
If , set . Otherwise set and return to step 2.
Newton's Method
Newton's Method is an alternative method for finding roots of differentiable functions. While it may converge faster than the Bisection Method algorithm, it is not always guaranteed to converge. Newton's Method uses the following recurrence relation to approximate roots of
Random Number Generation
To generate random numbers, use np.random
. The documentation covers most basic needs. Be sure to import the numpy
module.
We give a few examples below.
Generating 10 random coin flips.
Selecting (with replacement) from a desired list.
Shuffling a list.
Image Manipulation
Images are just matrices with several entries encoding the color/transparency of each invidual pixel. Below we illustrate some exaples of how to import, manipulate, and display images.
Each entry represents the intensity of the colors Red, Green, and Blue (RGB) in that order with a value from 0 to 255. Beware that other formats and Python modules might give different results.
Images here are numpy arrays and can be modified as such. In the example below sets the R values to zero, removing the color red from the image
We can also change the orientation or obtain a mirror image of the image by reshaping the original matrix.
We can also change individual values of the matrix to edit the image. Below we add a vertical white line to columns 400 to 499.
Numpy arrays allow for the creation of a mask. A mask is an array of True/False values. When modifying a matrix, the changes will only occur to the entries where the mask is True. In the example below, we create a max that is True outside a circle of radius 400 centered at the center of the image. So the mask only applies to matrix entries at least 400 pixels from the center of the image.m
Pandas Basics
Pandas is a module that allows us to read Excel and csv files. Pandas allows us to manipulate dataframes (you can think of dataframes as augmented matrices with a lot of available functions and methods) using Python.
The file format determines the read command needed. Use pd.read_csv
for csv files.
The shape attribute lets you know the dataframe size. You can obtain the indices (rows) using .index and the column labels using .columns.
The head command gives you the first 5 rows and is usually a good way to check the format and that the information has been read correctly.
OrderDate | Region | Rep | Item | Units | Unit Cost | Total | |
---|---|---|---|---|---|---|---|
0 | 2016-01-06 | East | Jones | Pencil | 95 | 1.99 | 189.05 |
1 | 2016-01-23 | Central | Kivell | Binder | 50 | 19.99 | 999.50 |
2 | 2016-02-09 | Central | Jardine | Pencil | 36 | 4.99 | 179.64 |
3 | 2016-02-26 | Central | Gill | Pen | 27 | 19.99 | 539.73 |
4 | 2016-03-15 | West | Sorvino | Pencil | 56 | 2.99 | 167.44 |
Much like the mask example we saw ealier, we can obtain whatever rows satisify some condition using comparisons inside of square brackets. For multiple comparisons requiring the use of 'and' or 'or', use & and | rather than the keywords 'and' and 'or'.
OrderDate | Region | Rep | Item | Units | Unit Cost | Total | |
---|---|---|---|---|---|---|---|
0 | 2016-01-06 | East | Jones | Pencil | 95 | 1.99 | 189.05 |
6 | 2016-04-18 | Central | Andrews | Pencil | 75 | 1.99 | 149.25 |
7 | 2016-05-05 | Central | Jardine | Pencil | 90 | 4.99 | 449.10 |
10 | 2016-06-25 | Central | Morgan | Pencil | 90 | 4.99 | 449.10 |
12 | 2016-07-29 | East | Parent | Binder | 81 | 19.99 | 1619.19 |
17 | 2016-10-22 | East | Jones | Pen | 64 | 8.99 | 575.36 |
19 | 2016-11-25 | Central | Kivell | Pen Set | 96 | 4.99 | 479.04 |
20 | 2016-12-12 | Central | Smith | Pencil | 67 | 1.29 | 86.43 |
21 | 2016-12-29 | East | Parent | Pen Set | 74 | 15.99 | 1183.26 |
23 | 2017-02-01 | Central | Smith | Binder | 87 | 15.00 | 1305.00 |
27 | 2017-04-10 | Central | Andrews | Pencil | 66 | 1.99 | 131.34 |
28 | 2017-04-27 | East | Howard | Pen | 96 | 4.99 | 479.04 |
30 | 2017-05-31 | Central | Gill | Binder | 80 | 8.99 | 719.20 |
32 | 2017-07-04 | East | Jones | Pen Set | 62 | 4.99 | 309.38 |
37 | 2017-09-27 | West | Sorvino | Pen | 76 | 1.99 | 151.24 |
41 | 2017-12-04 | Central | Jardine | Binder | 94 | 19.99 | 1879.06 |
OrderDate | Region | Rep | Item | Units | Unit Cost | Total | |
---|---|---|---|---|---|---|---|
6 | 2016-04-18 | Central | Andrews | Pencil | 75 | 1.99 | 149.25 |
7 | 2016-05-05 | Central | Jardine | Pencil | 90 | 4.99 | 449.10 |
10 | 2016-06-25 | Central | Morgan | Pencil | 90 | 4.99 | 449.10 |
19 | 2016-11-25 | Central | Kivell | Pen Set | 96 | 4.99 | 479.04 |
20 | 2016-12-12 | Central | Smith | Pencil | 67 | 1.29 | 86.43 |
23 | 2017-02-01 | Central | Smith | Binder | 87 | 15.00 | 1305.00 |
27 | 2017-04-10 | Central | Andrews | Pencil | 66 | 1.99 | 131.34 |
30 | 2017-05-31 | Central | Gill | Binder | 80 | 8.99 | 719.20 |
41 | 2017-12-04 | Central | Jardine | Binder | 94 | 19.99 | 1879.06 |
There are several useful methods for dataframes, see the documentation for more details. Here, value_counts() gives the number of unique entries in a column together with the number of instances of each.
The .loc method allows us to obtain slices of the dataframe. This is similar to slicing numpy arrays.
Item | Units | |
---|---|---|
38 | Binder | 57 |
39 | Pencil | 14 |
40 | Binder | 11 |
41 | Binder | 94 |
42 | Binder | 28 |
We can create new columns by simply defining them in terms of some formula.
After adding a column, we do a quick check to see if things have worked.
OrderDate | Region | Rep | Item | Units | Unit Cost | Total | Tax | |
---|---|---|---|---|---|---|---|---|
0 | 2016-01-06 | East | Jones | Pencil | 95 | 1.99 | 189.05 | 15.1240 |
1 | 2016-01-23 | Central | Kivell | Binder | 50 | 19.99 | 999.50 | 79.9600 |
2 | 2016-02-09 | Central | Jardine | Pencil | 36 | 4.99 | 179.64 | 14.3712 |
3 | 2016-02-26 | Central | Gill | Pen | 27 | 19.99 | 539.73 | 43.1784 |
4 | 2016-03-15 | West | Sorvino | Pencil | 56 | 2.99 | 167.44 | 13.3952 |