Basic Introduction to Python
This is a very basic introduction to Python. It is not exhaustive, but is meant to give you a starting point.
This notebook was written for PHY 403 by Segev BenZvi, University of Rochester, (Spring 2016), and updated by Aran Garcia-Bellido (Spring 2017).
It is based on a similar (longer) Python guide written by Kyle Jero (UW-Madison) for the IceCube Programming Bootcamp in June 2015, and includes elements from older guides by Jakob van Santen and Nathan Whitehorn.
What is Python?
Python is an imperative, interpreted programming language with strong dynamic typing.
Imperative: programs are built around one or more subroutines known as "functions" and "classes"
Interpreted: program instructions are executed on the fly rather than being pre-compiled into machine code
Dynamic Typing: data types of variables (
int
,float
,string
, etc.) are determined on the fly as the program runsStrong Typing: converting a variable from one type to another (e.g.,
int
tostring
) is not always done automatically
Python offers fast and flexible development and can be used to glue together many different analysis packages which have "Python bindings."
As a rule, Python programs are slower than compiled programs written in Fortran, C, and C++. But it's a much more forgiving programming language.
Why Use Python?
Python is one of the most popular scripting languages in the world, with a huge community of users and support on all major platforms (Windows, OS X, Linux).
Pretty much every time I've run into a problem programming in Python, I've found a solution after a couple of minutes of searching on google or stackoverflow.com!
Key Third-Party Packages
Must-Haves
NumPy: random number generation, transcendental functions, vectorized math, linear algebra.
SciPy: statistical tests, special functions, numerical integration, curve fitting and minimization.
Matplotlib: plotting: xy plots, error bars, contour plots, histograms, etc.
IPython: an interactive python shell, which can be used to run Mathematica-style analysis notebooks.
Worth Using
SciKits: data analysis add-ons to SciPy, including machine learning algorithms.
Pandas: functions and classes for specialized data analysis.
AstroPy: statistical methods useful for time series analysis and data reduction in astronomy.
Emcee: great implementation of Markov Chain Monte Carlo; nice to combine with the package Corner.
Specialized Bindings
Many C and C++ packages used in high energy physics come with bindings to Python. For example, the ROOT package distributed by CERN can be run completely from Python. If you are building ROOT in your computer from scratch, make sure you enable the python binding when you build ROOT.
Online Tools
If you don't want to install all these packages on your own computer, you can create a free account at cloud.sagemath.com. Sagemath gives you access to ipython notebooks running on remote servers. Recent versions of SciPy, NumPy, and Matplotlib are provided.
Similarly, try.jupyter.org allows to check simple code without opening an account (but does not allow to save files, for that you need to download Anaconda/Jupyter into your computer). Jupyter can also be run from github.com which is very useful as a code repository.
Programming Basics
We will go through the following topics, and then do some simple exercises.
Arithmetic Operators
Variables and Lists
Conditional Statements
Loops (
for
andwhile
)Functions
Importing Modules
Arithmetic Operators
Addition
Subtraction
Multiplication
Division
Note: in Python 2, division of two integers is always floor division. In Python 3, 1/2 automatically evaluates to the floating point number 0.5. To use floor division in Python 3, you'll have to run 1 // 2
.
Modulo/Remainder
Exponentiation
Variables
Variables are extremely useful for storing values and using them later. One can declare a variable to contain the output of any variable, function call, etc. However, variable names must follow certain rules:
Variable names must start with a letter (upper or lower case) or underscore
Variable names may contain only letters, numbers, and underscores _
The following names are reserved keywords in Python and cannot be used as variable names:
and del from not while
as elif global or with
assert else if pass yield
break except import print
class exec in raise
continue finally is return
def for lambda try
This time nothing printed out because the output of the expression was stored in the variable x
. To see the value we have to call the print
function:
Alternatively, just call x
and the notebook will evaluate it and dump the value to the output:
Recall that we don't have to explicitly declare what type something is in python, something that is not true in many other languages, we simply have to name our variable and specify what we want it to store. However, it is still nice to know the types of things sometimes and learn what types python has available for our use.
C-style formatted printing is also allowed:
Lists
Imagine that we are storing the heights of people or the results of a random process. We could imagine taking and making a new variable for each piece of information but this becomes convoluted very quickly. In instances like this it is best to store the collection of information together in one place. In python this collection is called a list and can be defined by enclosing data separated by commas in square brackets. A empty list can also be specified by square brackets with nothing between them and filled later in the program.
Notice that the type of our list is list and no mention of the data type it contains is made. This is because python does not fuss about what type of thing is in a list or even mixing of types in lists. If you have worked with nearly any other language this is different then you are used to since the type of your list must be homogeneous.
You can check the current length of a list by calling the len
function with the list as the argument:
In addition, you can add objects to the list or remove them from the list in several ways:
List Element Access
Individual elements (or ranges of elements) in the list can be accessed using the square bracket operators . For example:
This is an example of a slice, where we grab a subset of the list and also decide to step through the list by skipping every other element. The syntax is
listname[start:stop:stride]
Note that if start and stop are left blank, the full list is used in the slice by default.
A simple built-in function that is used a lot is the range function. It is not a list but returns one so we will discuss it here briefly. The syntax of the function is range(starting number, ending number, step size ). All three function arguments are required to be integers with the ending number not being included in the list. Additionally the step size does not have to be specified, and if it is not the value is assumed to be 1.
Conditional Statements
Conditionals are useful for altering the flow of control in your programs. For example, you can execute blocks of code (or skip them entirely) if certain conditions are met.
Conditions are created using if/elif/else
blocks.
For those of you familiar with C, C++, Java, and similar languages, you are probably used to code blocks being marked off with curly braces: { }
In Python braces are not used. Code blocks are indented, and the Python interpreter decides what's in a block depending on the indentation. Good practice (for readability) is to use 4 spaces per indentation. The IPython notebook will automatically handle the indentation for you.
Comparison Operators
There are several predefined operators used to make boolean comparisons in Python. They are similar to operators used in C, C++, and Java:
==
... test for equality
!=
... test for not equal
>
... greater than
>=
... greater than or equal to
<
... less than
<=
... less than or equal to
Combining Boolean Values
Following the usual rules of boolean algebra, boolean values can be negated or combined in several ways:
Logical AND
You can combine two boolean variables using the operator &&
or the keyword and
:
Logical OR
You can also combine two boolean variables using the operator ||
or the keyword or
:
Logical NOT
It's possible to negate a boolean expression using the keyword not
:
A more complex truth table demonstrating the duality
:
Loops
Loops are useful for executing blocks of code as long as a logical condition is satisfied.
Once the loop condition is no longer satisfied, the flow of control is returned to the main body of the program. Note that infinite loops, a serious runtime bug where the loop condition never evaluates to False
, are allowed, so you have to be careful.
While Loop
The while
loop evaluates until a condition is false. Note that loops can be nested inside each other, and can also contain nested conditional statements.
For Loop
The for
loop provides the same basic functionality as the while
loop, but allows for a simpler syntax in certain cases.
For example, if we wanted to access all the elements inside a list one by one, we could write a while loop with a variable index i
and access the list elements as listname[i]
, incrementing i
until it's the same size as the length of the list.
However, the for
loop lets us avoid the need to declare an index variable. For example:
If we are interested in building lists we can start from a blank list and append things to it in a for loop or use a list comprehension which combines for loops and list creation into line. The syntax is a set of square brackets that contains formula and a for loop.
You can also loop through two lists simultaneously using the zip
function:
To illustrate the zip function, this is what it does:
Functions
Functions are subroutines that accept some input and produce zero or more outputs. They are typically used to define common tasks in a program.
Rule of thumb: if you find that you are copying a piece of code over and over inside your script, it should probably go into a function.
Example: Rounding
The following function will round integers to the nearest 10:
In-Class Exercise
With the small amount we've gone through, you can already write reasonably sophisticated programs. For example, we can write a loop that generates the Fibonacci sequence.
Just to remind you, the Fibonacci sequence is the list of numbers
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...
It is defined by the linear homogeneous recurrence relation
, where .
The exercise is:
Write a Python function that generate given .
Use your function to generate the first 100 numbers in the Fibonacci sequence.
This function will work just fine for small n. Unfortunately, the recursive calls to fib
cause the function call stack to grow rapidly with n. When n gets sufficiently large, you may hit the Python call stack limit. At that point your program will crash.
Here is a more efficient approach that does not require recursion:
Accessing Functions Beyond the Built-In Functions
If we want to use libraries and modules not defined within the built-in functionality of python we have to import them. There are a number of ways to do this.
This imports the module numpy
and the module scipy
, and creates a reference to that modules in the current namespace. After you’ve run this statement, you can use numpy.name
and scipy.name
to refer to constants, functions, and classes defined in module numpy and scipy.
This imports the module numpy, and creates references in the current namespace to all public objects defined by that module (that is, everything that doesn’t have a name starting with “_”).
Or in other words, after you’ve run this statement, you can simply use a plain name to refer to things defined in module numpy. Here, numpy itself is not defined, so numpy.name doesn’t work. If name was already defined, it is replaced by the new version. Also, if name in numpy is changed to point to some other object, your module won’t notice.
This imports the module scipy
, and creates references in the current namespace functions in the submodule special
. We then make 3 function calls to the Error Function erf
.
This imports numpy
but assigns the name of the module to np
so that you can type np
rather than numpy
when you want to access variables and functions defined inside the module.
NumPy Tips and Tricks
NumPy is optimized for numerical work. The array
type inside of the module behaves a lot like a list, but it is vectorized so that you can apply arithmetic operations and other functions to the array without having to loop through it.
For example, when we wanted to square every element inside a python list we used a list comprehension:
This isn't that hard, but the syntax is a little ugly and we do have to explicitly loop through the list. In contrast, to square all the elements in the NumPy array you just apply the operator to the array variable itself:
Evenly Spaced Numbers
NumPy provides two functions to give evenly spaced numbers on linear or logarithmic scales.
Slicing Arrays with Boolean Masks
An extremely useful feature in NumPy is the ability to create a "mask" array which can select values satisfying a logical condition:
This is the type of selection used all the time in data analysis.
File Input/Output
Standard Python has functions to read basic text and binary files from disk.
However, for numerical analysis your files will usually be nicely formatted into numerical columns separated by spaces, commas, etc. For reading such files, NumPy has a nice function called genfromtxt
:
Plotting with Matplotlib
Matplotlib is used to plot data and can be used to produce the usual xy scatter plots, contour plots, histograms, etc. that you're used to making for all basic data analyses.
I strongly recommend that you go to the Matplotlib website and check out the huge plot gallery. This is the easiest way to learn how to make a particular kind of plot.
Note: when you want to plot something in an IPython notebook, put the magic line
%matplotlib inline
before you import the matplotlib
module. This will ensure that your plots appear inside the notebook. Otherwise the plots will pop open in another window, which can be annoying.
Here is an example of how to change the default formatting of the text in your plot. Also note how LaTeX is supported!
Using NumPy and Matplotlib Together
Here we create some fake data with NumPy and plot it, including a legend.
Help Manual and Inspection
When running interactive sessions, you can use the built-in help function to view module and function documentation.
For example, here is how to view the internal documentation for the built-in function that calculates the greatest common divisor of two numbers:
The inspect
module is nice if you actually want to look at the source code of a function. Just import inspect and call the getsource
function for the code you want to see: