| Download

Project: Ying Jia - MATH0094-2020/2021

Path: Notebooks / Week 2 / 2. More Python basics.ipynb

Views: ²⁸⁶⁷
Image: ubuntu2004

Kernel: Python 3 (system-wide)

More Python basics

Camilo A. Garcia Trillos - 2020

In this notebook

we look at how to define and test functions, and how to think in terms of error management.
we discuss Python packages and start looking at some notable Python packages or libraries.

Function definition

Python code is better structured by defining functions.

The basic syntaxis is def name_function: --- return(---)

Let us look at a first example: we will create a function that establishes if two integers are relative primes (i.e., that their maximum common divisor is 1). We will create several iterations of the function.

Let us start with a first simple implementation:

In [1]:

def rel_primes(x,y):
    # We can include a description of the function using a string immediately below the function
    ''' Receives two numbers and returns True if both are relative 
        primes or False otherwise
    
    '''
    
    z = min(x,y)                       # Assign to z the minimum from a and b
    for i in range(2,z) :               # run over all numbers between 2 and z (!)
        if (x%i)==0 and (y%i==0):      # if a number divides both a, b (i.e. the residual in both cases is zero) 
                                       # they are not relative primes ...
            return False               # ... in this case, return is False. This gets the flow out of the function
    return True                        # finally, if the program gets to this point, a,b are relative primes                      

# Note that the following line of code is outside the scope of the function
rel_primes(18,123) # This should be false, as 3 divides both

False

Check with other cases that the function works as it should.

In Python, when the code within a function is executed, a new 'environment' is created. Every object/function that is defined within a function only lives within the function. So for example, tha variables x,y,z above are not accessible in the following line

In [0]:

print(x,y,z) # this generates an error

However, functions can access variables and functions defined outside themselves. This is useful (as will be seen further below), but is sometimes a source of confusion (particualrly regarding variables).

In the above definition of 'rel_prime, apart from the funcional part, we included a help description. At any point this information can be retrieved with ? after the name of a function.

In [0]:

rel_primes?

What happens when we test the function with values that are not integers?

In [0]:

rel_primes('a',1) # this gives an error

In [2]:

rel_primes(1.2,1.4)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-149e970ce731> in <module>
----> 1 rel_primes(1.2,1.4)

<ipython-input-1-143c359c2d08> in rel_primes(x, y)
      7 
      8     z = min(x,y)                       # Assign to z the minimum from a and b
----> 9     for i in range(2,z) :               # run over all numbers between 2 and z (!)
     10         if (x%i)==0 and (y%i==0):      # if a number divides both a, b (i.e. the residual in both cases is zero)
     11                                        # they are not relative primes ...
TypeError: 'float' object cannot be interpreted as an integer

In [3]:

rel_primes(3,4.2)

True

Error management

Note that sometimes we get an error and sometimes we do not. Moreover, the error is not very informative of what happened. We can create our own error messages. In what follows we create a function that 'wraps' the previous one, while providing some error management.

In [4]:

def rel_primes2(a,b):
    
    # First we check if the inputs have the right type. Recall the type function we look at on the first notebook.
    
    if type(a)!=int:
        raise TypeError("Both numbers must be integers")

    if type(b)!=int:
        raise TypeError("Both numbers must be integers")
    
    # If no error is raised up to here, we call the original function.
        
    return rel_primes(a,b)  #Note that we can call functions we have defined before

print('This one works: 18 and 123 are both integers (and not relative primes)')
rel_primes2(18,123)

This one works: 18 and 123 are both integers (and not relative primes)

False

In [5]:

rel_primes2('a',123) # Here we raise an error as we defined it

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-58e533b5cb3e> in <module>
----> 1 rel_primes2('a',123) # Here we raise an error as we defined it

<ipython-input-4-21a95bd1e322> in rel_primes2(a, b)
      4 
      5     if type(a)!=int:
----> 6         raise TypeError("Both numbers must be integers")
      7 
      8     if type(b)!=int:
TypeError: Both numbers must be integers

You can check that errors are raised in other cases (for example if you provide complex numbers or floats).

The other very common type of error to be raised is ValueError (i.e. raise ValueError(...)). This means that a given input has a value outside the accepted domain.

It is possible also to check if a call to a function produces an error using the keword 'try', so taht any error can be managed by the programmer. This can be useful when one wants to allow

In [1]:

a=1.2
try:
    p=rel_primes2(a,1)
except:
    print('There would be an error because ',a,' is not integer.')
    print('But with try the error is caught here. We can then assign the value we want to p, for example -1')
    p=-1
    
p

There would be an error because  1.2  is not integer.
But with try the error is caught here. We can then assign the value we want to p, for example -1

-1

Testing functions

An important part of coding is testing. It entails designing a sequence of checks to evaluate the behaviour of a function.

The statement assert might be very useful for this purpose. It raises an error if a result is False.

In [7]:

#Some basic testing

assert rel_primes2(15,28),  'Failed with two large relative primes'
assert rel_primes2(2,3),  'Failed with two small relative primes'
assert not rel_primes2(15,25),  'Failed with two small numbers that are not relative primes'
assert not rel_primes2(2,4),  'Failed with two small numbers that are not relative primes'

Note that we performed 4 tests. The last one fails and illustrates that the function defined above is not working properly. When the test fails, the associated fail message is displayed.

We proceed to fix the error (which is located on the rel_prime function). Run the code below and then run the tests again.

In [6]:

def rel_primes(a,b):
    # This is a corrected version
    ''' Receives two numbers and returns True if both are relative 
        primes or False otherwise
    
    '''
    
    for i in range(2,min(a,b)+1) :       # The error was here
        if (a%i)==0 and (b%i==0):                                             
            return False               
    return True

What happened? We have fixed the error on the rel_primes function (we were not including the las element in the cycle). Since the function rel_primes2 calls rel_prime, this one gets fixed as well. This helps it pass all the tests.

Note we have learnt something very important in addition: in jupyter, code depends on the order of execution, not the order in which the code is written.

Remark: In more professional settings, the preferred form of testing is via unittesting. If you want to learn more about it, read the Python documentation on unittest

Lambda functions

We had seen that to define a function in Python, we use the command def followed by the name of the function, arguments and colon. There is an alternative in the form of lambda functions, that is useful to define inline functions.

The syntax is *name_function = lambda (vars): operations *

Here is an example where we implement the same function twice.

In [8]:

def square(x):
    return x*x

square2 = lambda y: y*y

In [9]:

assert square(100)==square2(100)

Lambda functions are very convenient for short tasks. In particular it is an easy way to encapsulate one line instructions in a function. Note, though that it is very hard

Some final observations:

Functions are not forced to return a value (sometimes these are called procedures)
More testing tools are provided on the [package](## 2. Packages) unitest

Packages

Python comes with many functions already defined. Some of them come as part of the standard language (we have encountered some of them). However, the real power of Python comes from sets of functions put together in packages.

Here are some of the ones we will use in this course (and are very useful in finance):

math : some mathematical functions
numpy : vector and matrix capabilities and operations
scipy : numerical scientific computing including integration, fixed points, solving ODEs, optimisation, …)
matplotlib : plots
pandas : database access and manipulation, and more plots routines
statutils: Some statistical tools including test of hypothesis

Some packages we will not use but sare very useful in finance include:

keras: Keras is a high-level neural networks library
sklearn: A library with tools for data mining and data analysis

Here and in the nest notebook, we look at numpy and math. We will learn about the following packages while making applications ion finance.

Packages must be imported into the kernel we are excuting. By convention all imports should be done at the start of the program, and in the Jupyter case at the start of the notebook.

In [30]:

# By convention this should be placed at the top of the file. But it can be used anywhere 
import math           # import the math package
import numpy as np    # import the numpy package and create an alias for it 'np'
from math import sin, exp  # import only the functions sin and exp that are located on the math package

After running the above code, we can use all functions on the math package. Here are some examples:

In [12]:

x = sin(2*math.pi) # Note we can use the function sin, but the constant pi has to be called from the math package  as was not explicitly imported
y = math.log(exp(-5))
print(x,'\n',y)

-2.4492935982947064e-16 
 -5.0

Some examples with numpy

Let us now look at numpy. Numpy is a scientific library taht has been optimised to perform vector and matrix operations.

We start by looking at how to create numpy objects. We can either transform another structure (for example a list) using the function array, or wwe can use one of the functions producing directly an array. Here are some examples:

In [13]:

a = np.array([3,4,5])  # an array with the numbers 3,4,5  
b = np.arange(3,6)     # an arary with the numbers 3,4,5
c = np.linspace(3,5,3) # an array with the numbers 3.,4.,5.

print('a:',a,' b:',b, 'c:',c)
print(type(a))
a

a: [3 4 5]  b: [3 4 5] c: [3. 4. 5.]
<class 'numpy.ndarray'>

array([3, 4, 5])

Note that the above objects are of class array, which was defined in the package numpy. Observe also how the result is printed if no print function is invoked.

Let us look at some simple operations before arrays:

In [14]:

#Most operations are done piecewisely:
print('1. a==b:', a==b)
print('2. a+b:', a+b)
print('3. a*b:', a*b)
print('4. a/b:', a/b)
print('5. a^b:', a**b)
print('6. a==c:', a==c)

a==b: [ True  True  True]
a+b: [ 6  8 10]
a*b: [ 9 16 25]
a/b: [1. 1. 1.]
a^b: [  27  256 3125]
a==c: [ True  True  True]

Note the difference with respect to lists: the operator + means vector addition. Note also that most operations like '*' and '/' are defined pointwise.

Since operations are mostly pointwise, the sizes of vectors need to coincide or an error will be raised.

In [15]:

a2 = np.arange(10)
print(a2,b)
a2*b # This raises an error

[0 1 2 3 4 5 6 7 8 9] [3 4 5]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-7b6584d97316> in <module>
      1 a2 = np.arange(10)
      2 print(a2,b)
----> 3 a2*b # This raises an error

ValueError: operands could not be broadcast together with shapes (10,) (3,) 

We also have some vector and matrix operations. We can, for example, find the dot product of two vectors of the same size with the operator @

In [16]:

# Dot product of a and b
a_dot_b = a@b
# An alternative way of calculating it
a_dot_b2 = (a*b).sum()     # Sum is a method available for arrays
print(a_dot_b, a_dot_b2)

50 50

We will frequently make use of the possibility to generate (pseudo) random numbers following some given distributions. Numpy allows for this through its sub-module random. We can generate (pseudo-)random arrays and matrices of a given size. Here are some examples using the current prefered way of calling the generator function.

In [17]:

#Random numbers

#Initialise generator
rng = np.random.default_rng()

#Uniform random numbers
print('Uniform  2x2')
c = rng.random((2,2)) # This creates a matrix of 2 x 2 of independent U[0,1] random numbers
print(c)
print (c.shape)
print(c.size)

print("====")

print('Gaussian 5x3')
c = rng.standard_normal((5,3)) # This creates a matrix of 5 x 3 of independent standard Gaussian random numbers
print(c)
print (c.shape)
print(c.size)

Uniform  2x2
[[0.87315032 0.66478065]
 [0.20504409 0.30908408]]
(2, 2)
4
====
Gaussian 5x3
[[-1.24004184  1.51037019  0.58394874]
 [ 0.00217341 -0.94616658 -0.29493058]
 [-0.82462691 -0.57902036  0.28046327]
 [ 0.22246655 -0.61206991 -1.19467205]
 [-1.52000388  0.00257337 -0.77046294]]
(5, 3)
15

This produces arrays in dimension 2 or matrices. We can also see the second use of operator '@': matrix (and matrix-vector) multiplication:

In [19]:

c@(a.T) # This is the result of multiplying the matrix c (Gaussian matrix) and the vector a

array([ 5.24109897, -5.25279897, -3.38764582, -7.75424022, -8.40203282])

The random generator can produce samples from different distributions: look at the help for rng.normal, rng.lognormal, rng.exponential....

In [22]:

rng.normal(5,25,[2,5])

array([[32.95070156,  4.18834104, 37.53819265, 42.65346581,  8.19730704],
       [38.69587919, 16.06854485, 30.7728049 ,  2.88131377, -6.0297293 ]])

In [0]:

rng.exponential?

Remark: Once imported, we can make use of numpy functions within our function definitions. We can also import a module within a function definition, however in that case, the imported modules are available only within that function.

Exercises

Create a function that receives a positive integer $n$ and a probability $p \in (0,1)$ , and returns the mean and standard deviation of a binomial distribution with these parameters. Your function must raise errors whenever the probability is outside the given range and if that the number n is not and integer greater than 0. Use the statement assert to test your function on some known cases.

In [36]:

from math import sqrt
def binominal(n,p):
    if p<0 or p>1:
        raise ValueError("p is a probability and must be within (0,1)")    
    if type(n)!= int:
        raise TypeError("n must be integer")
    if n<=0:
        raise ValueError("n must be positive")
        
    return n*p,sqrt(n*p*(1-p))

In [37]:

# assert binomial(2,0.5)==(1,sqrt(0.5))
binominal(2,0.5)

(1.0, 0.7071067811865476)

Using np.rand, create a function that receives a positive integer $n$ and a probability $p \in (0,1)$ , and returns an array with $n$ bernoulli trials with parameter $p$ . Your function must raise errors whenever the probability is outside the given range and if that the number n is not and integer greater than 0.

In [50]:

def bernoulli_trials (n,p):
    if p<0 or p>1:
        raise ValueError("p is a probability and must be within (0,1)")    
    if type(n)!= int:
        raise TypeError("n must be integer")
    if n<=0:
        raise ValueError("n must be positive")
        
    mygen = np.random.default_rng()    
    return 1.*(mygen.random(n)<=p)
    
bernoulli_trials(100,0.5)

array([1., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 1.,
       0., 1., 1., 1., 0., 1., 0., 1., 1., 0., 1., 0., 0., 0., 1., 1., 1.,
       0., 0., 1., 0., 1., 1., 0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 1.,
       0., 0., 0., 1., 1., 1., 0., 1., 0., 1., 1., 1., 0., 0., 0., 1., 1.,
       1., 0., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 0., 1., 1., 1.,
       1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 1., 0., 1., 1.])

Using the previous function, estimate the empirical mean and variance for n=[10,100,1000,10000]. Comment.

In [68]:

# This is a crude Monte Carlo. As the number of samples grows, we should approximate the population mean and variance.

#Set p
p=0.25


print('Population values')
print(' \t\t Mean \t\t Var')
print(' \t\t', p, '\t\t', p*(1-p))

print('Enmpirical values')
print('n \t\t Mean \t\t Var')

for i in range(5):
    n = 10**i
    aux = bernoulli_trials(n,p)
    emp_mean = aux.sum()/n
    emp_var = emp_mean*(1 - emp_mean)
    print(n,'\t\t',emp_mean,'\t\t',emp_var)

Population values
 		 Mean 		 Var
 		 0.25 		 0.1875
Enmpirical values
n 		 Mean 		 Var
1 		 0.0 		 0.0
10 		 0.7 		 0.21000000000000002
100 		 0.37 		 0.2331
1000 		 0.265 		 0.194775
10000 		 0.2531 		 0.18904039

In [0]:

Table of Contents