Project: Ying Jia - MATH0094-2020/2021

Path: Notebooks / Week 4 / 5. Risk measures.ipynb

Views: ²⁸⁶⁸
Image: ubuntu2004

Kernel: Python 3 (system-wide)

Risk measures

Camilo A. Garcia Trillos - 2020

In this notebook

we learn some probability distributions available in scipy.stats.
we use the associated functions to calculate V@R and ES
we introduce a Monte Carlo estimator for both V@R and ES

Some distributions do not have easy formulas for their value at risk or expeted shortfall. We introduce here some functions and procedures in Python to approximate their value.

We start by importing the modules we use in this notebook. The package scipy.stats contains some statistical distributions and tests. We will use some of the functions on it.

In [1]:

import numpy as np
from numpy.random import default_rng
import matplotlib.pyplot as plt
import scipy.stats as st

PDF, CDF and Quantile functions

It is possible to calculate directly the PDF, CDF and quantile function for several distributions in Python, thanks to the scipy.stats library. This is achieved with the following general structure:

PDF: st.[name distribution].pdf(probability, parameters)
CDF: st.[name distribution].cdf(probability, parameters)
Quantile function: st.[name distribution].ppf(probability, parameters)

Let us illustrate with some examples.

Standard Gaussian

In [3]:

# Plotting the pdf

x = np.linspace(-3,3,50) #Take 50 equally spaced points between -3 and 3
y = st.norm.pdf(x) # calculate the pdf for each one 

plt.plot(x, y) # plot
plt.title('PDF standard Gaussian')
plt.xlabel('x')
plt.ylabel('pdf')

Text(0, 0.5, 'pdf')

We recover the familiar bell-shaped pdf.

In [4]:

# Plotting the cdf

x = np.linspace(-3,3,50) #Take 50 equally spaced points between -3 and 3
y = st.norm.cdf(x) # calculate the pdf for each one 

plt.plot(x, y) # plot
plt.title('CDF standard Gaussian')
plt.xlabel('x')
plt.ylabel('cdf')

Text(0, 0.5, 'cdf')

The plot illustrates the properties of the cdf: non-decreasing, with limits and $-\infty$ and $\infty$ equal to 0 and 1 respectively.

In [5]:

# Plotting the quantile function

x = np.linspace(0,1,100) #Take 100 equally spaced points between 0 and 1
y = st.norm.ppf(x) # calculate the pdf for each one 

plt.plot(x, y) # plot
plt.title('Quantile function -  standard Gaussian')
plt.xlabel('probability')
plt.ylabel('x')

Text(0, 0.5, 'x')

Note how this is the reflection of the CDF plot around the x=y line.

Some numbers that appear commonly are associated to the quantile at levels 0.95 and 0.99 for the Gaussian distribution:

In [5]:

# Standard Gaussian, quantile 0.99

print('Quantile at level 0.95: ', st.norm.ppf(0.95))
print('Quantile at level 0.99: ', st.norm.ppf(0.99))

Quantile at level 0.95:  1.6448536269514722
Quantile at level 0.99:  2.3263478740408408

Other examples of continuous distributions

In [7]:

#Gaussian with mean 1 and variance 4, quantile 0.99
print(st.norm.ppf(0.99, 1, 2))

#Due to the specific structure of Gaussian variables, this coincides with a rescaling of the standard one:
print(st.norm.ppf(0.99) * 2 + 1)

5.6526957480816815
5.6526957480816815

In [8]:

# The pdf of a standard - t with 4 degrees of freedom.


x = np.linspace(-5,5,50) #Take 50 equally spaced points between -3 and 3
y = st.t.pdf(x,4) # calculate the pdf for each one 

plt.plot(x, y) # plot
plt.title('PDF - t with 4 d. of f.')
plt.xlabel('x')
plt.ylabel('pdf')

Text(0, 0.5, 'pdf')

In [9]:

# A t-distribution with 4 degrees of freedom and standard form, quantile 0.99

st.t.ppf(0.99,4)

3.7469473879811366

In [10]:

# A chi-square distribution with 4 degrees of freedom

st.chi.ppf(0.99,4)

3.6437211935036444

There are many more distributions than the ones above. Have a look at the help to know more.

Examples of discrete distributions

In the case of discrete distributions, there is no pdf function. Instead, we have a probability function ( $p(x):=\P(X=x)$ ) which is calculated in the scipy.stats package using the following structure:

st.[name distribution].pmf(probability, parameters)

The CDF and quantile are calculated as for the continuous case.

Let us see one example with the Poisson distribution. Recall that if $X$ is distributed Poisson with parameter $\mu$ , $P[X=n] = e^{-\mu} \frac{\mu^n}{n!}$

In [7]:

# Plotting probability

x = np.arange(15)
y = st.poisson.pmf(x,5)

plt.plot(x, y,'bo') # plot
plt.title('Probability Poisson mu=5')
plt.vlines(x, 0, y, colors='b', lw=1, alpha=0.2)
plt.xlabel('x')
plt.ylabel('probability')

Text(0, 0.5, 'probability')

In [14]:

# Plotting cdf

x = np.arange(15)
y = st.poisson.cdf(x,5)

plt.step(x, y, where='post') # the post label assumes a right continuous function
plt.plot(x, y, 'bo') # plot
plt.title('CDF Poisson mu=5')
#plt.vlines(x, 0, y, colors='b', lw=1, alpha=0.2)
plt.xlabel('x')
plt.ylabel('probability');

In [17]:

# Plotting quantile

x = st.poisson.cdf(np.arange(15),5) #Take the values we plotted before
y = st.poisson.ppf(x,5)

plt.step(x, y, where='pre') # the pre label assumes a left continuous function
plt.plot(x, y, 'bo') # plot
plt.title('Quantile function Poisson mu=5')
#plt.vlines(x, 0, y, colors='b', lw=1, alpha=0.2)
plt.xlabel('probability')
plt.ylabel('quantile');

Calculating V@R, ES using quantiles

Let us implement two functions that receive a (static) instance of a probability distribution from scipy.stats, and return V@R and ES.

Value at risk

Taking $X$ to be a random variable denoting, for example, profit and losses, we can easily calculate Value at Risk whenever $X$ is distributed following one of the distributions in scipy.stats, by using the quantile function.

We can show that (see lecture notes that) $\mathrm{V@R}^\alpha(X) = -q_{X}(1-\alpha+ \epsilon \mathbb 1_{ F_X(q_X(1-\alpha)) = 1-\alpha }) ,$ for some $0<\epsilon< P[X=q_X(1-\alpha)]$ . Note that in the case of a continuous distribution we get $\mathrm{V@R}^\alpha(X) = -q_{X}(1-\alpha).$

In [9]:

def varisk(dist, alpha):
    
    if alpha<=0 or alpha>=1:
        raise ValueError('Alpha is outside of valid interval')
    
    x_aux = dist.ppf(1-alpha)
    
    try:  #check if calling the function pmf raises an error
        dist.pmf(0)
    except:  # if an error, the distribution is continuous because there is no prob. mass function
        return -x_aux   
    #Otherwise, the distribution is discrete
    if dist.cdf(x_aux)==1-alpha:
        return -1.*dist.ppf( 1-alpha+dist.pmf(x_aux)*alpha )
    return -x_aux

Let us see some examples:

In [10]:


# Value at risk at level 0.99 for a Gaussian with mean 1 and variance 4
print(varisk(st.norm(1,2), 0.99))

#The same from an exact formula (see lecture notes):
print( 2*st.norm.ppf(0.99,0,1)-1  )

6526957480816815
6526957480816815
6526957480816815

We got the value we were expecting. Let us now check the case of a bernoulli random variable with $P[X=1]=0.75$ .

In [20]:

print('V@R level 0.99',varisk(st.bernoulli(0.75), 0.99))
print('V@R level 0.75',varisk(st.bernoulli(0.75), 0.75))
print('V@R level 0.60',varisk(st.bernoulli(0.75), 0.60))

V@R level 0.99 -0.0
V@R level 0.75 -1.0
V@R level 0.60 -1.0

Expected shortfall

Recall (see the lecture notes) that

\begin{split} \mathrm{ES}^\alpha & = \frac{1}{1-\alpha} \int_{\alpha}^1 \mathrm{V@R}^u(X)\mathrm d u \\ & = \frac 1 {1-\alpha} \left\{ - \mathbb{E}[X \mathbb{1}_{X <-\mathrm{V@R}^\alpha(X)}] + (\tilde \alpha-\alpha) \mathrm{V@R}^\alpha(X) \right\} \end{split}

where $\tilde \alpha:= F_{-X}(\mathrm{V@R}^\alpha(X)) = 1-F_{X}(-\mathrm{V@R}^\alpha(X)) + P[X = -\mathrm{V@R}^\alpha(X)]$ .

We are going to use the method 'expect' associated to a given distribution in scipy.stats. Look at the help of this function to understand more, but here are two examples of use:

In [11]:

# Example for expect

print('Expected value of a standard Gaussian:', st.norm(0,1).expect(lambda x: x)) 
print('Second moment of a standard Gaussian: ', st.norm(0,1).expect(lambda x: x**2)) 
print('Expected value of X in the interval [0,\infty):',st.norm(0,1).expect(lambda x: x, lb =0))

Expected value of a standard Gaussian: 0.0
Second moment of a standard Gaussian:  1.000000000000001
Expected value of X in the interval [0,\infty): 0.39894228040143215

We can use this available fucntion to define our own function to calculate expected shortfall.

In [12]:

def es(dist,alpha):
        
    var_alpha = varisk(dist,alpha)
    x_aux = (-1*dist.expect(func = lambda x: x, ub = -var_alpha))/(1-alpha)
    
    
    try:  #check if calling the function pmf raises an error
        dist.pmf(0)
    except:  # if an error, the distribution is continuous because there is no prob. mass function
        return x_aux
    
    p_varalpha = dist.pmf(-1*var_alpha)
    
    alpha_tilde = 1-dist.cdf(-1*var_alpha) + p_varalpha
    return x_aux + (alpha_tilde-alpha-p_varalpha)*var_alpha/(1-alpha)

In [13]:

# Expected shortfall at level 0.99 for a Gaussian with mean 1 and variance 4
alpha = 0.99
print(es(st.norm(1,2), alpha))

#The same from an exact formula (see lecture notes):
print( -1 + 2*( st.norm(0,1).pdf( st.norm(0,1).ppf(alpha))/(1-alpha)  ) )

4.330428440691598
4.330428440691612

As can be seen, there is a small difference due to the approximation error in the function "expect", but the approximation is very good.

Let us compare both value at risk and expected shortfall for different values

In [24]:

alpha_vec = np.arange(0.01,1,0.03)
test_dist = st.norm(0,1)
plt.plot(alpha_vec, [varisk(test_dist,i) for i in alpha_vec], label='V@R')
plt.plot(alpha_vec, [es(test_dist,i) for i in alpha_vec], label='ES')
plt.legend()
plt.title('V@R and ES for standard Gaussian')
plt.xlabel('Probability')
plt.ylabel('Value');

We can verify visually several properties:

Expected shortfall is always larger or equal than value at risk
Expected shortfall tends to $-E[X]$ when $\alpha\downarrow 0$ .

Let us now check the case of a Bernoulli random variable with $P[X=1]=0.75$ .

In [25]:

print('ES level 0.99',es(st.bernoulli(0.75), 0.99))
print('ES level 0.60',es(st.bernoulli(0.75), 0.6))

ES level 0.99 0.0
ES level 0.60 -0.3750000000000002

In [26]:

alpha_vec = np.arange(0.01,1,0.03)
test_dist = st.bernoulli(0.75)
plt.plot(alpha_vec, [varisk(test_dist,i) for i in alpha_vec], label='V@R')
plt.plot(alpha_vec, [es(test_dist,i) for i in alpha_vec], label='ES')
plt.legend()
plt.title('V@R and ES for Bernoulli(0.75)')
plt.xlabel('Probability')
plt.ylabel('Value');

This is an example that shows that expected shortfall is more regular with respect to the level. Here it is continuous in alpha while value at risk is not.

Monte Carlo approximations

A Monte Carlo approximation might be useful either to replace the calculation of integrals for expected shortfall as above, or to calculate both value at risk and expected shortfall in cases where we do not have an explicit expression for the pdf.

The idea is simply replace the calculation of expected shortfall or value at risk on the distribution bu that of the empirical measure coming from a large sample of the distribution.

Suppose that $X_1, \ldots, X_n$ are all i.i.d. samples of the same distribution in $\mathbb R$ . Let us also denote by $X_{(1)}, \ldots, X_{(n)}$ this sample after ordering (i.e. $X_{(i)} \leq X_{(j)}$ if and only if $i\leq j$ ). Then we have that $\begin{split} \mathrm{V@R}^\alpha(X) & \approx -X_{( i_\alpha )} \\ \mathrm{ES}^\alpha(X) &\approx -\frac{1}{i_\alpha }\sum_{k=1}^{i_\alpha} X_{( k )} \end{split}$

where $i_\alpha:= \lfloor n(1-\alpha)\rfloor$ .

Remark: Naturally, this is not the only estimator, There are other choices of estimator that will be presented later in the lecture notes.

Let us define a function that calculate the approximation of value at risk and expected shortfall for any sample

In [14]:

def var_es_sample(sample,alpha):    
    ss=np.sort(sample)
    ialpha = int(sample.size * (1-alpha))
    return -ss[ialpha], -ss[:ialpha].mean()

We can test by comparing in cases as the ones we had before. Remember the use of the random number generator.

We start with the standard normal case.

In [15]:

NMC = 1000000 # Number of samples
rng = default_rng(0)
sample_normal = rng.normal(1,2,NMC)
var_es_sample(sample_normal, 0.99)

(3.6554153431984657, 4.344458055149295)

Comparing with the values before (3.6526957480816815, 4.330428440691612), this is not a bad approximation. Note, though that we required a large sample (here, 1000000). A larger sample might be required for higher levels in VAR and ES.

Also, remember that this estimator is random (this is why we had to fix the seed of the generator). You can check this by removing the seed.

Let us see also the case of the Bernoulli(0.75).

In [29]:

rng = default_rng(123)
sample_bernoulli=(rng.random(size = NMC)<=0.75) +0
var_es_sample(sample_bernoulli, 0.99)

(0, -0.0)

In [30]:

var_es_sample(sample_bernoulli, 0.6)

(-1, -0.3761125)

Again, we get close enough values to the ones defined before.

Exercises

Plot the values of value at risk for value at risk for a t-distribution as a function of the number of degrees of freedom. Repeat the exercise with expected shortfall. What do you observe?

In [8]:

n_vec = np.arange(1,11,1)
alpha = 0.95
dist = st.t(n_vec)
VaR = varisk(dist,alpha)
plt.plot(n_vec,VaR,'g')
plt.xlabel("degree of freedom of t distribution")
plt.ylabel("VaR at level 0.95")
plt.show()

ES = np.arange(10)
for n in n_vec:
    ES[n-1] = es(st.t(n),alpha)
plt.plot(n_vec,ES,'b')
plt.xlabel("degree of freedom of t distribution")
plt.ylabel("ES at level 0.95")
plt.show()

/usr/local/lib/python3.8/dist-packages/scipy/stats/_distn_infrastructure.py:2561: IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
  If increasing the limit yields no improvement it is advised to analyze 
  the integrand in order to determine the difficulties.  If the position of a 
  local difficulty can be determined (singularity, discontinuity) one will 
  probably gain from splitting up the interval and calling the integrator 
  on the subranges.  Perhaps a special-purpose integrator should be used.
  vals = integrate.quad(fun, lb, ub, **kwds)[0] / invfac

Assume $X$ follows a standard Gaussian. Write a function $\phi(\alpha)$ such that $\mathrm{V@R}^{\phi(\alpha)}(X) = \mathrm{ES}^\alpha(X)$ . Test the function. What is the value associated to 0.975?

In [16]:

def phi(alpha):
    dist = st.norm(0,1)
    x_aux = es(dist,alpha)
    vec = np.arange(0.0001,1,0.0001)
    epsilon = 0.001
    for y in vec:
        if np.abs(varisk(dist,y)-x_aux)<epsilon:
            return y
    return None

print(phi(0.975))

0.9903000000000001

Assume that you have invested £100, equally, in two investments with P&L given respectively by a) A Gaussian random variable with mean 60 and sd 100; and b) An exponential random variable with mean 75. Assume further that both are independent.

Calculate the value at risk and expected shortfall of your total P&L.

In [12]:

import numpy as np
from numpy.random import default_rng

def var_es_sample(sample,alpha):
    ss = np.sort(sample)
    ialpha = int(sample.size * (1-alpha))
    return -ss[ialpha], -ss[:ialpha].mean()

NMC = 1000000 # Number of samples
rng = default_rng(0)
sample = rng.normal(60,100,size=NMC) + rng.exponential(75,size=NMC)
print("The VaR at level 0.95 of total P&L is",var_es_sample(sample,0.95)[0])
print("The ES at level 0.95 of total P&L is",var_es_sample(sample,0.95)[1])

The VaR at level 0.95 of total P&L is 56.90157098600528
The ES at level 0.95 of total P&L is 101.64082287263861

Table of Contents