CoCalc -- 1509-Simulating Random Variables Using a Uniform Distribution.sagews

All published worksheets from http://sagenb.org

Project: sagenb.org published worksheets

Path: pub / 1501-1601 / 1509-Simulating Random Variables Using a Uniform Distribution.sagews

Views: ¹⁶⁸⁷⁴⁰
Image: ubuntu2004

Simulating Random Variables Using a Uniform Distribution

As discussed briefly in the book (and in more detail in notes), we can use the built-in random function, available in most programming languages, to simulate other random variables as well. We will need to know the density (or some piece of information that gives us the density) of the random variable to be simulated.

First, let's take a look at the random function.

random()

0.37452349136492957

[random() for i in range(8)]

[0.76729971279762299, 0.84261613607162034, 0.34083321846329728, 0.47237033755796765, 0.22387522561551454, 0.18616466221686401, 0.86465531657256556, 0.027024983991109308]

The random function returns a pseudorandom sample from a Uniform(0,1) distribution. Let's use it to simulate some other random variables. Recall that the idea is that $Y=F_Y^{-1}(U)$ where $Y$ is the new random variable, $U$ is the Uniform(0,1) random variable, and $F_Y$ is the distribution function for $Y$ .

Example 1:

Suppose $Y$ is the Uniform(-2,2) random variable. Then $F_Y(y)=\left\{\begin{array}{cc}0&y<-2\\ \frac{y+2}{4}&-2\leq y\leq 2\\ 1&y>2\end{array}\right.$ We can invert this function algebraically: $F_Y^{-1}(u)=\left\{\begin{array}{cc}-2&u<0\\ 4u-2&0\leq u\leq 1\\ 2&u>1\end{array}\right.$ Thus our code will be pretty simple.

def r():
    return 4*random()-2

Let's take a look at a sample of our random variable.

[r() for i in range(20)]

[-0.2348097236576332, 0.92668312999571434, -0.54716904118411902, 0.5021973051009021, -0.05856081673581981, -1.2683765526382111, 0.2780102469217729, -0.10584052339384176, -1.2265790149283315, 1.5571019849233942, 0.3671489553757854, 1.8478766924522292, 0.97559443201467744, -1.6312252404353784, -0.46962864632193346, 0.72800414743978781, 0.92995856437223212, 0.4894900756890519, 0.70774578093319018, 0.40200920324234746]

This seems to be working pretty well! Let's move on to our more complicated examples.

Example 2:

Suppose that $Y$ is the Normal(0,1) random variable. The density function is $f_Y(y)=\frac{1}{\sqrt{2\pi}}\, e^{-y^2/2}$ and can be implemented as:

def f(y):
    return float(1/sqrt(2.*pi)*exp(-y^2/2))

We then tabulate the density and distribution functions:

h=0.01
y=[-6.0+h*i for i in range(12/h+1)]
fY=[f(Y) for Y in y]
FY=[0]
for z in fY[:-1]:
    FY.append(FY[-1]+z*h)

The first few values of FY are...

FY[:10]

[0, 6.07588284982329e-11, 1.25271547406350e-10, 1.93763235536208e-10, 2.66472033655209e-10, 3.43649875780914e-10, 4.25563259815706e-10, 5.12494059544508e-10, 6.04740380117335e-10, 7.02617459241747e-10]

Now we construct the code to draw a sample value from the normal distribution.

def n():
    u=random()
    k=0
    while k+1<len(FY) and FY[k+1]<u: k+=1
    return y[k]

We can draw a few values and look at them.

[n() for i in range(20)]

[-0.110000000000000, 0.910000000000000, 2.16000000000000, 1.36000000000000, 0.360000000000000, 1.01000000000000, 0.960000000000000, 1.78000000000000, 1.49000000000000, 0.0300000000000002, 1.12000000000000, 0.220000000000000, -1.11000000000000, 1.15000000000000, 0.0899999999999999, -0.530000000000000, -0.190000000000000, 0.470000000000000, -2.03000000000000, 0.140000000000001]

It is hard to say from this small sample whether the data is approximately normal. So, let's draw a much larger sample - N=5000 - and construct a frequency plot for (rounded) values.

data=[floor(10*n()) for i in range(5000)]
x=range(min(data),max(data)+1)
freq=[data.count(t) for t in x]
bar_chart(freq)

This seems to be producing a sample with a distribution tolerably close to a normal distribution. If we needed to use this code in applications, we would want to (a) design it to be much faster, and (b) test it more rigorously.

Example 3:

Now let's consider a discrete random variable. We can define the density function through a simple list of probabilities.

outcomes=['a','b','c','d','e']
p=[1/10,2/10,1/10,5/10,1/10]

(Note that this is truly a density function, since the probabilities sum to 1.)

sum(p)

1

Construct the distribution:

F=[0]
for y in p:
    F.append(F[-1]+y)

This distribution function is then

[0, 1/10, 3/10, 2/5, 9/10, 1]

As for the normal random variable, we now define our discrete random variable with a simple loop.

def d():
    u=random()
    k=0
    while k<len(F) and F[k]<u: k+=1
    return outcomes[k-1]

We can look at a small sample...

[d() for i in range(20)]

['d', 'b', 'd', 'b', 'd', 'd', 'e', 'c', 'b', 'b', 'b', 'e', 'd', 'd', 'c', 'd', 'd', 'b', 'c', 'b']

...and we can look at a large sample to see whether we have (approximately) the correct distribution.

N=10000
data=[d() for i in range(N)]
freq=[data.count(t)/float(N) for t in outcomes]
bar_chart(freq)

Compare the sampled results,

freq

[0.1018, 0.20130000000000001, 0.097799999999999998, 0.50090000000000001, 0.098199999999999996]

to the given probability density:

[1/10, 1/5, 1/10, 1/2, 1/10]

We are not far off at all!

It is worth noting that SAGE (and the Python language) incorporate some nice tools for manipulating discrete random variables. In this case, the choice function works particularly well.

new_outcomes=['a']+2*['b']+['c']+5*['d']+['e']
new_outcomes

['a', 'b', 'b', 'c', 'd', 'd', 'd', 'd', 'd', 'e']

def new_d():
    return choice(new_outcomes)

N=10000
new_data=[new_d() for i in range(N)]
new_freq=[new_data.count(t)/float(N) for t in outcomes]
bar_chart(new_freq)

new_freq

[0.1023, 0.20380000000000001, 0.097699999999999995, 0.49640000000000001, 0.0998]

Thus we can avoid computing our distribution and coding the loop that steps through and looks at values.

Section 6.4: The Method of Transformations

Suppose that $Y$ is a given random variable with known density function $f_Y$ , and define $U=h(Y)$ , where $h$ is a function. Then we would like to find the density $f_U$ of $U$ . On page 316 of our text, the authors outline the method of transformations. While this method can be useful, it suffers from one flaw: we must be able to algebraically solve $u=h(y)$ for $y$ in order to find a formula for the inverse function $h^{-1}$ .

For example, suppose that $Y$ is the uniform random variable Uniform(1,2), and suppose that $h$ is the function defined by $h(y)=y\cdot\ln(y)\ \ \ y>0$ Then, plotting this function on the support of $Y$ indicates that it satisfies the monotonicity requirements of the method of transformations:

y=var('y')
plot1=plot(y*ln(y),(y,1,2))
show(plot1)

In order to apply the method of transformations to find $f_U$ , we need to solve $u=y\cdot\ln(y)$ for $y$ . This is, algebraically speaking, an ugly problem.

Instead of approaching this symbolically, we can try to approach it numerically. The method we demonstrate here is fairly elementary, and based on evenly spaced samples of the function $h$ .

So, let's begin by tabulating $h$ on the support of $Y$ .

dy=0.01
support=[1+i*dy for i in range(100)]
u=[float(t*ln(t)) for t in support]
inverse_rate=[abs((support[i+1]-support[i])/(u[i+1]-u[i])) for i in range(len(support)-1)]

h_inverse=line([(u[i],support[i]) for i in range(len(u))])
h_inverse_rate=line([(u[i],inverse_rate[i]) for i in range(len(inverse_rate))],color='red')
show(h_inverse+h_inverse_rate)

We can now use these tables to construct the density function for $U$ .

def f_U(x):
    nn=[abs(x-t) for t in u]
    m=min(nn)
    k=nn.index(m)
    return inverse_rate[k]

test=line([(u[i],f_U(u[i])) for i in range(len(u)-1)])

test.show()

This is a fairly elementary example (in particular, the fact that the density of $Y$ is constant on its support makes things quite a bit simpler). However, it indicates how more complex examples can be approached numerically, using the straightforward approach of function tabulation together with a difference quotient estimate of the derivative.