Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

All published worksheets from http://sagenb.org

Views: 168740
Image: ubuntu2004

Simulating Random Variables Using a Uniform Distribution

As discussed briefly in the book (and in more detail in notes), we can use the built-in random function, available in most programming languages, to simulate other random variables as well.  We will need to know the density (or some piece of information that gives us the density) of the random variable to be simulated.

First, let's take a look at the random function.

random()
0.37452349136492957
[random() for i in range(8)]
[0.76729971279762299, 0.84261613607162034, 0.34083321846329728, 0.47237033755796765, 0.22387522561551454, 0.18616466221686401, 0.86465531657256556, 0.027024983991109308]

The random function returns a pseudorandom sample from a Uniform(0,1) distribution.  Let's use it to simulate some other random variables.  Recall that the idea is thatY=FY1(U)Y=F_Y^{-1}(U)where YY is the new random variable, UU is the Uniform(0,1) random variable, and FYF_Y is the distribution function for YY.

Example 1:

Suppose YY is the Uniform(-2,2) random variable.  ThenFY(y)={0y<2y+242y21y>2F_Y(y)=\left\{\begin{array}{cc}0&y<-2\\ \frac{y+2}{4}&-2\leq y\leq 2\\ 1&y>2\end{array}\right.We can invert this function algebraically:FY1(u)={2u<04u20u12u>1F_Y^{-1}(u)=\left\{\begin{array}{cc}-2&u<0\\ 4u-2&0\leq u\leq 1\\ 2&u>1\end{array}\right.Thus our code will be pretty simple.

def r(): return 4*random()-2

Let's take a look at a sample of our random variable.

[r() for i in range(20)]
[-0.2348097236576332, 0.92668312999571434, -0.54716904118411902, 0.5021973051009021, -0.05856081673581981, -1.2683765526382111, 0.2780102469217729, -0.10584052339384176, -1.2265790149283315, 1.5571019849233942, 0.3671489553757854, 1.8478766924522292, 0.97559443201467744, -1.6312252404353784, -0.46962864632193346, 0.72800414743978781, 0.92995856437223212, 0.4894900756890519, 0.70774578093319018, 0.40200920324234746]

This seems to be working pretty well!  Let's move on to our more complicated examples.

Example 2:

Suppose that YY is the Normal(0,1) random variable.  The density function isfY(y)=12πey2/2f_Y(y)=\frac{1}{\sqrt{2\pi}}\, e^{-y^2/2}and can be implemented as:

def f(y): return float(1/sqrt(2.*pi)*exp(-y^2/2))

We then tabulate the density and distribution functions:

h=0.01 y=[-6.0+h*i for i in range(12/h+1)] fY=[f(Y) for Y in y] FY=[0] for z in fY[:-1]: FY.append(FY[-1]+z*h)

The first few values of FY are...

FY[:10]
[0, 6.07588284982329e-11, 1.25271547406350e-10, 1.93763235536208e-10, 2.66472033655209e-10, 3.43649875780914e-10, 4.25563259815706e-10, 5.12494059544508e-10, 6.04740380117335e-10, 7.02617459241747e-10]

Now we construct the code to draw a sample value from the normal distribution.

def n(): u=random() k=0 while k+1<len(FY) and FY[k+1]<u: k+=1 return y[k]

We can draw a few values and look at them.

[n() for i in range(20)]
[-0.110000000000000, 0.910000000000000, 2.16000000000000, 1.36000000000000, 0.360000000000000, 1.01000000000000, 0.960000000000000, 1.78000000000000, 1.49000000000000, 0.0300000000000002, 1.12000000000000, 0.220000000000000, -1.11000000000000, 1.15000000000000, 0.0899999999999999, -0.530000000000000, -0.190000000000000, 0.470000000000000, -2.03000000000000, 0.140000000000001]

It is hard to say from this small sample whether the data is approximately normal.  So, let's draw a much larger sample - N=5000 - and construct a frequency plot for (rounded) values.

data=[floor(10*n()) for i in range(5000)] x=range(min(data),max(data)+1) freq=[data.count(t) for t in x] bar_chart(freq)

This seems to be producing a sample with a distribution tolerably close to a normal distribution.  If we needed to use this code in applications, we would want to (a) design it to be much faster, and (b) test it more rigorously.

Example 3:

Now let's consider a discrete random variable.  We can define the density function through a simple list of probabilities.

outcomes=['a','b','c','d','e'] p=[1/10,2/10,1/10,5/10,1/10]

(Note that this is truly a density function, since the probabilities sum to 1.)

sum(p)
1

Construct the distribution:

F=[0] for y in p: F.append(F[-1]+y)

This distribution function is then

F
[0, 1/10, 3/10, 2/5, 9/10, 1]

As for the normal random variable, we now define our discrete random variable with a simple loop.

def d(): u=random() k=0 while k<len(F) and F[k]<u: k+=1 return outcomes[k-1]

We can look at a small sample...

[d() for i in range(20)]
['d', 'b', 'd', 'b', 'd', 'd', 'e', 'c', 'b', 'b', 'b', 'e', 'd', 'd', 'c', 'd', 'd', 'b', 'c', 'b']

...and we can look at a large sample to see whether we have (approximately) the correct distribution.

N=10000 data=[d() for i in range(N)] freq=[data.count(t)/float(N) for t in outcomes] bar_chart(freq)

Compare the sampled results,

freq
[0.1018, 0.20130000000000001, 0.097799999999999998, 0.50090000000000001, 0.098199999999999996]

to the given probability density:

p
[1/10, 1/5, 1/10, 1/2, 1/10]

We are not far off at all!

It is worth noting that SAGE (and the Python language) incorporate some nice tools for manipulating discrete random variables.  In this case, the choice function works particularly well.

new_outcomes=['a']+2*['b']+['c']+5*['d']+['e'] new_outcomes
['a', 'b', 'b', 'c', 'd', 'd', 'd', 'd', 'd', 'e']
def new_d(): return choice(new_outcomes)
N=10000 new_data=[new_d() for i in range(N)] new_freq=[new_data.count(t)/float(N) for t in outcomes] bar_chart(new_freq)
new_freq
[0.1023, 0.20380000000000001, 0.097699999999999995, 0.49640000000000001, 0.0998]

Thus we can avoid computing our distribution and coding the loop that steps through and looks at values.

Section 6.4: The Method of Transformations

Suppose that YY is a given random variable with known density function fYf_Y, and define U=h(Y)U=h(Y), where hh is a function.  Then we would like to find the density fUf_U of UU.  On page 316 of our text, the authors outline the method of transformations.  While this method can be useful, it suffers from one flaw: we must be able to algebraically solve u=h(y)u=h(y) for yy in order to find a formula for the inverse function h1h^{-1}.

For example, suppose that YY is the uniform random variable Uniform(1,2), and suppose that hh is the function defined byh(y)=yln(y)   y>0h(y)=y\cdot\ln(y)\ \ \ y>0Then, plotting this function on the support of YY indicates that it satisfies the monotonicity requirements of the method of transformations:

y=var('y') plot1=plot(y*ln(y),(y,1,2)) show(plot1)

In order to apply the method of transformations to find fUf_U, we need to solveu=yln(y)u=y\cdot\ln(y)for yy.  This is, algebraically speaking, an ugly problem.

Instead of approaching this symbolically, we can try to approach it numerically.  The method we demonstrate here is fairly elementary, and based on evenly spaced samples of the function hh.

So, let's begin by tabulating hh on the support of YY.

dy=0.01 support=[1+i*dy for i in range(100)] u=[float(t*ln(t)) for t in support] inverse_rate=[abs((support[i+1]-support[i])/(u[i+1]-u[i])) for i in range(len(support)-1)]
h_inverse=line([(u[i],support[i]) for i in range(len(u))]) h_inverse_rate=line([(u[i],inverse_rate[i]) for i in range(len(inverse_rate))],color='red') show(h_inverse+h_inverse_rate)

We can now use these tables to construct the density function for UU.

def f_U(x): nn=[abs(x-t) for t in u] m=min(nn) k=nn.index(m) return inverse_rate[k]
test=line([(u[i],f_U(u[i])) for i in range(len(u)-1)])
test.show()

This is a fairly elementary example (in particular, the fact that the density of YY is constant on its support makes things quite a bit simpler).  However, it indicates how more complex examples can be approached numerically, using the straightforward approach of function tabulation together with a difference quotient estimate of the derivative.