CoCalc -- DistMeans.ipynb

Distribution of means

Views: ²⁷⁹

Kernel: Python 3 (Anaconda 5)

In [2]:

import random as rand
import pylab as plt
import statistics as stat
import scipy.stats as stats
import numpy as np

Make whatever distribution you like.

In [3]:

#dist = [rand.randint(0,9) for i in range(1000)]
dist=[11, 15, 9, 15, 12, 26, -6, 27, 32, 20, -10, 12, 34, 20, 24, 21, 18, -19, 11, 33, 16, 19, 20, 18, 29, 7, 22, 30, 17, 17, 30, 19, 10, 33, 31, 13, 29, 24, 16, 3, 23, 30, 16, 15, 18, 18, 24, 32, 15, 24, 28, 19, 27, 21, 26, 17, 14, 29, 28, 18, 24, 8, 16, 17, 39, 19, 24, 31, 21, 21, 22, 26, 23, 17, 14, 12, 17, 16, 9, 24, 21, 19, 28, 27, 25, 20, 20, 22, 38, 17, 22, 20, 21, 28, 22, 22, 14, 15, 22, 22]
MEAN = np.mean(dist)
a =plt.hist(dist, density=True)

Now sample 1000 time with sample size N and compare with normal distribution whose mean and standard deviation based on a single sample. The sampling distribution and the normal pdf are shown. To actually determine distance between distributions requires some thought. (see here)

In [9]:

N = 100
dist1 = [sum(rand.choices(dist, k=N))/N for i in range(100000)]
sd1 = stat.stdev(dist1)
mean1 = np.mean(dist1)
samp = rand.choices(dist, k=N) # This is our single sample
mean = sum(samp)/N # Use this mean for our normal
sd = stat.stdev(samp)/N**.5 # Use this SD for our normal
fig, ax = plt.subplots(1, 1)
# Plot the sampling distribution of the means
a = ax.hist(dist1, bins=50, density=True, alpha=.5, color="blue")
# Generate the gaussian determined by the single sample
x = np.linspace(min(a[1]), max(a[1]),100)
y = stats.norm.pdf(x,mean, sd)
# Add the mean of the actual distribution
ax.plot([MEAN,MEAN],[0,max(a[0]/4)],color='red')
# Add the 95% CI interval computed by the sample.
ax.plot([mean-1.96*sd,mean-1.96*sd],[0,max(a[0]/4)],color='green')
ax.plot([mean+1.96*sd,mean+1.96*sd],[0,max(a[0]/4)],color='green')
ax.plot()
ax.plot(x,y)
plt.show()

In [78]:

sd, sd1

(0.2945635360455818, 0.2952581748060327)

In [79]:

mean, mean1

(4.7, 4.62626)

In [0]: