SharedDistMeans.ipynbOpen in CoCalc
Author: Richard Ketchersid
Views : 3
Description: Distribution of means
In [2]:
import random as rand import pylab as plt import statistics as stat import scipy.stats as stats import numpy as np

Make whatever distribution you like.

In [3]:
#dist = [rand.randint(0,9) for i in range(1000)] dist=[11, 15, 9, 15, 12, 26, -6, 27, 32, 20, -10, 12, 34, 20, 24, 21, 18, -19, 11, 33, 16, 19, 20, 18, 29, 7, 22, 30, 17, 17, 30, 19, 10, 33, 31, 13, 29, 24, 16, 3, 23, 30, 16, 15, 18, 18, 24, 32, 15, 24, 28, 19, 27, 21, 26, 17, 14, 29, 28, 18, 24, 8, 16, 17, 39, 19, 24, 31, 21, 21, 22, 26, 23, 17, 14, 12, 17, 16, 9, 24, 21, 19, 28, 27, 25, 20, 20, 22, 38, 17, 22, 20, 21, 28, 22, 22, 14, 15, 22, 22] MEAN = np.mean(dist) a =plt.hist(dist, density=True)

Now sample 1000 time with sample size N and compare with normal distribution whose mean and standard deviation based on a single sample. The sampling distribution and the normal pdf are shown. To actually determine distance between distributions requires some thought. (see here)

In [9]:
N = 100 dist1 = [sum(rand.choices(dist, k=N))/N for i in range(100000)] sd1 = stat.stdev(dist1) mean1 = np.mean(dist1) samp = rand.choices(dist, k=N) # This is our single sample mean = sum(samp)/N # Use this mean for our normal sd = stat.stdev(samp)/N**.5 # Use this SD for our normal fig, ax = plt.subplots(1, 1) # Plot the sampling distribution of the means a = ax.hist(dist1, bins=50, density=True, alpha=.5, color="blue") # Generate the gaussian determined by the single sample x = np.linspace(min(a[1]), max(a[1]),100) y = stats.norm.pdf(x,mean, sd) # Add the mean of the actual distribution ax.plot([MEAN,MEAN],[0,max(a[0]/4)],color='red') # Add the 95% CI interval computed by the sample. ax.plot([mean-1.96*sd,mean-1.96*sd],[0,max(a[0]/4)],color='green') ax.plot([mean+1.96*sd,mean+1.96*sd],[0,max(a[0]/4)],color='green') ax.plot() ax.plot(x,y) plt.show()
In [78]:
sd, sd1
(0.2945635360455818, 0.2952581748060327)
In [79]:
mean, mean1
(4.7, 4.62626)
In [ ]: