Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
| Download

📚 The CoCalc Library - books, templates and other resources

Views: 96144
License: OTHER
Kernel: Python 3

Skew test

Allen Downey

MIT License

%matplotlib inline import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set(style='white') from thinkstats2 import Pmf, Cdf import thinkstats2 import thinkplot decorate = thinkplot.config

Suppose you buy a loaf of bread every day for a year, take it home, and weigh it. You suspect that the distribution of weights is more skewed than a normal distribution with the same mean and standard deviation.

To test your suspicion, write a definition for a class named SkewTest that extends thinkstats.HypothesisTest and provides two methods:

  • TestStatistic should compute the skew of a given sample.

  • RunModel should simulate the null hypothesis and return simulated data.

class HypothesisTest(object): """Represents a hypothesis test.""" def __init__(self, data): """Initializes. data: data in whatever form is relevant """ self.data = data self.MakeModel() self.actual = self.TestStatistic(data) self.test_stats = None def PValue(self, iters=1000): """Computes the distribution of the test statistic and p-value. iters: number of iterations returns: float p-value """ self.test_stats = np.array([self.TestStatistic(self.RunModel()) for _ in range(iters)]) count = sum(self.test_stats >= self.actual) return count / iters def MaxTestStat(self): """Returns the largest test statistic seen during simulations. """ return np.max(self.test_stats) def PlotHist(self, label=None): """Draws a Cdf with vertical lines at the observed test stat. """ plt.hist(self.test_stats, color='C4', alpha=0.5) plt.axvline(self.actual, linewidth=3, color='0.8') plt.xlabel('Test statistic') plt.ylabel('Count') plt.title('Distribution of the test statistic under the null hypothesis') def TestStatistic(self, data): """Computes the test statistic. data: data in whatever form is relevant """ raise UnimplementedMethodException() def MakeModel(self): """Build a model of the null hypothesis. """ pass def RunModel(self): """Run the model of the null hypothesis. returns: simulated data """ raise UnimplementedMethodException()
# Solution goes here

To test this class, I'll generate a sample from an actual Gaussian distribution, so the null hypothesis is true.

mu = 1000 sigma = 35 data = np.random.normal(mu, sigma, size=365)

Now we can make a SkewTest and compute the observed skewness.

test = SkewTest(data) test.actual

Here's the p-value.

test = SkewTest(data) test.PValue()

And the distribution of the test statistic under the null hypothesis.

test.PlotHist()

Most of the time the p-value exceeds 5%, so we would conclude that the observed skewness could plausibly be due to random sample.

But let's see how often we get a false positive.

iters = 100 count = 0 for i in range(iters): data = np.random.normal(mu, sigma, size=365) test = SkewTest(data) p_value = test.PValue() if p_value < 0.05: count +=1 print(count/iters)

In the long run, the false positive rate is the threshold we used, 5%.