Often, we have to compare data from three or more groups to see if one or more of them is different from the others. To do this, scientists use a statistic called the F- statistic, defined as F=VarwithinVarbetween. The variability between groups is the sum of the squared differences between the group means and the grand mean. The variability within groups is the sum of the group variances. In LS 40, we will use a variation of the F-statistic that does not require squaring, which is called the F-like statistic.
In this lab, we will examine the results of a pharmaceutical company's study comparing the effectiveness of different pain relief medications on migraine headaches. For the experiment, 27 volunteers were selected and 9 were randomly assigned to one of three
drug formulations. The subjects were instructed to take the drug during their next migraine headache episode and to report their pain on a scale of 1 = no pain to 10 =extreme pain 30 minutes after taking the drug.
Using the pandas read_csv function, import the file migraines.csv and show the data.
Find the grand median (the median of the whole sample). HINT: Use np.median.
Find the numerator of the F-like statistic (variation among groups). HINT: When working with data frames and NumPy arrays, you can do computations like addition and multiplication directly, without for loops (unlike in regular Python). Also, Numpy has abs and sum functions.
We now want to find a p-value for our data by simulating the null hypothesis. This, of course, means computing the F-like statistic each time, which takes a lot of code and would make a mess in the bootstrap loop. Instead, we will package our code into a function and call this function whereever necessary.
Write a function that will compute the F-like statistic for this dataset or one of the same size.
We now want to simulate the null hypothesis that there is no difference between the groups. To do this, we have to make all the data into one dataset, sample pseudo-groups from it, and compute the F-like statistic for the resampled data.
Use the code alldata = np.concatenate([migraine["Drug A"], migraine["Drug B"], migraine["Drug C"]]) (this assumes your data frame is called "migraine") to put all the data into one 1-D array.
Make the three samples into a data frame. To do this, use the NumPy function column_stack to put the 1-D arrays side by side and then use the pandas function DataFrame to convert the result into a data frame.
Compute the F-like statistic for your resampled data.
Do the above steps 10,000 times to simulate the null hypothesis, storing the results.