SharedWeek 9 / hw9-turnin.ipynbOpen in CoCalc
Author: Dean Neutel
Views : 5

Homework 9

Name: Dean Neutel

I collaborated with: Vennis

In [1]:
# Dataset from Homework 4: Plant species richness in ephemeral and permanent wetlands in Wisconsin, 2013 plant_eph = [34,51,20,47,22,19,24,18,33,45,36,42,30,25,44,33,47,35,51,15,24,13,25,32,49,28,37,17,13,32,24,18,22] plant_perm = [31,40,43,37,23,45,53,46,44,25,47,34,41,31,37,34,42,37,14,29,18,15,14,28]

#1 In attached file

#2 Similarities: Used 1000 simulated pvalues to estimate power, calculated power finding the number of pvalues less than the alpha value, changed the sample sizes to see the effect of sample size on power Differences: They used a t-test instead of relative risk, they only looked at the effect of sample size on power instead of varying sample size and effect size, they varied sample sizes of both groups while we only varied the Bendectin group, they looked at the distribution of pvalues while we computed the proportions of significant pvalues using two different alpha values.

#3 Figure 6 illustrates that for any effect size and sample size pairing that the power is greater when the alpha value is less than or equal to 0.05 rather than 0.01. Another thing that it shows is that for any alpha value and sample size pairing, the power is greater when the effect size is greater. The last thing we can see from the figure is that for any alpha value and effect size pairing, power is greater when the sample size is greater.

In [3]:
#4a import numpy as np import seaborn as sns plant_population = np.concatenate([plant_eph, plant_perm]) total= 1000 pvalues = np.zeros(total) for i in range(total): random_eph = np.random.choice(plant_eph, len(plant_eph)) random_perm = np.random.choice(plant_perm, len(plant_perm)) phantom_difference= np.median(random_eph)-np.median(random_perm) differences = np.zeros(total) for a in range(total): new_random_eph = np.random.choice(plant_population, len(plant_eph)) new_random_perm = np.random.choice(plant_population, len(plant_eph)) differences[a] = np.median(new_random_eph)- np.median(new_random_perm) if phantom_difference > 0: positive = phantom_difference negative= -phantom_difference else: positive = -phantom_difference negative= phantom_difference pvalues[i] = (np.sum(differences>=positive)) + (np.sum(differences<=negative))/total power = np.sum(pvalues<=0.01)/total display("Power", power); sns.distplot(pvalues, kde=False);
'Power'
0.039

#4b The power for this study is 3.9%. This means that there is a 3.8% chance of observing a significant difference, if there was actually a significant difference between the medians of the actual observed data. The power observed is not high enough to see an null difference.

#4c To increase the power of this experiment, we can either increase the sample size of the study or use a paired analysis of the data to decrease variation. The second option would probably not increase the power as much.

In [ ]:
#5 def MADAM (data): return(np.median(np.abs(data-np.median(data))))
In [ ]:
display("Emphemeral MADAM", MADAM(plant_eph),"Permanent MADAM", MADAM(plant_perm))

From the MADAMS we can see that the macroinvertebrate data has a smaller within group variability than the plant datasets since we know that the MADAMs of the macroinvertebrate data is 2 and 3 while the MADAMs of the plant data is 7.5 and 8. Since the macroinvertebrate group has a smaller within group variability we are more likely to detect a statistically significant difference.

In [ ]:
#6 difference_observed = np.median(plant_eph) - np.median(plant_perm) factor=0 power=0 total=1000 while power<0.80: factor = factor + 1 pvalues=np.zeros(total) for i in range (total): random_eph=np.random.choice(plant_eph, len(plant_eph)*factor) random_perm=np.random.choice(plant_perm, len(plant_perm)*factor) phantom_difference=np.median(random_eph) - np.median(random_perm) plant_population = np.concatenate([plant_eph, plant_perm]) differences=np.zeros(total) for a in range(total): new_random_eph = np.random.choice(plant_population, len(plant_eph)*factor) new_random_perm = np.random.choice(plant_population, len(plant_eph)*factor) differences[a] = np.median(new_random_eph)- np.median(new_random_perm) if phantom_difference > 0: positive = phantom_difference negative= -phantom_difference else: positive = -phantom_difference negative= phantom_difference pvalues[i] = (np.sum(differences>=positive)) + (np.sum(differences<=negative))/total power = np.sum(pvalues<=.01)/total display("Power",power,"Factor",factor,"Approximate Ephemeral Sample Size", factor*len(plant_eph), "Approximate Permanent Sample Size", factor*len(plant_perm))
In [ ]:
factor=9 total=1000 pvalues=np.zeros(total) for i in range (total): random_eph=np.random.choice(plant_eph, len(plant_eph)*factor) random_perm=np.random.choice(plant_perm, len(plant_perm)*factor) phantom_difference=np.median(random_eph) - np.median(random_perm) plant_population = np.concatenate([plant_eph, plant_perm]) differences=np.zeros(total) for a in range(total): new_random_eph = np.random.choice(plant_population, len(plant_eph)*factor) new_random_perm = np.random.choice(plant_population, len(plant_eph)*factor) differences[a] = np.median(new_random_eph)- np.median(new_random_perm) if phantom_difference > 0: positive = phantom_difference negative= -phantom_difference else: positive = -phantom_difference negative= phantom_difference pvalues[i] = (np.sum(differences>=positive)) + (np.sum(differences<=negative))/total power = np.sum(pvalues<=.01)/total display("Power",power,"Factor",factor,"Approximate Sample Size", factor*len(plant_eph)+factor*len(plant_perm))
In [ ]: