CoCalc -- Collaborative Calculation in the Cloud
SharedWeek 9 / hw9-turnin.ipynbOpen in CoCalc

Homework 9

Name: Dean Neutel

I collaborated with: Vennis

# Dataset from Homework 4: Plant species richness in ephemeral and permanent wetlands in Wisconsin, 2013

plant_eph = [34,51,20,47,22,19,24,18,33,45,36,42,30,25,44,33,47,35,51,15,24,13,25,32,49,28,37,17,13,32,24,18,22]
plant_perm = [31,40,43,37,23,45,53,46,44,25,47,34,41,31,37,34,42,37,14,29,18,15,14,28]

#1 In attached file

#2 Similarities: Used 1000 simulated pvalues to estimate power, calculated power finding the number of pvalues less than the alpha value, changed the sample sizes to see the effect of sample size on power Differences: They used a t-test instead of relative risk, they only looked at the effect of sample size on power instead of varying sample size and effect size, they varied sample sizes of both groups while we only varied the Bendectin group, they looked at the distribution of pvalues while we computed the proportions of significant pvalues using two different alpha values.

#3 Figure 6 illustrates that for any effect size and sample size pairing that the power is greater when the alpha value is less than or equal to 0.05 rather than 0.01. Another thing that it shows is that for any alpha value and sample size pairing, the power is greater when the effect size is greater. The last thing we can see from the figure is that for any alpha value and effect size pairing, power is greater when the sample size is greater.

#4a
import numpy as np
import seaborn as sns
plant_population = np.concatenate([plant_eph, plant_perm])
total= 1000
pvalues = np.zeros(total)
for i in range(total):
    random_eph = np.random.choice(plant_eph, len(plant_eph))
    random_perm = np.random.choice(plant_perm, len(plant_perm))
    phantom_difference= np.median(random_eph)-np.median(random_perm)
    differences = np.zeros(total)
    for a in range(total):
        new_random_eph = np.random.choice(plant_population, len(plant_eph))
        new_random_perm = np.random.choice(plant_population, len(plant_eph))
        differences[a] = np.median(new_random_eph)- np.median(new_random_perm)
    if phantom_difference > 0:
        positive = phantom_difference
        negative= -phantom_difference
    else:
        positive = -phantom_difference
        negative= phantom_difference
    pvalues[i] = (np.sum(differences>=positive)) + (np.sum(differences<=negative))/total
power = np.sum(pvalues<=0.01)/total
display("Power", power);
sns.distplot(pvalues, kde=False);
'Power'
0.039

#4b The power for this study is 3.9%. This means that there is a 3.8% chance of observing a significant difference, if there was actually a significant difference between the medians of the actual observed data. The power observed is not high enough to see an null difference.

#4c To increase the power of this experiment, we can either increase the sample size of the study or use a paired analysis of the data to decrease variation. The second option would probably not increase the power as much.

#5
def MADAM (data):
    return(np.median(np.abs(data-np.median(data))))
display("Emphemeral MADAM", MADAM(plant_eph),"Permanent MADAM", MADAM(plant_perm))

From the MADAMS we can see that the macroinvertebrate data has a smaller within group variability than the plant datasets since we know that the MADAMs of the macroinvertebrate data is 2 and 3 while the MADAMs of the plant data is 7.5 and 8. Since the macroinvertebrate group has a smaller within group variability we are more likely to detect a statistically significant difference.

#6
difference_observed = np.median(plant_eph) - np.median(plant_perm)
factor=0
power=0
total=1000
while power<0.80:
    factor = factor + 1
    pvalues=np.zeros(total)
    for i in range (total):
        random_eph=np.random.choice(plant_eph, len(plant_eph)*factor)
        random_perm=np.random.choice(plant_perm, len(plant_perm)*factor)
        phantom_difference=np.median(random_eph) - np.median(random_perm)
        plant_population = np.concatenate([plant_eph, plant_perm])
        differences=np.zeros(total)
        for a in range(total):
            new_random_eph = np.random.choice(plant_population, len(plant_eph)*factor)
            new_random_perm = np.random.choice(plant_population, len(plant_eph)*factor)
            differences[a] = np.median(new_random_eph)- np.median(new_random_perm)
        if phantom_difference > 0:
            positive = phantom_difference
            negative= -phantom_difference
        else:
            positive = -phantom_difference
            negative= phantom_difference
        pvalues[i] = (np.sum(differences>=positive)) + (np.sum(differences<=negative))/total
    power = np.sum(pvalues<=.01)/total
display("Power",power,"Factor",factor,"Approximate Ephemeral Sample Size", factor*len(plant_eph), "Approximate Permanent Sample Size", factor*len(plant_perm))
factor=9
total=1000
pvalues=np.zeros(total)
for i in range (total):
    random_eph=np.random.choice(plant_eph, len(plant_eph)*factor)
    random_perm=np.random.choice(plant_perm, len(plant_perm)*factor)
    phantom_difference=np.median(random_eph) - np.median(random_perm)
    plant_population = np.concatenate([plant_eph, plant_perm])
    differences=np.zeros(total)
    for a in range(total):
        new_random_eph = np.random.choice(plant_population, len(plant_eph)*factor)
        new_random_perm = np.random.choice(plant_population, len(plant_eph)*factor)
        differences[a] = np.median(new_random_eph)- np.median(new_random_perm)
    if phantom_difference > 0:
        positive = phantom_difference
        negative= -phantom_difference
    else:
        positive = -phantom_difference
        negative= phantom_difference
    pvalues[i] = (np.sum(differences>=positive)) + (np.sum(differences<=negative))/total
power = np.sum(pvalues<=.01)/total
display("Power",power,"Factor",factor,"Approximate Sample Size", factor*len(plant_eph)+factor*len(plant_perm))