# Homework 3

Name: Nayha Alvarez

I collaborated with: Ana Dillen

In [28]:
#1 Why might researchers want to use a paired study design rather than just independent samples?

#Researchers might want to use a paired study design rather than just independent samples because they might have repeated or paired measures in the data. This would be the case if there were pre-test and post-test samples measured before and after an intervention or altering a factor, cross-over trials in which individuals are randomized to one treatment and then the same individuals are crossed-over to the alternative treatment, matched samples in which each individual sample pairs are matched based on a factor, or duplicate measurements on the same biological samples. In an independent sample, the null hypothesis is that the two groups come from the same population and were drawn at random from it whereas the null hypothesis for the paired design is that there is no difference between the before and after. In paired sample analysis, keeping the structure of the data pairs intact would allow researchers to filter out the variability across pairs. Therefore, they can compare the data against eachother.

In [29]:
#2
from IPython.display import Image # load the required Python library
Image(filename='LS40 Homework-3.jpg') # display your image named "1.png"

In [30]:
#3 We always use Two-Box sampling without re-centering for computing confidence intervals of effect sizes, such as the difference between medians. Why can’t we use Big Box sampling for confidence intervals as we sometimes do for NHST?

#We cannot use Big Box sampling for confidence intervals as we sometimes do for NHST because big box sampling assumes that the null hypothesis is correct. For confidence intervals, we are not trying to assume anything, we are just trying to describe the data and determine if the true value falls within the range.

In [31]:
#4 Which is more conservative: the Benjamini-Hochberg correction or the Bonferroni correction? Describe the advantages and disadvantages of using each.

#The Bonferonni Correction is more conservative than the Benjamini-Hochberg Correction. Bonferonni produces less false positives but, consequently, that means that it will also ignore more true positives. It increases the threshold of proof needed to claim a significant result, making it more difficult to reject the null hypothesis. The Benjemini-Hochberg correction is more lenient, allowing for more false positives to remain than the Bonferonni correction. The p-value threshold changes for every test, so it has more power in detecting true effects.

In [32]:
#5 You conduct an ANOVA of the number of mosquito bites volunteers receive with three different types of insect repellents. There are 30 volunteers in each of the three groups (total of 90). You calculate a p-value that is greater than the critical ⍺ you pre-selected for this study. How do you interpret this? What further steps should you take to further investigate this issue?

#Since the p-value is greater than the critical ⍺ I pre-selected, I must fail to reject the null hypothesis because the data is not statistically significant enough. I must then do further experimentation by repeating more and eventually altering the null hypothesis. I would also consider plotting the data distributions to see if the formulaic ANOVA is even appropriate for the data.

In [33]:
#6a. Visualize data by group
import pandas as pd
import numpy as np
import seaborn as sns
coldremedy=[3.5, 2.3, 4.7, 1.5, 3.7]
coldplacebo=[5.3, 3.6, 4.3, 5.7, 6.7]
p=sns.swarmplot(data=coldremedy)
p.set(xlabel="Cold Durations When Taking the Remedy",ylabel="Count")

[Text(0.5, 0, 'Cold Durations When Taking the Remedy'), Text(0, 0.5, 'Count')]
In [34]:
p=sns.swarmplot(data=coldplacebo)
p.set(xlabel="Cold Durations When Taking the Placebo",ylabel="Count")

[Text(0.5, 0, 'Cold Durations When Taking the Placebo'), Text(0, 0.5, 'Count')]