#1#a) There is an A effect present amongst longhand students, however there is no A effect present at all for laptop users as they have a lower A2 value in comparison to their A1. The difference between longhand and laptop users derives in the difference of question being asked, whether it be a factual or a conceptual question. There is little to no difference between longhand and laptop users who are asked factual questions, but when asked a conceptual question the differences are shown. #b) There is an effect of the question being asked as well, as mentioned earlier that longhand students have a higher z-score than laptop users amongst the conceptual column. There is a B effect present between the longhand and laptop users as the differences between the lines are large. #c) There is no interaction effect as the lines do not intersect, and remain parallel to each other. #d) Students who use longhand to write their notes have a greater understanding when it comes to conceptual questions in comparison to those who use laptops to write their notes. Despite this, there is no known difference in understanding between longhand and laptop users when asked to write down factual information.
#2# Correlation is significant as it can lead to a variable amount of conclusiosn in regards to where the correlation lies. For instance, it could be that A can cause B, or B causes A, or A and B are dependent on each other, or a completley different external factor (C) causes both A and B to which they are related to each other. Another reason why correlation is significant is that it also determines the strength between two datasets, whether they are postively or negatively correlated.
#3# One possible explanation for the correlation could be that the inflation of money that could have occurred during a 25 year gap which is why both the professor's salary and alcohol prices seem to be rising incrementally which each other. This often seems to be the case as inflation causes wages and salaries to increase but at the cost of consumer prices to increase as well making this judgement very plausible. #Another possible explanation for the correlation could be that the correlation could derive completley from chance; there could be a completley different variable besides inflation that can affect both the salary amount and alcohol prices. This may not seem plausible as often correlations are measured and do not occur randomly. #A final possible explanation for the correlation could be that the staticians could have performed a faulty statistical test. The data isn't visualized, we are only told that there is a correlation and a significant p-value. This can lead us to believe that perhaps the visual relationship between salaries and alcohol prices could be non-linear, and the staticians have used a linear correlation test to find the correlation value, leading to such a high correlation between the two variables. This is plausible as we do not have a visual representation of the data, we can only know for sure until we graph the data.
#5b# It would be appropiate to use Spearman's correlation coefficient test as opposed to Pearson's because the jointplot displays a non-linear relationship. The positive incline between CO2 correlation and CO2 uptake rate demonstrates that there is a positive correlation between the two variables, but a non-linear relationship.
#5d#Null (H0): There is no known correlation between CO2 concentration and CO2 uptake rate. #The sample size of the study would be 42 as there are 42 known results for both groups.
#5eco2_copy=co2.copy()#creates a copy to use for NHST, a copy is needed so that we can preserve the original while being able to use the copy for testing purposes as the NHST will destroy any correaltion between pairs of data. co2_concentration=list(co2_copy["CO2 Concentration"])#set co2_concentration to the column of values that lie under the CO2 Concentration columnco2_uptake_rate=list(co2_copy["CO2 Uptake Rate"])#set co2_uptake_rate to the column of values that lie under the CO2 Uptake Rate column.np.random.shuffle(co2_concentration)#shuffles the values in co2_concentrationnp.random.shuffle(co2_uptake_rate)#shuffles the values in co2_uptake_rate
#Calculating p-valuesims=10000zeros=np.zeros(10000)foriinrange(sims):#for loop np.random.shuffle(co2_concentration)#shuffles co2_concentrations_correlation_copy=spearmanr(co2_concentration,co2_uptake_rate)#finds the spearman correaltion value for the copied column valueszeros[i]=s_correlation_copy#appends the spearman correaltion values to zeros. p_value=np.abs(np.sum(zeros>=spearman_coefficient))/10000p_value_inverse=np.sum(zeros<=-1*(spearman_coefficient))/10000p=sns.distplot(zeros)p.axvline(spearman_coefficient,color="green")p.axvline(p_value,color='blue')p.axvline(-1*(spearman_coefficient),color='red')p.set(title="Null Distribution",xlabel="Null Values",ylabel='Count')print(p_value,p_value_inverse)
#5f#Confidence Intervalzeros1=np.zeros(10000)foriinrange(10000):co2_resample=co2.sample(len(co2_copy),replace=True)#instead of using the copy, the original is used to preserve the correaltion of the data, and as we are resampling the data, the original is needed. s_CI=spearmanr(co2_resample["CO2 Concentration"],co2_resample["CO2 Uptake Rate"])#determines the spearman correaltion value of the original data setzeros1[i]=s_CI#appends the spearman correaltion value to zeros1 #Looking for Upper and Lower Bounds zeros1.sort()M_lower=zeros1M_upper=zeros1lower_bound=(2*spearman_coefficient-M_upper)upper_bound=(2*spearman_coefficient-M_lower)q=sns.distplot(zeros1)q.axvline(lower_bound,color='red')q.axvline(upper_bound,color='blue')q.axvline(spearman_coefficient,color='green')q.set(title="Confidence Interval",xlabel="Correlation Values",ylabel="Count")
#5i#With the 99% Confidence Interval at the observed correlation value, it is safe to determine that there is a true correlation occuring between CO2 concentration and CO2 uptake rate. With a p-value that is or is close to 0, it is also safe to determine that the correlation between these two variables are significant.
sns.jointplot("Ice Breakup Day of Year","Years Since 1900",data=nenana)
<seaborn.axisgrid.JointGrid at 0x7f5d1377c588>
#6b#It would be appropiate to use Pearson's correlation coefficient test as opposed to Spearman's test because the jointplot demonstrates although it is not distinctly linear, it demonstrates a negative linear relationship.
pearson_coefficient=pearsonr(nenana["Ice Breakup Day of Year"],nenana["Years Since 1900"])pearson_p_value=pearsonr(nenana["Ice Breakup Day of Year"],nenana["Years Since 1900"])print(pearson_coefficient,pearson_p_value)
#6d#Null (H0): The null hypothesis claims that there is no correlation between the number of days that takes to break the ice, and the years that have past since 1900. #The sample size would be 103 as there are 103 rows for each group that we are comparing to.
#6enenana_copy=nenana.copy()nenana_days=list(nenana_copy["Ice Breakup Day of Year"])nenana_years=list(nenana_copy["Years Since 1900"])np.random.shuffle(nenana_days)np.random.shuffle(nenana_years)
#6f#Confidence Interval Testingzeros3=np.zeros(10000)foriinrange(10000):nenana_resample=nenana.sample(len(nenana_copy),replace=True)p_CI=pearsonr(nenana_resample["Ice Breakup Day of Year"],nenana_resample["Years Since 1900"])zeros3[i]=p_CI#Looking for Upper and Lower Bounds zeros3.sort()M_lower=zeros3M_upper=zeros3lower_bound=(2*pearson_coefficient-M_upper)upper_bound=(2*pearson_coefficient-M_lower)r=sns.distplot(zeros3)r.axvline(lower_bound,color='red')r.axvline(upper_bound,color='blue')r.axvline(pearson_coefficient,color='green')
<matplotlib.lines.Line2D at 0x7f5d130a8e80>
In [ ]:
#6i #With the observed correlation being negative--as shown by the jointplot in 6a-- it is also appropiate to presume that the two variables do cause a negative correlation. With the 99% confidence interval conducted, the true correlation value will be found 99% of the time between teh lower and upper bounds. Additionally, with a p-value that is or is close to 0, it is also safe to assume that the correlation value is significant. Finally, based on the results of this NHST/CI, the relationship is determined to be linear.
#Confidence Intervalzeros5=np.zeros(10000)foriinrange(10000):acid_phosphatase_resample=acid_phosphatase.sample(len(acid_phosphatase),replace=True)s_CI1=spearmanr(acid_phosphatase_resample["Temperature"],acid_phosphatase_resample["Initial Reaction Rate"])zeros5[i]=s_CI1#Looking for Upper and Lower Bounds zeros5.sort()M_lower=zeros5M_upper=zeros5lower_bound=(2*spearman_correlation1-M_upper)upper_bound=(2*spearman_correlation1-M_lower)q=sns.distplot(zeros5)q.axvline(lower_bound,color='red')q.axvline(upper_bound,color='blue')q.axvline(spearman_correlation1,color='green')
<matplotlib.lines.Line2D at 0x7f5d132db8d0>
In [ ]:
#7i#With both NHST and the 99% confidence interval conducted, it is safe to presume that there is a known correlation between temperature and the intial reaction rate. The confidence interval gives the assertion that the correaltion value between the known variables will be found 99% of the time between the lower and upper bounds. Additionally, the NHST determines that the correaltion value is significant as the p-value is less than the critical alpha value of 0.01.