Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Views: 236
Kernel: Python 2

ATMS 391: Geophysical Data Analysis

Homework 9


Problem 1. Using the August Chicago data, test the hypothesis that the means of hourly temperature for the first half of the month are equal to the second half of the month. Use 00Z on Aug 15 as the start of the second half of the month. Compute p values, assuming the Gaussian distribution is an adequate approximation to the null distribution of the test statistic.

import pandas as pd import scipy.stats as st df = pd.read_csv('chicago_hourly_aug_2015.csv', header=6) firstHalf = df['DryBulbCelsius'][df['Date'] < 20150815] secondHalf = df['DryBulbCelsius'][df['Date'] >= 20150815] # use two-sample t-test: http://stattrek.com/hypothesis-test/difference-in-means.aspx stat, p = st.ttest_ind(firstHalf, secondHalf) print("p-value is %e" % p)
p-value is 6.833892e-25

The p-value is quite small, so we can reject the null hypothesis of equal means.

Problem 2. (a) Using the same dataset, calculate the correlation coefficient between hourly temperature and dewpoint for August 2015.

df1 = pd.DataFrame({'Temperature': df['DryBulbCelsius'], 'Dewpoint': df['DewPointCelsius']}).dropna() df1.corr()
Dewpoint Temperature
Dewpoint 1.000000 0.401586
Temperature 0.401586 1.000000

(b) Is this correlation statistically significant at the 99% level?

df1['data1'] = df1['Temperature'] > df1['Temperature'].median() df1['data2'] = df1['Dewpoint'] > df1['Dewpoint'].median() out1 = pd.crosstab(df1['data1'], df1['data2']) print(out1) p_val = st.chi2_contingency(out1)[1] print("p-value is %f" % p_val)
data2 False True data1 False 281 185 True 183 220 p-value is 0.000016

p-value is less than 0.01, so there is statistically significant at 99% level.

(c) Repeat (a) and (b) for daytime temperatures only (6 AM-6PM local time). Does your conclusion change?

data_set = df[(df['Time'][:] >= 600) & (df['Time'][:] <= 1800)] #print(data_set) # Part a df_new = pd.DataFrame({'Temperature': data_set['DryBulbCelsius'], 'Dewpoint': data_set['DewPointCelsius']}).dropna() print(df_new.corr()) # Part b df_new['data1'] = df_new['Temperature'] > df_new['Temperature'].median() df_new['data2'] = df_new['Dewpoint'] > df_new['Dewpoint'].median() out2 = pd.crosstab(df_new['data1'], df_new['data2']) p_val = st.chi2_contingency(out2)[1] print("p-value is %f" % p_val)
Dewpoint Temperature Dewpoint 1.000000 0.359294 Temperature 0.359294 1.000000 p-value is 0.142380

p-value is larger than 0.01, so there is no statistically significant at 99% level.