Project: Jingyi Xie - Autumn2016/BMS353

Path: Autumn2016 / Week2-peer-grade / 29a8820a-894c-4f01-ac91-09874be50e0f / MDA14ST_Week2.ipynb

Views: ⁴⁰³⁹

Kernel: R

Data Frames

-------Exercise 1-------

In [44]:

names<- c('Bob','Claire','Luisa','Matt','Marta','Mike')
score<- c(34,82,59,72,50,100)

game_cards<- data.frame(names,score,stringsAsFactors=FALSE)
game_cards

  names  score
Bob     34  
Claire  82  
Luisa   59  
Matt    72  
Marta   50  
Mike   100  

In [45]:

score2<- c(45,12,74,33,40,79)

game_cards2<- data.frame(game_cards$names,score2,stringsAsFactors=FALSE)
game_cards2

  game_cards.names score2
Bob              45    
Claire           12    
Luisa            74    
Matt             33    
Marta            40    
Mike             79    

The question asks for another field to be created for the second match scores, so the scores for match 1 and 2 should be in the same table.

Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.15)
Interpretation of the results (score = 0.25)
Total Score = 0.9

INSTRUCTOR FEEDBACK: 0.75 since the aswer did not cover the question fully.

--------Exercise 2---------

In [1]:

colnames(game_cards)<-c("match1","score1")
colnames(game_cards2)<-c("match2","score2")

Error in colnames(game_cards) <- c("match1", "score1"): object 'game_cards' not found
Traceback:

Dimensions of the frame (Game_cards)

In [47]:

dim(game_cards)

[1] 6 2

Minimum score of match 1

In [48]:

min(game_cards$score1)

[1] 34

Minimum score of match 2

In [49]:

min(game_cards2$score2)

[1] 12

Maximum score of match 1

In [50]:

max(game_cards$score1)

[1] 100

Maximum score of match 2

In [51]:

max(game_cards2$score2)

[1] 79

Overall minimum score

In [52]:

min(game_cards$score1,game_cards2$score2)

[1] 12

Overall maximum score

In [53]:

max(game_cards$score1,game_cards2$score2)

[1] 100

Order the scores for each match

In [54]:

game_cards[order(game_cards$score1),]

  match1 score1
Bob     34   
Marta   50   
Luisa   59   
Matt    72   
Claire  82   
Mike   100   

In [55]:

game_cards2[order(game_cards2$score2),]

  match2 score2
Claire 12    
Matt   33    
Marta  40    
Bob    45    
Luisa  74    
Mike   79    

Why do we need to use the function order() in this way? What is its output?

In order to display the table game_cards/game_cards2 and to get its values from the 'score' column of each table.

Renaming of the columns executed an error, also the columns for the scores of match 1 and 2 should be in a combined table, even in ordering.

Overall clarity (score = 0.25)
Correctness of the code (score = 0.15)
Exhaustive cover of required analysis (score= 0.20)
Interpretation of the results (score = 0.25)
Total Score = 0.85

INSTRUCTOR FEEDBACK: the code is correct. the error was due to the fact that the oject game_cards needed to be reexecute to be active in teh workspace. The interpretation though of teh question is wrong. The order() function gives back indeces and this is why we need it to use it this way. Score 0.75

Visualise the data

------Exercise 3-------

Describe the plot() characteristics and how we can change/add features.

Various characteristics of the plot can be edited, using commands within the plot() command, separated by commas.

Answer very brief, and general. For correct answers see solutions

Overall clarity (score = 0.10)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.10)
Interpretation of the results (score = 0.25)
Total Score = 0.7

INSTRUCTOR FEEDBACK: there is lack of claririty and you could have given more examples on how to use it. Score 0.5

------Exercise 4-------

In [56]:

par(mfrow=c(1,2))
barplot(game_cards$score, names = game_cards$match1,cex.names=0.6)

barplot(game_cards2$score2, names = game_cards2$match2,cex.names=0.6)

Write a brief summary of settings that you think might be useful when presenting data.

It can be useful when you want to compare two sets of data/two different graphs.

Question not answered, see solutions.

Overall clarity (score = 0.15)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.1)
Interpretation of the results (score = 0.25)
Total score = 0.75

INSTRUCTOR FEEDBACK: I agree with you peer feedback. You could have combined them and more importantly use some of teh graphical parameters.

------Exercise 5-------

What is the scatter plot useful for? What do you get and why?

Scatter plots show how much one variable is affected by another. They show correlation.

In [57]:

score<- c(34,82,59,72,50,100)
plot(score,score)
abline(0,1)

The data can also be plotted on a bar graph. See solutions.

Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total score = 1

INSTRUCTOR FEEDBACK: You answered the question but you could have gone further with it with adding some examples that show cases where the two variables are not identical. Score 1

------Exercise 6-------

In [58]:

iris

    Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
 5.1          3.5         1.4          0.2         setosa   
 4.9          3.0         1.4          0.2         setosa   
 4.7          3.2         1.3          0.2         setosa   
 4.6          3.1         1.5          0.2         setosa   
 5.0          3.6         1.4          0.2         setosa   
 5.4          3.9         1.7          0.4         setosa   
 4.6          3.4         1.4          0.3         setosa   
 5.0          3.4         1.5          0.2         setosa   
 4.4          2.9         1.4          0.2         setosa   
4.9          3.1         1.5          0.1         setosa   
5.4          3.7         1.5          0.2         setosa   
4.8          3.4         1.6          0.2         setosa   
4.8          3.0         1.4          0.1         setosa   
4.3          3.0         1.1          0.1         setosa   
5.8          4.0         1.2          0.2         setosa   
5.7          4.4         1.5          0.4         setosa   
5.4          3.9         1.3          0.4         setosa   
5.1          3.5         1.4          0.3         setosa   
5.7          3.8         1.7          0.3         setosa   
5.1          3.8         1.5          0.3         setosa   
5.4          3.4         1.7          0.2         setosa   
5.1          3.7         1.5          0.4         setosa   
4.6          3.6         1.0          0.2         setosa   
5.1          3.3         1.7          0.5         setosa   
4.8          3.4         1.9          0.2         setosa   
5.0          3.0         1.6          0.2         setosa   
5.0          3.4         1.6          0.4         setosa   
5.2          3.5         1.5          0.2         setosa   
5.2          3.4         1.4          0.2         setosa   
4.7          3.2         1.6          0.2         setosa   
⋮   ⋮            ⋮           ⋮            ⋮           ⋮        
6.9          3.2         5.7          2.3         virginica
5.6          2.8         4.9          2.0         virginica
7.7          2.8         6.7          2.0         virginica
6.3          2.7         4.9          1.8         virginica
6.7          3.3         5.7          2.1         virginica
7.2          3.2         6.0          1.8         virginica
6.2          2.8         4.8          1.8         virginica
6.1          3.0         4.9          1.8         virginica
6.4          2.8         5.6          2.1         virginica
7.2          3.0         5.8          1.6         virginica
7.4          2.8         6.1          1.9         virginica
7.9          3.8         6.4          2.0         virginica
6.4          2.8         5.6          2.2         virginica
6.3          2.8         5.1          1.5         virginica
6.1          2.6         5.6          1.4         virginica
7.7          3.0         6.1          2.3         virginica
6.3          3.4         5.6          2.4         virginica
6.4          3.1         5.5          1.8         virginica
6.0          3.0         4.8          1.8         virginica
6.9          3.1         5.4          2.1         virginica
6.7          3.1         5.6          2.4         virginica
6.9          3.1         5.1          2.3         virginica
5.8          2.7         5.1          1.9         virginica
6.8          3.2         5.9          2.3         virginica
6.7          3.3         5.7          2.5         virginica
6.7          3.0         5.2          2.3         virginica
6.3          2.5         5.0          1.9         virginica
6.5          3.0         5.2          2.0         virginica
6.2          3.4         5.4          2.3         virginica
5.9          3.0         5.1          1.8         virginica

In [59]:

par(mfrow=c(1,2))
plot(iris$Sepal.Length,iris$Petal.Length,col="red",main="Sepal Length vs Petal Length",col.main="red",xlab="Sepal Length", ylab="Petal Length")
abline(lm(iris$Petal.Length~iris$Sepal.Length), col="red")

plot(iris$Sepal.Width,iris$Petal.Width,col="blue",main="Sepal Width vs Petal Width",col.main="Blue",xlab="Sepal Width", ylab="Petal Width")
abline(lm(iris$Petal.Width~iris$Sepal.Width), col="blue")

Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total Score = 1

INSTRUCTOR FEEDBACK: The code is correct but you need to add more interpretation of the data and explain the methods. Clarity is needed and interpretation is missing. Score 0.5

------Exercise 7-------

Linear regression: Explain your findings.

As sepat length increases, petal length increases as well. However; as sepal width increases, petal width decreases.

Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total score = 1

INSTRUCTOR FEEDBACK: The interpretation is correct but you need to add a lot more details nd also being more clear explaining why you arrived to those conclusions. Score 0.5

------Exercise 8-------

In [60]:

morley

    Expt Run Speed
1     1   850 
1     2   740 
1     3   900 
1     4  1070 
1     5   930 
1     6   850 
1     7   950 
1     8   980 
1     9   980 
1    10   880 
1    11  1000 
1    12   980 
1    13   930 
1    14   650 
1    15   760 
1    16   810 
1    17  1000 
1    18  1000 
1    19   960 
1    20   960 
2     1   960 
2     2   940 
2     3   960 
2     4   940 
2     5   880 
2     6   800 
2     7   850 
2     8   880 
2     9   900 
2    10   840 
⋮   ⋮    ⋮   ⋮    
4    11  910  
4    12  920  
4    13  890  
4    14  860  
4    15  880  
4    16  720  
4    17  840  
4    18  850  
4    19  850  
4    20  780  
5     1  890  
5     2  840  
5     3  780  
5     4  810  
5     5  760  
5     6  810  
5     7  790  
5     8  810  
5     9  820  
5    10  850  
5    11  870  
5    12  870  
5    13  810  
5    14  740  
5    15  810  
5    16  940  
5    17  950  
5    18  800  
5    19  810  
5    20  870  

In [61]:

morley_table <- table(morley$Expt)
lbls <- paste(names(morley_table), "\n", morley_table, sep="")
pie(morley_table, labels = lbls, 
  	main="Pie Chart of Experiments\n")

Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total Score =1

INSTRUCTOR FEEDBACK: It is all correct but it is iumportant that you explain your methods and add more to teh interpretation. Score 0.75

------Exercise 9-------

In [62]:

boxplot(morley$Speed ~ morley$Expt,
  col='light grey', xlab='Experiment #',
  ylab="speed (km/s - 299,000)",
  main="Michelson–Morley experiment")
mtext("speed of light data")

sol=299792.458-299000 # deviation of real speed of ligth from the estimated 299,000 km/s
abline(h=sol, col='red')

In [63]:

Quantiles1<-quantile(morley$Speed[morley$Expt==1])
IQR1<-IQR(morley$Speed[morley$Expt==1])
Mean1<-mean(morley$Speed[morley$Expt==1])
Sd1<-sd(morley$Speed[morley$Expt==1])
"Quantiles of Experiment 1:"
Quantiles1
"Interquartile Range of Experiment 1 data"
IQR1
"Mean of Experiment 1 data"
Mean1
"Standard Deviation of Experiment 1 data"
Sd1

[1] "Quantiles of Experiment 1:"

  0%  25%  50%  75% 100% 
 650  850  940  980 1070 

[1] "Interquartile Range of Experiment 1 data"

[1] 130

[1] "Mean of Experiment 1 data"

[1] 909

[1] "Standard Deviation of Experiment 1 data"

[1] 104.926

In [64]:

Quantiles2<-quantile(morley$Speed[morley$Expt==2])
IQR2<-IQR(morley$Speed[morley$Expt==2])
Mean2<-mean(morley$Speed[morley$Expt==2])
Sd2<-sd(morley$Speed[morley$Expt==2])
"Quantiles of Experiment 2:"
Quantiles2
"Interquartile Range of Experiment 2 data"
IQR2
"Mean of Experiment 2 data"
Mean2
"Standard Deviation of Experiment 2 data"
Sd2

[1] "Quantiles of Experiment 2:"

  0%  25%  50%  75% 100% 
 760  800  845  885  960 

[1] "Interquartile Range of Experiment 2 data"

[1] 85

[1] "Mean of Experiment 2 data"

[1] 856

[1] "Standard Deviation of Experiment 2 data"

[1] 61.16414

In [65]:

Quantiles3<-quantile(morley$Speed[morley$Expt==3])
IQR3<-IQR(morley$Speed[morley$Expt==3])
Mean3<-mean(morley$Speed[morley$Expt==3])
Sd3<-sd(morley$Speed[morley$Expt==3])
"Quantiles of Experiment 3:"
Quantiles3
"Interquartile Range of Experiment 3 data"
IQR3
"Mean of Experiment 3 data"
Mean3
"Standard Deviation of Experiment 3 data"
Sd3

[1] "Quantiles of Experiment 3:"

  0%  25%  50%  75% 100% 
 620  840  855  880  970 

[1] "Interquartile Range of Experiment 3 data"

[1] 40

[1] "Mean of Experiment 3 data"

[1] 845

[1] "Standard Deviation of Experiment 3 data"

[1] 79.10686

In [66]:

Quantiles4<-quantile(morley$Speed[morley$Expt==4])
IQR4<-IQR(morley$Speed[morley$Expt==4])
Mean4<-mean(morley$Speed[morley$Expt==4])
Sd4<-sd(morley$Speed[morley$Expt==4])
"Quantiles of Experiment 4:"
Quantiles4
"Interquartile Range of Experiment 4 data"
IQR4
"Mean of Experiment 4 data"
Mean4
"Standard Deviation of Experiment 4 data"
Sd4

[1] "Quantiles of Experiment 4:"

   0%   25%   50%   75%  100% 
720.0 767.5 815.0 865.0 920.0 

[1] "Interquartile Range of Experiment 4 data"

[1] 97.5

[1] "Mean of Experiment 4 data"

[1] 820.5

[1] "Standard Deviation of Experiment 4 data"

[1] 60.04165

In [67]:

Quantiles5<-quantile(morley$Speed[morley$Expt==5])
IQR5<-IQR(morley$Speed[morley$Expt==5])
Mean5<-mean(morley$Speed[morley$Expt==5])
Sd5<-sd(morley$Speed[morley$Expt==5])
"Quantiles of Experiment 5:"
Quantiles5
"Interquartile Range of Experiment 5 data"
IQR5
"Mean of Experiment 5 data"
Mean5
"Standard Deviation of Experiment 5 data"
Sd5

[1] "Quantiles of Experiment 5:"

   0%   25%   50%   75%  100% 
740.0 807.5 810.0 870.0 950.0 

[1] "Interquartile Range of Experiment 5 data"

[1] 62.5

[1] "Mean of Experiment 5 data"

[1] 831.5

[1] "Standard Deviation of Experiment 5 data"

[1] 54.21934

Discuss findings. What can you conclude?

The median of all experiments is similar, except the median speed of the first experiment which is considerably higher. Experiment 1 and 3 have a few outliers. Generally, experiment 3 seems to be the most reliable whereas experiment 1 seems to be the least reliable. This is due to the fact that experiment 3 has quantiles very close to the median whereas experiment 1 has quantiles faw away from the median, meaning that values are more scattered.

Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total Score = 1

INSTRUCTOR FEEDBACK: This is a beeter execute exercise and also more clear than previous ones. Score 1

------Exercise 10------

In [68]:

hist(morley$Speed[morley$Expt==1], prob=F,
     col=rgb(0.9,0.9,0.9),
     main='Michelson-Morley Experiment 1',
     ylab="Frequency", xlab='Difference from Speed of Light')
par(fg='black')
lines(density(morley$Speed[morley$Expt==1]))
abline(v=mean(morley$Speed[morley$Expt==1]), col=rgb(0.5,0.5,0.5))
abline(v=median(morley$Speed[morley$Expt==1]), lty=3, col=rgb(0.5,0.5,0.5))
abline(v=mean(morley$Speed[morley$Expt==1])+sd(morley$Speed[morley$Expt==1]), lty=2, col=rgb(0.7,0.7,0.7))
abline(v=mean(morley$Speed[morley$Expt==1])-sd(morley$Speed[morley$Expt==1]), lty=2, col=rgb(0.7,0.7,0.7))
rug(morley$Speed[morley$Expt==1])

In [69]:

hist(morley$Speed[morley$Expt==2], prob=F,
     col=rgb(0.9,0.9,0.9),
     main='Michelson-Morley Experiment 2',
     ylab="Frequency", xlab='Difference from Speed of Light')
par(fg='black')
lines(density(morley$Speed[morley$Expt==2]))
abline(v=mean(morley$Speed[morley$Expt==2]), col=rgb(0.5,0.5,0.5))
abline(v=median(morley$Speed[morley$Expt==2]), lty=3, col=rgb(0.5,0.5,0.5))
abline(v=mean(morley$Speed[morley$Expt==2])+sd(morley$Speed[morley$Expt==2]), lty=2, col=rgb(0.7,0.7,0.7))
abline(v=mean(morley$Speed[morley$Expt==2])-sd(morley$Speed[morley$Expt==2]), lty=2, col=rgb(0.7,0.7,0.7))
rug(morley$Speed[morley$Expt==2])

In [70]:

hist(morley$Speed[morley$Expt==3], prob=F,
     col=rgb(0.9,0.9,0.9),
     main='Michelson-Morley Experiment 3',
     ylab="Frequency", xlab='Difference from Speed of Light')
par(fg='black')
lines(density(morley$Speed[morley$Expt==3]))
abline(v=mean(morley$Speed[morley$Expt==3]), col=rgb(0.5,0.5,0.5))
abline(v=median(morley$Speed[morley$Expt==3]), lty=3, col=rgb(0.5,0.5,0.5))
abline(v=mean(morley$Speed[morley$Expt==3])+sd(morley$Speed[morley$Expt==3]), lty=2, col=rgb(0.7,0.7,0.7))
abline(v=mean(morley$Speed[morley$Expt==3])-sd(morley$Speed[morley$Expt==3]), lty=2, col=rgb(0.7,0.7,0.7))
rug(morley$Speed[morley$Expt==3])

In [71]:

hist(morley$Speed[morley$Expt==4], prob=F,
     col=rgb(0.9,0.9,0.9),
     main='Michelson-Morley Experiment 4',
     ylab="Frequency", xlab='Difference from Speed of Light')
par(fg='black')
lines(density(morley$Speed[morley$Expt==4]))
abline(v=mean(morley$Speed[morley$Expt==4]), col=rgb(0.5,0.5,0.5))
abline(v=median(morley$Speed[morley$Expt==4]), lty=3, col=rgb(0.5,0.5,0.5))
abline(v=mean(morley$Speed[morley$Expt==4])+sd(morley$Speed[morley$Expt==4]), lty=2, col=rgb(0.7,0.7,0.7))
abline(v=mean(morley$Speed[morley$Expt==4])-sd(morley$Speed[morley$Expt==4]), lty=2, col=rgb(0.7,0.7,0.7))
rug(morley$Speed[morley$Expt==4])

In [72]:

hist(morley$Speed[morley$Expt==5], prob=F,
     col=rgb(0.9,0.9,0.9),
     main='Michelson-Morley Experiment 5',
     ylab="Frequency", xlab='Difference from Speed of Light')
par(fg='black')
lines(density(morley$Speed[morley$Expt==5]))
abline(v=mean(morley$Speed[morley$Expt==5]), col=rgb(0.5,0.5,0.5))
abline(v=median(morley$Speed[morley$Expt==5]), lty=3, col=rgb(0.5,0.5,0.5))
abline(v=mean(morley$Speed[morley$Expt==5])+sd(morley$Speed[morley$Expt==5]), lty=2, col=rgb(0.7,0.7,0.7))
abline(v=mean(morley$Speed[morley$Expt==5])-sd(morley$Speed[morley$Expt==5]), lty=2, col=rgb(0.7,0.7,0.7))
rug(morley$Speed[morley$Expt==5])

What do you conclude? Discuss.

Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total score = 1

INSTRUCTOR FEEDBACK: The code is correct and so is the use of teh data. Discussion is missing. you can use the par(mfrow=..) to plot figures in the same pringt area. Score 0.75

------Exercise 11------

In [73]:

?rnorm()

In [74]:

?runif()

In [75]:

?rbinom()

Overall clarity (score = 0)
Correctness of the code (score = 0)
Exhaustive cover of required analysis (score= 0.0)
Interpretation of the results (score = 0)
Total score = 0

------Exercise 12------

In [76]:

?t.test()

In [77]:

t.test(morley$Speed[morley$Expt==1], morley$Speed[morley$Expt==2])

	Welch Two Sample t-test

data:  morley$Speed[morley$Expt == 1] and morley$Speed[morley$Expt == 2]
t = 1.9516, df = 30.576, p-value = 0.0602
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  -2.419111 108.419111
sample estimates:
mean of x mean of y 
      909       856 

In [78]:

t.test(morley$Speed[morley$Expt==2], morley$Speed[morley$Expt==3])

	Welch Two Sample t-test

data:  morley$Speed[morley$Expt == 2] and morley$Speed[morley$Expt == 3]
t = 0.49196, df = 35.736, p-value = 0.6258
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -34.35882  56.35882
sample estimates:
mean of x mean of y 
      856       845 

Explain the results and discuss your conclusions.

For both t-tests conducted, the p-value is greater than 0.05. This means that the null hypothesis can be accepted, stating that the difference in speed seen between those experiments may be due to error.

Need to test all experiments against eachother.

Overall clarity (score = 0.15)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.10)
Interpretation of the results (score = 0.15)
Total Score = 0.65

INSTRUCTOR FEEDBACK: I agree with the peer comments and also with the fact that you did not interpret the question well. The interpretation of the data was correct. Score 0.75

Overall, it was very great work. Final score = 9.95/12

INSTRUCTOR FEEDBACK: Score:8.75/12

In [ ]:

Data Frames

Visualise the data

Product

Resources

Company