Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
| Download

R

Views: 4039
Kernel: R

Data Frames

-------Exercise 1-------

names<- c('Bob','Claire','Luisa','Matt','Marta','Mike') score<- c(34,82,59,72,50,100) game_cards<- data.frame(names,score,stringsAsFactors=FALSE) game_cards
names score 1 Bob 34 2 Claire 82 3 Luisa 59 4 Matt 72 5 Marta 50 6 Mike 100
score2<- c(45,12,74,33,40,79) game_cards2<- data.frame(game_cards$names,score2,stringsAsFactors=FALSE) game_cards2
game_cards.names score2 1 Bob 45 2 Claire 12 3 Luisa 74 4 Matt 33 5 Marta 40 6 Mike 79

The question asks for another field to be created for the second match scores, so the scores for match 1 and 2 should be in the same table.

  • Overall clarity (score = 0.25)

  • Correctness of the code (score = 0.25)

  • Exhaustive cover of required analysis (score= 0.15)

  • Interpretation of the results (score = 0.25)

  • Total Score = 0.9

INSTRUCTOR FEEDBACK: 0.75 since the aswer did not cover the question fully.

--------Exercise 2---------

colnames(game_cards)<-c("match1","score1") colnames(game_cards2)<-c("match2","score2")
Error in colnames(game_cards) <- c("match1", "score1"): object 'game_cards' not found Traceback:

Dimensions of the frame (Game_cards)

dim(game_cards)
[1] 6 2

Minimum score of match 1

min(game_cards$score1)
[1] 34

Minimum score of match 2

min(game_cards2$score2)
[1] 12

Maximum score of match 1

max(game_cards$score1)
[1] 100

Maximum score of match 2

max(game_cards2$score2)
[1] 79

Overall minimum score

min(game_cards$score1,game_cards2$score2)
[1] 12

Overall maximum score

max(game_cards$score1,game_cards2$score2)
[1] 100

Order the scores for each match

game_cards[order(game_cards$score1),]
match1 score1 1 Bob 34 5 Marta 50 3 Luisa 59 4 Matt 72 2 Claire 82 6 Mike 100
game_cards2[order(game_cards2$score2),]
match2 score2 2 Claire 12 4 Matt 33 5 Marta 40 1 Bob 45 3 Luisa 74 6 Mike 79

Why do we need to use the function order() in this way? What is its output?

In order to display the table game_cards/game_cards2 and to get its values from the 'score' column of each table.

Renaming of the columns executed an error, also the columns for the scores of match 1 and 2 should be in a combined table, even in ordering.

  • Overall clarity (score = 0.25)

  • Correctness of the code (score = 0.15)

  • Exhaustive cover of required analysis (score= 0.20)

  • Interpretation of the results (score = 0.25)

  • Total Score = 0.85

INSTRUCTOR FEEDBACK: the code is correct. the error was due to the fact that the oject game_cards needed to be reexecute to be active in teh workspace. The interpretation though of teh question is wrong. The order() function gives back indeces and this is why we need it to use it this way. Score 0.75

Visualise the data

------Exercise 3-------

Describe the plot() characteristics and how we can change/add features.

Various characteristics of the plot can be edited, using commands within the plot() command, separated by commas.

Answer very brief, and general. For correct answers see solutions

  • Overall clarity (score = 0.10)

  • Correctness of the code (score = 0.25)

  • Exhaustive cover of required analysis (score= 0.10)

  • Interpretation of the results (score = 0.25)

  • Total Score = 0.7

INSTRUCTOR FEEDBACK: there is lack of claririty and you could have given more examples on how to use it. Score 0.5

------Exercise 4-------

par(mfrow=c(1,2)) barplot(game_cards$score, names = game_cards$match1,cex.names=0.6) barplot(game_cards2$score2, names = game_cards2$match2,cex.names=0.6)
Image in a Jupyter notebook

Write a brief summary of settings that you think might be useful when presenting data.

It can be useful when you want to compare two sets of data/two different graphs.

Question not answered, see solutions.

  • Overall clarity (score = 0.15)

  • Correctness of the code (score = 0.25)

  • Exhaustive cover of required analysis (score= 0.1)

  • Interpretation of the results (score = 0.25)

  • Total score = 0.75

INSTRUCTOR FEEDBACK: I agree with you peer feedback. You could have combined them and more importantly use some of teh graphical parameters.

------Exercise 5-------

What is the scatter plot useful for? What do you get and why?

Scatter plots show how much one variable is affected by another. They show correlation.

score<- c(34,82,59,72,50,100) plot(score,score) abline(0,1)
Image in a Jupyter notebook

The data can also be plotted on a bar graph. See solutions.

  • Overall clarity (score = 0.25)

  • Correctness of the code (score = 0.25)

  • Exhaustive cover of required analysis (score= 0.25)

  • Interpretation of the results (score = 0.25)

  • Total score = 1

INSTRUCTOR FEEDBACK: You answered the question but you could have gone further with it with adding some examples that show cases where the two variables are not identical. Score 1

------Exercise 6-------

iris
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa 7 4.6 3.4 1.4 0.3 setosa 8 5.0 3.4 1.5 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 4.9 3.1 1.5 0.1 setosa 11 5.4 3.7 1.5 0.2 setosa 12 4.8 3.4 1.6 0.2 setosa 13 4.8 3.0 1.4 0.1 setosa 14 4.3 3.0 1.1 0.1 setosa 15 5.8 4.0 1.2 0.2 setosa 16 5.7 4.4 1.5 0.4 setosa 17 5.4 3.9 1.3 0.4 setosa 18 5.1 3.5 1.4 0.3 setosa 19 5.7 3.8 1.7 0.3 setosa 20 5.1 3.8 1.5 0.3 setosa 21 5.4 3.4 1.7 0.2 setosa 22 5.1 3.7 1.5 0.4 setosa 23 4.6 3.6 1.0 0.2 setosa 24 5.1 3.3 1.7 0.5 setosa 25 4.8 3.4 1.9 0.2 setosa 26 5.0 3.0 1.6 0.2 setosa 27 5.0 3.4 1.6 0.4 setosa 28 5.2 3.5 1.5 0.2 setosa 29 5.2 3.4 1.4 0.2 setosa 30 4.7 3.2 1.6 0.2 setosa ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 121 6.9 3.2 5.7 2.3 virginica 122 5.6 2.8 4.9 2.0 virginica 123 7.7 2.8 6.7 2.0 virginica 124 6.3 2.7 4.9 1.8 virginica 125 6.7 3.3 5.7 2.1 virginica 126 7.2 3.2 6.0 1.8 virginica 127 6.2 2.8 4.8 1.8 virginica 128 6.1 3.0 4.9 1.8 virginica 129 6.4 2.8 5.6 2.1 virginica 130 7.2 3.0 5.8 1.6 virginica 131 7.4 2.8 6.1 1.9 virginica 132 7.9 3.8 6.4 2.0 virginica 133 6.4 2.8 5.6 2.2 virginica 134 6.3 2.8 5.1 1.5 virginica 135 6.1 2.6 5.6 1.4 virginica 136 7.7 3.0 6.1 2.3 virginica 137 6.3 3.4 5.6 2.4 virginica 138 6.4 3.1 5.5 1.8 virginica 139 6.0 3.0 4.8 1.8 virginica 140 6.9 3.1 5.4 2.1 virginica 141 6.7 3.1 5.6 2.4 virginica 142 6.9 3.1 5.1 2.3 virginica 143 5.8 2.7 5.1 1.9 virginica 144 6.8 3.2 5.9 2.3 virginica 145 6.7 3.3 5.7 2.5 virginica 146 6.7 3.0 5.2 2.3 virginica 147 6.3 2.5 5.0 1.9 virginica 148 6.5 3.0 5.2 2.0 virginica 149 6.2 3.4 5.4 2.3 virginica 150 5.9 3.0 5.1 1.8 virginica
par(mfrow=c(1,2)) plot(iris$Sepal.Length,iris$Petal.Length,col="red",main="Sepal Length vs Petal Length",col.main="red",xlab="Sepal Length", ylab="Petal Length") abline(lm(iris$Petal.Length~iris$Sepal.Length), col="red") plot(iris$Sepal.Width,iris$Petal.Width,col="blue",main="Sepal Width vs Petal Width",col.main="Blue",xlab="Sepal Width", ylab="Petal Width") abline(lm(iris$Petal.Width~iris$Sepal.Width), col="blue")
Image in a Jupyter notebook
  • Overall clarity (score = 0.25)

  • Correctness of the code (score = 0.25)

  • Exhaustive cover of required analysis (score= 0.25)

  • Interpretation of the results (score = 0.25)

  • Total Score = 1

INSTRUCTOR FEEDBACK: The code is correct but you need to add more interpretation of the data and explain the methods. Clarity is needed and interpretation is missing. Score 0.5

------Exercise 7-------

Linear regression: Explain your findings.

As sepat length increases, petal length increases as well. However; as sepal width increases, petal width decreases.

  • Overall clarity (score = 0.25)

  • Correctness of the code (score = 0.25)

  • Exhaustive cover of required analysis (score= 0.25)

  • Interpretation of the results (score = 0.25)

  • Total score = 1

INSTRUCTOR FEEDBACK: The interpretation is correct but you need to add a lot more details nd also being more clear explaining why you arrived to those conclusions. Score 0.5

------Exercise 8-------

morley
Expt Run Speed 001 1 1 850 002 1 2 740 003 1 3 900 004 1 4 1070 005 1 5 930 006 1 6 850 007 1 7 950 008 1 8 980 009 1 9 980 010 1 10 880 011 1 11 1000 012 1 12 980 013 1 13 930 014 1 14 650 015 1 15 760 016 1 16 810 017 1 17 1000 018 1 18 1000 019 1 19 960 020 1 20 960 021 2 1 960 022 2 2 940 023 2 3 960 024 2 4 940 025 2 5 880 026 2 6 800 027 2 7 850 028 2 8 880 029 2 9 900 030 2 10 840 ⋮ ⋮ ⋮ ⋮ 71 4 11 910 72 4 12 920 73 4 13 890 74 4 14 860 75 4 15 880 76 4 16 720 77 4 17 840 78 4 18 850 79 4 19 850 80 4 20 780 81 5 1 890 82 5 2 840 83 5 3 780 84 5 4 810 85 5 5 760 86 5 6 810 87 5 7 790 88 5 8 810 89 5 9 820 90 5 10 850 91 5 11 870 92 5 12 870 93 5 13 810 94 5 14 740 95 5 15 810 96 5 16 940 97 5 17 950 98 5 18 800 99 5 19 810 100 5 20 870
morley_table <- table(morley$Expt) lbls <- paste(names(morley_table), "\n", morley_table, sep="") pie(morley_table, labels = lbls, main="Pie Chart of Experiments\n")
Image in a Jupyter notebook
  • Overall clarity (score = 0.25)

  • Correctness of the code (score = 0.25)

  • Exhaustive cover of required analysis (score= 0.25)

  • Interpretation of the results (score = 0.25)

  • Total Score =1

INSTRUCTOR FEEDBACK: It is all correct but it is iumportant that you explain your methods and add more to teh interpretation. Score 0.75

------Exercise 9-------

boxplot(morley$Speed ~ morley$Expt, col='light grey', xlab='Experiment #', ylab="speed (km/s - 299,000)", main="Michelson–Morley experiment") mtext("speed of light data") sol=299792.458-299000 # deviation of real speed of ligth from the estimated 299,000 km/s abline(h=sol, col='red')
Image in a Jupyter notebook
Quantiles1<-quantile(morley$Speed[morley$Expt==1]) IQR1<-IQR(morley$Speed[morley$Expt==1]) Mean1<-mean(morley$Speed[morley$Expt==1]) Sd1<-sd(morley$Speed[morley$Expt==1]) "Quantiles of Experiment 1:" Quantiles1 "Interquartile Range of Experiment 1 data" IQR1 "Mean of Experiment 1 data" Mean1 "Standard Deviation of Experiment 1 data" Sd1
[1] "Quantiles of Experiment 1:"
0% 25% 50% 75% 100% 650 850 940 980 1070
[1] "Interquartile Range of Experiment 1 data"
[1] 130
[1] "Mean of Experiment 1 data"
[1] 909
[1] "Standard Deviation of Experiment 1 data"
[1] 104.926
Quantiles2<-quantile(morley$Speed[morley$Expt==2]) IQR2<-IQR(morley$Speed[morley$Expt==2]) Mean2<-mean(morley$Speed[morley$Expt==2]) Sd2<-sd(morley$Speed[morley$Expt==2]) "Quantiles of Experiment 2:" Quantiles2 "Interquartile Range of Experiment 2 data" IQR2 "Mean of Experiment 2 data" Mean2 "Standard Deviation of Experiment 2 data" Sd2
[1] "Quantiles of Experiment 2:"
0% 25% 50% 75% 100% 760 800 845 885 960
[1] "Interquartile Range of Experiment 2 data"
[1] 85
[1] "Mean of Experiment 2 data"
[1] 856
[1] "Standard Deviation of Experiment 2 data"
[1] 61.16414
Quantiles3<-quantile(morley$Speed[morley$Expt==3]) IQR3<-IQR(morley$Speed[morley$Expt==3]) Mean3<-mean(morley$Speed[morley$Expt==3]) Sd3<-sd(morley$Speed[morley$Expt==3]) "Quantiles of Experiment 3:" Quantiles3 "Interquartile Range of Experiment 3 data" IQR3 "Mean of Experiment 3 data" Mean3 "Standard Deviation of Experiment 3 data" Sd3
[1] "Quantiles of Experiment 3:"
0% 25% 50% 75% 100% 620 840 855 880 970
[1] "Interquartile Range of Experiment 3 data"
[1] 40
[1] "Mean of Experiment 3 data"
[1] 845
[1] "Standard Deviation of Experiment 3 data"
[1] 79.10686
Quantiles4<-quantile(morley$Speed[morley$Expt==4]) IQR4<-IQR(morley$Speed[morley$Expt==4]) Mean4<-mean(morley$Speed[morley$Expt==4]) Sd4<-sd(morley$Speed[morley$Expt==4]) "Quantiles of Experiment 4:" Quantiles4 "Interquartile Range of Experiment 4 data" IQR4 "Mean of Experiment 4 data" Mean4 "Standard Deviation of Experiment 4 data" Sd4
[1] "Quantiles of Experiment 4:"
0% 25% 50% 75% 100% 720.0 767.5 815.0 865.0 920.0
[1] "Interquartile Range of Experiment 4 data"
[1] 97.5
[1] "Mean of Experiment 4 data"
[1] 820.5
[1] "Standard Deviation of Experiment 4 data"
[1] 60.04165
Quantiles5<-quantile(morley$Speed[morley$Expt==5]) IQR5<-IQR(morley$Speed[morley$Expt==5]) Mean5<-mean(morley$Speed[morley$Expt==5]) Sd5<-sd(morley$Speed[morley$Expt==5]) "Quantiles of Experiment 5:" Quantiles5 "Interquartile Range of Experiment 5 data" IQR5 "Mean of Experiment 5 data" Mean5 "Standard Deviation of Experiment 5 data" Sd5
[1] "Quantiles of Experiment 5:"
0% 25% 50% 75% 100% 740.0 807.5 810.0 870.0 950.0
[1] "Interquartile Range of Experiment 5 data"
[1] 62.5
[1] "Mean of Experiment 5 data"
[1] 831.5
[1] "Standard Deviation of Experiment 5 data"
[1] 54.21934

Discuss findings. What can you conclude?

The median of all experiments is similar, except the median speed of the first experiment which is considerably higher. Experiment 1 and 3 have a few outliers. Generally, experiment 3 seems to be the most reliable whereas experiment 1 seems to be the least reliable. This is due to the fact that experiment 3 has quantiles very close to the median whereas experiment 1 has quantiles faw away from the median, meaning that values are more scattered.

  • Overall clarity (score = 0.25)

  • Correctness of the code (score = 0.25)

  • Exhaustive cover of required analysis (score= 0.25)

  • Interpretation of the results (score = 0.25)

  • Total Score = 1

INSTRUCTOR FEEDBACK: This is a beeter execute exercise and also more clear than previous ones. Score 1

------Exercise 10------

hist(morley$Speed[morley$Expt==1], prob=F, col=rgb(0.9,0.9,0.9), main='Michelson-Morley Experiment 1', ylab="Frequency", xlab='Difference from Speed of Light') par(fg='black') lines(density(morley$Speed[morley$Expt==1])) abline(v=mean(morley$Speed[morley$Expt==1]), col=rgb(0.5,0.5,0.5)) abline(v=median(morley$Speed[morley$Expt==1]), lty=3, col=rgb(0.5,0.5,0.5)) abline(v=mean(morley$Speed[morley$Expt==1])+sd(morley$Speed[morley$Expt==1]), lty=2, col=rgb(0.7,0.7,0.7)) abline(v=mean(morley$Speed[morley$Expt==1])-sd(morley$Speed[morley$Expt==1]), lty=2, col=rgb(0.7,0.7,0.7)) rug(morley$Speed[morley$Expt==1])
Image in a Jupyter notebook
hist(morley$Speed[morley$Expt==2], prob=F, col=rgb(0.9,0.9,0.9), main='Michelson-Morley Experiment 2', ylab="Frequency", xlab='Difference from Speed of Light') par(fg='black') lines(density(morley$Speed[morley$Expt==2])) abline(v=mean(morley$Speed[morley$Expt==2]), col=rgb(0.5,0.5,0.5)) abline(v=median(morley$Speed[morley$Expt==2]), lty=3, col=rgb(0.5,0.5,0.5)) abline(v=mean(morley$Speed[morley$Expt==2])+sd(morley$Speed[morley$Expt==2]), lty=2, col=rgb(0.7,0.7,0.7)) abline(v=mean(morley$Speed[morley$Expt==2])-sd(morley$Speed[morley$Expt==2]), lty=2, col=rgb(0.7,0.7,0.7)) rug(morley$Speed[morley$Expt==2])
Image in a Jupyter notebook
hist(morley$Speed[morley$Expt==3], prob=F, col=rgb(0.9,0.9,0.9), main='Michelson-Morley Experiment 3', ylab="Frequency", xlab='Difference from Speed of Light') par(fg='black') lines(density(morley$Speed[morley$Expt==3])) abline(v=mean(morley$Speed[morley$Expt==3]), col=rgb(0.5,0.5,0.5)) abline(v=median(morley$Speed[morley$Expt==3]), lty=3, col=rgb(0.5,0.5,0.5)) abline(v=mean(morley$Speed[morley$Expt==3])+sd(morley$Speed[morley$Expt==3]), lty=2, col=rgb(0.7,0.7,0.7)) abline(v=mean(morley$Speed[morley$Expt==3])-sd(morley$Speed[morley$Expt==3]), lty=2, col=rgb(0.7,0.7,0.7)) rug(morley$Speed[morley$Expt==3])
Image in a Jupyter notebook
hist(morley$Speed[morley$Expt==4], prob=F, col=rgb(0.9,0.9,0.9), main='Michelson-Morley Experiment 4', ylab="Frequency", xlab='Difference from Speed of Light') par(fg='black') lines(density(morley$Speed[morley$Expt==4])) abline(v=mean(morley$Speed[morley$Expt==4]), col=rgb(0.5,0.5,0.5)) abline(v=median(morley$Speed[morley$Expt==4]), lty=3, col=rgb(0.5,0.5,0.5)) abline(v=mean(morley$Speed[morley$Expt==4])+sd(morley$Speed[morley$Expt==4]), lty=2, col=rgb(0.7,0.7,0.7)) abline(v=mean(morley$Speed[morley$Expt==4])-sd(morley$Speed[morley$Expt==4]), lty=2, col=rgb(0.7,0.7,0.7)) rug(morley$Speed[morley$Expt==4])
Image in a Jupyter notebook
hist(morley$Speed[morley$Expt==5], prob=F, col=rgb(0.9,0.9,0.9), main='Michelson-Morley Experiment 5', ylab="Frequency", xlab='Difference from Speed of Light') par(fg='black') lines(density(morley$Speed[morley$Expt==5])) abline(v=mean(morley$Speed[morley$Expt==5]), col=rgb(0.5,0.5,0.5)) abline(v=median(morley$Speed[morley$Expt==5]), lty=3, col=rgb(0.5,0.5,0.5)) abline(v=mean(morley$Speed[morley$Expt==5])+sd(morley$Speed[morley$Expt==5]), lty=2, col=rgb(0.7,0.7,0.7)) abline(v=mean(morley$Speed[morley$Expt==5])-sd(morley$Speed[morley$Expt==5]), lty=2, col=rgb(0.7,0.7,0.7)) rug(morley$Speed[morley$Expt==5])
Image in a Jupyter notebook

What do you conclude? Discuss.

  • Overall clarity (score = 0.25)

  • Correctness of the code (score = 0.25)

  • Exhaustive cover of required analysis (score= 0.25)

  • Interpretation of the results (score = 0.25)

  • Total score = 1

INSTRUCTOR FEEDBACK: The code is correct and so is the use of teh data. Discussion is missing. you can use the par(mfrow=..) to plot figures in the same pringt area. Score 0.75

------Exercise 11------

?rnorm()
?runif()
?rbinom()
  • Overall clarity (score = 0)

  • Correctness of the code (score = 0)

  • Exhaustive cover of required analysis (score= 0.0)

  • Interpretation of the results (score = 0)

  • Total score = 0

------Exercise 12------

?t.test()
t.test(morley$Speed[morley$Expt==1], morley$Speed[morley$Expt==2])
Welch Two Sample t-test data: morley$Speed[morley$Expt == 1] and morley$Speed[morley$Expt == 2] t = 1.9516, df = 30.576, p-value = 0.0602 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.419111 108.419111 sample estimates: mean of x mean of y 909 856
t.test(morley$Speed[morley$Expt==2], morley$Speed[morley$Expt==3])
Welch Two Sample t-test data: morley$Speed[morley$Expt == 2] and morley$Speed[morley$Expt == 3] t = 0.49196, df = 35.736, p-value = 0.6258 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -34.35882 56.35882 sample estimates: mean of x mean of y 856 845

Explain the results and discuss your conclusions.

For both t-tests conducted, the p-value is greater than 0.05. This means that the null hypothesis can be accepted, stating that the difference in speed seen between those experiments may be due to error.

Need to test all experiments against eachother.

  • Overall clarity (score = 0.15)

  • Correctness of the code (score = 0.25)

  • Exhaustive cover of required analysis (score= 0.10)

  • Interpretation of the results (score = 0.15)

  • Total Score = 0.65

INSTRUCTOR FEEDBACK: I agree with the peer comments and also with the fact that you did not interpret the question well. The interpretation of the data was correct. Score 0.75

Overall, it was very great work. Final score = 9.95/12

INSTRUCTOR FEEDBACK: Score:8.75/12