This is an individual assignment. You are free to use your notes, homeworks, books, online materials, etc. You may not discuss the questions with anyone else. The midterm is due on canvas Friday at midnight before Saturday. Submit the html file this Rmd produces like the homework. You are free to schedule one 15 minute zoom session with me this upcoming week to ask a question or two. I reserve the right to not answer the question.
I have loaded the libraries and data for you. Don’t touch the code block below. You will have to create your own code block(s) to answer the questions if needed.
Read the questions carefully and slowly. Take your time. Write your answers in this document where indicated.
knitr::opts_chunk$set(warning=FALSE,
message=FALSE,
fig.width=6,
fig.align="center") # No warnings
library(dplyr) # For pipe and other data commands
library(janitor) # For tabyl
library(ggplot2) # For plotting using ggplot() function
library(knitr) # For making tablues using kabble()
load("~/Data/output/ACS_clean.RData")
ls()
## [1] "mydata_clean"
Good luck.
Here is a histogram for the variable HINCP
split by the variable new_FS
which indicates whether a family was on food stamps or not.
mydata_clean %>%
distinct(SERIALNO, new_FS, HINCP) %>%
ggplot(aes(x = HINCP)) +
geom_histogram(binwidth = 25000,
color = "white"
) +
facet_wrap( ~ new_FS) +
scale_x_continuous(labels = scales::comma) +
xlim(0, 1000000)
Taking the graph into account, write your answers below.
HINCP
? Explain how you know.HINCP
is a categorical variable, because it is showing the income of each household within certain ranges. Categorical variables are also always found on the x-axis.
new_FS
? Explain how you know.new_FS
is a numerical variable, because it is showing how many households use food stamps or do not use food stamps. Numerical variables are also always found on the y-axis.
The shape of the histogram is skewed right for people who are on food stamps.
Here is a summary table for the variable HINCP
split by new_FS
.
mydata_clean %>%
distinct(SERIALNO, new_FS, HINCP) %>%
group_by(new_FS) %>%
summarize(n = n(),
min = min(HINCP, na.rm=TRUE),
median = median(HINCP, na.rm=TRUE),
mean = mean(HINCP, na.rm=TRUE),
max = max(HINCP, na.rm=TRUE)) %>%
kable()
new_FS | n | min | median | mean | max |
---|---|---|---|---|---|
Food stamps | 641 | -4600 | 38400 | 64901.88 | 958700 |
No food stamps | 13471 | -4800 | 124000 | 164488.83 | 2580000 |
Taking the table and histogram into account, answer the questions.
HINCP
for people on food stamps?The appropriate measure of spread of distribution of HINCP
for people on food stamps is Median and IQR.
HINCP
for people on food stamps means.When using the median to find the measure of center, it means putting in sequence the number of people on food stamps from lowest to highest and seeing from the middle amount as being the median of the distribution.
HINCP
for those on food stamps and not on food stamps different and similar?The distributions of HINCP
for those on food stamps and those not on food stamps are similar being as though they are both right skewed.
I find over 1,000 households having a low income and not using food stamps very surprising compared to the chart of the people who are on food stamps. I find it surprising that there is a lot more data of those not on food stamps being households that have lower incomes.
mydata_clean %>%
distinct(SERIALNO, new_FS, HINCP) %>%
ggplot(aes(x = new_FS, y = HINCP)) +
geom_boxplot(outlier.shape = NA) +
coord_flip() +
ylim(0,400000)
HINCP
is the same or different for people who are on or not on food stamps? Cite as much evidence as possible from the graphs and tables above.I think the distribution of HINCP
is about the same in terms of being right skewed and needing to use the measure of center with Median.
Here is a table for the variable JWTR_new
which indicates how someone got to work.
mydata_clean %>%
distinct(SERIALNO, JWTR_new) %>%
tabyl(JWTR_new) %>%
adorn_pct_formatting(digits=0) %>%
kable()
JWTR_new | n | percent | valid_percent |
---|---|---|---|
Car, truck, or van | 8524 | 38% | 71% |
Bus or trolley bus | 558 | 3% | 5% |
Streetcar | 24 | 0% | 0% |
Subway | 649 | 3% | 5% |
Railroad | 224 | 1% | 2% |
Ferryboat | 38 | 0% | 0% |
Taxicab | 60 | 0% | 0% |
Motorcycle | 60 | 0% | 0% |
Bicycle | 271 | 1% | 2% |
Walked | 475 | 2% | 4% |
Worked at home | 987 | 4% | 8% |
Other method | 149 | 1% | 1% |
NA | 10122 | 46% | - |
JWTR_new
? Explain how you know.JWTR_new
is a categorical variable because it is showing which vehicles people use to get to work.
NA
mean?The 46% stands for the percent of people whom the transportation method is inapplicable to them.
percent
and valid_percent
columns?The valid_percent
column is counting the methods in which people do use transportation to get to work; excluding the percentage of people in which the question was inapplicable to them.
Bus or trolley bus
mean?The 5% of row Bus or trolley bus
is showing the valid percent of people excluding the NA
row in which they do not apply to the question.
Read the document Cell-Phone-Student.md
in the Affective-Domain\Cell-Phones
directory. Answer the questions below.
I do think my cell phone habits are hindering my attention and focus. For me, I feel like my cell phone and I are connected by a string and I can never go without it. I use my phone for just about everything, because with our technological advances improving day by day, my phone makes everything much easier. I can search up any question that comes to mind, I can contact my friends via text, call, Facetime, email, etc., I can play games when I’m bored, and I can check my social media.
However, though my cell phone is very useful and can kill my boredom, it can be very distracting when I’m trying to focus. I can be doing my homework and a notification of a text message will immediately draw my attention away and I will not only reply to the text, but also check my email, check my social media, and so forth. For me, I check my phone so often in the day, I don’t need a notification to take my attention away from anything. I think our cell phones just provide us with so much that we will easily drop whatever we are doing to go on it.
To be honest, I would want to make changes to my phone usage, but realistically, I don’t think I would be able to. I work late at night and for me, it is really hard for me to fall asleep, so being on my phone for who knows how long, helps my eyes get tired. Though it is an acquired bad habit, I don’t think I’d be able to break that routine for myself.
However, I had recently made the slightest change in my extreme phone usage. I used to check my social media every second of the day, but recently, as in for a few weeks, I won’t open my social media until the end of the day when I’m at home from school and work. I think this slight change has made some positive progress to my previous excessive usage.
I would say, if I, or someone were to try to change their habits, I would advise myself/them to put the phone in a different room when you’re with your family/friends at home, so that you’re complete undivided attention is on them. I would also say to put your phone on silent instead of on sound, so it lessens the times you get interrupted while doing something to check your notifications. I would also say to possibly delete your social media apps. I had seen other people “take a break from social media” by deleting the apps for about a month or so, and I think can really help someone see how unimportant social media is and lessen some of their screen time.
What is “sampling bias”? Explain using proper terminology and craft your example to explain how it can effect the outcome of a statistical study. This should be several paragraphs long.
“Sampling bias” is a bias in which a sample is collected in such a way that some members of the sample population have a lower sampling probability than others.