CoCalc Public FilesExams / Spring-2020 / Midterm / MATH-159_Midterm-1.rmdOpen with one click!
Author: Ian Ramirez
Views : 42
Compute Environment: Ubuntu 18.04 (Deprecated)

title: Midterm 1 subtitle: MATH-159 Spring 2020 author: Ian Ramirez date: 3/15-3/19

Instructions and Setup

This is an individual assignment. You are free to use your notes, homeworks, books, online materials, etc. You may not discuss the questions with anyone else. The midterm is due on canvas Friday at midnight before Saturday. Submit the html file this Rmd produces like the homework. You are free to schedule one 15 minute zoom session with me this upcoming week to ask a question or two. I reserve the right to not answer the question.

I have loaded the libraries and data for you. Don't touch the code block below. You will have to create your own code block(s) to answer the questions if needed.

Read the questions carefully and slowly. Take your time. Write your answers in this document where indicated.

knitr::opts_chunk$set(warning=FALSE,
                      message=FALSE,
                      fig.width=6,
                      fig.align="center") # No warnings
library(dplyr)   # For pipe and other data commands
library(janitor) # For tabyl
library(ggplot2) # For plotting using ggplot() function
library(knitr)   # For making tablues using kabble()

load("~/Data/output/ACS_clean.RData")
ls()

Good luck.

Questions

Question 1

Here is a histogram for the variable HINCP split by the variable new_FS which indicates whether a family was on food stamps or not.

mydata_clean %>%
  distinct(SERIALNO, new_FS, HINCP) %>%
  ggplot(aes(x = HINCP)) +
    geom_histogram(binwidth = 25000,
                   color = "white"
                  ) +
    facet_wrap( ~ new_FS) + 
    scale_x_continuous(labels = scales::comma) +
    xlim(0, 1000000)

Taking the graph into account, write your answers below.

  1. What sort of variable is HINCP? Explain how you know.
  • 'HINCP' is a numerical variable.
  1. What sort of variables is new_FS? Explain how you know.
  • 'new_FS' is a categorical variable.
  1. How would you describe the shape of the histogram for people who are on food stamps?
  • The shape of the histogram for people who are on food stamps is skewed right.

Here is a summary table for the variable HINCP split by new_FS.

mydata_clean %>%
  distinct(SERIALNO, new_FS, HINCP) %>%
  group_by(new_FS) %>%
  summarize(n = n(),
            min = min(HINCP, na.rm=TRUE),
            median = median(HINCP, na.rm=TRUE),
            mean = mean(HINCP, na.rm=TRUE),
            max = max(HINCP, na.rm=TRUE)) %>%
  kable()

Taking the table and histogram into account, answer the questions.

  1. What is the appropriate measure of spread for distribution of HINCP for people on food stamps?
  • The spread for people with food stamps are from -4600 to 958700
  1. Please explain in context what the measure of center of HINCP for people on food stamps means.
  • The center of measure means that's the average amount of people who have food stamps.
  1. How are the distributions of HINCP for those on food stamps and not on food stamps different and similar?

  2. What do you find surprising for either distribution?

  • What I found surprising was the high number of families who are not on food stamp.
mydata_clean %>%
  distinct(SERIALNO, new_FS, HINCP) %>%
  ggplot(aes(x = new_FS, y = HINCP)) +
    geom_boxplot(outlier.shape = NA) +
    coord_flip() +
    ylim(0,400000)
  1. Do you think the distribution of HINCP is the same or different for people who are on or not on food stamps? Cite as much evidence as possible from the graphs and tables above.

Question 2

Here is a table for the variable JWTR_new which indicates how someone got to work.

mydata_clean %>%
  distinct(SERIALNO, JWTR_new) %>%
  tabyl(JWTR_new) %>%
  adorn_pct_formatting(digits=0) %>%
  kable()
  1. What sort of variable is JWTR_new? Explain how you know.
  • 'JWTR_new' is a categorical variable because the answers are in words.
  1. What does the 46% of row NA mean?
  • 'NA' means no answer.
  1. What is the difference between the percent and valid_percent columns?
  • Valid percent is the percent when missing data are excluded from the calculations.
  1. What does the 5% of row Bus or trolley bus mean?
  • 5% of people take the buss/trolley bus to get to work.

Question 3

Read the document Cell-Phone-Student.md in the Affective-Domain\Cell-Phones directory. Answer the questions below.

  1. Do you think your cell phone habits are hindering your attention and focus? Explain. This should be a paragraph or two long.
  • Yes, I definitely think my cell phone habits are hindering my attention and focus. Whenever I'm working on a homework assignment, sometimes I drift off and start using my phone to browse through social media, and without even knowing, 30 minutes has already gone by. It's a terrible habit that I have but I am working on it.
  1. Do you intend to make any changes to how you use your cellphone? If so, what changes? This should be a paragraph or two long.
  • Yes I do intend to make changes. When working on homework assignments/writing a paper, I set my phone aside facing down on the other side of the room, or I simply just turn it off. This way I don't have to worry about seeing any notifications pop up and I stay focused on my task. The music I listen to can also be a little distracting because it's too lyrical and I end up singing along. To switch it up, lo-fi beats or instrumentals would really help.

Question 4

What is "sampling bias"? Explain using proper terminology and craft your example to explain how it can effect the outcome of a statistical study. This should be several paragraphs long.

  • Sampling bias means that the samples of a random variable that are collected to determine its distribution are selected incorrectly and do not represent the distribution truthfully. What makes a sample biased is when they are SELECTIVELY chosen instead of RANDOMLY. In an unbiased sample, how samples are determined should only result by chance.

  • For example: If we wanted to predict an outcome of the election, we will poll 1000 voters, asking them who they want to vote for. To get an accurate representation, we need to include everybody's views as a whole; including elderly people, young voters, middle-aged voters, ethnic minorities, rich people, etc. True representation means that nobody is left out from the sampling.

  • The most effective method to avoid sampling bias is through a random sample. This provides the same odds for every member of the population to be chosen as a participant in the conducted study.