Random numbers and random sampling
This worksheet is an interactive, guided module for learning the basics of generating a variety of different types of random numbers, and how to do random sampling. It assumes you know how to read datafiles and produce a dataframe containing your data. Let us begin by learning how to generate random numbers. Two of the most useful types of random numbers are those that are: (1) uniformly distributed in some specified interval; and (2) normally distributed, with specified mean and SD. Example: Shown below are several variants of commands to generate both types of random numbers. The basic command for uniformly distributed numbers is runif(). Likewise, the command for normally distributed numbers is rnorm(). Note that all the information following any "#" sign is just to explain what is going on. R ignores anything that follows a "#" sign.- 2.26215060451068
- 0.216705296188593
- 0.640162804629654
- 2.86498950328678
- 2.8263787983451
- 0.851491509238258
- 0.871844316134229
- 0.36544156447053
- 0.701308553805575
- 11.0674621077608
- 13.007689296916
- 12.9938812848226
- 11.9787983835207
- 14.1956856764494
- 16.7709320458723
- 0.300196825364775
- -0.655895508767473
- -0.399298374172015
How about discrete random numbers
Notice that all the above examples produce numbers that are continuously distributed across their range. Such numbers usually contain decimals, and rarely turn out to be nice, round numbers. What if we wanted random integers, say, 12 of them, lying between and 23? The command for doing that is sample, as shown in the following examples- 19
- -3
- 2
- 1
- 20
- -5
- 10
- 21
- 18
- 14
- 1
- 7
- 13
- 20
- -4
- 16
- 0
- -3
- 14
- 17
- 19
- 18
- 10
- 7
The sample command can also be used to randomly
pick from categorical variables. Some examples follow.
- 'H'
- 'H'
- 'T'
[1] "T" "H" "T"
[1] "H" "H" "H"
[1] "H" "T" "T"
[1] "H" "H" "H"
[1] "T" "T" "T"
[1] "T" "H" "H"
[1] "T" "H" "T"
[1] "T" "H" "H"
[1] "T" "H" "H"
[1] "T" "H" "T"
How to pick random samples from datafiles
Another very important use of the sample command is to pick random samples from data sets. R provides relatively straightforward ways to pick simple random samples and stratified random samples from dataframes. Example: The file "test_csv_file.csv" contains data on the employment status, work hours, age, etc., of a group of university students. The following examples show how to get an SRS and stratified random sample from this dataset.
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Response_id | Age | Gender | Employment.Status | Work.Hours | |
---|---|---|---|---|---|
5 | 164457 | 34 | Female | Unemployed | 0.0 |
16 | 165440 | 21 | Female | Part Time | 15.0 |
15 | 165417 | 21 | Male | Part Time | 20.0 |
27 | 166391 | 18 | Female | Part Time | 25.0 |
28 | 166397 | 33 | Female | Full Time | 37.5 |
23 | 166105 | 41 | Male | Full Time | 50.0 |
22 | 165932 | 21 | Female | Unemployed | 0.0 |
Response_id | Age | Gender | Employment.Status | Work.Hours |
---|---|---|---|---|
165793 | 56 | Male | Full Time | 40 |
164573 | 23 | Female | Full Time | 36 |
166415 | 37 | Male | Full Time | 40 |
164417 | 21 | Male | Part Time | 10 |
166389 | 38 | Female | Part Time | 20 |
165417 | 21 | Male | Part Time | 20 |
165345 | 21 | Female | Unemployed | 0 |
165638 | 32 | Female | Unemployed | 0 |
165056 | 30 | Female | Unemployed | 0 |
Exercise 1:
- Generate 9 uniformly distributed random numbers in the range [2, 5].
- Show that the "runif" function does, in fact, produce a uniform distribution of random numbers by plotting a histogram of 500 numbers in the range [2, 5].
- Generate 50 normally distributed random numbers with mean=4.56 and SD=2. Plot a histogram showing your results.
- Toss a fair coin 50 times (using R) and count the number of heads.
Exercise 2:
The file "grades.csv" contains data on midterm scores
and class year for a group of college students. Read the
file and create a dataframe.
- Pick a simple random sample of size 12 from this dataset.
- Next, pick a stratified random sample containing 3 students from each class year.