Introduction to Data Analysis with Computing¶

Lecture 2: Introduction to R; Types of Data¶

Today's Goals and Topics:

Basic computing and arithmetic in R
Lists and Data Frames
Types of Data

Part 1: Basic computing in R¶

1.1. Arithmetic in R¶

We can do basic computations in R, such as adding, subtracting, multiplying, and dividing numbers

12 + 3

12 * 3

12 * 3 - 20 / 4

Order of Operations¶

(12 * 3 - 20) / 4

3**2

1.2. Names¶

Sometimes, we would like to give names to describe the quantities that we are working with

price_of_pie <- 20
number_of_people <- 4
cost_per_person <- price_of_pie / number_of_people

To display the content/ the value stored in each name, we simply type the names:

cost_per_person

That is, names are "labels" or "placeholders" or "storage units". We could store not just numbers, but also text. Make sure to surround text to be stored by a single quotation mark:

student1 <- 'Alex Smith'
student2 <- 'Bob Singh'
student3 <- 'Chen Zhang'

student1

1.3. Functions¶

R allows us to do a lot of things using "functions". We can think of functions in R as "verbs" which we can use to tell R to do a particular task. Just as some verbs in English must be followed by a noun ("transitive verbs") and some don't, some functions in R must take a particular object or input (often called an "argument").

Let's start with simple function: the print() function. It's use is to print the content of a name. For example:

print(cost_per_person)
print(price_of_pie)
print(number_of_people)

[1] 5
[1] 20
[1] 4

Contrast the output above with the output of the cell below, where print() was not used:

cost_per_person
price_of_pie
number_of_people

Notice that the "noun"/object that the function is acting upon is placed inside the pair of parenthes that come directly after the function name (without space between the function and the open parenthesis.)

New Functions. Here are a couple other R functions that helps us does arithmetic:

sqrt(): takes the square root of a number
abs(): takes the absolute value of a number

1.4. Comments¶

As your R code becomes more and more involved, it is important to make sure that you and others understand what exactly the code does. To do this, we want to add additional explanation (in english) that we want R to ignore computationally. This additional explanation can be added as "comments" in R. For example:

price_of_pie <- 20
number_of_people <- 4
# To compute cost per person, divide the price of pie by the number of people:
cost_per_person <- price_of_pie / number_of_people

In the above cell, any text to the right of the # sign is ignored by R. Any text that is preceded by # is a comment.

Part 2: Grouping values together: Lists and Data Frames¶

2.1. Lists of Values¶

Sometimes, we need to work not just with one number but a collection of numbers; in R, these collections of numbers are called lists.

height_Alex <- 72
height_Bob <- 65
height_Chen <- 59

A New Function. We use the function c() to concatenate (i.e., to chain together) several different values into one object. See the example below, where we store the heights of the three students into one list, which we name height:

height <- c(height_Alex, height_Bob, height_Chen)
print(height)

[1] 72 65 59

Exercise Create a list of all integers from 1 to 10 and name this list integers10. Then, print the contents of this list.

integers10 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
print(integers10)

 [1]  1  2  3  4  5  6  7  8  9 10

A New R Command. Here is a second way to create a list containing consecutive integers: firstInteger:lastInteger.

For example, instead of using c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) to create a list of all integers from 1 to 10, we could have created the same list using the following command: 1:10, which is more concise. Try it below and name this list integers10_version2:

integers10_version2 <- ...

Error in eval(expr, envir, enclos): '...' used in an incorrect context
Traceback:

This is particularly useful if you want to create a very long list. For example, if we want to create a list of all integers from -100 to 100:

my_list <- -100:100
print(my_list)

  [1] -100  -99  -98  -97  -96  -95  -94  -93  -92  -91  -90  -89  -88  -87  -86
 [16]  -85  -84  -83  -82  -81  -80  -79  -78  -77  -76  -75  -74  -73  -72  -71
 [31]  -70  -69  -68  -67  -66  -65  -64  -63  -62  -61  -60  -59  -58  -57  -56
 [46]  -55  -54  -53  -52  -51  -50  -49  -48  -47  -46  -45  -44  -43  -42  -41
 [61]  -40  -39  -38  -37  -36  -35  -34  -33  -32  -31  -30  -29  -28  -27  -26
 [76]  -25  -24  -23  -22  -21  -20  -19  -18  -17  -16  -15  -14  -13  -12  -11
 [91]  -10   -9   -8   -7   -6   -5   -4   -3   -2   -1    0    1    2    3    4
[106]    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19
[121]   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34
[136]   35   36   37   38   39   40   41   42   43   44   45   46   47   48   49
[151]   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64
[166]   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79
[181]   80   81   82   83   84   85   86   87   88   89   90   91   92   93   94
[196]   95   96   97   98   99  100

my_list2 <- c(my_list, height)
print(my_list2)

  [1] -100  -99  -98  -97  -96  -95  -94  -93  -92  -91  -90  -89  -88  -87  -86
 [16]  -85  -84  -83  -82  -81  -80  -79  -78  -77  -76  -75  -74  -73  -72  -71
 [31]  -70  -69  -68  -67  -66  -65  -64  -63  -62  -61  -60  -59  -58  -57  -56
 [46]  -55  -54  -53  -52  -51  -50  -49  -48  -47  -46  -45  -44  -43  -42  -41
 [61]  -40  -39  -38  -37  -36  -35  -34  -33  -32  -31  -30  -29  -28  -27  -26
 [76]  -25  -24  -23  -22  -21  -20  -19  -18  -17  -16  -15  -14  -13  -12  -11
 [91]  -10   -9   -8   -7   -6   -5   -4   -3   -2   -1    0    1    2    3    4
[106]    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19
[121]   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34
[136]   35   36   37   38   39   40   41   42   43   44   45   46   47   48   49
[151]   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64
[166]   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79
[181]   80   81   82   83   84   85   86   87   88   89   90   91   92   93   94
[196]   95   96   97   98   99  100   72   65   59

We can also store text data in a list.

student_names <- c(student1, student2, student3)
print(student_names)

[1] "Alex Smith" "Bob Singh"  "Chen Zhang"

2.2. Data Frames¶

Data Frames are basically tables, or spreadsheets, of data. Each column of a data frame corresponds to a "variable"; Each row of a data frame corresponds to one observation/one individual.

weight_Alex <- 150
weight_Bob <- 180
weight_Chen <- 110
weight <- c(weight_Alex, weight_Bob, weight_Chen)
print(weight)

[1] 150 180 110

studentdata <- as.data.frame(cbind(weight, height) )
print(studentdata)

  weight height
1    150     72
2    180     65
3    110     59

Note that the names of the two lists (weight and height) are now the names of the two columns in the data frame.

names(studentdata) # the function names() displays the names of the columns of a data frame

row.names(studentdata) # the function row.names() displays the names of the rows of a data frame

row.names(studentdata) <- student_names
studentdata

Accessing a column of a data frame¶

Each column of a data frame is simply a list! Given a data frame, to obtain a list containing just one of its columns is easy. We do this using the $ symbol followed by the name of the column.

studentdata$height

2.2.1. Adding New Columns to a data frame¶

Last lecture, we had an example of a student data set that contains weight, height, major, and whether students have taken "Text and Ideas". Let's add the majors and "have taken text and ideas" columns into this data frame.

To create a new column, simply type the data frame name, followed by the $ symbol and the new column name; then, store the values of the new column there.

studentdata$majors <- c('Music', 'Psychology', 'Linguistics')

studentdata

studentdata$haveTakenTextAndIdeas <- c('Yes', 'Yes', 'No')
studentdata

Note that the first two columns of the studentdata data frame contains numerical data whereas the last two columns are text data.

We will talk about different data types in more detail in a bit. However, this is a good chance to introduce a new function:

A New Function The class() function tells us the type of data that a particular name represents.

For example, using class(), we will find that

studentdata is a data frame
studentdata$weight is a list containing numbers, so this is a numerical data
studentdata$majors is a list containing text. In R, text data is called "character" (because text consists of characters)

class(studentdata)

class(studentdata$weight)

class(studentdata$majors)

2.2.2. Built-In Datasets¶

R comes with some datasets that are ready for us to explore. One such built-in datasets is the women dataset.

women

head(women, 5)

dim(women)
row.names(women)

We saw that we can put together lists of the same length into a dataframe. We can also (1) extract each column of a dataframe to get a list, (2) extract just one entry in the data frame to get a number

print(women$height)

 [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

women$height
women$height[3]

New Functions Here is a summary of new functions that are useful for examining and working with data frames:

as.data.frame(cbind()): to "bind" lists together to form the columns of a new data frame
names(): to find out the names of the columns of a data frame. It returns a list containing the names of the columns
row.names(): to find out the names of the rows of a data frame
dim(): to find the number of rows and columns of a data frame (that is, to find the "dimension" of the data frame)
head(): to display the first few rows of a data frame. It takes two arguments: the name of the data frame and the number of rows to be displayed

Part 3: Types of Data¶

3.1. Numerical Data¶

We saw an example of numerical data in the women dataset, which contains the weight and height of 15 women in the US. Weight and height are both numbers.

Some numerical data are integers, some are decimals or fractions.

head(women, 5)

class(women)

class(women$height)

3.2. Character/Text Data¶

This is data that are just texts. For example, suppose that in the studentdata data set above, the students' majors are text data:

studentdata
class(studentdata$majors)

3.3. Categorical Data¶

Some data are "categorical". For example, in the 'studentdata' dataset above, majors contains text data. However, we could think of it as a category as well: each student fall into one of a number of possible categories. Sometimes, it is a good idea to tell R explicitly that a given set of text data actually represents categories instead of simply a string of alphabets.

A New Function We can tell R explictly that a column's text data is actually categorical using factor(), as follows:

factor(studentdata$majors)
class(factor(studentdata$majors))

Note that while studentdata$majors is text data, factor(studentdata$majors) treats the different texts/words as categories.

Suppose that it is useful to think of majors as categories as opposed to simply a string of alphabets. We can replace studentdata$majors with factor(studentdata$majors):

# We replace the text data stored in the `major` column with 
studentdata$majors <- factor(studentdata$majors)
class(studentdata$majors)

Another example: In the chickwts dataset below, we record the weight as well as the type of feed given to each chicken. The weight column contains numbers but the feed column contains the type (i.e., the category) of feed. In this particular dataset, one category of feed is horsebean

head(chickwts, 5)

We might wonder, how many different categories of feeds are there in this data set? That is, can we quickly find out what are the other possible types of feed given to the chickens in this data set?

We could do this using the function levels(dataframe$columnname), as follows:

levels(chickwts$feed)

As you can see above, there are six categories of feed.

While it might not be so obvious why we care about the distinction between text data vs. categorical data, keep in mind that this distinction is important. It will make more sense why as we work with more and more examples and datasets.

3.4. Logical Data¶

Logical data are data whose values are either TRUE or FALSE.

For example, in our studentdata data set, the column on whether each student has taken "Text and Ideas" contain a True/False information ("yes" or "no").

studentdata$haveTakenTextAndIdeas
class(studentdata$haveTakenTextAndIdeas)

Currently, the yes and no's are treated as plain text data. In order to tell R to treat them as logical data, let's replace each 'yes' with TRUE and each 'no' with FALSE:

studentdata$haveTakenTextAndIdeas <- c(TRUE, TRUE, FALSE)
studentdata

class(studentdata$haveTakenTextAndIdeas)

(Again, it might not be so obvious why it is useful or important to replace the "yes" and "no"s with TRUE and FALSE values, the distinction between text and logical data is important. By storing these as logical data, we can do more than if they are simply text data.)

Examples¶

Now that we have been more acquainted with how R works and how various types of data can be stored in R, let's look at an example of a real (and large) data set.

The `nycflights13` package and dataset¶

This dataset contains data of ALL flights that departed from one of the three NYC-area airports (JFK, LaGuardia, and Newark) in the year 2013. We will do a bit of exploration of this dataset using tools that we learn today.

We first need to install the package and load it so that R can access it and work with it.

install.packages('nycflights13')

Installing package into ‘/home/user/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)

library('nycflights13')

The package nycflights13 contains a data frame called flights

flights

Hmm, this data frame looks huge. Let's see how many rows and columns it has using dim().

dim(flights)

It looks that among the nineteen columns, some contain numerical data and some text data. Let's check the data types of the various columns using class()

class(flights$arr_time)

tail(flights, 5)

airlines

df <- merge(flights, airlines, by="carrier")

head(df)

names(df)[20] <- 'airline'

head(df)

df[order(df$arr_delay), ]

year	month	day	dep_time	sched_dep_time	dep_delay	arr_time	sched_arr_time	arr_delay	carrier	flight	tailnum	origin	dest	air_time	distance	hour	minute	time_hour
2013	1	1	517	515	2	830	819	11	UA	1545	N14228	EWR	IAH	227	1400	5	15	2013-01-01 05:00:00
2013	1	1	533	529	4	850	830	20	UA	1714	N24211	LGA	IAH	227	1416	5	29	2013-01-01 05:00:00
2013	1	1	542	540	2	923	850	33	AA	1141	N619AA	JFK	MIA	160	1089	5	40	2013-01-01 05:00:00
2013	1	1	544	545	-1	1004	1022	-18	B6	725	N804JB	JFK	BQN	183	1576	5	45	2013-01-01 05:00:00
2013	1	1	554	600	-6	812	837	-25	DL	461	N668DN	LGA	ATL	116	762	6	0	2013-01-01 06:00:00
2013	1	1	554	558	-4	740	728	12	UA	1696	N39463	EWR	ORD	150	719	5	58	2013-01-01 05:00:00
2013	1	1	555	600	-5	913	854	19	B6	507	N516JB	EWR	FLL	158	1065	6	0	2013-01-01 06:00:00
2013	1	1	557	600	-3	709	723	-14	EV	5708	N829AS	LGA	IAD	53	229	6	0	2013-01-01 06:00:00
2013	1	1	557	600	-3	838	846	-8	B6	79	N593JB	JFK	MCO	140	944	6	0	2013-01-01 06:00:00
2013	1	1	558	600	-2	753	745	8	AA	301	N3ALAA	LGA	ORD	138	733	6	0	2013-01-01 06:00:00
2013	1	1	558	600	-2	849	851	-2	B6	49	N793JB	JFK	PBI	149	1028	6	0	2013-01-01 06:00:00
2013	1	1	558	600	-2	853	856	-3	B6	71	N657JB	JFK	TPA	158	1005	6	0	2013-01-01 06:00:00
2013	1	1	558	600	-2	924	917	7	UA	194	N29129	JFK	LAX	345	2475	6	0	2013-01-01 06:00:00
2013	1	1	558	600	-2	923	937	-14	UA	1124	N53441	EWR	SFO	361	2565	6	0	2013-01-01 06:00:00
2013	1	1	559	600	-1	941	910	31	AA	707	N3DUAA	LGA	DFW	257	1389	6	0	2013-01-01 06:00:00
2013	1	1	559	559	0	702	706	-4	B6	1806	N708JB	JFK	BOS	44	187	5	59	2013-01-01 05:00:00
2013	1	1	559	600	-1	854	902	-8	UA	1187	N76515	EWR	LAS	337	2227	6	0	2013-01-01 06:00:00
2013	1	1	600	600	0	851	858	-7	B6	371	N595JB	LGA	FLL	152	1076	6	0	2013-01-01 06:00:00
2013	1	1	600	600	0	837	825	12	MQ	4650	N542MQ	LGA	ATL	134	762	6	0	2013-01-01 06:00:00
2013	1	1	601	600	1	844	850	-6	B6	343	N644JB	EWR	PBI	147	1023	6	0	2013-01-01 06:00:00
2013	1	1	602	610	-8	812	820	-8	DL	1919	N971DL	LGA	MSP	170	1020	6	10	2013-01-01 06:00:00
2013	1	1	602	605	-3	821	805	16	MQ	4401	N730MQ	LGA	DTW	105	502	6	5	2013-01-01 06:00:00
2013	1	1	606	610	-4	858	910	-12	AA	1895	N633AA	EWR	MIA	152	1085	6	10	2013-01-01 06:00:00
2013	1	1	606	610	-4	837	845	-8	DL	1743	N3739P	JFK	ATL	128	760	6	10	2013-01-01 06:00:00
2013	1	1	607	607	0	858	915	-17	UA	1077	N53442	EWR	MIA	157	1085	6	7	2013-01-01 06:00:00
2013	1	1	608	600	8	807	735	32	MQ	3768	N9EAMQ	EWR	ORD	139	719	6	0	2013-01-01 06:00:00
2013	1	1	611	600	11	945	931	14	UA	303	N532UA	JFK	SFO	366	2586	6	0	2013-01-01 06:00:00
2013	1	1	613	610	3	925	921	4	B6	135	N635JB	JFK	RSW	175	1074	6	10	2013-01-01 06:00:00
2013	1	1	615	615	0	1039	1100	-21	B6	709	N794JB	JFK	SJU	182	1598	6	15	2013-01-01 06:00:00
2013	1	1	615	615	0	833	842	-9	DL	575	N326NB	EWR	ATL	120	746	6	15	2013-01-01 06:00:00
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮
2013	9	30	2123	2125	-2	2223	2247	-24	EV	5489	N712EV	LGA	CHO	45	305	21	25	2013-09-30 21:00:00
2013	9	30	2127	2129	-2	2314	2323	-9	EV	3833	N16546	EWR	CLT	72	529	21	29	2013-09-30 21:00:00
2013	9	30	2128	2130	-2	2328	2359	-31	B6	97	N807JB	JFK	DEN	213	1626	21	30	2013-09-30 21:00:00
2013	9	30	2129	2059	30	2230	2232	-2	EV	5048	N751EV	LGA	RIC	45	292	20	59	2013-09-30 20:00:00
2013	9	30	2131	2140	-9	2225	2255	-30	MQ	3621	N807MQ	JFK	DCA	36	213	21	40	2013-09-30 21:00:00
2013	9	30	2140	2140	0	10	40	-30	AA	185	N335AA	JFK	LAX	298	2475	21	40	2013-09-30 21:00:00
2013	9	30	2142	2129	13	2250	2239	11	EV	4509	N12957	EWR	PWM	47	284	21	29	2013-09-30 21:00:00
2013	9	30	2145	2145	0	115	140	-25	B6	1103	N633JB	JFK	SJU	192	1598	21	45	2013-09-30 21:00:00
2013	9	30	2147	2137	10	30	27	3	B6	1371	N627JB	LGA	FLL	139	1076	21	37	2013-09-30 21:00:00
2013	9	30	2149	2156	-7	2245	2308	-23	UA	523	N813UA	EWR	BOS	37	200	21	56	2013-09-30 21:00:00
2013	9	30	2150	2159	-9	2250	2306	-16	EV	3842	N10575	EWR	MHT	39	209	21	59	2013-09-30 21:00:00
2013	9	30	2159	1845	194	2344	2030	194	9E	3320	N906XJ	JFK	BUF	50	301	18	45	2013-09-30 18:00:00
2013	9	30	2203	2205	-2	2339	2331	8	EV	5311	N722EV	LGA	BGR	61	378	22	5	2013-09-30 22:00:00
2013	9	30	2207	2140	27	2257	2250	7	MQ	3660	N532MQ	LGA	BNA	97	764	21	40	2013-09-30 21:00:00
2013	9	30	2211	2059	72	2339	2242	57	EV	4672	N12145	EWR	STL	120	872	20	59	2013-09-30 20:00:00
2013	9	30	2231	2245	-14	2335	2356	-21	B6	108	N193JB	JFK	PWM	48	273	22	45	2013-09-30 22:00:00
2013	9	30	2233	2113	80	112	30	42	UA	471	N578UA	EWR	SFO	318	2565	21	13	2013-09-30 21:00:00
2013	9	30	2235	2001	154	59	2249	130	B6	1083	N804JB	JFK	MCO	123	944	20	1	2013-09-30 20:00:00
2013	9	30	2237	2245	-8	2345	2353	-8	B6	234	N318JB	JFK	BTV	43	266	22	45	2013-09-30 22:00:00
2013	9	30	2240	2245	-5	2334	2351	-17	B6	1816	N354JB	JFK	SYR	41	209	22	45	2013-09-30 22:00:00
2013	9	30	2240	2250	-10	2347	7	-20	B6	2002	N281JB	JFK	BUF	52	301	22	50	2013-09-30 22:00:00
2013	9	30	2241	2246	-5	2345	1	-16	B6	486	N346JB	JFK	ROC	47	264	22	46	2013-09-30 22:00:00
2013	9	30	2307	2255	12	2359	2358	1	B6	718	N565JB	JFK	BOS	33	187	22	55	2013-09-30 22:00:00
2013	9	30	2349	2359	-10	325	350	-25	B6	745	N516JB	JFK	PSE	196	1617	23	59	2013-09-30 23:00:00
2013	9	30	NA	1842	NA	NA	2019	NA	EV	5274	N740EV	LGA	BNA	NA	764	18	42	2013-09-30 18:00:00
2013	9	30	NA	1455	NA	NA	1634	NA	9E	3393	NA	JFK	DCA	NA	213	14	55	2013-09-30 14:00:00
2013	9	30	NA	2200	NA	NA	2312	NA	9E	3525	NA	LGA	SYR	NA	198	22	0	2013-09-30 22:00:00
2013	9	30	NA	1210	NA	NA	1330	NA	MQ	3461	N535MQ	LGA	BNA	NA	764	12	10	2013-09-30 12:00:00
2013	9	30	NA	1159	NA	NA	1344	NA	MQ	3572	N511MQ	LGA	CLE	NA	419	11	59	2013-09-30 11:00:00
2013	9	30	NA	840	NA	NA	1020	NA	MQ	3531	N839MQ	LGA	RDU	NA	431	8	40	2013-09-30 08:00:00

year	month	day	dep_time	sched_dep_time	dep_delay	arr_time	sched_arr_time	arr_delay	carrier	flight	tailnum	origin	dest	air_time	distance	hour	minute	time_hour
2013	9	30	NA	1455	NA	NA	1634	NA	9E	3393	NA	JFK	DCA	NA	213	14	55	2013-09-30 14:00:00
2013	9	30	NA	2200	NA	NA	2312	NA	9E	3525	NA	LGA	SYR	NA	198	22	0	2013-09-30 22:00:00
2013	9	30	NA	1210	NA	NA	1330	NA	MQ	3461	N535MQ	LGA	BNA	NA	764	12	10	2013-09-30 12:00:00
2013	9	30	NA	1159	NA	NA	1344	NA	MQ	3572	N511MQ	LGA	CLE	NA	419	11	59	2013-09-30 11:00:00
2013	9	30	NA	840	NA	NA	1020	NA	MQ	3531	N839MQ	LGA	RDU	NA	431	8	40	2013-09-30 08:00:00

carrier	name
9E	Endeavor Air Inc.
AA	American Airlines Inc.
AS	Alaska Airlines Inc.
B6	JetBlue Airways
DL	Delta Air Lines Inc.
EV	ExpressJet Airlines Inc.
F9	Frontier Airlines Inc.
FL	AirTran Airways Corporation
HA	Hawaiian Airlines Inc.
MQ	Envoy Air
OO	SkyWest Airlines Inc.
UA	United Air Lines Inc.
US	US Airways Inc.
VX	Virgin America
WN	Southwest Airlines Co.
YV	Mesa Airlines Inc.

carrier	year	month	day	dep_time	sched_dep_time	dep_delay	arr_time	sched_arr_time	arr_delay	flight	tailnum	origin	dest	air_time	distance	hour	minute	time_hour	name
9E	2013	2	5	827	830	-3	1032	1023	9	4220	N8698A	JFK	RDU	78	427	8	30	2013-02-05 08:00:00	Endeavor Air Inc.
9E	2013	8	23	1901	1905	-4	2051	2103	-12	3360	N926XJ	JFK	PIT	61	340	19	5	2013-08-23 19:00:00	Endeavor Air Inc.
9E	2013	6	2	805	810	-5	949	1027	-38	3538	N925XJ	JFK	MSP	145	1029	8	10	2013-06-02 08:00:00	Endeavor Air Inc.
9E	2013	10	26	2139	1935	124	2358	2145	133	3470	N928XJ	JFK	CVG	102	589	19	35	2013-10-26 19:00:00	Endeavor Air Inc.
9E	2013	7	7	NA	2030	NA	NA	2156	NA	4218	NA	JFK	PHL	NA	94	20	30	2013-07-07 20:00:00	Endeavor Air Inc.
9E	2013	2	18	1459	1505	-6	1621	1637	-16	3393	N910XJ	JFK	DCA	46	213	15	5	2013-02-18 15:00:00	Endeavor Air Inc.

carrier	year	month	day	dep_time	sched_dep_time	dep_delay	arr_time	sched_arr_time	arr_delay	flight	tailnum	origin	dest	air_time	distance	hour	minute	time_hour	airline
9E	2013	2	5	827	830	-3	1032	1023	9	4220	N8698A	JFK	RDU	78	427	8	30	2013-02-05 08:00:00	Endeavor Air Inc.
9E	2013	8	23	1901	1905	-4	2051	2103	-12	3360	N926XJ	JFK	PIT	61	340	19	5	2013-08-23 19:00:00	Endeavor Air Inc.
9E	2013	6	2	805	810	-5	949	1027	-38	3538	N925XJ	JFK	MSP	145	1029	8	10	2013-06-02 08:00:00	Endeavor Air Inc.
9E	2013	10	26	2139	1935	124	2358	2145	133	3470	N928XJ	JFK	CVG	102	589	19	35	2013-10-26 19:00:00	Endeavor Air Inc.
9E	2013	7	7	NA	2030	NA	NA	2156	NA	4218	NA	JFK	PHL	NA	94	20	30	2013-07-07 20:00:00	Endeavor Air Inc.
9E	2013	2	18	1459	1505	-6	1621	1637	-16	3393	N910XJ	JFK	DCA	46	213	15	5	2013-02-18 15:00:00	Endeavor Air Inc.

	weight	height	majors
Alex Smith	150	72	Music
Bob Singh	180	65	Psychology
Chen Zhang	110	59	Linguistics

height	weight
58	115
59	117
60	120
61	123
62	126
63	129
64	132
65	135
66	139
67	142
68	146
69	150
70	154
71	159
72	164

weight	feed
179	horsebean
160	horsebean
136	horsebean
227	horsebean
217	horsebean

	weight	height	majors	haveTakenTextAndIdeas
Alex Smith	150	72	Music	TRUE
Bob Singh	180	65	Psychology	TRUE
Chen Zhang	110	59	Linguistics	FALSE

	carrier	year	month	day	dep_time	sched_dep_time	dep_delay	arr_time	sched_arr_time	arr_delay	flight	tailnum	origin	dest	air_time	distance	hour	minute	time_hour	airline
322787	VX	2013	5	7	1715	1729	-14	1944	2110	-86	193	N843VA	EWR	SFO	315	2565	17	29	2013-05-07 17:00:00	Virgin America
320759	VX	2013	5	20	719	735	-16	951	1110	-79	11	N840VA	JFK	SFO	316	2586	7	35	2013-05-20 07:00:00	Virgin America
40461	AA	2013	5	6	1826	1830	-4	2045	2200	-75	269	N3KCAA	JFK	SEA	289	2422	18	30	2013-05-06 18:00:00	American Airlines Inc.
257628	UA	2013	5	2	1947	1949	-2	2209	2324	-75	612	N851UA	EWR	LAX	300	2454	19	49	2013-05-02 19:00:00	United Air Lines Inc.
51525	AS	2013	5	4	1816	1820	-4	2017	2131	-74	7	N551AS	EWR	SEA	281	2402	18	20	2013-05-04 18:00:00	Alaska Airlines Inc.
287137	UA	2013	5	2	1926	1929	-3	2157	2310	-73	1628	N24212	EWR	SFO	314	2565	19	29	2013-05-02 19:00:00	United Air Lines Inc.
63392	B6	2013	5	13	657	700	-3	908	1019	-71	671	N805JB	JFK	LAX	290	2475	7	0	2013-05-13 07:00:00	JetBlue Airways
132997	DL	2013	5	6	1753	1755	-2	2004	2115	-71	1394	N3760C	JFK	PDX	283	2454	17	55	2013-05-06 17:00:00	Delta Air Lines Inc.
291801	UA	2013	5	7	2054	2055	-1	2317	28	-71	622	N806UA	EWR	SFO	309	2565	20	55	2013-05-07 20:00:00	United Air Lines Inc.
73875	B6	2013	5	13	1801	1805	-4	2018	2128	-70	217	N663JB	JFK	LGB	295	2465	18	5	2013-05-13 18:00:00	JetBlue Airways
213026	HA	2013	2	11	857	900	-3	1430	1540	-70	51	N389HA	JFK	HNL	601	4983	9	0	2013-02-11 09:00:00	Hawaiian Airlines Inc.
265352	UA	2013	2	26	1721	1725	-4	1936	2046	-70	385	N855UA	EWR	PDX	294	2434	17	25	2013-02-26 17:00:00	United Air Lines Inc.
283072	UA	2013	2	28	702	705	-3	924	1034	-70	963	N831UA	EWR	SNA	306	2434	7	5	2013-02-28 07:00:00	United Air Lines Inc.
284804	UA	2013	2	26	1335	1335	0	1819	1929	-70	15	N76065	EWR	HNL	566	4963	13	35	2013-02-26 13:00:00	United Air Lines Inc.
286098	UA	2013	5	13	1624	1629	-5	1831	1941	-70	789	N855UA	EWR	LAX	290	2454	16	29	2013-05-13 16:00:00	United Air Lines Inc.
313668	US	2013	5	3	616	630	-14	803	913	-70	195	N507AY	JFK	PHX	266	2153	6	30	2013-05-03 06:00:00	US Airways Inc.
322941	VX	2013	1	4	1026	1030	-4	1305	1415	-70	23	N855VA	JFK	SFO	324	2586	10	30	2013-01-04 10:00:00	Virgin America
43629	AA	2013	2	26	1827	1830	-3	2056	2205	-69	269	N3EAAA	JFK	SEA	308	2422	18	30	2013-02-26 18:00:00	American Airlines Inc.
46050	AA	2013	5	13	855	900	-5	1116	1225	-69	1	N328AA	JFK	LAX	299	2475	9	0	2013-05-13 09:00:00	American Airlines Inc.
108686	DL	2013	2	27	1858	1900	-2	2152	2301	-69	1967	N704X	JFK	SFO	329	2586	19	0	2013-02-27 19:00:00	Delta Air Lines Inc.
116263	DL	2013	2	28	1855	1900	-5	2152	2301	-69	1967	N705TW	JFK	SFO	331	2586	19	0	2013-02-28 19:00:00	Delta Air Lines Inc.
285618	UA	2013	5	4	1914	1915	-1	2107	2216	-69	1557	N36447	EWR	LAS	276	2227	19	15	2013-05-04 19:00:00	United Air Lines Inc.
322427	VX	2013	2	26	1022	1030	-8	1306	1415	-69	23	N846VA	JFK	SFO	327	2586	10	30	2013-02-26 10:00:00	Virgin America
323553	VX	2013	5	12	721	730	-9	956	1105	-69	183	N852VA	EWR	SFO	318	2565	7	30	2013-05-12 07:00:00	Virgin America
2781	9E	2013	5	6	1846	1859	-13	2026	2134	-68	3403	N922XJ	JFK	MCI	138	1113	18	59	2013-05-06 18:00:00	Endeavor Air Inc.
11513	9E	2013	8	20	1555	1559	-4	1720	1828	-68	3540	N905XJ	JFK	MSP	133	1029	15	59	2013-08-20 15:00:00	Endeavor Air Inc.
47876	AA	2013	9	7	1550	1600	-10	1757	1905	-68	1156	N3EHAA	LGA	DFW	171	1389	16	0	2013-09-07 16:00:00	American Airlines Inc.
111907	DL	2013	4	30	1440	1445	-5	1711	1819	-68	963	N713TW	JFK	LAX	308	2475	14	45	2013-04-30 14:00:00	Delta Air Lines Inc.
139164	DL	2013	3	1	2014	2020	-6	2220	2328	-68	1729	N694DL	JFK	LAS	283	2248	20	20	2013-03-01 20:00:00	Delta Air Lines Inc.
140618	DL	2013	2	26	1918	1925	-7	2155	2303	-68	6	N3768	JFK	SLC	246	1990	19	25	2013-02-26 19:00:00	Delta Air Lines Inc.
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮
336511	YV	2013	5	23	NA	1735	NA	NA	1937	NA	2751	N912FJ	LGA	CLT	NA	544	17	35	2013-05-23 17:00:00	Mesa Airlines Inc.
336514	YV	2013	6	25	NA	1735	NA	NA	1937	NA	2751	N935LR	LGA	CLT	NA	544	17	35	2013-06-25 17:00:00	Mesa Airlines Inc.
336523	YV	2013	12	17	NA	1637	NA	NA	1800	NA	3771	N503MJ	LGA	IAD	NA	229	16	37	2013-12-17 16:00:00	Mesa Airlines Inc.
336530	YV	2013	2	8	NA	1435	NA	NA	1559	NA	3750	N516LR	LGA	IAD	NA	229	14	35	2013-02-08 14:00:00	Mesa Airlines Inc.
336531	YV	2013	2	8	NA	1602	NA	NA	1722	NA	3771	N519LR	LGA	IAD	NA	229	16	2	2013-02-08 16:00:00	Mesa Airlines Inc.
336536	YV	2013	6	28	NA	1735	NA	NA	1937	NA	2751	N924FJ	LGA	CLT	NA	544	17	35	2013-06-28 17:00:00	Mesa Airlines Inc.
336537	YV	2013	6	28	NA	1617	NA	NA	1744	NA	3771	N509MJ	LGA	IAD	NA	229	16	17	2013-06-28 16:00:00	Mesa Airlines Inc.
336550	YV	2013	12	10	NA	1637	NA	NA	1800	NA	3771	N514MJ	LGA	IAD	NA	229	16	37	2013-12-10 16:00:00	Mesa Airlines Inc.
336575	YV	2013	12	9	1749	1637	72	NA	1800	NA	3771	N510MJ	LGA	IAD	NA	229	16	37	2013-12-09 16:00:00	Mesa Airlines Inc.
336582	YV	2013	10	21	NA	1735	NA	NA	1946	NA	2751	N918FJ	LGA	CLT	NA	544	17	35	2013-10-21 17:00:00	Mesa Airlines Inc.
336597	YV	2013	6	24	NA	1735	NA	NA	1937	NA	2751	N935LR	LGA	CLT	NA	544	17	35	2013-06-24 17:00:00	Mesa Airlines Inc.
336618	YV	2013	12	5	NA	1150	NA	NA	1406	NA	2885	N942LR	LGA	CLT	NA	544	11	50	2013-12-05 11:00:00	Mesa Airlines Inc.
336621	YV	2013	12	5	NA	1637	NA	NA	1800	NA	3771	N519LR	LGA	IAD	NA	229	16	37	2013-12-05 16:00:00	Mesa Airlines Inc.
336623	YV	2013	8	1	NA	1735	NA	NA	1937	NA	2751	N920FJ	LGA	CLT	NA	544	17	35	2013-08-01 17:00:00	Mesa Airlines Inc.
336624	YV	2013	8	1	NA	1605	NA	NA	1732	NA	3771	N507MJ	LGA	IAD	NA	229	16	5	2013-08-01 16:00:00	Mesa Airlines Inc.
336626	YV	2013	7	22	NA	1735	NA	NA	1937	NA	2751	N909FJ	LGA	CLT	NA	544	17	35	2013-07-22 17:00:00	Mesa Airlines Inc.
336627	YV	2013	7	22	NA	1605	NA	NA	1732	NA	3771	N508MJ	LGA	IAD	NA	229	16	5	2013-07-22 16:00:00	Mesa Airlines Inc.
336639	YV	2013	1	30	NA	1602	NA	NA	1722	NA	3771	N503MJ	LGA	IAD	NA	229	16	2	2013-01-30 16:00:00	Mesa Airlines Inc.
336644	YV	2013	12	8	NA	1637	NA	NA	1800	NA	3771	N508MJ	LGA	IAD	NA	229	16	37	2013-12-08 16:00:00	Mesa Airlines Inc.
336663	YV	2013	7	23	NA	1136	NA	NA	1338	NA	2651	N916FJ	LGA	CLT	NA	544	11	36	2013-07-23 11:00:00	Mesa Airlines Inc.
336664	YV	2013	7	23	NA	1605	NA	NA	1732	NA	3771	N513MJ	LGA	IAD	NA	229	16	5	2013-07-23 16:00:00	Mesa Airlines Inc.
336672	YV	2013	12	10	NA	1150	NA	NA	1406	NA	2885	N930LR	LGA	CLT	NA	544	11	50	2013-12-10 11:00:00	Mesa Airlines Inc.
336680	YV	2013	4	19	NA	1603	NA	NA	1730	NA	3790	N519LR	LGA	IAD	NA	229	16	3	2013-04-19 16:00:00	Mesa Airlines Inc.
336708	YV	2013	7	1	NA	1735	NA	NA	1937	NA	2751	N922FJ	LGA	CLT	NA	544	17	35	2013-07-01 17:00:00	Mesa Airlines Inc.
336742	YV	2013	1	11	NA	1435	NA	NA	1559	NA	3750	N518LR	LGA	IAD	NA	229	14	35	2013-01-11 14:00:00	Mesa Airlines Inc.
336748	YV	2013	10	7	NA	1735	NA	NA	1946	NA	2751	N926LR	LGA	CLT	NA	544	17	35	2013-10-07 17:00:00	Mesa Airlines Inc.
336749	YV	2013	10	7	NA	1629	NA	NA	1750	NA	3771	N510MJ	LGA	IAD	NA	229	16	29	2013-10-07 16:00:00	Mesa Airlines Inc.
336758	YV	2013	1	13	NA	1605	NA	NA	1729	NA	3771	N502MJ	LGA	IAD	NA	229	16	5	2013-01-13 16:00:00	Mesa Airlines Inc.
336765	YV	2013	6	13	NA	1617	NA	NA	1744	NA	3771	N509MJ	LGA	IAD	NA	229	16	17	2013-06-13 16:00:00	Mesa Airlines Inc.
336775	YV	2013	8	8	NA	1605	NA	NA	1732	NA	3771	N503MJ	LGA	IAD	NA	229	16	5	2013-08-08 16:00:00	Mesa Airlines Inc.

height	weight
58	115
59	117
60	120
61	123
62	126
63	129
64	132
65	135
66	139
67	142
68	146
69	150
70	154
71	159
72	164