Question 4 Compare air_time with arr_time - dep_time. What do you expect to see? What do you see? What do you need to do to fix it?
I expect to see an error because air_time is a double, and arr_time & dep_time are integers. The arr_time is a 24 hour time format, but dep_time is calcuated with repect to midnight. When you attempt arr_time-dep_time, your answer would be wrong.
In order to fix this problem, one solution would be to convert arr_time & dep_time into a standardized time format.
Question 5a Consider number of cancelled flights. Deterimine the definition of a flight cancellation.
As seen above, there are no flights that arrived but did not depart, so we can just use the !is.na(dep_delay)
Question 5b Find the pattern of cancelled flights in relation to average delay. The canx/avg_delay shows a strong correlation between cancellations and delay; if one is high then the other is likely to be as well.