Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Views: 546
Kernel: Python 3

Modeling Flight Delays

The Question

What is the best way to increase the number of flights without delays? We will model airplane traffic between several airports and test two different modeling strategies to avoid flight delays and maintain flight turnaround efficiency.

# Configure Jupyter so figures appear in the notebook %matplotlib inline # Configure Jupyter to display the assigned value after an assignment %config InteractiveShell.ast_node_interactivity='last_expr_or_assign' # import functions from the modsim library from modsim import * # set the random number generator np.random.seed(7) import random import pandas as pd import datetime from dateutil.parser import parse import math import numpy as np

Below is data collected in 2008 which details flights and delays. This data was narrowed to include only Delta (DL) and United (UA) flights between airports LAX, JFK, ATL, IAD, SEA. By using only flights between specific airports, we reduce the likelihood that the data is influenced primarily by the airport or the airline.

trips = pd.read_csv('2008.csv')
Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier ActualElapsedTime AirTime ArrDelay ... Origin Dest Distance TaxiIn TaxiOut CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay
0 2008 1 1 2 613.0 1407.0 UA 294.0 278.0 -24.0 ... LAX JFK 2475 3.0 13.0 NaN NaN NaN NaN NaN
1 2008 1 2 3 615.0 1435.0 UA 320.0 298.0 4.0 ... LAX JFK 2475 3.0 19.0 NaN NaN NaN NaN NaN
2 2008 1 3 4 607.0 1454.0 UA 347.0 299.0 23.0 ... LAX JFK 2475 8.0 40.0 0.0 0.0 23.0 0.0 0.0
3 2008 1 4 5 618.0 1523.0 UA 365.0 284.0 52.0 ... LAX JFK 2475 3.0 78.0 0.0 0.0 52.0 0.0 0.0
4 2008 1 5 6 615.0 1416.0 UA 301.0 282.0 -15.0 ... LAX JFK 2475 4.0 15.0 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10152 2008 2 29 5 2128.0 2311.0 DL 103.0 77.0 -2.0 ... ATL IAD 533 8.0 18.0 NaN NaN NaN NaN NaN
10153 2008 2 29 5 1858.0 2041.0 DL 103.0 79.0 0.0 ... ATL IAD 533 7.0 17.0 NaN NaN NaN NaN NaN
10154 2008 2 29 5 1455.0 1646.0 DL 111.0 78.0 5.0 ... ATL IAD 533 5.0 28.0 NaN NaN NaN NaN NaN
10155 2008 2 29 5 824.0 1002.0 DL 98.0 78.0 -5.0 ... ATL IAD 533 4.0 16.0 NaN NaN NaN NaN NaN
10156 2008 2 29 5 957.0 1147.0 DL 110.0 82.0 -2.0 ... ATL IAD 533 7.0 21.0 NaN NaN NaN NaN NaN

10157 rows × 21 columns

The Model

To model flights and delays, we will use a state object which keeps a list of planes and also keeps track of ticks with the time variable. These variables are global but change throughout, so putting them in the state object makes sense. To simulate the planes themselves, a Plane class is created, which contains any variables for the planes and several functions to update them.

Our model, obviously, is more simple than a real-life airport system. We have limited our traffic to only a few airports, and a small number of planes. We have also decided to focus on airport delays--effectively ignoring in-flight delays due to weather, diversions, or other spontaneous circumstances.

planes = [] time = 0 state = State(planes = planes,time = time)
values
planes []
time 0
class Plane: def __init__(self, airline, inFlight, distance, target): ## Initializes an instance of the Plane class self.airline = airline self.inFlight = inFlight self.distance = distance self.target = target self.wait = 0 self.data = [] def move(self): ##the plane's movement tracker, which moves the plane towards its target by one unit every tick if self.distance > 0: self.data.append(str(self.distance)) self.distance -= 1 return True else: return False def delay(self): ##the plane's delay timer at airports, which counts down tick by one second if it is at an airport if self.wait > 0: self.data.append(0) self.wait -= 1 return True else: return False def go_to(self, target): ##sets a new target airport for the plane, while also calculating the distance from its current location to the target temp = self.target self.target = target self.distance = flight_time(temp,target) ##--------Getters---------## def getAirline(self): return self.airline def getInFlight(self): return self.inFlight def getDistance(self): return self.distance def getTarget(self): return self.target def getData(self): return self.data def getWait(self): return self.wait ##--------Setters---------## def setAirline(self,airline): self.airline = airline def setInFlight(self,inFlight): self.inFlight = inFlight def setDistance(self,distance): self.distance = distance def setTarget(self,target): self.target = target def setWait(self, wait): self.wait = wait def flight_time(x, y): #Outside the plane class, flight time calculates the time/distance in ticks between any of the 5 airports in the simulation if (x == "ATL" and y == "LAX") or (y == "ATL" and x == "LAX"): return 51 elif (x == "ATL" and y == "IAD") or (y == "ATL" and x == "IAD"): return 21 elif (x == "ATL" and y == "JFK") or (y == "ATL" and x == "JFK"): return 28 elif (x == "ATL" and y == "SEA") or (y == "ATL" and x == "SEA"): return 57 elif (x == "LAX" and y == "IAD") or (y == "LAX" and x == "IAD"): return 59 elif (x == "LAX" and y == "SEA") or (y == "LAX" and x == "SEA"): return 35 elif (x == "LAX" and y == "JFK") or (y == "LAX" and x == "JFK"): return 66 elif (x == "IAD" and y == "JFK") or (y == "IAD" and x == "JFK"): return 17 elif (x == "IAD" and y == "SEA") or (y == "IAD" and x == "SEA"): return 70 elif (x == "JFK" and y == "SEA") or (y == "JFK" and x == "SEA"): return 76 else: return False def delay_factor(baseNum, margin): ##Adds an element of randomness to the delay, which can be adjusted with the base number and the amout it can deviate rnd = random.randint(1,margin*2) return int((baseNum - (margin)) + rnd)
plane1 = Plane("UA",False,0,"LAX") plane2 = Plane("DL",False,0,"ATL") plane3 = Plane("UA",False,0,"LAX") plane4 = Plane("DL",False,0,"ATL") plane5 = Plane("UA",False,0,"LAX") plane6 = Plane("DL",False,0,"ATL") plane7 = Plane("UA",False,0,"LAX") plane8 = Plane("DL",False,0,"LAX") plane9 = Plane("UA",False,0,"ATL") plane10 = Plane("DL",False,0,"LAX") plane11 = Plane("UA",False,0,"ATL") state.planes.append(plane1) state.planes.append(plane2) state.planes.append(plane3) state.planes.append(plane4) state.planes.append(plane5) state.planes.append(plane6) state.planes.append(plane7) state.planes.append(plane8) state.planes.append(plane9) state.planes.append(plane10) state.planes.append(plane11)

For comparison we are using two different models for airlines, assuming each has only 2 planes, going between 2 airports.

Delta Airlines (DL) will be using a model where 1 plane is kept in reserve. Any time delta experiences a significant delay (variable maxDelay), the reserve plane will be called in to replace the original, instantly resetting the delay to 0.

United Airlines (UA) will be using a model where all planes are always in service, flying opposite directions between the 2 airports. Since there is no reserve plane, United makes turnarounds longer to maintain planes and reduce the impact of delays. However, if one of their planes exceeds a significant delay (variable maxDelay), the flight is cancelled, and the plane must wait until the next scheduled flight. Since the planes fly between two airports, this means two previously scheduled flights are cancelled.

The data will be obtained in the form of a ratio, comparing the number of successful flights for each airline. The variables for maximum delays and turnarounds are designed to be as close to real life as possile based on research. Running the simulation usually takes upwards of 2 minutes because of the vast quantity of data being processed. We experimented with smaller time scales and numbers but this resulted in very varied outputs.

def run_simulation(numPlanes, air1, air2): # The run_simulation function runs the simulation state.time = 0 DL = 0 UA = 0 planes = state.planes[0:((numPlanes * 2) - 1)] for x in range(100000): state.time += 1 DL += sim1(planes, air1, air2, 22) UA += sim2(planes, air1, air2, 22, 5) return [DL, UA, DL / UA] def sim1(planes, air1, air2, maxDelay): # Sim1 implements Delta's reserve plane model success = 0 for plane in planes: if plane.getAirline() == "DL": if not (plane.delay()): if plane.getWait() > maxDelay: plane.setWait(0) if not (plane.move()): success += 1 if (plane.getTarget() == air1): plane.go_to(air2) else: plane.go_to(air1) plane.setWait(delay_factor(15, 9)) return success def sim2(planes, air1, air2, maxDelay, addTurn): # Sim2 implements United's model success = 0 # that operates without reserve planes for plane in planes: if plane.getAirline() == "UA": if not (plane.delay()): if not (plane.move()): success += 1 if (plane.getTarget() == air1): plane.go_to(air2) else: plane.go_to(air1) delay = delay_factor(15, 9) plane.setWait(delay + addTurn) if delay > maxDelay: success -= 2 return success test1 = run_simulation(2, "IAD", "JFK") # This section collects data from run_simulation test2 = run_simulation( 2, "ATL", "LAX") # and creates lists to store all the different datasets test3 = run_simulation(2, "JFK", "SEA") test4 = run_simulation(3, "IAD", "JFK") test5 = run_simulation(3, "ATL", "LAX") test6 = run_simulation(3, "JFK", "SEA") test7 = run_simulation(4, "IAD", "JFK") test8 = run_simulation(4, "ATL", "LAX") test9 = run_simulation(4, "JFK", "SEA") test10 = run_simulation(5, "IAD", "JFK") test11 = run_simulation(5, "ATL", "LAX") test12 = run_simulation(5, "JFK", "SEA") test13 = run_simulation(6, "IAD", "JFK") test14 = run_simulation(6, "ATL", "LAX") test15 = run_simulation(6, "JFK", "SEA") tests = [ test1, test2, test3, test4, test5, test6, test7, test8, test9, test10, test11, test12, test13, test14, test15 ] DL_Flights = [] for test in tests: DL_Flights.append(test[0]) UA_Flights = [] for test in tests: UA_Flights.append(test[1]) ratio = [] for test in tests: ratio.append(test[2]) num_planes = [2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6] flight_length = [1, 3, 5, 1, 3, 5, 1, 3, 5, 1, 3, 5, 1, 3, 5]

The Results

The ratios of Delta's successful flights versus United's succesful flights are shown below. For each flight path, there are four ratios--each representing a test with a different number of planes. For reference, the flight paths are in order of shortest time to longest.
print(ratio)
# Creates a bubble plot to plot the ratio of successful flights data = {'a': num_planes, 'c': "blue", 'd': (flight_length)} data['b'] = ratio data['d'] = np.abs(data['d']) * 15 plt.scatter('a', 'b', c='c', s='d', data=data) plot([2, 6], [1, 1], color='purple', linestyle='-', linewidth=2) plt.xlabel('Number of Planes') plt.ylabel('Ratio of Successful Flights') plt.title('Plot 1: Ratios of Successful Flights v Number of Planes') plt.show()
Image in a Jupyter notebook

This graph shows the ratio of successful flights from United to those of Delta. The higher the ratio, the more successful Delta is. The line at y = 1 represents the point at which Delta begins having more successful flights than United. The size of each bubble represents the length of the flight.

# Creates a bubble plot to plot the number of successful flights for each airline data = {'a': num_planes, 'c': "maroon", 'd': flight_length} data['b'] = DL_Flights data['d'] = np.abs(data['d']) * 20 plt.scatter('a', 'b', c='c', s='d', data=data) data = {'a': num_planes, 'c': "navy", 'd': flight_length} data['b'] = UA_Flights data['d'] = np.abs(data['d']) * 20 plt.scatter('a', 'b', c='c', s='d', data=data) plt.xlabel('Number of Planes') plt.ylabel('Number of Successful Flights') plt.title('Plot 2: Number of Successful Flights v Number of Planes') plt.show()
Image in a Jupyter notebook

This graph shows the individual number of flights from United (blue) and Delta (red) that are successful. The size of each bubble represents the length of the flights.

The Interpretation

The data in these two plots shows most clearly that given a certain time period, more shorter flights can be completed than longer ones. This is no surprise, and more interesting is the ratios of United flights to Delta flights. With less planes, it is clear that United's model of using all of its planes works well, even if they have a slightly longer turnaround. However, as the airlines' fleets grow, Delta gains the advantage. This can best be explained by the fact that Delta constantly has 1 reserve plane, and so the ratio of reserve planes:in service planes decreases, while the advantage of having a reserve plane doesn't.

Plot 1 shows the ratio increasing throughout (shifting towards Delta's side) but levelling off near the end. This could imply that Delta's strategy works better with more planes, but only up to a point, after which it is still beneficial, but stops improving relative to United.

Plot 2 shows a very significant change in the number of successful short flights, and a smaller change in the longer flights. This can be attributed to the fact that there are more shorter flights so any imbalances will compound much faster. However, it should also be noted that Delta's method definitely seems more effective with shorter flights. This can be seen in both plots, as in plot 1, the shorter flights (smaller bubbles) have a much larger ratio, and in the second plot, delta's short flights exceed those of United earlier than the long flights.

In conclusion, United's model tends to be better suited to longer flights and fewer planes, while the opposite is true for Delta.