Modeling Flight Delays

The Question

What causes flight delays and how can they be prevented? Here we will be looking at airplane traffic between several airports and modeling strategies to avoid flight delays and maintain flight turnaround efficiency.

In [130]:
# Configure Jupyter so figures appear in the notebook
%matplotlib inline

# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'

# import functions from the modsim library
from modsim import *

# set the random number generator
np.random.seed(7)
import random

import pandas as pd
import datetime
from dateutil.parser import parse
import math
import numpy as np

Below is data collected in 2008 which details flights and delays. This data was narrowed to include only Delta (DL) and United (UA) flights between airports LAX, JFK, ATL, IAD, SEA. By using only flights between specific airports, we reduce the likelyhood that the data is influenced primarily by the airport or the airline.

In [131]:
trips = pd.read_csv('2008.csv')
Out[131]:
Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier ActualElapsedTime AirTime ArrDelay ... Origin Dest Distance TaxiIn TaxiOut CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay
0 2008 1 1 2 613.0 1407.0 UA 294.0 278.0 -24.0 ... LAX JFK 2475 3.0 13.0 NaN NaN NaN NaN NaN
1 2008 1 2 3 615.0 1435.0 UA 320.0 298.0 4.0 ... LAX JFK 2475 3.0 19.0 NaN NaN NaN NaN NaN
2 2008 1 3 4 607.0 1454.0 UA 347.0 299.0 23.0 ... LAX JFK 2475 8.0 40.0 0.0 0.0 23.0 0.0 0.0
3 2008 1 4 5 618.0 1523.0 UA 365.0 284.0 52.0 ... LAX JFK 2475 3.0 78.0 0.0 0.0 52.0 0.0 0.0
4 2008 1 5 6 615.0 1416.0 UA 301.0 282.0 -15.0 ... LAX JFK 2475 4.0 15.0 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10152 2008 2 29 5 2128.0 2311.0 DL 103.0 77.0 -2.0 ... ATL IAD 533 8.0 18.0 NaN NaN NaN NaN NaN
10153 2008 2 29 5 1858.0 2041.0 DL 103.0 79.0 0.0 ... ATL IAD 533 7.0 17.0 NaN NaN NaN NaN NaN
10154 2008 2 29 5 1455.0 1646.0 DL 111.0 78.0 5.0 ... ATL IAD 533 5.0 28.0 NaN NaN NaN NaN NaN
10155 2008 2 29 5 824.0 1002.0 DL 98.0 78.0 -5.0 ... ATL IAD 533 4.0 16.0 NaN NaN NaN NaN NaN
10156 2008 2 29 5 957.0 1147.0 DL 110.0 82.0 -2.0 ... ATL IAD 533 7.0 21.0 NaN NaN NaN NaN NaN

10157 rows × 21 columns

The Model

To model flights and delays, we will use a state object which keeps a list of planes and also keeps track of ticks with the time variable. These variables are global but change throughout, so putting them in the state object makes sense. To simulate the planes themselves, a Plane class is created, which contains any variables for the planes and several functions to update them.

In [132]:
planes = []
time = 0
state = State(planes = planes,time = time)
Out[132]:
values
planes []
time 0
In [133]:
class Plane:
    
    def __init__(self, airline, inFlight, distance, target):         ## Initializes an instance of the Plane class
        self.airline = airline
        self.inFlight = inFlight
        self.distance = distance
        self.target = target
        self.wait = 0
        self.data = []
        
    def move(self):           ##the plane's movement tracker, which moves the plane towards its target by one unit every tick
        if self.distance > 0:
            self.data.append(str(self.distance))
            self.distance -= 1
            return True
        else:
            return False
    
    def delay(self):            ##the plane's delay timer at airports, which counts down tick by one second if it is at an airport
        if self.wait > 0:
            self.data.append(0)
            self.wait -= 1
            return True
        else:
            return False
    
    def go_to(self, target):     ##sets a new target airport for the plane, while also calculating the distance from its current location to the target
        temp = self.target
        self.target = target
        self.distance = flight_time(temp,target)
    
    ##--------Getters---------##
    def getAirline(self):
        return self.airline
    def getInFlight(self):
        return self.inFlight
    def getDistance(self):
        return self.distance
    def getTarget(self):
        return self.target
    def getData(self):
        return self.data
    def getWait(self):
        return self.wait
    
    ##--------Setters---------##
    def setAirline(self,airline):
        self.airline = airline
    def setInFlight(self,inFlight):
        self.inFlight = inFlight
    def setDistance(self,distance):
        self.distance = distance
    def setTarget(self,target):
        self.target = target
    def setWait(self, wait):
        self.wait = wait
        
def flight_time(x, y):     #Outside the plane class, flight time calculates the time/distance in ticks between any of the 5 airports in the simulation
    if (x == "ATL" and y == "LAX") or (y == "ATL" and x == "LAX"):
        return 51
    elif (x == "ATL" and y == "IAD") or (y == "ATL" and x == "IAD"):
        return 21
    elif (x == "ATL" and y == "JFK") or (y == "ATL" and x == "JFK"):
        return 28
    elif (x == "ATL" and y == "SEA") or (y == "ATL" and x == "SEA"):
        return 57
    elif (x == "LAX" and y == "IAD") or (y == "LAX" and x == "IAD"):
        return 59
    elif (x == "LAX" and y == "SEA") or (y == "LAX" and x == "SEA"):
        return 35
    elif (x == "LAX" and y == "JFK") or (y == "LAX" and x == "JFK"):
        return 66
    elif (x == "IAD" and y == "JFK") or (y == "IAD" and x == "JFK"):
        return 17
    elif (x == "IAD" and y == "SEA") or (y == "IAD" and x == "SEA"):
        return 70
    elif (x == "JFK" and y == "SEA") or (y == "JFK" and x == "SEA"):
        return 76
    else:
        return False
    
def delay_factor(baseNum, margin):     ##Adds an element of randomness to the delay, which can be adjusted with the base number and the amout it can deviate
    rnd = random.randint(1,margin*2)
    return int((baseNum - (margin)) + rnd)
In [134]:
plane1 = Plane("DL",False,0,"LAX")
plane2 = Plane("UA",False,0,"ATL")
plane3 = Plane("UA",False,0,"LAX")
state.planes.append(plane1)
state.planes.append(plane2)
state.planes.append(plane3)

For comparison we are using two different models for airlines, assuming each has only 2 planes, going between 2 airports.

Delta Airlines (DL) will be using a model where 1 plane is flying and 1 is kept in reserve. Any time delta experiences a significant delay (variable maxDelay), the reserve plane will be called in to replace the original, instantly resetting the delay to 0.

United Airlines (UA) will be using a model where both planes are always in service, flying opposite directions between the 2 airports. Since there is no reserve plane, United makes turnarounds longer to maintain planes and reduce the impact of delays. However, if one of their planes exceeds a significant delay (variable maxDelay), the flight is cancelled, and the plane must wait until the next scheduled flight. Since the planes fly between two airports, this means two previously scheduled flights are cancelled.

The data will be obtained in the form of how many successful flights are achieved by each airline. The variables for maximum delays and turnarounds are designed to be as close to real life as possile based on research. Running the simulation usually takes upwards of 2 minutes because of the vast quantity of data being processed. We experimented with smaller time scales and numbers but this resulted in very varied outputs.

In [135]:
def run_simulation(air1, air2):
    state.time = 0
    DL = 0
    UA = 0
    for x in range(500000):
        state.time += 1
        DL += sim1(air1, air2, 19)
        UA += sim2(air1, air2, 19, 6)
    print("For a set of flights between " + air1 + " and " + air2 + ", DL had " + str(DL) + " successful flights and UA had " + str(UA) + " resulting in a ratio of " + str(DL/UA))

def sim1(air1, air2, maxDelay):
    success = 0
    for plane in state.planes:
        if plane.getAirline() == "DL":
            if not(plane.delay()):
                if plane.getWait() > maxDelay:
                    plane.setWait(0)
                if not(plane.move()):
                    success += 1
                    if(plane.getTarget() == air1):
                        plane.go_to(air2)
                    else:
                        plane.go_to(air1)
                    plane.setWait(delay_factor(15,6))
    return success
                    
def sim2(air1, air2, maxDelay, addTurn):
    success = 0
    for plane in state.planes:
        if plane.getAirline() == "UA":
            if not(plane.delay()):
                if not(plane.move()):
                    success += 1
                    if(plane.getTarget() == air1):
                        plane.go_to(air2)
                    else:
                        plane.go_to(air1)
                    delay = delay_factor(15, 6)
                    plane.setWait(delay + addTurn)
                    if delay > maxDelay:
                        success -= 2

    return success
            
def plot_data():    #Implementation in progress
    data = []
    #for plane in state.planes:
        #data.append(state.planes[0].getData())
    #df=pd.DataFrame({'x': np.array(range(1, state.time)), 'y1': state.planes[0].getData(), 'y2': state.planes[1].getData()})
    #print(df)
    #plot(state.planes[0].getData(), '--', label = "Full Send")
    #plot(state.planes[1].getData(), '--', label = "Full Send")
    #plot('y1', data=df)
    #plot('y2', data=df)


    
run_simulation("IAD", "JFK")
run_simulation("ATL", "LAX")
run_simulation("JFK", "SEA")
#plot_data()
For a set of flights between IAD and JFK, DL had 14899 successful flights and UA had 16680 resulting in a ratio of 0.8932254196642686

The Results

The results of the experiments are shown above. For reference, the flight paths are in order of shortest time to longest. Throughout, United's has more successful flights than Delta, and this success is even clearer on longer flights. For reference, the data aquired when we ran the simulation was:

For a set of flights between IAD and JFK, DL had 14941 successful flights and UA had 16977 resulting in a ratio of 0.8800730399952877

For a set of flights between ATL and LAX, DL had 7410 successful flights and UA had 9050 resulting in a ratio of 0.8187845303867404

For a set of flights between JFK and SEA, DL had 5405 successful flights and UA had 6763 resulting in a ratio of 0.7992015377790921

The Interpretation

The clear interpretation of this is that United's model is more effective. However, the difference is much closer on shorter flights, where Delta's model gets close. Although this seems to show a clear solution to this problem, there are many more variales that can be investigated to determine in what scenarios Delta's model may be superior to United's, or if United's is always better.

In [0]:
 
In [0]:
 
In [0]:
 
In [0]:
 
In [0]:
 
In [0]:
 
In [0]: