Radiation Exposure of Flight Crews on United Airlines
Steve Matsumoto ([email protected])
**This is a sample project, in the form of a computational essay. You should not try to replicate this in one pass - there were at least three iterations of the model I went through in this process, and you should start with the simplest possible model. There is also a lot of programming here t
Question
Air travel carries many risks. Most members of the general public are aware of risks such as flight delays, lost or damaged luggage, or extra security screenings. However, many people are unaware or underinformed when it comes to radiation. At the cruising altitude of commercial flights (usually around 30,000 feet), there is less atmosphere in between passengers/crewmembers and cosmic ionizing radiation, and thus anyone on these flights faces increased doses of radiation. There is quite a bit of ongoing work about the health risks of cosmic ionizing radiation, but it is generally considered good advice to minimize radiation exposure.
Because of the frequency with which they fly, airline crewmembers have a particularly high radiation exposure relative to the rest of the population, and some airlines have measures in place to limit this exposure by restricting how often or how far their crewmembers can fly in a given amount of time. In this project, I attempt to answer the following question:
How many flight crews should United have on staff to maintain a reasonably safe level of radiation exposure for the average crew?
Even though this is a design question, its answer can also help us understand more about the world of air travel. It may help us predict the harm that increasing air travel will do to both the environment and to the workers servicing these flights. It may also help us explain why flight delays sometimes occur due to crew changes: radiation exposure does play a part in those decisions.
Methodology
To answer this question, I model domestic inter-hub flights on United Airlines using historical flight data and use this to simulate the average amount of radiation that an average crewmember is exposed to. Below, I describe the three major facets of my model: the state it keeps track of, the parameters and actions that determine how the state changes over time, and the metrics I will record to help in answering the central question.
State
My model deals with the movement of flight crews among United's hub airports and their accumulating radiation levels as they travel. Therefore, the state of the model will track a number of flight crews and store for each crew their current hub airport and their total radiation exposure thus far. To simplify the representation of the state, I assume the following:
The number of crews at United Airlines stays constant, so the model is always tracking the same number of crews through a simulation.
Each crew has a constant set of members and does not split or merge, so a crew is always at a single airport and has a single level of radiation exposure.
We only need to track future radiation exposure, so everyone initially starts with no radiation exposure.
These assumptions allow us to represent this part of the model state as a two-dimensional Pandas DataFrame: each row represents a crew, and the columns represent the airport and total radiation exposure of the crew.
To implement this representation, we can start by defining United's seven hub airports:
O'Hare International Airport (ORD) in Chicago, IL
Denver International Airport (DEN)
George Bush Intercontinental Airport (IAH) in Houston, TX
Los Angeles International Airport (LAX)
Newark Liberty International Airport (EWR)
San Francisco International Airport (SFO)
Dulles International Airport (IAD) in Washington, DC
We will represent this set simply as a list:
The variable name HUBS
is in all caps to indicate that we should assume it to reman constant throughout the simulation.
For measuring radiation exposure, I use the sievert (Sv), which is used as a standard unit of measurement for radiation exposure in humans, and represents the biological effect of 1 joule of radiation energy into a kilogram of human tissue. For this model, I specifically measure radiation exposure in microsieverts (μSv), which is the right scale given the exposure in typical domestic flight within the United States.
Because the central question of this project is about the total number of crews that United should employ, we should also make sure that there are enough crews to handle the volume of flights out of each airport. Therefore, we can also track the number of times that a flight attempts to leave an airport without a crew. This can simply be an integer included with the state.
Because we represent the location and radiation of the crews as a data frame, we first need to import Pandas:
Now we can define the function that takes a set of initial conditions (the number of crews initially at each hub airport) and returns a data frame representing those conditions.
In the for
loops, the underscore (_
) indicates that we do not need to use the value of the loop counter.
As an example, you can uncomment and run the cell below to see what an initialized data frame looks like. Feel free to change the parameters.
We can now define a function to make a state with a set of parameters. First we need to import the ModSim library.
Actions and Parameters
The state changes when a crew flies to another airport. At a given time step, each crew either stays at its current airport (leaving its radiation exposure level unchanged) or flies to a new airport and is exposed to some radiation. We can thus think of the model as having two types of parameters: first, some measure of how likely it is that a crew will fly from Airport X to Airport Y (or stay at Airport X), and second, the amount of radiation in microsieverts that a crewmember is exposed to when flying from Airport X to Airport Y. We can use the term flight segment to refer to a flight between airports; in a flight segment from ORD to DEN, we call ORD the origin and DEN the destination.
For the first type of parameter, rather than simply providing the probability that a crew will fly on some flight segment, I instead provide the average number of flights per day on a given flight segment. This parameter will allow us to select a set of crews for each flight segment on each day, and works better given the data we have to work with.
Since we have seven hub airports, there are many possible flight segments, 42 to be exact. This also means we have a lot of parameters: for each flight segment, there is the average number of flights per day and the amount of radiation exposure. Of course, the number of overall flight crews at United and number of days for which we run the simulation are also parameters.
So to summarize, the parameters of the system are:
The average number of flights per day on each flight segment (42 total)
The radiation exposure of each flight segment (42 total)
The total number of crews at hub airports
The number of time steps for which we run the simulation
Data-Driven Parameters
While in the simulation I assume the values of the system parameters above, it is helpful to explain and justify these assumptions with data.
Much of the data I used to determine the system parameters came from the Bureau of Transportation Statistics (BTS), which operates under the US Department of Transportation. Specifically, I used the T-100 Domestic Segment data from the BTS's Air Carrier Statistics Database. This data set has quite a lot of information, far more than what I need for the model, and is also not formatted well for the purposes of this model. In the Appendix section, I describe how I cleaned the data for easier processing, so we can assume that the data we are working with looks like this:
This data is for February 2019 and as of the writing of this notebook (September 2019) is the latest available data from the BTS. In the data set, "Passengers" denotes the number of passengers carried on the segment for the whole month, and "Airtime" denotes the number of minutes spent in the air for all flights that month. Unfortunately, the number of flights made in a month was not available in the data set, so we have to use the number of passengers and minutes in the air as proxy values to determine the system parameters.
To determine the average number of flights per day on each flight segment, I assumed that each flight transports 180 passengers, based on the capacities of United's aircraft fleet.
Since this data is from February 2019, we expect that the data is for 28 days of flying.
In estimating the radiation exposure level of a flight segment, I assumed that radiation exposure is purely determined by time in the air. (In reality, the amount of radiation exposure is determined by longitude, latitude, and altitude over time.) In particular, we can use an estimate of 3.9 microsieverts (μSv) per hour of airtime, based on the highest exposure rate for a domestic US flight as mentioned in a short information page by the Health Physics Society.
We define the file we will read our data from:
We can read this file into a Pandas DataFrame:
We can define a function that makes it easier to get the passenger and airtime values out of the dataframe.
With this function, we can define a function that converts a number of passengers per month to average flights per day.
We can also define a function that converts monthly airtime in minutes to microsieverts per flight, using the number of monthly passengers to estimate the average flight time on a segment.
We can then define a function that uses the above two functions to populate a matrix of parameters for flights per day and radiation per segment. The sets of these parameters are represented as a data frame where both rows and columns represent hub airports. The flights per day and radiation from an airport to itself can be ignored, but for a nice square matrix, we set those values to zero.
With these functions, we can generate the set of parameters for each segment.
To see the actual set of parameters generated, uncomment the lines below and run the cells.
As I expect, longer flights (corresponding to longer distances between airports) result in higher levels of radiation exposure.
Determining the initial number of crews to start with can be a bit tricky. As a baseline, I assume that each airport starts with enough crews to make all of the outgoing flights on a given day. We can see approximately how many crews this is by summing the columns of the daily flight parameter matrix:
A single parameter, which I call alpha
, can be used to scale these numbers up or down, and then round to the nearest integer to determine how many crews will initially be at each airport. The following function tells us how to allocate the initial crews among airports.
We can use this function to find out how many total crews there are.
To make a system object, we need to import the relevant function from ModSim:
We can then define a function to create a system object representing the parameters of a simulation, given the parameters from our data file and a value of alpha
.
If you want, you can make a sample system object below and check the system variables to make sure that they are what you expect.
Actions
For simplicity, I assume that a flight crew takes no more than one flight a day, so the daily action consists of selecting flights from each airport to move, and recording the radiation exposure of each crew that moves.
We start the description by making a sample state object to use as an example:
The asterisk at the beginning of the function parameter tells the function to treat the elements of the list as if it were the parameters of the function in order. Since make_state
takes the number of crews at each airport and starting_crews
is the number of crews to start at each airport, this works out nicely.
The flight parameters for our model describe the average daily number of flights on each segment. A Poisson distribution is a probability distribution that can be used to generate a nonnegative number of flights per day with a given average, with values farther from the average being less likely. In deciding how many crews to move along a segment on a given day, we will use this distribution.
In Python, we can do this using NumPy. Let's import the module:
We can then generate from a Poisson distribution using the entire flight parameter matrix:
Now we need to use this Poisson information to mark crews to move between airports. The choice
function in NumPy allows us to select a random subset of a series, or a row/column of a Pandas DataFrame. Let's import it:
We can now define a function that finds all crews at an origin airport and choose a random subset of them to move to a new airport.
We can use mark_flight_crews
in a function that selects a set of crews to move from a given origin airport to any other airport, and returns an updated list of crew locations given those flights.
We can then use select_departures
to implement a function that updates the state in each step of a simulation.
Metrics
With our model set up, we can now implement a function that runs a simulation. For flexibility, we can have the function return the entire state at the end of the simulation, rather than a specific metric.
With our model set up, we can now measure the radiation exposure of the crews. Because there can be many crews, it makes sense to think about the amount of radiation the average crew would be exposed to in a month. But it is also helpful to consider the most radiation any crew would be exposed to in a month. The latter is not as helpful to get in a simulation, since a crew may just end up having very bad luck in a month, but it can be helpful to look at anyway. We can implement functions to get those metrics from a state.
Intuitively, too few crews will also be bad for United, not only because of the higher radiation exposure per crew, but because there will simply not be enough crews to serve all of the flights. So it is helpful to take a look at how many flights are canceled due to insufficient crews. While this number is somewhat independent of the radiation exposure, it will help us tie the model to reality, since United will likely make their decision of how many crews to employ based on how many flights they have per day. We can define a function to get this value as well.
It may seem silly to define such a simple function, but providing a consistent way to get metrics from our model - that is, having a function take a state and return a value - makes it much easier to get other metrics from our model in the future if we want.
Because we are trying to answer the question of how many crews should be on staff, we can track these metrics against the number of flight crews. Remember that alpha
, the the ratio of total crews to total daily flights, is helpful for initally allocating crews to airports, so we can use values of alpha
in a parameter sweep. However, rather than setting the values of our sweep series based on alpha
, we can compute the number of crews and use that number instead. Let's import the SweepSeries
function from the ModSim library.
Our function will take a range of alpha
values and create a number of sweep series. To keep things flexible, the function will combine those sweep series into a Pandas dataframe, making it easy to get and plot the specific series we want, even if we add more metrics later.
Since we used monthly data to estimate the parameters for our simulation, let's run our simulation for a month. Just to make sure we have data for any month, we can run the simulation for 31 days.
We can import linrange
from the ModSim library to generate our parameters.
We can now run our simulations. This will take a while.
Results
We can now plot these results. To do this, let's import the necessary functions from the ModSim library.
We can then create a function to plot a desired series and label the plot.
Now, we can plot the metrics we collected earlier. First, we plot the average radiation exposure level.
We can then plot the maximum radiation exposure levels.
Finally, we can plot the number of canceled flights.
Interpretation
These plots confirm our intuition: as the total number of crews increases, each one is exposed to less radiation, and fewer flights go unserviced.
According to information from the Health Physics Society, the recommended monthly dose limit for radiation exposure is about 500 µSv for airline crewmembers. It looks like even with only 0.1 crews per average daily flight (about 27 crews in total), the average radiation exposure is below this amount, which is a bit surprising. At this level, each crew is flying almost every day, since there are not enough of them to go around. In fact, looking at the plot of maximum radiation exposure, this limit is almost never exceeded, regardless of how many or few crews are on United's payroll.
Interestingly, according to a CDC info page, the recommended exposure level for the general public in the US and for crews working in the EU, the recommended yearly dose is about 1000 µSv, which is about 83 µSv per month. Even with around 800 crews on staff (about 3 times the number of average daily flights), the average radiation level is significantly more than this amount. So it seems safe to say that crewmembers are exposed to significantly more radiation than is recommended for the general public.
So what is a reasonable level of radiation exposure? Since crewmembers likely expect to be exposed to a high amount of radiation, we could set a limit of twice that of the general public, or about 166 µSv per month. At this level, about 667 crews, or about 2.5 times the number of average daily flights, is recommended.
If we look at the number of canceled flights, it seems to level off around 400 crews, or around 1.5 times the number of daily average flights. So with 667 crews, United should be reasonably confident in its ability to maintain a decent on-time departure percentage, as well as a reasonable level of radiation exposure for its crews.
Limitations
There are a number of limitations of this model. Most notably, there are far more airports than the seven hub airports I considered here. A few other limitations to consider:
Crews typically have limits on how often they can fly, or how far they can fly in a given period of time. This is due to labor laws as well as radiation risks.
Flights can have vastly differing capacities, particularly between short regional flights and longer flights. This can throw off our estimates for flight and radiation parameters, particularly considering segments such as Newark to Dulles (which are quite rare due to their proximity).
This data is from February, which does not take into account travel patterns at more popular times, such as the summer months or December/January.
Further Exploration
With more time, there are some further interesting questions I could explore. For example, if crews could set limits on how often they fly (e.g., once every 3 days), how would the average radiation level exposure change? How would the number of canceled flights change?
Also, we could make a number of changes to the model, including the concept of a home airport (which virtually every crewmember has) in which crews fly to and from that airport more often than other airports. If we consider multiple months, we can also change the number of flights among airports as we move through the months. In both of these cases, we could see if these changes have an effect on the number of crews that United needs to emply to limit radiation exposure of their crews and maintain an on-time departures schedule.
Appendix
This data set is rather complex, but it is possible to select only the data that is needed when downloading. In this case, I chose to download information for February 2019 (the most recent month available). The data set is formatted as a CSV file, which can be interpreted similarly to a DataFrame. Each row represented a flight or set of flights, and the columns of the data set represented the following, in order.
The number of passengers transported by the flight(s)
The total air time of the flight(s)
The operating airline's IATA identification code (in United's case, UA)
The origin airport's IATA identification code (e.g., ORD for O'Hare)
The destination airport's IATA identification code
The month of flight (in the case of February, 2)
So the first few rows of the data looked like this:
In order to process this data into a more useful format, we can do the following:
Filter out all flights that carried no passengers or was not operated by United.
Add the total number of passengers and air time of all records for a given segment (origin-destination pair).
Assume that each flight transports 180 passengers, based on the capacities of United's aircraft fleet.
Use this to calculate the number of flights per month on each segment.
Assume that radiation exposure is purely determined by time in the air. (In reality, the amount of radiation exposure is determined by longitude, latitude, and altitude over time.) I used an estimate of 4 microsieverts (μSv) per hour of flying, based on the highest exposure rate for a domestic US flight as mentioned in a short information page by the Health Physics Society.
Use this to calculate the radiation exposure of a flight on each segment.
First, we define a variable with the path to the raw data file.
Then we import the modules we need to process the data.
We can then process the raw data into a more workable format: