An In Depth Analysis Of Possible Asteroid Impacts With Earth.
By Sebastian Goslin CDS 101-001
The following data analysis will answer specific questions using data collected from NASA's Near Earth Object Program at the Jet Propulsion Lab, California Institute of Technology. The data is automatically organized and categorized using the automated collision monitoring program: Sentry.
This analysis is designed to show highlight some of the automated processes that Sentry goes through in prioritizing objects that have a possibility of being a hazard to Earth. More specifically the data analysis will answer the following questions as well as visualize the data set with plots:
Which of the asteroids are the largest, and of those which have the highest impact probability, as well as what is the average size of the asteroids? This question is broken up into three parts a, b, and c.
Which objects have periods that end on or near the year 2017?
Which objects have the are the fastest, and which ones are the largest and fastest?
Which asteroids score the highest on the Palermo scale, what’s the average Palermo score?
Which object is closest to us?
1. Data Import and Cleaning.
The dataset is imported into a tidyverse library and read as a tibble using the read_csv()
function.
The "untidy'd" data set, with more columns that will be needed for the data analysis, as columns that contain strings and integers, which need to be read as integers. i.e. Maximum Torino Scale
is read as a string but the data is in integers, or has NA values. while The Torino scale won't be used for this analysis, an example of cleaning as follows:
First, the dataset needs to have the NA values omitted this is done in the line below. Which can be seen in the new variable
impacts
which is stripped of NA values using theis.na()
function.The column values that are strings or "chr" must read as integers. Since this we are not using the Torino scale it would look like so:
impacts <- as.numeric(
Maximum Torino Scale
)
I created a variable that isolated the Torino column and converted its data values to integers, which would have allowed me to order them specifically. The reason the Torino scale will not be utilized in the analysis is explained in the answer to question 4.
Loading tidyverse
Reading the imported datset from a .csv
file into tidyverse tibble. By creating a variable impacts
to handle the dataset.
Using the select()
function, asteroids and the Torino scale are separated and shown as two columns side by side, we can see the that the torino scale data is read as strings even though its collected as integers.
Creating a variable impacts.2
of the new data set read from the original data csv file, to make data manipulation more efficient. As well as eliminating any empty values, as the dataset comes from NASA its been been organized fairly well previously.
2. Dataset Analysis, Exploration and Visulization.
To begin the analysis, the data set is reduced to the columns that are pertinent to the data analysis using the select()
function.
Object Name
Period End
Possible Impacts
Cumulative Impact Probability (CIP)
Asteroid Diameter (km)
Cumaltive Palermo Scale (CPS)
Asteroid Magnitude
New variable created called** Impacts_pert
**that selects all the pertinent columns to the data analysis. The new variable will contain 8 columns and 683 rows.
Question 1:
a: Which of the asteroids are the largest?
b: Of those which have the highest impact probability?
c: What is the average size of the objects?
A new variable: Asteroid_Size
, which compares the three columns Asteroid Diameter (km)
, Cumulative Impact Probability
and from largest to smallest in descending order. The diameter can be estimated from the magnitude of the asteroid.
1a:
Looking at the tibble, which we can see the largest is asteroid 2011 SR52, but in the second line of code, we see that asteroid that has the highest chance of hitting us 2010 RF12, with the lowsest chance belonging to 2014 HN197.
**1b: **
This can be answered with a plot of the two columns "Asteroid Diameter (km)
" on the x axis, and "Cumulative Impact Probability
" on the y axis.
As illustrated by the plot, the plot is very stretched out due to the high probability of 2010 RF12 and the few asteroids that have diameters greater than 1 km. Using the zoom function coord_cartesian()
,** the plot is better able to be visualized. We can zoom in on that same plot to see all asteroids smaller than 50 meters, the datasets mean size.**
1c:
We find that the mean of the asteroid size is 0.494 km or 49.4 m. To put this size into perspecticve the Chelyabinsk meteorite that hit Russia in 2013 was ~20m in diameter. It injured over 1000 people and had force of 230 kT (kilotons), thats 20 times the power of the nuclear weapon that level Hiroshima, if one of these objects were to strike us, a metor of 50m would have an energy of 5.2 mT (megatons).
Question 2: Which objects have periods that end on or near the year 2017?
A new variable Asteroid_Period
is created to show which asteroid has the soonest period end; in this case the asteroid is 2006 WP1, if its CIP was high this object could be hitting us this year!
Question 3: Which objects are the fastest, which ones are the largest and what’s the average speed?
Following the same order that we took with the previous questions we isolate the asteroids and their individual velocities and order them from slowest to fastest, with this we can also find out what the average speed of the asteroids is as well as if it shows any kind of trend, using the summarise()
function we can summarise all the values into a new tibble and use the mean()
function to find the average as well as the min(), max()
functions in the summarise()
function.
We see that the min max and mean velocities are:
mean = 11.46258
min = 0.34
max = 39.47
After using the filter()
and plotting the function the data shows a trend that the smaller asteroids have a higher velocity. While the larger ones have lower velocities. The values obtained by the summarise()
function are plotted below.
Question 4: Which asteroids score the highest on the Palermo scale, whats the average Palermo score?
The Palermo Scale is a more descriptive version of the other scale used in the dataset The Torino Scale, which is used for non-scientific audiences. The scale measures the probability of the asteroid impact and the energy in megatons of the object.
The scale operates on the premise that a "safe" object has a negative score, and as soon as it is upgrade to either zero or higher than its threat celling is raised.
Solving question four involves not just making sure that every value is negative (thankfully they are), but also the average of scores. While it may be negative it doesn’t necessarily mean well. A high negative mean is an indication that all the objects have such high negative scores that the probability of any hitting us is very slim a good example is the calculated mean of -6.51.
While a low negative mean of say; -2, would indicate that the overall number of objects have a higher probability of hitting us.
Question 5: Which object is closest to us?
In astronomy, an Object's magnitude is an indicator of how far it is from us. The magnitude is its brightness; it’s essentially a reflectivity index, and its measure in two forms Apparent Magnitude and Absolute Magnitude. The more accurate of the two is the absolute magnitude (measured as H mathematically), as it is obtained to a high degree of precision from astronomical observatories on Earth.
By using the H factor of an asteroid its distance can be mathematically obtained using the formula:
D = 10(Apparent Magnitude - H + 5)/5
Using an assumed apparent magnitude of -2.50 (the average magnitude of objects in the asteroid belt), we can estimate the distance to each asteroid using the mutate()
function.
This equation outputs the distance in parsec’s, there for it must be converted to a more manageable unit of measurement, in this case we will use the AU or Astronomical Unit which is ~150,000,000 kilometers.
1 parsec, is 206265 AU.
Thus the farthest asteroid is ~495 AU away or out in heliosphere, also conveniently this is also the largest asteroid: 2011 SR52 which explains why we can see it form so far away.
The closest asteroid is 2016 QY84 at 47 million kilometers from us!
We can see in the two plots, namely that there is a linear progression in size vs distance and the further out it gets the less asteroids there are, but also the larger they become.
3. Dataset Summary
This dataset has proved to be extremely useful, also given that it is only two months old there is plenty of room for more in depth analysis and no doubt could provide a deeper insight on the Asteroids that may pose a threat to us one day. It has proved useful in providing a platform for not only learning R, but in seeing the capabilities of the language in interpreting data.
This data was downloaded from NASA's webpage on kaggle: https://www.kaggle.com/nasa/asteroid-impacts
The asteroid orbit and impact risk data was collected by NASA's Near Earth Object Program at the Jet Propulsion Laboratory (California Institute of Technology).
All mathematical formulas referenced from my astronomy textbook: Foundations of Modern Cosmology by Hawley and Holcomb.