The following data analysis will answer specific questions using data collected from NASA's Near Earth Object Program at the Jet Propulsion Lab, California Institute of Technology. The data is automatically organized and categorized using the automated collision monitoring program: Sentry.
This analysis is designed to show highlight some of the automated processes that Sentry goes through in prioritizing objects that have a possibility of being a hazard to Earth. More specifically the data analysis will answer the following questions as well as visualize the data set with plots:
The dataset is imported into a tidyverse library and read as a tibble using the read_csv()
function.
The "untidy'd" data set, with more columns that will be needed for the data analysis, as columns that contain strings and integers, which need to be read as integers. i.e. Maximum Torino Scale
is read as a string but the data is in integers, or has NA values. while The Torino scale won't be used for this analysis, an example of cleaning as follows:
impacts
which is stripped of NA values using the is.na()
function.The column values that are strings or "chr" must read as integers. Since this we are not using the Torino scale it would look like so:
impacts <- as.numeric(Maximum Torino Scale
)
I created a variable that isolated the Torino column and converted its data values to integers, which would have allowed me to order them specifically. The reason the Torino scale will not be utilized in the analysis is explained in the answer to question 4.
Loading tidyverse
.libPaths(new = "~/Rlibs")
library(tidyverse)
Reading the imported datset from a .csv
file into tidyverse tibble. By creating a variable impacts
to handle the dataset.
impacts <- read_csv("impacts.csv")
print(impacts)
Using the select()
function, asteroids and the Torino scale are separated and shown as two columns side by side, we can see the that the torino scale data is read as strings even though its collected as integers.
print(select(impacts, `Object Name`, `Maximum Torino Scale`))
Creating a variable impacts.2
of the new data set read from the original data csv file, to make data manipulation more efficient. As well as eliminating any empty values, as the dataset comes from NASA its been been organized fairly well previously.
impacts.2 <- na.omit(impacts)
print(impacts.2)
To begin the analysis, the data set is reduced to the columns that are pertinent to the data analysis using the select()
function.
Impacts_pert <- select(impacts.2, `Object Name`, `Period End`, `Possible Impacts`, `Cumulative Impact Probability`,
`Asteroid Diameter (km)`, `Cumulative Palermo Scale`, `Asteroid Magnitude`, `Asteroid Velocity`)
New variable created called Impacts_pert
that selects all the pertinent columns to the data analysis. The new variable will contain 8 columns and 683 rows.
print(Impacts_pert)
A new variable: Asteroid_Size
, which compares the three columns Asteroid Diameter (km)
, Cumulative Impact Probability
and from largest to smallest in descending order. The diameter can be estimated from the magnitude of the asteroid.
Looking at the tibble, which we can see the largest is asteroid 2011 SR52, but in the second line of code, we see that asteroid that has the highest chance of hitting us 2010 RF12, with the lowsest chance belonging to 2014 HN197.
Asteroid_Size <- select(Impacts_pert, `Object Name`,
`Asteroid Diameter (km)`, `Cumulative Impact Probability`)
print(arrange(Asteroid_Size, desc(`Asteroid Diameter (km)`),
`Cumulative Impact Probability`))
Asteroid_critchance <- select(Impacts_pert, `Object Name`, `Cumulative Impact Probability`, `Asteroid Diameter (km)`)
print(arrange(Asteroid_critchance, desc(`Cumulative Impact Probability`), `Asteroid Diameter (km)`))
print(arrange(Asteroid_critchance, `Cumulative Impact Probability`, `Asteroid Diameter (km)`))
This can be answered with a plot of the two columns "Asteroid Diameter (km)
" on the x axis, and "Cumulative Impact Probability
" on the y axis.
As illustrated by the plot, the plot is very stretched out due to the high probability of 2010 RF12 and the few asteroids that have diameters greater than 1 km. Using the zoom function coord_cartesian()
, the plot is better able to be visualized. We can zoom in on that same plot to see all asteroids smaller than 50 meters, the datasets mean size.
Asteroid.Size.plot <- select(Asteroid_Size, `Asteroid Diameter (km)`,
`Cumulative Impact Probability`) # The variable to plot for 1b
size_plot <- ggplot(Asteroid.Size.plot) +
geom_point(mapping = aes(x = `Asteroid Diameter (km)`,
y = `Cumulative Impact Probability`))
zoom1 <- ggplot(Asteroid.Size.plot) +
geom_point(mapping = aes(x = `Asteroid Diameter (km)`,
y = `Cumulative Impact Probability`))
size_plot + theme_classic()
zoom1 + coord_cartesian(xlim = c(0,0.40),
ylim = c(1.1e-10, 1e-04)) + theme_classic()# The zoom variable is using the function coord_cartesian().
We find that the mean of the asteroid size is 0.494 km or 49.4 m. To put this size into perspecticve the Chelyabinsk meteorite that hit Russia in 2013 was ~20m in diameter. It injured over 1000 people and had force of 230 kT (kilotons), thats 20 times the power of the nuclear weapon that level Hiroshima, if one of these objects were to strike us, a metor of 50m would have an energy of 5.2 mT (megatons).
Average.Size <- select(Impacts_pert, `Asteroid Diameter (km)`)
mean.notrim <- colMeans(Average.Size, na.rm = FALSE, dims = 1)
trimmed.mean <- round(mean.notrim, digits = 4)
print(trimmed.mean)
A new variable Asteroid_Period
is created to show which asteroid has the soonest period end; in this case the asteroid is 2006 WP1, if its CIP was high this object could be hitting us this year!
Asteroid_Period <- select(Impacts_pert, `Object Name`, `Period End`,
`Asteroid Diameter (km)`, `Asteroid Magnitude`)
print(Asteroid_Period, desc(`Object Name`, `Period End`,
`Asteroid Diameter (km)`, `Asteroid Magnitude`))
Following the same order that we took with the previous questions we isolate the asteroids and their individual velocities and order them from slowest to fastest, with this we can also find out what the average speed of the asteroids is as well as if it shows any kind of trend, using the summarise()
function we can summarise all the values into a new tibble and use the mean()
function to find the average as well as the min(), max()
functions in the summarise()
function.
We see that the min max and mean velocities are:
mean = 11.46258
min = 0.34
max = 39.47
After using the filter()
and plotting the function the data shows a trend that the smaller asteroids have a higher velocity. While the larger ones have lower velocities. The values obtained by the summarise()
function are plotted below.
Asteroid_Size_vs_Velocity1 <- select(impacts.2, `Object Name`, `Asteroid Velocity`, `Asteroid Diameter (km)`)
print(arrange(Asteroid_Size_vs_Velocity1, desc(`Asteroid Velocity`), `Asteroid Diameter (km)`))
summarise(Asteroid_Size_vs_Velocity1, mean(`Asteroid Velocity`),
min(`Asteroid Velocity`),
max(`Asteroid Velocity`))
Asteroid_Size_vs_Velocity2 <- select(impacts.2, `Object Name`, `Asteroid Velocity`, `Asteroid Diameter (km)`)
less_than_1km <- filter(Asteroid_Size_vs_Velocity2, `Asteroid Diameter (km)` <= 0.1)
more_than_1km <- filter(Asteroid_Size_vs_Velocity2, `Asteroid Diameter (km)` >= 0.1)
LT <- ggplot(less_than_1km, aes(x = `Asteroid Velocity`, y = `Asteroid Diameter (km)`)) +
geom_point(size = 0.5) + labs(title = "All Asteroids less than 1km in radius") +
geom_vline(aes(xintercept = min(`Asteroid Velocity`), color = "red")) +
geom_vline(aes(xintercept = mean(`Asteroid Velocity`))) + theme_classic()
GT <- ggplot(more_than_1km, aes(x = `Asteroid Velocity`, y = `Asteroid Diameter (km)`)) +
geom_point(size = 2) + labs(title = "All Asteroids more than 1km in radius") +
geom_vline(aes(xintercept = max(`Asteroid Velocity`), color = "red")) +
geom_vline(aes(xintercept = mean(`Asteroid Velocity`))) + theme_classic()
GT
LT
vplot <- select(Asteroid_Size_vs_Velocity2, `Asteroid Velocity`, `Asteroid Diameter (km)`)
big_plot <-ggplot(vplot, aes(x = `Asteroid Velocity`, y = `Asteroid Diameter (km)`)) +
geom_point(size = 0.5) + labs(title = "All Asteroids") +
geom_vline(aes(xintercept = min(`Asteroid Velocity`), color = "red")) +
geom_vline(aes(xintercept = mean(`Asteroid Velocity`))) +
geom_vline(aes(xintercept = max(`Asteroid Velocity`), color = "red")) + theme_classic()
big_plot
The Palermo Scale is a more descriptive version of the other scale used in the dataset The Torino Scale, which is used for non-scientific audiences. The scale measures the probability of the asteroid impact and the energy in megatons of the object.
The scale operates on the premise that a "safe" object has a negative score, and as soon as it is upgrade to either zero or higher than its threat celling is raised.
Solving question four involves not just making sure that every value is negative (thankfully they are), but also the average of scores. While it may be negative it doesn’t necessarily mean well. A high negative mean is an indication that all the objects have such high negative scores that the probability of any hitting us is very slim a good example is the calculated mean of -6.51.
While a low negative mean of say; -2, would indicate that the overall number of objects have a higher probability of hitting us.
P_Scale <- select(Impacts_pert, `Object Name`, `Cumulative Palermo Scale`)
print(arrange(P_Scale, `Object Name`, desc(`Cumulative Palermo Scale`)))
P_Score <- select(P_Scale, `Cumulative Palermo Scale`)
Mean_Score <- mutate(P_Scale, "CPS Mean" = colMeans(P_Score, na.rm = FALSE, dims = 1))
print(Mean_Score)
In astronomy, an Object's magnitude is an indicator of how far it is from us. The magnitude is its brightness; it’s essentially a reflectivity index, and its measure in two forms Apparent Magnitude and Absolute Magnitude. The more accurate of the two is the absolute magnitude (measured as H mathematically), as it is obtained to a high degree of precision from astronomical observatories on Earth.
By using the H factor of an asteroid its distance can be mathematically obtained using the formula:
D = 10(Apparent Magnitude - H + 5)/5
Using an assumed apparent magnitude of -2.50 (the average magnitude of objects in the asteroid belt), we can estimate the distance to each asteroid using the mutate()
function.
This equation outputs the distance in parsec’s, there for it must be converted to a more manageable unit of measurement, in this case we will use the AU or Astronomical Unit which is ~150,000,000 kilometers.
1 parsec, is 206265 AU.
Thus the farthest asteroid is ~495 AU away or out in heliosphere, also conveniently this is also the largest asteroid: 2011 SR52 which explains why we can see it form so far away.
The closest asteroid is 2016 QY84 at 47 million kilometers from us!
We can see in the two plots, namely that there is a linear progression in size vs distance and the further out it gets the less asteroids there are, but also the larger they become.
Amag <- select(impacts.2, `Object Name`, `Asteroid Magnitude`, `Asteroid Diameter (km)`)
Distance <- mutate(Amag, `Asteroid Distance` = (10^((-2.5-`Asteroid Magnitude`+5)/5))*206265)
print(arrange(Distance, desc(`Asteroid Distance`)))
Distance_Mean <- select(Distance, `Asteroid Distance`)
Mean_Distance <- colMeans(Distance_Mean, na.rm = FALSE, dims = 1)
print(Mean_Distance)
Asteroid_Size_vs_Distance <- select(Distance, `Object Name`, `Asteroid Distance`, `Asteroid Diameter (km)`)
less_than_9km <- filter(Asteroid_Size_vs_Distance,`Asteroid Distance` <= 10)
more_than_9km <- filter(Asteroid_Size_vs_Distance,`Asteroid Distance` >= 10)
DLT <- ggplot(less_than_9km, aes(x = `Asteroid Diameter (km)`, y = `Asteroid Distance`)) +
geom_point(size = 0.5) + labs(x = "Asteroid Size in km",
y = "Asteroid Distance in AU",
title = "All Asteroids less than 9 km away from Earth (.est)") + theme_classic()
DGT <- ggplot(more_than_9km, aes(x = `Asteroid Diameter (km)`, y = `Asteroid Distance`)) +
geom_point(size = 1) + labs(x = "Asteroid Size in km",
y = "Asteroid Distance in AU",
title = "All Asteroids more than 9 km away from Earth (.est)") + theme_classic()
DGT
DLT
This dataset has proved to be extremely useful, also given that it is only two months old there is plenty of room for more in depth analysis and no doubt could provide a deeper insight on the Asteroids that may pose a threat to us one day. It has proved useful in providing a platform for not only learning R, but in seeing the capabilities of the language in interpreting data.
This data was downloaded from NASA's webpage on kaggle: https://www.kaggle.com/nasa/asteroid-impacts
The asteroid orbit and impact risk data was collected by NASA's Near Earth Object Program at the Jet Propulsion Laboratory (California Institute of Technology).
All mathematical formulas referenced from my astronomy textbook: Foundations of Modern Cosmology by Hawley and Holcomb.