Police deparments are stirving to implement more automated and predictive data systems into their everyday processes to reduce crime and deploy scarce resources more efficiently. This provides an opportunity for more proactive policing if it were possible to alert resources of abnormal patterns in the data as they occur. Boston police department released public dataset with incident reports reported to its 911 call center.
Assess the potential of the provided data set for predicting where police patrols should be dispatched in order to serve, protect, and optimize (people, money, resources, time).
(As provided online)
incident_num
(varchar; required) - Internal BPD report numberoffense_code
(varchar) - Numerical code of offense descriptionOffense_Code_Group_Description
(varchar) - Internal categorization of [offense_description]Offense_Description
(varchar) - Primary descriptor of incidentdistrict
(varchar) - What district the crime was reported inreporting_area
(varchar) - RA number associated with the where the crime was reported from.shooting
(char) - Indicated a shooting took place.occurred_on
(datetime) - Earliest date and time the incident could have taken placeUCR_Part
(varchar) - Universal Crime Reporting Part number (1, 2, 3)street
(varchar) - Street name the incident took placeWe load the data in the cells below. Uncomment and run the one corresponding to the language of your choice!
import pandas as pd reports_df = pd.read_csv('boston_crime_incident_reports_2015aug-2018apr.csv', encoding='latin-1') weather_df = pd.read_csv('boston_weather_data_cleaned_2018oct05.csv')
# reports_df <- read.csv('boston_crime_incident_reports_2015aug-2018apr.csv', header=TRUE) # weather_df <- read.csv('boston_weather_data_cleaned_2018oct05.csv', header=TRUE)
reports_df.head(3)
INCIDENT_NUMBER | OFFENSE_CODE | OFFENSE_CODE_GROUP | OFFENSE_DESCRIPTION | DISTRICT | REPORTING_AREA | SHOOTING | OCCURRED_ON_DATE | YEAR | MONTH | DAY_OF_WEEK | HOUR | UCR_PART | STREET | Lat | Long | Location | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | I182024895 | 2629 | Harassment | HARASSMENT | B3 | 442 | NaN | 2018-04-03 20:00:00 | 2018 | 4 | Tuesday | 20 | Part Two | WESTCOTT ST | 42.293218 | -71.078865 | (42.29321805, -71.07886455) |
1 | I182024895 | 619 | Larceny | LARCENY ALL OTHERS | B3 | 442 | NaN | 2018-04-03 20:00:00 | 2018 | 4 | Tuesday | 20 | Part One | WESTCOTT ST | 42.293218 | -71.078865 | (42.29321805, -71.07886455) |
2 | I182024887 | 1402 | Vandalism | VANDALISM | B3 | 469 | NaN | 2018-03-28 20:30:00 | 2018 | 3 | Wednesday | 20 | Part Two | ALMONT ST | 42.275277 | -71.095542 | (42.27527670, -71.09554245) |
Most of the interesting features are not numbers, so the above is not very useful.
Other than the incident ID, we can split the data features into two groups: Incident description or space-time coordinates
For the Incident description, a lot is redundant. OFFENSE_CODE is an integer representation of OFFENSE_CODE_GROUP, so we will not bother with it (let's stay human readable here).
OFFENSE_CODE_GROUP and OFFENSE_DESCRIPTION are pretty similar. The description is more granular, too granular. We don't want to use it.
# This can give us a first idea of crimes by date and district. For all hours of day, assign police resources in proportions to crimes commited in each district. # Note that some crimes are more important than others. reports_df.groupby(['HOUR', 'DISTRICT']).count().MONTH
Looking at the different types of crimes, we may want to eventually categorize the crimes in further subcategories.
reports_df.groupby('OFFENSE_CODE_GROUP').count().sort_values('MONTH', ascending=False).head(10).MONTH
# If we want to visualize the crimes location. reports_df.plot.scatter(x='Long', y='Lat')
# Uhhh? Turns out there are some data with bad values, eg Lat and Long with value of (0,0). Lets get rid of these reports_df[(reports_df['Lat'] > 10) & (reports_df['Long'] < -50)].plot.scatter(x='Long', y='Lat', s=1) # This should work better, we'll need s=1 to make the individual dots easier to see.
Cool, a map of Boston! What are the holes? Why are there disconnected regions? (There is a park in the middle of town, cars can't get in there. Also there are bodies of water, and a prominent bridge is traced out in the upper left part of the map.)