Police deparments are stirving to implement more automated and predictive data systems into their everyday processes to reduce crime and deploy scarce resources more efficiently. This provides an opportunity for more proactive policing if it were possible to alert resources of abnormal patterns in the data as they occur. Boston police department released public dataset with incident reports reported to its 911 call center.
Assess the potential of the provided data set for predicting where police patrols should be dispatched in order to serve, protect, and optimize (people, money, resources, time).
Description of columns
(As provided online)
incident_num (varchar; required) - Internal BPD report number
offense_code (varchar) - Numerical code of offense description
Offense_Code_Group_Description (varchar) - Internal categorization of [offense_description]
Offense_Description (varchar) - Primary descriptor of incident
district (varchar) - What district the crime was reported in
reporting_area (varchar) - RA number associated with the where the crime was reported from.
shooting (char) - Indicated a shooting took place.
occurred_on (datetime) - Earliest date and time the incident could have taken place
UCR_Part (varchar) - Universal Crime Reporting Part number (1, 2, 3)
street (varchar) - Street name the incident took place
We load the data in the cells below. Uncomment and run the one corresponding to the language of your choice!
Most of the interesting features are not numbers, so the above is not very useful.
Other than the incident ID, we can split the data features into two groups: Incident description or space-time coordinates
For the Incident description, a lot is redundant. OFFENSE_CODE is an integer representation of OFFENSE_CODE_GROUP, so we will not bother with it (let's stay human readable here).
OFFENSE_CODE_GROUP and OFFENSE_DESCRIPTION are pretty similar. The description is more granular, too granular. We don't want to use it.
# This can give us a first idea of crimes by date and district. For all hours of day, assign police resources in proportions to crimes commited in each district.# Note that some crimes are more important than others.reports_df.groupby(['HOUR','DISTRICT']).count().MONTH
Motor Vehicle Accident Response 26459
Medical Assistance 18911
Investigate Person 15646
Drug Violation 12698
Simple Assault 12604
Verbal Disputes 11009
Name: MONTH, dtype: int64
# If we want to visualize the crimes location.reports_df.plot.scatter(x='Long',y='Lat')
<matplotlib.axes._subplots.AxesSubplot at 0x7f0a90860550>
# Uhhh? Turns out there are some data with bad values, eg Lat and Long with value of (0,0). Lets get rid of thesereports_df[(reports_df['Lat']>10)&(reports_df['Long']<-50)].plot.scatter(x='Long',y='Lat',s=1)# This should work better, we'll need s=1 to make the individual dots easier to see.
<matplotlib.axes._subplots.AxesSubplot at 0x7f0a9068aa20>
Cool, a map of Boston! What are the holes? Why are there disconnected regions? (There is a park in the middle of town, cars can't get in there. Also there are bodies of water, and a prominent bridge is traced out in the upper left part of the map.)