Cleaning, Analysis & Documentation

   The BPD Field Interrogation and Observation dataset collected by the Boston Police Department tracks data in regards to field interrogation stops throughout the year of 2025. Across multiple neighborhoods within the Boston area, the set was created in order to promote transparency between the police and the public. However, throughout the set there are a number of inconsistencies, and it lacks crucial questions that help contextualize the situation. Despite this one may ask when looking at the data at face value, why is it that communities of color have a drastically larger amount of stops as compared to the white communities throughout the Boston area. 

When exploring the data,I first decided to separate the needed columns into a separate sheet, which includes the officer name, circumstance, city, and zip code. Immediately when filtering to see the number of stops per city, there is a clear inconsistency. First, the label of city for a data set that's only about Boston is misleading and incorrect. That is shown more when filtering as under that particular column it names Boston, as well as a number of neighborhoods to show where stops are. Some zip codes for the stops are labeled under multiple neighborhoods, and they are all labeled with Boston. Therefore, I decided to filter, remove the Boston, and City of Boston Label, and proceed to rename the column Neighborhoods as that is a more accurate description. 

Given this major inconsistency, I decided the best way to view the data was to keep it separated first on the basis of neighborhoods, and the number of stops made. Then I could see which neighborhoods had heightened numbers of stops. Ever more similarly, separating with officers in a pivot table, neighborhoods, and stops,I could see which officers made what number of stops in each neighborhood of Boston. This of course is not the only way to do this, as it does mean I have to take into consideration the areas I am excluding. While thai helps filter other neighborhoods, places like the downtown boston area only fall under “Boston”, so removing that filter means removing stops in the downtown area. 

Once I had cleaned the data set, of course I first noticed that many officers have more stops than others across the same neighborhood over the course of the year which I found interesting. While some officers have 1-2 stops/recorded observations within a neighborhood, others will have 23. What's really interesting however, is when you look at it including the dates,and times the stops were made. Because this column includes both the time and date in one, it's harder to filter cleanly without tediously deciding to go down and separate the information. 

Following this realization, I utilized my copilot on excel which helped me by analyzing that portion of data. In doing so it found that there are clear spikes in these events during certain times of the day,and of the year. For instance, it found that across Boston there is a spike in stops in the Month of May, Dorchester specifically spikes in August, and South Boston spikes in the beginning of January. 

I also am able to filter by street to see that in downtown areas there are significant increases in general, for example Tremont St sees a lot of activity as well as Mass Ave and the Downtown Crossing area. Not only that, but I can use copilot to help me identify a trend in the circumstance for these stops, and what neighborhoods have what kinds of stops. In doing so, I discovered the following: For firearm stops Mattapan has ~46% and Dorchester !44% of their stops are firearms. Leading in drug related stops is Roxbury with ~44% and Mattapan with ~33%, and finally for homelessness, interestingly enough the concentration is found just in Boston with ~36% heavily in the downtown area.

My initial angle on my story proposal for this dataset centered around communities with higher concentrations of people of color in them seem to have the most stops, and exploring that while also focusing on the lack of information that is within the sheet. There are no results of traffic stops listed, nor is there demographic information on the people who are stopped other than the circumstance, falling under drugs, firearms, or homelessness as a reason. However, when looking closer and seeing the concentration of stops by month, or by type per neighborhood helps me dig deeper within my initial pitch. Why do communities of color, such as Dorchester, Roxbury, and Mattapan, have such a higher amount of stops in relation to firearms, and drugs., I can explore deeper on the effect of firearms and gun violence that statistically does affect lower income communities of color at a disproportionate rate to white communities. Thai would of course require outside research, but this data helps me see the exact numbers of where these issues are found. 

Throughout my analysis of the dataset, I utilized tools such as pivot tables, and filtering within excel, but I also used my copilot in excel to help me filter and sort through information in a more clear and concise way. Through doing this it helped me get specific percentages, and clean the data easier to help form a narrative. At first I was skeptical as I had not used my copilot until now, but I found it very helpful, straightforward, and gave me what I needed to see. 

Previous
Previous

Dataset Pitch