Calling the Cops

An EdSource investigation into policing in California schools

About


Credits

Reporting: Thomas Peele, Daniel J. Willis

Local reporting: Emma Gallegos (Kern County), Lasherica Thornton (Fresno), Mallika Seshadri (Los Angeles, San Bernardino County) and Monica Velez (Oakland)

Investigations and Projects Editor: Rose Ciotta

Database design, data gathering, scraping and cleaning: Daniel J. Willis, Thomas Peele and Justin Allen

Website design and development: Justin Allen

Graphics and website design: Yuxuan (Sunny) Xie

Social media, photo editor: Andrew Reed

Copy editing: Chuck Carroll

Data Collection

EdSource’s investigation of police activity in schools is based on the collection and analysis of nearly 46,000 incidents from 164 law enforcement agencies for 852 school sites in every county in California but one, Alpine County, which had no records to meet our request.

EdSource sought all police dispatch logs originating from or directing officers to specified school addresses. The database is a standardized collection of their responses. All marked redactions are by the law enforcement agency, and the data is otherwise unedited except in few labeled cases where EdSource removed unredacted information to protect student privacy.

The data is a random sample of schools across the state, with the goal of each county being represented and examples of schools of all types and sizes. The requests for police records of calls from or about high schools also included a substantial number of middle schools and some elementary schools.

Each incident is classified as either “Serious” or “Not Serious,” “Violent” or “Not Violent.” Each incident is also given a category signifying the type of call and a subcategory with the specific type of incident when available. Those fields were added by EdSource for ease of use and analysis. A full list of categories and subcategories can be found here.

Methodology

To obtain these records, EdSource sent requests under the California Public Records Act to 164 city, county sheriffs and district-run police departments resulting in data for 852 school sites. The schools were chosen at random, using a stratified sampling process with counties as the strata weighted by their population. Additionally EdSource acquired records from 10 of the state’s 19 district run police departments for all schools they provide police services to. Reporters sent more than 2,500 emails and messages within agencies’ automated public records act systems to obtain the data starting on May 9, 2023.

The police agencies were selected to maintain a geographic balance.

The records were then processed into a consistent format to build a centralized, standardized database.

Using a combination of optical character recognition software, AI document scrapers, and manual data entry, we were able to extract the applicable data for 164 agencies, including the district-run police departments. There were some records that could not be processed to our standard of accuracy due to PDF image quality, complexity or inconsistency of the template, or too much proprietary terminology in the reports.

Once the data was standardized the data was labeled to allow for a consistent analysis. While police agencies generally identify incident types with a straightforward description or penal code number, others used internal code or abbreviations, requiring that all records be given consistent tags. The internal codes were deciphered by obtaining departments’ data dictionaries or consulting with current and former police officials.

Each incident has a broad category for the type of call — for example “Physical Violence” for things like battery, assault, and fights, or “Vehicle” for traffic stops or parking violations — and subcategory for the specific type of incident. When an incident could fit in multiple categories we gave it the more serious one; for example a fight would be “Physical Violence,” but if the fight involved a weapon it would instead be classified as “Weapons.”

Categories and subcategories were primarily determined through the incident type listed in the report, but if the additional info, call log, or narrative contradicted the type or gave more detail, that category was used instead. In some cases where the data understates the severity of the incident, the category was based on news reports or other research outside the scope of the dataset.

Each record was given a subcategory - “Serious,” meaning an incident that reasonably requires a police presence, and “Violent,” which includes anything involving a violent act, including on oneself. Still, 6,177 records, 13.4% of the sample, were unable to be labeled and have the category and subcategory of “Unknown.”

In the interest of completeness and full transparency, the published database includes all of the notations that appeared in the police record.

The analysis followed after all tagging was complete. Though “Patrol” is the most common incident category, not every department included patrols in their data so the category is excluded when calculating per 100 student rates or other analysis that compares schools or districts. Patrols are included when analyzing all calls. The analysis included calculating a sampling error of 0.4884, under the adopted threshold of 0.5, which allowed the sample to be used to extrapolate statewide incident counts and trends.