Computing Applications News

It’s Not the Algorithm, It’s the Data

In risk assessment and predictive policing, biased data can yield biased results.
  1. Article
  2. Author
  3. Figures
It's Not the Algorithm, It's the Data, illustrative photo
Predictive policing systems identify "hotspots" where crime risk is the highest.

Crime in the U.S. has fallen dramatically over the past three decades, with 2014 statistics from the Federal Bureau of Investigation (FBI) noting the number of violent crimes committed per 100,000 people in 2013 (368) was less than half the level seen in 1991 (758).

Nevertheless, the debate continues over how to maintain these lower crime rates while addressing issues of fairness in the way communities are policed, as well as how to effectively and fairly use risk-assessment tools that can be relied upon by sentencing courts or parole boards.

There are two primary issues at stake: risk-assessment algorithms, which weigh a variety of factors related to recidivism, or the likelihood an individual will commit another crime and wind up back behind bars; and predictive policing, which has been described as using data analytics and algorithms to better pinpoint where and when a crime might occur, so police resources can be more efficiently deployed. Both issues are fraught with challenges—moral, logistical, and political—and opinions on whether they can be fairly and ethically utilized largely depend on how one views the nature of policing and the criminal justice system.

There is no debate that both of these types of technologies are being used on a fairly widespread basis in the U.S. According to a 2013 article published by Sonja B. Starr, a professor of law at the University of Michigan Law School, nearly every state has adopted some type of risk-based assessment tools to aid in sentencing. The primary concern related to these tools revolves around the use of computerized algorithms, which provide risk scores based on the result of questions that are either answered by defendants or pulled from criminal records, and whether such tools may ultimately penalize racial minorities by overpredicting the likelihood of recidivism in these groups.

The most widely known of these tools is COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a software tool owned by Northpointe, Inc., which has been used by a number of jurisdictions, including Broward County, FL, the State of New York, the State of Wisconsin, and the State of California, among others. The tool is seen as a success by many jurisdictions, such as New York State, which issued a 2012 report highlighting the effectiveness of the recidivism scale, noting, “the Recidivism Scale worked effectively and achieved satisfactory predictive accuracy,” with an accuracy rate of 0.71 AUC (area under curve) value (the optimal AUC value is 1.0, which would indicate no false positives/all true positives were identified).

The report noted actual and expected rates for any re-arrest were closely aligned across scores, and that the tool was more effective with higher-risk cases (53.8% re-arrest rate for those deemed high-risk by the tool, versus 16.9% for those deemed low-risk by the tool).

Nevertheless, in recent years, there has been significant criticism from many in academia and a scathing investigative analysis from ProPublica (whose website describes it as “an independent, non-profit newsroom that produces investigative journalism in the public interest”), which charged that the COMPAS algorithm and questions used to inform the algorithm were biased, since they relied on factors that could correlate with race. Critics say factors such as poverty, postal codes, and employment status can be used as proxies for race, as some are more highly correlated with minorities. Despite these limitations, the COMPAS tool survived its first major legal challenge in July 2016, when the Wisconsin Supreme Court ruled that judges can consider such risk scores during sentencing, but warnings must be attached to the scores to flag the tool’s “limitations and cautions.”

Moreover, the court specified that a computerized risk score cannot be the “determinative factor” in deciding whether someone is incarcerated or granted probation, and raised concerns about how many of its risk factors could be correlated with race.

For its part, Northpointe did not respond by press time to queries to address either the impact of the Wisconsin decision or criticism by academic or watchdog groups.

The use of algorithms in law enforcement is not limited to sentencing and parole cases. Many police departments around the country (including those in Seattle, WA, Richmond, VA, and Baltimore County, MD) are taking a more proactive approach to policing using analytics and algorithms, although these tools also are being targeted for incorporating what critics contend are data that has been tainted by years of racially motivated or biased policing strategies.

One such tool being used by a number of police departments is PredPol, developed by mathematicians at the University of California, Los Angeles, and Santa Clara University in close collaboration with crime analysts and patrol officers at the Los Angeles and Santa Cruz police departments. The tool uses three data points to provide predictions on where crime is likely to occur: past type of crime, place of crime, and time of crime. It does not use any personal information about individuals or groups of individuals in its crime predictions.

“We’re using algorithms that go through historical crime reports,” explains George Mohler, chief scientist at Santa Cruz, CA-based PredPol, Inc. “We use that data to estimate risk. Whether it’s patrol cars or foot patrols or community policing, where they’re engaging the community in those areas, we’re providing those locations on a Google Map for the officers to allocate their resources.”

The company cites success in a number of jurisdictions, such as Alhambra, CA (a 32% drop in burglaries and a 20% drop in vehicle theft since deploying PredPol in January 2013), Los Angeles (the city’s Foothill division saw a 20% drop in predicted crimes year over year from January 2013 to January 2014), and Norcross, CA (a 15%–30% reduction in burglaries and robberies in the four months after it deployed the technology in August 2013).

“We’ve made the decision to not use [demographic or personally identifying information], partially because when you do use them, there’s a diminishing return on accuracy you get,” Mohler says. “Secondly, I think as a company, and with the agencies that use these tools, there is concern about these algorithms being biased.”

Another predictive policing tool being deployed by police departments is Motorola Solutions’ CommandCentral Predictive. This tool takes historical crime data (including exact locations, types of crimes, and times of day at which they were committed) and compares that data with a more recent snapshot of a particular area, which allows changes or anomalies to be easily identified.

Daniel (DJ) Seals, a former police officer and industry expert with Motorola Solutions, says CommandCentral incorporates a machine-learning algorithm that compares historical crime data with more recent data to create a more accurate crime model and forecast, as opposed to simply relying on older data that may not be reflective of more recent activity.

PredPol uses three data points to provide predictions on where crime is likely to occur: past type of crime, place of crime, and time of crime.

Also, Seals says, CommandCentral introduces into the algorithm the concept of seasonality, which addresses crime patterns when temperatures rise or fall, further improving the granularity of the algorithm. Nonetheless, Seals agrees CommandCentral is a tool to help officers, not a replacement for the judgement of experienced officers.

“It takes a seasoned officer to look at the data, and say, ‘hey, I know what that is,'” Seals says. “It may be seemingly benign, but to that seasoned officer who knows the patterns, who knows the persons in that area, that sounds like ‘Bob.’ ‘Bob used to do that, and Bob just got out [of prison.]'”

Critics, however, say tools such as PredPol and CommandCentral are inherently biased since they rely heavily on reported crimes data, which is often concentrated in areas that are heavily policed, thereby skewing statistics to overrepresent poor or minority communities.

“We know that we have a history of racially biased policing in the United States, and that has fed into all the data that we have on where arrests have occurred, which crimes are more likely to occur in specific communities, and at which particular times,” says Jennifer Lynch, senior staff attorney at the Electronic Frontier Foundation. “That’s the data that’s being fed into predictive policing algorithms.”

Still, it is difficult to discount the value of event-based predictive policing, which relies on actual data on crimes that have been committed; ignoring this data could result in losing opportunities to prevent additional criminal acts.

“There has been a lot of research on near-repeat effects in crime,” PredPol’s Mohler says. “If someone breaks into a car in a certain neighborhood and is successful, they’ll often return to that same neighborhood a few days later, and break into another car.”

Systems such as PredPol and CommandCentral likely can spot such trends more quickly than relying on crunching historical crime statistics by hand, and allow law enforcement to target resources to address specific incidents.

Motorola’s Seals agrees, noting that CommandCentral does not just rely on data from years ago. “As we get closer to the time we’re predicting, we actually crunch another shorter term [algorithm],” Seals says. What’s more, as it employs a learning algorithm, CommandCentral will get more accurate over time, if the system is properly updated.

Critics say these tools are inherently biased since they rely on reported crimes data, which is often concentrated in heavily policed areas, skewing statistics to overrepresent the poor and minorities.

Ultimately, however, “The algorithm itself may not be biased, but the data used by predictive policing algorithms is colored by years of biased police practices,” the EFF’s Lynch says, citing government statistics that up to 15% of vehicle thefts and 65% of rapes or sexual assaults are not reported, and noting that these non-reported crimes may be occurring in areas that are not necessarily deemed “high crime.”

“An algorithm can only predict crime based on the data it already has,” Lynch says. “This means it will continue to predict crime that looks like the crime we already know about, and will miss crimes for which we don’t have data.”

What’s more, defenders of predictive policing admit it must be accompanied by better community police outreach and transparency, to engender greater trust in these types of systems. Writing in The Wall Street Journal in April 2016, Jennifer Bachner, director of the master of science in government analytics program at Johns Hopkins University and author of a paper that supports greater use of predictive policing, cited a need for both greater technology utilization and solid community policing strategies to reduce crime.

“Departments that adopt predictive-policing programs must at the same time re-emphasize their commitment to community policing,” Bachner wrote. “Officers won’t achieve substantial reductions in crime by holding up in patrol cars, generating real-time hotspot maps. Effective policing still requires that officers build trust with the communities they serve.”

Most importantly, the tools put in place must be used. A RAND Corporation study focused on a predictive-policing pilot program deployed in 2013 and 2014 by the Chicago Police Department called Strategic Subjects List, which examined data on people with arrest records and generated a list of several hundred individuals deemed at elevated risk of being shot or committing a shooting.

While an analysis of the program found that people on the list were nearly three times as likely to be arrested for a shooting as those who did not get flagged by the system, the system resulted in very few arrests. This was due the presence of no fewer than 11 other violence-reduction programs in use at the time, so officers simply ignored the data, and their superiors did not make utilizing the system a priority.

*  Further Reading

Starr, S. B.
Evidence-Based Sentencing and the Scientific Rationalization of Discrimination (September 1, 2013). Stanford Law Review, Forthcoming; U of Michigan Law & Econ Research Paper No. 13–014. Available at SSRN:

New York State COMPAS-Probation Risk and Need Assessment Study: Examining the Recidivism Scale’s Effectiveness and Predictive Accuracy

Wisconsin v. Loomis July 2016 Decision:

Statistics on Non-Reported Crime:

Truman, J. and Langton, L.
Criminal Victimization. September 29, 2015. U.S. Department of Justice.

Back to Top

Back to Top


UF1 Figure. Predictive policing systems identify “hotspots” where crime risk is the highest.

Back to top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More