Academics Confirm Major Predictive Policing Algorithm Is Fundamentally Flawed (vice.com)
An anonymous reader quotes a report from Motherboard: Last week, Motherboard published an investigation which revealed that law enforcement agencies around the country are using PredPol -- a predictive policing software that once cited the controversial, unproven "broken windows" policing theory as a part of its best practices. Our report showed that local police in Kansas, Washington, South Carolina, California, Georgia, Utah, and Michigan are using or have used the software. In a 2014 presentation to police departments obtained by Motherboard, the company says that the software is "based on nearly seven years of detailed academic research into the causes of crime pattern formation the mathematics looks complicated -- and it is complicated for normal mortal humans -- but the behaviors upon which the math is based are very understandable."
The company says those behaviors are "repeat victimization" of an address, "near-repeat victimization" (the proximity of other addresses to previously reported crimes), and "local search" (criminals are likely to commit crimes near their homes or near other crimes they've committed, PredPol says.) But academics Motherboard spoke to say that the mathematical theory that is used to power PredPol is flawed, and that its algorithm -- at least as pitched to police -- is far too simplistic to actually predict crime. Kristian Lum, who co-wrote a 2016 paper that tested the algorithmic mechanisms of PredPol with real crime data, told Motherboard in a phone call that although PredPol is powered by complicated-looking mathematical formulas, its actual function can be summarized as a moving average -- or an average of subsets within a data set. "The academic foundation for PredPol's software takes a statistical modeling method used to predict earthquakes and apply it to crime," reports Motherboard. "Much like how earthquakes are likely to appear in similar places, the papers argue, crimes are also likely to occur in similar places. Suresh Venkatasubramanian, a professor of computing at the University of Utah and a member of the board of directors for ACLU Utah, told Motherboard that earthquake data and crime data are, naturally, collected in different ways."
"I would say in our mind, the key difference is that in earthquake models, you have seismographs everywhere -- wherever an earthquake happens, you'll find it," Venkatasubramanian said. "The crux of the issue really is that to what extent are you able to get data about what you're observing that is not also totally on the model itself." "If you build predictive policing, you are essentially sending police to certain neighborhoods based on what what they told you -- but that also means you're not sending police to other neighborhoods because the system didn't tell you to go there," Venkatasubramanian said. "If you assume that the data collection for your system is generated by police whom you sent to certain neighborhoods, then essentially your model is controlling the next round of data you get."
The company says those behaviors are "repeat victimization" of an address, "near-repeat victimization" (the proximity of other addresses to previously reported crimes), and "local search" (criminals are likely to commit crimes near their homes or near other crimes they've committed, PredPol says.) But academics Motherboard spoke to say that the mathematical theory that is used to power PredPol is flawed, and that its algorithm -- at least as pitched to police -- is far too simplistic to actually predict crime. Kristian Lum, who co-wrote a 2016 paper that tested the algorithmic mechanisms of PredPol with real crime data, told Motherboard in a phone call that although PredPol is powered by complicated-looking mathematical formulas, its actual function can be summarized as a moving average -- or an average of subsets within a data set. "The academic foundation for PredPol's software takes a statistical modeling method used to predict earthquakes and apply it to crime," reports Motherboard. "Much like how earthquakes are likely to appear in similar places, the papers argue, crimes are also likely to occur in similar places. Suresh Venkatasubramanian, a professor of computing at the University of Utah and a member of the board of directors for ACLU Utah, told Motherboard that earthquake data and crime data are, naturally, collected in different ways."
"I would say in our mind, the key difference is that in earthquake models, you have seismographs everywhere -- wherever an earthquake happens, you'll find it," Venkatasubramanian said. "The crux of the issue really is that to what extent are you able to get data about what you're observing that is not also totally on the model itself." "If you build predictive policing, you are essentially sending police to certain neighborhoods based on what what they told you -- but that also means you're not sending police to other neighborhoods because the system didn't tell you to go there," Venkatasubramanian said. "If you assume that the data collection for your system is generated by police whom you sent to certain neighborhoods, then essentially your model is controlling the next round of data you get."
Today it is taboo to discuss that few minority populations responsible for 90%+ of all urban crime. Without these models, police would be faced with wasted effort of policing no-crime white suburban neighborhoods instead of trying to stop black gangs from killing each other and selling drugs to black children.
a predictive policing software that once cited the controversial, unproven "broken windows" policing theory as a part of its best practices.
The only thing "controversial" about broken window theory was that it worked.
The fact that it worked really bothered a certain class of people (people who wanted crime to be the fault of something other than criminals).
Re "self-reinforcing observation"
No city has the extra police numbers to over police low crime areas and still have enough police for parts of the city filled with criminals.
Crime fills parts of US cities and over time its rather easy to map out, put up on a GUI map.
Every 911 call has to be responded to on time.
No city can just say that many of its police are kept on patrol in a very low crime area that day and the response time in a high crime area will take time.
Police do not want to see that their police support is kept far away from them in areas of a city with a constant crime problem.
The numbers of police, police patrols is not a large number.
Why would any city waste its limited numbers of police in low crime areas?
Thats real people in an area of the city with real crime that have to wait longer? Why do that?
he system exists to map and predict crime given past crime.
Fill that area with police and try and contain all the crime and criminals.
Re "areas aren't going to get enough police".
That would show on the reported maps and the system would report a need to place more police in that area as crime and criminals move into an area.
Insurance reports, reported crime, calls to 911, types of crime, arrests and resulting convictions would show a rising and changing crime rate.
Its not "questionably criminal" when lots of people go to court and then to prison from one area of a city a lot more than another.
Police could put on more patrols and try and get their ability time needed to get to a crime down.
Domestic spying is now "Benign Information Gathering"
That really makes no sense for most crimes. Look at murder or burglary: it doesn't matter if police are in the neighborhood "noticing crimes" or not, it is going to get reported equally theoretically. The only way that applies is for victimless crimes. Traffic violations aren't going to be reported unless a police notices it.
Yes we do. If there is a murder, burglary, mugging it is going to be "detected" and reported no matter where it occurs by the populace. The only crimes that won't get reported are minor violations (traffic, etc). Police rarely detect crimes - they respond to crimes after they happen.
So.. while not an academic, this is pretty close to my field of research. Looking at their model, I am not surprised they sold this product but deeply disappointed. This is the type of model that is REALLY easy to sell to people, both law enforcement and the military (our customer) are enamored with them for their near magic ability to 'predict' things. Only they don't, they tend to fail in unpredictable ways. They are not bad in multi-model systems where you take a dozen or so different systems built by different teams, run them in parallel, then have subject matter experts ponder the conflicting results. But actual police out of a single model? Madness... or hubris.. or stupidity... or simply being enamored with a slick sales pitch from 'one of your own' offering to solve problems in the way you want them solved.
Oddly enough, we actually DID do a LEO model years back, which was actually pretty effective, but it encouraged things like community outreach and police/citizen interaction which worked really well for officers on the ground but pissed off lawmakers and 'police unions', so it was largely dropped.
Which gets back to this story and one of the fundamental flaws in such attempts. The decision makers are not interested in solutions that make things better for high crime areas in the first place, the people in those areas are not part of their power block. They want solutions that 'sound right' to people who live elsewhere and confirm what they already believe. Which is exactly what models like this are good at producing. They are kinda like torture... useless for prediction or information gathering, but an excellent political tool for confirming the story your career depends on being 'true'.
Murder yes you're right, unless the area is dealing with a high number of murders. See the case of NYC in the 1970's, 'warm bodies' on the streets made a significant difference in the span of a few years. Burglary you're wrong on, more police or more active patrols decrease the possibility of those types of crimes happening because the possibility of something happening in plain sight makes the individual reconsider their actions. See the "rational choice" theory out of criminology for example. It takes the belief that most people, knowing right from wrong will not take an action unless they fall into three basic groups. First being those who won't ever commit a crime, the second being those that will commit a crime if they know they won't be caught. The third being those who will commit a crime irregardless of circumstances, even if someone is standing over their shoulder. Depending on the studies, those numbers range from 30-40% who will never commit a crime. To the remaining 60% who might or will. Will generally making up 10-25% of that remainder depending on various other factors dealing from generational crime, to social influence.
CPTED(Crime prevention through environmental design) is the basis of reducing crime by deterring the actions of those who "might" and "will" commit a crime. Whether it be more patrols, building designs that don't leave dark areas, motion lights/cameras, and so on. It's also heavily used in internal theft-prevention in every business around the world because it works, and works well. You can turn bad areas that are effectively ghettos into crime free areas by increasing property values, bringing in businesses that employ, reducing petty crime and poor education and so-forth. Having programs like Neighborhood watch or COP(citizens on patrol), to have more eyes looking for crime problems. All of that falls into the CPTED models.
Om, nomnomnom...
Except the ISSUE IS NOT REPORTING or "noticing crimes" - IT'S PREDICTING CRIMES that's the issue.
Algorithm doesn't notice nor report crimes. It predicts where to send the police.
Resulting in a "garbage in = garbage out" predictive result based on reinforcement of outdated data.
E.g. If there was an arrest of a guy selling pot in front of a local Starbucks last month, and another guy arrested selling meth in a parking lot of a mall - algorithm now dictates through "near-repeat victimization" that both the Starbucks and the mall AND EVERYTHING AROUND THEM are likely locations of future crimes.
And should cops actually notice something in that area aroooouuund the location of a previous crime while being under pressure to fulfill their monthly quotas - it is seen as a validation of the predictive powers of the magical AI.
Rinse and repeat.
It's "Round up the usual suspects!" - only with locations and "supported" by math.
Pretty soon you have cops policing parking lots for broken tail lights and ID-checking everyone around a Starbucks, falling number of arrests for preventable crimes (such as selling drugs or opportunistic crimes) - with actual number of crimes on the increase city-wide.
Cause everyone is listening to the magical algorithm, designed to predict earthquake aftershocks.
Instead of having police patrolling even there where no crimes are being reported - e.g. cause the locals don't trust the police or are afraid of reprisals from the drug dealer next door.
Mit der Dummheit kämpfen Götter selbst vergebens
No one realized that those parts of the airplanes weren't as important to protect because the planes still returned though they were hit there. Protecting the parts where the returning planes didn't get hit was much more important, as obviously, planes hit there never made it back.
This is called survivorship bias. And systems that try to predict crimes from past crime numbers suffer heavily from survivorship bias.