Slashdot Mirror


Data Mining In Law Enforcement

jcatcw points out a blog entry by Scott McPherson, CIO for the Florida House of Representatives. McPherson condemns the state of data sharing and data mining in law enforcement, saying that the US causes itself a great deal of trouble by focusing more on "antiterror armor and nuke-sniffing devices" than a useful information distribution network. He discusses a few such projects, and how they could have directly affected the events of 9/11. Quoting: "One of those ingenious things that actually worked, Seisint founder Hank Asher's brilliant MATRIX system, remains mired in controversy and politics. Hank showed me MATRIX just a few short weeks after the 9/11 attacks. Using law enforcement data and commercial data, all of the commercial data available in the public domain, Asher's query produced [hijacker Mohamed] Atta's photo -- and about 80 others, many of them fellow 9/11 hijackers, many of them associates of the 9/11 hijackers. It was simple data mining and algorithms, and none of the information was obtained illegally."

10 of 148 comments (clear)

  1. Hold on a minute here by goldcd · · Score: 4, Insightful

    so he managed to write some software that analyzed the internet - and managed to produce photos of some of the people that erm had already erm been identified. Surely (and maybe I've misunderstood something here) a 'result' would be identifying people likely to commit terrorist attacks, allowing enforcement agencies to monitor them and prevent them from commiting future attacks. (and no - this doesn't mean off-shoring every muslin who downloaded the Jolly Roger Cookbook).

    1. Re:Hold on a minute here by Alpha830RulZ · · Score: 4, Insightful

      If you assume for a minute that the author of TFA is smart enough to figure out if this was a google search or not, this is probably pretty interesting. I'm going to, perhaps naively, assume that the data mining approach was done as a reasonable experiment of a mining approach on some set of data, and arrived at a set of names that should be interesting to check up up. I'll further assume that he properly restricted his training set of data to only data that was available before 9/11.

      If that is the case, this is a pretty impressive set of results. Being able to identify, say, 5 of the attackers, and to have a number of the other hits be known associates, when the training set likely consisted of at least 10's of thousands of names, is pretty fair accuracy. The false positive rate is pretty fair, as well, especially when you contrast it to the No Fly list, which has numerous false positives, and no known successes in identifying anyone of interest.

      There is likely some sort of clustering algorithm behind this, and the math behind those is pretty solid. Before you dis this, or even get excited about privacy issues, I'd suggest you check out a reference such as this

      I'm not really concerned about data mining as a privacy issue, and I think it's a pretty legitimate approach for law enforcement. As a side note, I do data mining and predictive analytics for a living. It's objective, it's factual, and if the practitioner is knowledgable about it, it shouldn't be stigmatizing. Indeed, it would reduce scrutiny on the majority of the folks that would otherwise be tarred by having an arabic surname and swarthy skin.

      It would have the potential to be vastly more effective, and vastly less expensive than the path we are on now. One reason that we might not be using could be that we -have- used it, and didn't find anything. That's the thing about objective data mining, if there is nothing there, it'll tell you that. I don't think, for our current administration, that it's a desireable outcome to find that there is nothing to worry about. If that happened, the populace would be less fearful, and less easy to control.

      Take this one step further, and apply this bit of thought. It has been shown time and again that the TSA is incompetent, and that any motivated terrorist could get a weapon on board a plane. It is further obvious that our ports are porous, and that soft targets abound. We have seen no triumphant pictures of the authorities frog marching attempted terrorists away, no success stories of how these measures have saved our lives again. We have also seen no further attacks.

      This strongly suggests to this practitioner that we have a near zero incidence rate of terrorists in the US; that when a terrorist attempts an attack, he succeeds, and that the lack of attacks suggests that the attack rate is close to zero.

      Data mining would be a useful tool to calibrate this theory.

      --
      I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
  2. Hindsight is 20/20 by garcia · · Score: 4, Insightful

    Wow, really? You were able to identify after the fact? Great! Real useful -- that and the fact that it's much easier to find that information when you are looking for a specific result. If this guy had come out and said, "hey, I was able to find those people before the fact," then I'd be impressed.

    1. Re:Hindsight is 20/20 by FredThompson · · Score: 4, Interesting

      Exactly. "Connecting the dots" is always easier when you know the connections. Discovering them is a lot harder.

      This guy also doesn't seem to have much knowledge of intel gathering. The idea that forward projection isn't happening is...uh...wrong, and that's all I'll say on the matter (disclaimer: I'm ex-NSA)

      He also doesn't seem to comprehend the concept of misdirection, as the term is used by performance magicians.

      I'd guess he can't even pronounce the name, "Sun Tzu", let alone have read the writings.

    2. Re:Hindsight is 20/20 by Chris+Burke · · Score: 4, Insightful

      Yeah, I've got a mother-fucking perfect Suicide Bomber detector. It never fails. 100% specificity, 100% sensitivity. Here's how it works (it's patented, so my lucrative business is not in danger by sharing my methods):

      I stand around a marketplace in Baghdad. When a guy runs up to a crowd, screams "Allah Akhbar", pulls a string on his coat, and fucking explodes all over the place, I point at the spot where he used to be, and say "That was a suicide bomber".

      And before you try to horn in on my business, know that I've already sold the DoD enhancements to my algorithm that covers cases where the bomber doesn't scream "Allah Akhbar", or where the bomber is a she not a he, or where the explosives are in a car not a coat. Or combinations thereof.

      But seriously, it says that "his query" produced Atta's photo (and 80 others only some of which apparently had anything to do with 9/11). What exactly was this query? "9/11 hijackers"? "terrorists named Atta"? "Arabs who've been pulled over"? So Atta's driving citations means it was theoretically possible for someone to pull his name up. The question is, why would they have done this? What would have motivated someone to perform that query, and how exactly does data mining driving citations lead to the important conclusion that Atta was a terrorist?

      The article makes good points that data sharing between law enforcement agencies is a good thing, and helps with such rather mundane things as finding fugitives who skip out on parole, or people who don't show up for court dates. But that MATRIX nonsense is yet another attempt to cash in on post-9/11 anti-terror funding bonanzas. Which, now that I've gotten my slice of the pie, I'm against. :)

      --

      The enemies of Democracy are
  3. Maybe by oodaloop · · Score: 4, Interesting

    I have a lot of issues with the various things in this article, but I'll keep it to one for now. Maybe Atta could have been arrested because of better coordination between local law enforcement. But his arrest almost certainly would NOT have prevented 9/11. Moussawi was supposed to be there that fateful day, and it still went down. One person arrested, even one of the many masterminds, would not have prevented it.

    Also, no local law enforcement officer would have been able to piece together this plot from looking through one car BEFORE the event. Piloting multiple planes simultaneously into various landmarks was just too implausible to be believed before it happened. Even if John McClain himself figured it out, he wouldn't be able to convince anyone to help him stop 19 other people from boarding planes in multiple airports.

    Sharing information sure beats what we're doing now, both in law enforcement and the intelligence community where I work, which is holding everything close so no one else can take credit. But let's not exaggerate the benefits here.

    --
    Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
  4. Worst. Clairvoyant. Ever. by Zigurd · · Score: 4, Funny

    Hank showed me MATRIX just a few short weeks after the 9/11 attacks. Using law enforcement data and commercial data, all of the commercial data available in the public domain, Asher's query produced [hijacker Mohamed] Atta's photo -- and about 80 others, many of them fellow 9/11 hijackers, many of them associates of the 9/11 hijackers.


    A few short weeks after the Kentucky Derby, I devised a database system that predicted the winner. Impressive, no?

  5. Islands of Automation by rlp · · Score: 4, Informative

    I've worked in the field of law enforcement data sharing. Fact is that most law enforcement agencies are either islands of automation or very loosely connected to other agencies. The stuff you see in TV and movies ("24") is a fantasy. Adjacent towns and cities rarely share information, and this lack of knowledge can put members of their police force in danger (for instance when making a traffic stop). A few years ago, the DOJ kicked off a sharing initiative with the Global Justice XML Data Model (GJXDM). This is an XML based specification for exchanging law enforcement data that was developed at Georgia Tech. I was involved in an initiative in Ohio to share police record management system information at a state level. The system was deployed and is operational today. GJXDM has been superseded by the National Information Exchange Model (NIEM). It should be noted that the NIEM model is even more complex than it's predecessor and tends to break many XML tools. The data exchanged tends to be fairly rudimentary and fairly sparse - arrests, bookings, warrants. Nevertheless, most agencies, and most states have either not implemented data sharing or are in the earliest stages of doing so.

    --
    [Insert pithy quote here]
  6. pff by Kingrames · · Score: 4, Funny

    "Data Mining In Law Enforcement"

    I'll take "How do you round up the most possible innocent people and make false charges against them" for $500, Alex...

    --
    If you can read this, I forgot to post anonymously.
  7. Re:Or not by shmlco · · Score: 5, Insightful

    Well, as in many things it would seem that there's a loophole or two involved. While there are many restrictions placed on government in terms of data collection and data mining, there are few placed on individual businesses who do the same thing (think credit agencies). As such, there's little stopping the government from simply contracting out its needs to private companies.

    --
    Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.