pball · Slashdot Mirror

← Back to Users

User: pball

pball's activity in the archive.

Stories: 0
Comments: 2
First seen: 2002-02-01
Last seen: 2002-02-27
Profile: (view on slashdot.org)

Comments · 2

how to let Congress know what's up on SSSCA Squirms Forward Again Thursday · 2002-02-27 08:29 · Score: 1

I just called Sen Hollings' office and his staffer denied knowing anything about the bill. He sent me to a staffer at the Senate Commerce Committee who said that they canvassed a "wide variety" of positions for the hearing. Well, my quick review of the people who are going to testify makes that laughable. Anyway, I told them about slashdot, and they promised to check the site and this discussion. Anyone care to call them and make sure they don't forget to surf the web for a bit more opinion?
you can reach the Committee at (202) 224-8418.
simple quantitative problem with the proposal on Feds Undertaking Massive Passenger Profiling Plan · 2002-02-01 05:08 · Score: 3, Informative

Both of the schemes proposed in the WP article are essentially statistical models that predict behavior. Stats are a fine thing (hey, I'm a statistician, I build models all the time), but they depend on having enough examples of the event you're trying to predict in order to isolate the variables that correlate with it.
Say I have a dependent variable called "did a crazy, evil thing." Now I have dozens of independent variables called "income," "purchase behavior," etc. How many positive cases do I have on the "did a crazy, evil thing" variable? Let's assume that the FBI won't just leak all their investigative data into this system (which would permanently blow those investigations). So that means we have what, like 100 million people with negative scores on the "did a crazy, evil thing" variable, and like 30 ppl with positive scores?
The statistics suck here, folks, you will NEVER isolate the variation under these conditions. You'll get millions of innocent people whose patterns among the indep variables match the incredibly thin patterns you get among the terrorists.
This is TOTALLY different from credit analysis schemes where you have like 1/3 or 1/2 of the people in the dataset with occasional or severe credit problems. Modeling really works here b/c a) you have a quantitative measure of the dependent variable (you can smoothly and precisely quantify HOW bad someone's credit is), and b) the dependent variable gives a nice scale with tractable variation, probably one of those infamous bell distributions conveniently around some point (or if you stratify properly you'll find the bells, whatever).
And don't be fooled by the fancy-sounding "neural network" stuff, that's just another modeling technique which loosens a few assumptions. But it does NOT fundamentally change the need to have enough positive cases to balance the variation in the independent variables. And binary dependent variables? Sheesh. BAD DATA! DOWN BOY!
And let's talk for a second about the living arrangement correlation analysis. If someone X has lived with someone Y known to be positive on the "did a crazy, evil thing," variable, I sure as hell hope that someone X was questioned very, very thoroughly by the cops. So what good is this additional profiling??
BTW, I travel internationally with my laptop pretty often. EVERY SINGLE TIME I go through Schipol in Amsterdam they pull me out of the line for ~20 mins of additional questioning. They don't tell me why, but I'm tripping something in their profile. It's not racial, but I think "has been to Bosnia" or something, plus that I have a laptop. They always pester about whether the laptop is mine or my employer's, and being the latter, they are very, very concerned.
Profiling creates millions of false positives, and it is by no means clear that it prevents false negatives.