SETI's Anti-Cheating Strategy
mtDNA writes: "There's an article in the New York Times about the strategies SETI is using to avoid fraudulent reports. One trick they're using is multiple analyses of the same data. Another strategy is the use of "ringer" data, where they send you fake data for which they know the results." One of the researchers has several postscript papers on his home page - Incentives for Sharing in Peer-to-Peer Networks, Uncheatable Distributed Computations, Distributed Computing with Payout. In related news, ProcessTree apparently sent out an email to participants indicating it is closing up shop, so although SETI seems to be chugging along, the idea of distributed computing as a business model is perhaps a bit premature.
William Gibson's "Black Ice" should do nicely. Failing that, slice or dice the data in multiple directions and compare results.
(The "different slices" is important, to ensure that you aren't trying to validate one modified client against another.)
Let's say that you have a grid of data, N x M x B (where N, M is the data, and B is the number of bits per word for that data.)
The probability that one modified client is doing the rounds, and will be encountered again by chance, is non-zero. It's not high, but it's high enough that nobody is releasing their client code in a hurry.
On the other hand, you've three simple slices you can do (along each axis), and any number of more complicated ones. That means that you have to hit the correctly-modified client for the slice you've picked, for each slice in each axis, for the data to be marked "valid". Any failure by any one client to return a result that confirms the other 16 clients that would overlap with it, would signal a bogus client.
With that much redundancy, you could also simply have "client voting". The results that are returned identically by the most clients (in excess of some threshold), regardless of the direction of slice, could be regarded as "true", with a reasonable degree of certainty. (Sure, it's not 100%, but that's the price you pay for having a society that rewards the greedy and the ethically sick.)
Of course, if you want to go one stage further, there's nothing to stop you "dicing" the data. Instead of taking a single slice through the data, you take random, small chunks from all sections, and feed them in a random order to the client. Again, the server re-constitutes the "valid" results, by merging together the results from multiple clients, taking the generally-accepted results as "correct".
This would mean that, instead of needing 20+ clients, all with suitable code for cheating "correctly" along each slice, you now need !(N x M x B)/(Size of chunks) such clients. The values don't have to be large to make this a virtual impossibility.
If you then only credit "confirmed" units (whether "slices" or "chunks"), since cheating becomes impractical, short of a global Internet conspiracy which also included the researchers, nobody is going to bother modifying the clients in any way which produced inaccurate results.
They =MIGHT= modify them to produce faster, accurate results. But, in that case, who bloody cares? I'm not going to object to someone handing round an honest, genuine client that can plow through 10 times as many blocks in a second, and still deliver the true results back to the central system. And, if the scientists were being honest to themselves, I doubt they would, either. PROVIDED the results could be guaranteed.
And that gets back to why independent result reviews, using slicing, dicing, or some other method of producing non-duplicate data sets, is very important.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
May 2001
Dear ProcessTree Network suppliers,
It is with sadness that I have to announce that this will be the last newsletter you receive from Distributed Science, Inc.
etc etc etc...
We will diligently negotiate the sale of the supplier database, with emphasis on the privacy policy under which you signed up. As soon as we came to a result, the new owners will be informing you about any changes they might plan, including an opt-out for those concerned about their privacy under new management.
EEP!
Their argument against open-sourcing the client has always been that this would allow cheaters and that people would use modified clients that didn't crunch the numbers right. To which I have always responded that with any distributed computational task running on untrusted clients, you would have to do this sort of redundant analysis on each data block anyway. Even a closed-source client can be hacked fairly easily if you really wanted to, so not releasing the source doesn't magically guarantee the validity of any client-side processing. It's nice to see SETI@Home finally acknowledge what some of us have known all along.
So, when will we be seeing the client source code available for download? I'm all ready to start working on an Xscreensaver module for it.
Caution: contents may be quarrelsome and meticulous!
Your right to not believe: Americans United for Separation of Church and
As far as I know this is nothing new, distirbuted.net has always done this on thier projects (RC5, DES) to make sure people are actually checking the blocks.
God, I loved that old feature on Telegard/Renegade and the like. Though most people figured it out when noone responded to their flames, and then made a fake account/logged on as a guest, to find out the truth. But this would work with seti, where there is no 'feedback'. Hell, they've even disabled 'see my last 10 packets' as of late, so as long as they kept on incrementing the person's records to their eyes, it wouldn't matter. As far as the problem that you present - a broken computer being innocent as compared to malicious data. That really isn't a problem. Not to sound like an arrogant fuckwad, but the end result is the same to seti. Data that's just wrong as a result of a computer going tits-up or data that's wrong from a computer being messed with - it really doesn't matter. They're going to need to reject both.
Here are some warning signs that you may have a SETI hoax on your hands:
In other news: Bi Curious: The Senator Jim Jeffords Story
But the problem is not ordinary punks hacking the client to create false positives. No, the problem are those Beowulf clusters in underground NSA facilities making all the false negatives!
--
--hongpong.com
My questions is Why anyone would want to cheat SETI? I could just see the guy now:
"LOOK! i'm high on the hours list with 31337 years of data done on my computer for SETI. I RULE! Oh god, I wish I were dead..."
Instead of locking out a cheater, a better solution is to continue to feed data to that cheater, but ignore any results they submit. This will help prevent the cheater from simply creating a new account, as they will be unaware that their false results have been detected.
So somebody's trying to manipulate the system in order to artificially inflate a meaningless number in a database! How shocking! (Score=5, Insightful)
Toronto-area transit rider? Rate your ride.