SETI@Home Says Client 'Upgrades' Are a Bad Idea
bgp4 writes "New Scientist has an article on how 'upgrades' to SETI@Home clients are causing some trouble. Even though the upgrades speed the client up, SETI reps don't want people using them because they may induce bad data. If SETI@home just open-sourced [the SETI@home client], they'd have better PR and a better client." Amen! SETI@home, are you listening?
SETI is listening, and your arguments are rejected.
This may sound odd to people participating in distributed.net, but SETI is not about processing data as quickly as possible. It's science. In science, you want to hold as many of the variables as similar as possible, so that you can be sure they didn't create a false result (false positive OR false negative). All else being equal, speed is nice, but it is not the goal.
Open source is not the answer for everything. Sure, if it was open source, some good patches might come out, but how many people would download the code, apply the patches that speed it up, and never have a clue that they just fatally broke the FFT result testing algorithm? Or for that matter, if they broke the FFT algorithm? Or would simply use it to easily learn how to send result blocks without processing them?
The fact of the matter is, even if you can improve the code, you cannot improve the code. (That's not a typo.) If you can improve the code, instead of helping SETI by processing keys faster, you bring yourself out of alignment with everybody else, create potential bugs in the experiment, and render all of your results suspect. SETI is science... distributed.net is engineering. There is a big difference, and science does things the way it does it for a reason. SETI needs the results to be as solid as possible. (If one of the hacked clients detects a signal, rest assured that even if SETI doesn't subject it to extra scrutiny as a result, some other scientist will.)
SETI can't stop people from modifying the executable on their own systems, but I think the people calling for SETI to make it even easier for people to modify the system (not just your code, SETI is part of a system and subject to the interactions thereof) have a fundamental misunderstanding of what SETI is about.
The insistence with which some people clamor for open sourcing everything really annoys me (and a lot of other people). There are very good reasons not everything is open sourced, and sometimes they're not even due to stupid licensing restrictions imposed by third-party code.
For something like SETI@home (or distributed.net or whatever else you like), there's a very good reason to keep the clients binary-only. Namely, there is no oracle for verifying that a block of search space was actually searched by the client that claims to have searched it. Abuse of this was seen by the DES challenge and distributed.net before; open-sourcing SETI@home would lead to even worse abuses. Unethical people would modify the code to claim they had searched oodles of key blocks, ruining the results of the search -- and only so they could show off how "studly" their computer system is.
Of course, maybe this concept is too hard for bgp4 to grasp. But for goodness's sake, it's in the SETI@home FAQ. Whining about their policies on Slashdot isn't likely to change their minds.
(Beyond the malicious introduction of false reports, it's very easy to "optimize" something like this and introduce numerical or algorithmic errors. Unless you are familiar with advanced theories of signal processing -- the sort of thing you'd find in graduate classes at a good university -- you would be well over your head in looking at how the algorithms work. And there are enough bright grad students working on the average project to know how to optimize for all sorts of cases without the help of a bunch of open source zealots who think that the GPL is some magic potion that can be applied to anything to make it better.)
Why are the goals of accuracy and speed mutual exclusive?
Well, here's a demonstration you can try at home. Find a recipe for, say, chocolate chip cookies. You want more speed? Double the heat setting on your oven, and cut the baking time by half. Watch what happens. Your output is no longer accurate, even though the input (yer ingredients, order in which you combined them, etc.) is the same as what was called for in the original "source".
Now open-sourcing recipes is a fine idea. Go ahead and experiment in the kitchen, and if you can come up with a faster way to make cookies that taste as good as the slow-cooking ones, more power to ya. But don't expect Betty Crocker to print your recipe in her next cookbook until she gets to test it out herself.
The folks at Seti@home might be better served if they open-sourced their code. It seems like a good way to improve it. But one programmer's improvement is another programmer's bug. And if someone's "improved" Seti@home code is fast but sloppy, and gives unacceptable results, the folks at Seti (and all of us who care about the project) lose out big-time.
It would be nice if the code were available to be tampered with, fine-tuned, and "improved". It would also be nice if only "real" improvements - not quick'n'dirty shortcuts - were used in crunching the data. But how to tell? We don't live in a perfect world. Open the source and big improvements - as well as tiny-but-devastating bugs - may follow.
There is supposed to be one accepted program for crunching Seti's data. Arrange it so several versions are running, and you introduce more variables into the experiment. Not good.
- People are conditioned to want their computers to run faster. The amount of time and effort some people spend to overclock and benchmark their computers is often far out of proportion to the actual benefit they get from their computer's speed. It's not surprising that people treat their SETI@home processing speed as a benchmark.
- The fact that SETI@home puts up statistics that have turned this experiment into a competition to complete the most work units reinforces that behavior.
- At least one company (perhaps SGI, but I can't remember for sure) has mentioned their SETI@home crunching speed in some marketing literature, again emphasizing speed over quality of results.
- As several in this discussion have pointed out, making the clients faster won't help the project because the bottleneck is that SETI@home can't prepare the units fast enough. However...
- If the client software were improved, clients could potentially do more sophisticated processing in roughly the same time, improving the science. However...
- This could make the clients seem even slower than they already are, which wouldn't sit well with the kiddies who are more interested in their rank or how fast it makes their box seem than the science involved.
So what lessons could be learned if this or a similar experiment were to be done again?- Deemphasize the ranking of work units completed. Perhaps if the concept of a fixed work unit could be dropped altogether (i.e. make the "size" of a work unit something arbitrary so that they couldn't be compared). This would possibly prevent the client from being used more as a benchmark than for its true purpose.
- Plan for hacked clients and spoofed results by sending out enough test work units and by cross-checking results with multiple clients enough to have confidence in the results backed up by statistics.
- With enough cross-checking, you might as well Open Source the client.
I would be interested to hear if there is a (theoretically) foolproof way to use distributed clients to produce results with confidence if you accept that some clients will be spoofed.