Slashdot Mirror


SETI@Home Says Client 'Upgrades' Are a Bad Idea

bgp4 writes "New Scientist has an article on how 'upgrades' to SETI@Home clients are causing some trouble. Even though the upgrades speed the client up, SETI reps don't want people using them because they may induce bad data. If SETI@home just open-sourced [the SETI@home client], they'd have better PR and a better client." Amen! SETI@home, are you listening?

15 of 250 comments (clear)

  1. Think realistically... by Ageless · · Score: 3

    Comon people, think about this before you flame. Someone submitting invalid data can ruin this entire project for everyone. Open source software doesn't get tested until it's too late, and by time a calculation bug is found in the code, thousands and thousands of entries could be invalid. At least think about the project some before you randomly spew out "Free The Source!!"

    1. Re:Think realistically... by WNight · · Score: 3

      Ok, I'm done. Don't know what took you so long.

      Process the data packets, hash the results, with MD5 or SHA, then return the hash along with whatever other results are needed.

      SETI@Home picks some random number of packets and sends them out to a second user, checking to see if both users return the same hash value.

      And this doesn't just stop malicious forgery, but it also stops bugs which may cause incorrect packets to be returned some of the time.

      This does require keeping a list of the packets a user processes, but this shouln't be that hard. Especially because they start fresh every week (month?) with the new data.


      Where this gets really hard is with Bovine, and the code breaking.

      The message is known to the users, so the cracker (bad guy) knows what result to watch for. And he knows what result to return (yes, or no). Easy to forge.

      Bovine is also more likely to get a user hiding hits, instead of faking misses. For misses, they simply use a hash of the first bytes of the 'plaintext' after each decryption and compare these. For a hit, they need to hope that the packet with the hit is one sent out for inependent verification.

      What they need is a cryptographic way of hiding the true results from the user.

      As in, in you have C, a cyphertext, and P, a plaintext, and want to find K, the key that turns C into P, is there some transformation you can make to C and P that allows the same key to function, but masks the cypher and plaintext?

      This isn't as impossible as it sounds... Imagine a rotation cypher with a numeric key.

      Let's imagine the key is 3.

      Plaintext 'CAT' becomes 'FDW'

      Add the transformation -1 to both

      'CAT' becomes 'BZS', 'FDW' becomes 'ECV'

      Now, imagine the Bovine group wanting to test their users. They think people will hide a success, making the project continue.

      If the user knows that the random-looking cypher text decrypts into 'CAT' they simply watch for 'CAT' to be decrypted and they've found the key.

      So, Bovine sends T(C1) and T(P), the transformed (T layer) cyphertext (C1) and plaintext (P1). The user knows that T(P1) is the desired result, 'FDW' in this case. But they don't know if that plaintext is the contest winner, or a loyalty check. So they decrypt T(C1) and low and behold, it matches T(P1) ('BZS' becomes 'ECV') and the software says 'We did it!'.

      If they report this set of keys as a failure, Bovine *could* have planted this key, and shut them down when it isn't reported. Or, it could be the real key. They'd never know.

      So they're compelled to be correct, because if their answers don't match the ones Bovine expects, Bovine doesn't trust their other answers and their stats are thrown out.

      So, we need to find the transformation (T) that allows T(C)+K to equal T(P) for all cyphertexts and all plaintexts.

      There may not be such a transformation that does this, but if there is, this is the ultimate answer for Bovine, and other projects like this.

  2. It's *not* open source by rde · · Score: 3

    Yeah, I run seti@home. Yeah, I'd like it to be open source. But it's not.
    If you agree to help out, you do so on seti@home's terms. They say they want you to use this software, so you use this software.
    This is not about having the most units completed, or about being the one to find the signal, or about improving the software. It's about helping the project, and contributing to the body of scientific knowledge. If you want to help, use the sanctioned software. If you use anything else, then you're hindering. Go crack cyphers or calculate weather patterns or something.

  3. Yes, SETI is listening by Jerf · · Score: 5

    SETI is listening, and your arguments are rejected.

    This may sound odd to people participating in distributed.net, but SETI is not about processing data as quickly as possible. It's science. In science, you want to hold as many of the variables as similar as possible, so that you can be sure they didn't create a false result (false positive OR false negative). All else being equal, speed is nice, but it is not the goal.

    Open source is not the answer for everything. Sure, if it was open source, some good patches might come out, but how many people would download the code, apply the patches that speed it up, and never have a clue that they just fatally broke the FFT result testing algorithm? Or for that matter, if they broke the FFT algorithm? Or would simply use it to easily learn how to send result blocks without processing them?

    The fact of the matter is, even if you can improve the code, you cannot improve the code. (That's not a typo.) If you can improve the code, instead of helping SETI by processing keys faster, you bring yourself out of alignment with everybody else, create potential bugs in the experiment, and render all of your results suspect. SETI is science... distributed.net is engineering. There is a big difference, and science does things the way it does it for a reason. SETI needs the results to be as solid as possible. (If one of the hacked clients detects a signal, rest assured that even if SETI doesn't subject it to extra scrutiny as a result, some other scientist will.)

    SETI can't stop people from modifying the executable on their own systems, but I think the people calling for SETI to make it even easier for people to modify the system (not just your code, SETI is part of a system and subject to the interactions thereof) have a fundamental misunderstanding of what SETI is about.

    1. Re:Yes, SETI is listening by bgarcia · · Score: 3
      ...but SETI is not about processing data as quickly as possible...
      Absolutely, positively false.

      If that were the case, then SETI@home would simply do all the computations on their own machines, and not ask for help from thousands of systems distributed all over the world.

      but how many people would download the code, apply the patches that speed it up, and never have a clue that they just fatally broke the FFT result testing algorithm?
      This isn't rocket science. You have a central repository that takes in good patches. Clueless people know to just download good code from the repository. You have regression tests to test these patches.
      If you can improve the code, instead of helping SETI by processing keys faster, you bring yourself out of alignment with everybody else...
      What in the world do you mean by "alignment"? And why is it a bad thing?

      Code that does the same thing, only faster, is known as better code.

      What SETI@home needs to do is add some security and checking to their system. Double-check results every now and then. If a particular client is found to have given a bad result, then remove all results obtained from that client.

      Telling people to not upgrade their clients just isn't good enough. It's quite easy for someone to maliciously hack a client to produce bad output. You need a system that can protect against this anyway. And once you have this system in place, then you'll also have protection against buggy clients. Then there should be no reason not to open-source the damn thing.

      --
      I'm a leaf on the wind. Watch how I soar.
  4. Open sourced willy-waving tools: BAD idea by Entrope · · Score: 4

    The insistence with which some people clamor for open sourcing everything really annoys me (and a lot of other people). There are very good reasons not everything is open sourced, and sometimes they're not even due to stupid licensing restrictions imposed by third-party code.

    For something like SETI@home (or distributed.net or whatever else you like), there's a very good reason to keep the clients binary-only. Namely, there is no oracle for verifying that a block of search space was actually searched by the client that claims to have searched it. Abuse of this was seen by the DES challenge and distributed.net before; open-sourcing SETI@home would lead to even worse abuses. Unethical people would modify the code to claim they had searched oodles of key blocks, ruining the results of the search -- and only so they could show off how "studly" their computer system is.

    Of course, maybe this concept is too hard for bgp4 to grasp. But for goodness's sake, it's in the SETI@home FAQ. Whining about their policies on Slashdot isn't likely to change their minds.

    (Beyond the malicious introduction of false reports, it's very easy to "optimize" something like this and introduce numerical or algorithmic errors. Unless you are familiar with advanced theories of signal processing -- the sort of thing you'd find in graduate classes at a good university -- you would be well over your head in looking at how the algorithms work. And there are enough bright grad students working on the average project to know how to optimize for all sorts of cases without the help of a bunch of open source zealots who think that the GPL is some magic potion that can be applied to anything to make it better.)

    1. Re:Open sourced willy-waving tools: BAD idea by Otto · · Score: 3

      For something like SETI@home (or distributed.net or whatever else you like), there's a very good reason to keep the clients binary-only. Namely, there is no oracle for verifying that a block of search space was actually searched by the client that claims to have searched it. Abuse of this was seen by the DES challenge and distributed.net before; open-sourcing SETI@home would lead to even worse abuses.

      You're right. Distributed.net had that exact problem. They found a way to fix it. The client is STILL open-source. See? Simple, huh?

      Check out http://www.distributed.net/source/...

      Not entirely open source, they left out the part you would need to report results back. But that's the part we really don't care about. Everyone wants the algorithms optimized. I just think the SETI@home people should get a clue.

      ---

      --
      - Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
  5. Commercial paranoia from a scientist by athmanb · · Score: 3

    What we see here is just one more guy who simply cannot believe that open-sourced software is more secure then proprietary one (Just like the old argument of some people - mainly the less mathematically knowledgeable suits - who can't believe in the security of OS encryption software, after all the algorithm is readable by everyone and therefore unsecure).

    In reality, the original patch from Olli was a good chance for moving in the bazaar scheme not only to processing power but also to client development, since it was a simple wrapper which didn't do anything special itself, but enabled other programmers to write their own Fat Fourier Transformation algorithms for special chips.

    If S@h would have allowed patching the software (or had even made the client OS) we could now have dozens of different clients tuned for every imaginable Streaming Extension, DSP or multiprocessor enviroment. And the SETI@home project could have also been a starter for future similar projects which could have used all the power of the specialized out there.

    Now for the defense of the project heads: There have been multiple malicious tries by individuals in the past who tried to get their rating up by cracking the protocol and sending bogus data to the server. This perhaps explains why they're this paranoid.

  6. Bunk by Otto · · Score: 3

    SETI is science... distributed.net is engineering. There is a big difference, and science does things the way it does it for a reason. SETI needs the results to be as solid as possible.

    The main thing with the SETI client _should_ be not to make sure it finds a signal, but to make sure it doesn't miss one. I agree with this part of it.

    However, this is data analysis. Pure and simple. Run an algorithm on a lot of signals. Easy. No one questions the algorithm. What is questioned is the implementation. If I can take an algorithm and optimize it to run faster on a particular processor, then I must still get the same results. Otherwise, the algorithm is no longer the same.

    And it's not a question of some hacked program finding a signal. ANY signal found will be subjected to the most intense scrutiny even before it's announced. Then it'll be scrutinized again afterward. And other signals from that sky area will be looked at, at various times going back years. No, a false signal will be eliminated pretty quickly. The thing to watch for is a false negative.

    I don't think open source is the cure-all answer here, but I think the people running seti@home have not given thought to the fact that people running the client are the kind of people who really, really know computers. They know algorithms. They know the internet. They are smarter than the average bear. And they don't like people telling them you cannot know this, or you cannot know that. The SETI people need to explain their position better, or count on a lot of people leaving the project.

    Just my $0.02

    ---

    --
    - Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
  7. speed vs. accuracy: an analogy. by drox · · Score: 5

    Why are the goals of accuracy and speed mutual exclusive?

    Well, here's a demonstration you can try at home. Find a recipe for, say, chocolate chip cookies. You want more speed? Double the heat setting on your oven, and cut the baking time by half. Watch what happens. Your output is no longer accurate, even though the input (yer ingredients, order in which you combined them, etc.) is the same as what was called for in the original "source".

    Now open-sourcing recipes is a fine idea. Go ahead and experiment in the kitchen, and if you can come up with a faster way to make cookies that taste as good as the slow-cooking ones, more power to ya. But don't expect Betty Crocker to print your recipe in her next cookbook until she gets to test it out herself.

    The folks at Seti@home might be better served if they open-sourced their code. It seems like a good way to improve it. But one programmer's improvement is another programmer's bug. And if someone's "improved" Seti@home code is fast but sloppy, and gives unacceptable results, the folks at Seti (and all of us who care about the project) lose out big-time.

    It would be nice if the code were available to be tampered with, fine-tuned, and "improved". It would also be nice if only "real" improvements - not quick'n'dirty shortcuts - were used in crunching the data. But how to tell? We don't live in a perfect world. Open the source and big improvements - as well as tiny-but-devastating bugs - may follow.

    There is supposed to be one accepted program for crunching Seti's data. Arrange it so several versions are running, and you introduce more variables into the experiment. Not good.

  8. Not a Sound Bite Answer by hanway · · Score: 5
    There are several intertwined issues here, so it's not easy to boil everything down to a simple "it should[n't] be Open Source" answer:

    • People are conditioned to want their computers to run faster. The amount of time and effort some people spend to overclock and benchmark their computers is often far out of proportion to the actual benefit they get from their computer's speed. It's not surprising that people treat their SETI@home processing speed as a benchmark.
    • The fact that SETI@home puts up statistics that have turned this experiment into a competition to complete the most work units reinforces that behavior.
    • At least one company (perhaps SGI, but I can't remember for sure) has mentioned their SETI@home crunching speed in some marketing literature, again emphasizing speed over quality of results.
    • As several in this discussion have pointed out, making the clients faster won't help the project because the bottleneck is that SETI@home can't prepare the units fast enough. However...
    • If the client software were improved, clients could potentially do more sophisticated processing in roughly the same time, improving the science. However...
    • This could make the clients seem even slower than they already are, which wouldn't sit well with the kiddies who are more interested in their rank or how fast it makes their box seem than the science involved.
    So what lessons could be learned if this or a similar experiment were to be done again?
    • Deemphasize the ranking of work units completed. Perhaps if the concept of a fixed work unit could be dropped altogether (i.e. make the "size" of a work unit something arbitrary so that they couldn't be compared). This would possibly prevent the client from being used more as a benchmark than for its true purpose.
    • Plan for hacked clients and spoofed results by sending out enough test work units and by cross-checking results with multiple clients enough to have confidence in the results backed up by statistics.
    • With enough cross-checking, you might as well Open Source the client.
    I would be interested to hear if there is a (theoretically) foolproof way to use distributed clients to produce results with confidence if you accept that some clients will be spoofed.
  9. Can we? Please? by theonetruekeebler · · Score: 3
    1. Seti can hardly be convinced to open-source the client until they have means of verifying the blocks they receive.
    But they can sure as anything verify the blocks they've sent, and if they get a report of a hit, they can verify it themselves using a reference implementation, or quietly submit it to someone else and see if they get the same results. I'm thinking that if they make a point of being the One True Source of Source code, and reviewing submitted patches and integrating them themselves, much potential trouble will be eliminated.

    Open-sourcing has the further advantage of becoming very, very portable, almost for free. I'm sure there are a few bored mainframers out there with a few underutilized MVS boxen that could contribute to the Cause.

    <troll>But all of us know the real reason they won't publish the source--SETI@Home isn't sending us space noise at all; they're sending us heavily-encrypted high-level diplomatic radio transmissions from all over the world, letting us do the CIA / NSA / NRO's dirty work for them. Until we see the source, they can't prove it's false; therefore we must assume it's true until we see the source.</troll>.

    Has anybody seen the latest DNRC office prank? Someone's co-worker was running SETI@Home on their office PC, so the DNRCie hacked him a screen saver that beeped and kept flashing the words SETI@Home: Possible extraterrestrial transmission found. Confidence 99.9893% over and over again. I can think of a half dozen people I'd like to do this to, only I'd make damned sure that on the next user input event, they'd get a dialogue saying "Please stand by: writing results to disk. DO NOT INTERRUPT THIS PROCESS" followed two seconds later by a dialogue box saying "Fatal exception error", because there's no point in being just a little bit cruel.

    --

    --
    This is not my sandwich.
  10. During the development, this was OSS by JohnnyCannuk · · Score: 3

    BTW, for all the "why not Open Source" whiners out there, 18 months ago, this project WAS open sourced. You could volunteer, talk directly to the project leaders and hack the code all you wanted. This was during the DEVELOPMENT phase. When the development was finished, the client was closed. Where were the whiners when you could get the source? I got 2 different versions of it to play with (though I don't have them now, since I've nuked my hd a few times since then). The client is NOT a commercial product. Open source is great for commercial products because it continuously improves the quality of the code and design. A commercial product needs this improvement to stay competitive. The SETI@HOME client does not need this. It is doing what it was designed to do for this experiment just fine. There is no "competing product" to stay caught up to or surpass. In 2 years, when the experiment is over, so will the SETI@HOME client. Therefore, any of the benefits of OSS development have already been utilised by by SETI@HOME. It doesn't need to be tweaked or spead up or become more efficient. It simply needs to be used by 'volunteers' the way the project leaders have designed it. If you can't abide by their request, don't volunteer. If you don't like the way it works, don't use it.

    If SETI@HOME didn't have their ranking scheme, would we be even having this discussion? I think that was their only mistake....

    They already used the Open Source model...

    --
    Never by hatred has hatred been appeased, only by kindness - the Buddha
  11. The bizarre bazaar by copito · · Score: 3
    Open source works best when the developers are "scratching their own itch" or at least eating their own dogfood. Linux supports lots of odd hardware because the users need to use lots of odd hardware. OpenBSD is fairly secure because the users need it to be secure.

    In the case of SETI, there is a mismatch between the potential developers and users that is unique to a distributed system. The client user/developer wants a client that is faster, potentially at the expense of accuracy. The official SETI developers are not users primarily, but instead want to have great confidence in the client, potentially at the expence of speed.

    This mismatch turns many of the benefits of open source on its head, from the perspective of the SETI developers, since their goals are different than the potential developer community. Note that the client user/developer would achieve his goals quite well if the client were open sourced, as per dogma.

    Perhaps we need to tweak the dogma a bit to account for this bizarre bazaar. Instead of "open source is always better", try this on for size:
    Open source always makes what I am running on my computer better for my needs, assuming I am a developer or there are developers whose goals coincide with mine.

    It does not necessarily make me happier with what you are running on your computer. If you are running a server I am happy that the server is open source because that means it has open interfaces. I'd be equally happy if you just opened your interfaces. If you are running my code for me in a distributed type of network, I want my goals to be preeminent in the code design. Open source only gaurantees this in the rare case that our goals completely coincide.

    For open source to work reliably in a distributed network of users with varying goals, there has to be a central authority with approved clients for whom the cost of approving clients is less than the benefit of the potential innovation. In addition there must be a way to strongly authenticate that those clients have not been modified. Incidentally, an authentication method is what is really needed in a closed source case as well. I can't think of a foolproof method for doing this, the best bet is probably just random result comparison and some speed heuristics.

    So the answer is that SETI could open source their client, and the people running it would benefit. But it might not be a benefit to them since the benefit of added speed (the most likely outcome) is not currently needed, and the potential for good faith bogus clients increases greatly. Bad faith bogus clients would likely stay at the same frequency.

    --
    --
    "L'IT c'est moi!"
  12. Floating point results differ by heroine · · Score: 3

    Having worked with the gcc compiler for many years I can say that math fluctuations are increadibly easy to introduce through optimization. Floating point operations, though rarely used in CS education, are rampant in the Seti code. The problem isn't in producing bad data but in differences in the output that most floating point optimizations produce. Floating point operations are much more prone to fluctuations due to optimization. The optimized seti clients are guaranteed to produce different results than the original clients.

    Since they haven't been able to collect enough data by orders of magnitude to feed their clients, there's little use to optimized Seti clients outside of learning how the Seti client works.