Slashdot Mirror


Distributed.net Suspends OGR project

st.n. writes "According to this statement, distributed.net is suspending its new OGR-24 project, which was started just a week ago, because of a missing ntohl() call in the buffering code. They were 24% done already and have to start over again now. "

27 of 110 comments (clear)

  1. 24% done? by jawad · · Score: 2

    By "So far, we have completed approximately 24% of the total stubs, mostly smaller ones." doesn't that mean that they're not 24% done in terms of time? I think the statement "They were 24% done already and have to start over again now." is a little misleading...

    1. Re:24% done? by Cramer · · Score: 2

      As I understand it from dB's note, it's not any one client that's broken. The problem is introduced by sharing buffers between different endian machines. Intels will read it's buffers correctly; Sparcs will read it's buffers correctly. HOWEVER, a sparc reading an intel buffer and vice versa will corrupt the counts (and possibly more, but no one has said anything about that.)

      There's no way they can point a finger at any specific binary. And in fact, detecting the corruption is not easy -- just how many nodes qualifies as "messed up"?

      This is a very bad omen for Distributed.Net. It's taken two years to deploy OGR only to see it broken out of the starting gate. This is a very stupid mistake; one that would not have happened if people had paid attention to their work and tested a supported function (buffer sharing)

    2. Re:24% done? by Cramer · · Score: 2

      First, _WE_ have jobs. That means we have more important things to do than write free software -- no matter how much we may like writing code for Cosm. Second, there are very few core people writing code for Cosm -- and we are actually writing code, not using someone else's stuff (most of DCTI's cores originated outside DCTI. DCTI eventually generated their own 'core' code or optimized the aquired code.)

      As for "releas[ing] something", the Cosm development has been open to the public for a year. There's an IRC channel on EFNet, several mailing lists, a web site full of information, and a public cvs tree of the code. I invite you to go look. (the irc channel is #cosm, btw)

      Yes, there are weeks that go by when "nothing" happens. We have lives to live and jobs that pay our bills to attend to. Work may be slow at times, but it's never at a full stand-still. There's alot of stuff going on that isn't in the CVS tree (yet) pending some legal wording to prevent people from stealing our work as their own. (Plus there's no point in making it available yet. And before you ask, I'm refering to stats processing code -- extremely fast and efficient. (Blindingly fast compared to the existing DCTI stats.))

      My philsophy is that there is no such thing as "bug free". However, there is a distinction to be made where things are supposed to work. This is called "testing" and "verification" in the software industry. In this specific case, DCTI failed to verify proper functionality of sharing buffer between machines -- a published feature, or it used to be.

  2. Evil Wintel :-( by Telcontar · · Score: 2

    If Intel x86 chips, where the bytes are ordered the wrong way round, had not been so sucessful due to their most unholy alliance with software where much more is the wrong way round, we would never have had this problem.

    If you can figure out this sentence, then you are probably too smart to think of a reason why one would write a number backwards (in memory). ;-)

    1. Re:Evil Wintel :-( by Pascal+Q.+Porcupine · · Score: 4
      Well, the "justification" for little-endian machines was, back in the Good Old Days, it was very useful to be able to have a free downward typecast on pointers (with big-endian you have to either add some value to your index, or AND with a bitmask after the read, whereas with little-endian you just take the same pointer and use a smaller-sized read).

      I never said it was a good justification. :) After all, in situations like that, you usually aren't using pointers anyway...

      Unfortunately, because of x86's influence, a lot of other vendors have bastardized their architectures. For example, newer Alphas have both big- and little-endian modes, and apparently AlphaLinux runs in the little-endian mode simply for easy compatability with x86. IMO, they should do it in big-endian so that fun bugs show up causing them to need to properly ntohl() and htonl() all their data. It'd make for much more consistency with the porting efforts to REAL platforms, such as PPC and Sparc (that isn't to say that Alpha isn't a real platform, of course, but it can hardly be treated with respect when it's got a little-endian mode simply to pander to x86 apologists).

      At least IA-64 is switchable endian, though (except in IA-32 mode, obviously), so at least there's some validation on that front. Hopefully the IA-64 Linux porting effort is doing the Right Thing and using the big-endian mode.
      ---
      "'Is not a quine' is not a quine" is a quine.

      --
      "'Is not a quine' is not a quine" is a quine.
      Quine "quine?
    2. Re:Evil Wintel :-( by Pascal+Q.+Porcupine · · Score: 2

      Yes, I KNOW they're NOPs on big-endian, but they actually DO stuff on little-endian... the problem is that a lot of code out there assumes the endianness of the machine, so rather than storing it in network byte order it stores in host byte order. The reason I said to properly-encapsulate all data in ntohl and htonl is so that the same code would work on both types of platform, which is, of course, the whole POINT.
      ---
      "'Is not a quine' is not a quine" is a quine.

      --
      "'Is not a quine' is not a quine" is a quine.
      Quine "quine?
  3. Wasted brain by marcus · · Score: 3

    Is this childlike reasoning ever going to stop?

    You left computers on last week. Were they going to be on anyway? If so, there was no waste.

    Is it cold where those computers are? Would the heater be running anyway? If so, there was no waste.

    If the only reason that the computers were left on was so that you could gain ground in the stats race, then guess what? YOU wasted resources. No one else did.

    So, pay your electric bill and live with yourself as you are and shut up about it, or learn from your mistake and don't do it again. Either way, we really don't need to hear whining about the resources that YOU wasted.

    --
    Good judgement comes from experience, and experience comes from bad judgement.
    - W. Wriston, former Citibank CEO
  4. I noticed this this morning by Dicky · · Score: 2
    I noticed this morning and switched my client back onto RC5. I fully understand how this kind of thing happens - there is no way the people at the center of a project like Distributed.net can test all possible combinations of hardware and software.

    This may be a silly question, but I'm going to ask it anyway: Did (some) of the OGR blocks take a huge amount of time for others, or was it just me? I'm running the client on a Celeron 300A (not a power machine, but a lot faster than the 386 I started running RC5 on) and some of the OGR blocks took over 14 hours. I didn't know we'd done anything like 25% of the 'keyspace', but it looked to me like this project was going to go on for ever, given the speed of my computer.

    --
    Paranoia isn't an infectious condition, it's a way of life
    1. Re:I noticed this this morning by Pike · · Score: 3

      Let me get this straight...you switched back to RC5 because you thought OGR was going to take forever?

      Isn't RC5 the contest that has been dragging on for more than 2 years?? And they still haven't finished even a quarter of the keyspace!

      With that kind of delay, D.net won't be proving anything about the vulnerability of RC5-64 when/if they find the solution. They may get a $10,000 check, but they won't score any usefulness or political points.

      Who cares it takes 3 days for a single box to complete an OGR node if we still finish the project in only a month or two? I welcome useful, fast, record-breaking projects like this to break up the glacial RC5 stuff.

      JD

  5. Dead Horses, Beating of by evilpenguin · · Score: 2

    I still participate in distributed.net efforts using many boxes. I have them up all the time anyways. I consider it increased productivity to let someone do something marginally useful with my idle clocks.

    That said, if they would release the source for their clients they would find these problems sooner (I suspect) and there would be less wasted time and resources...

    GPL the client!

    1. Re:Dead Horses, Beating of by Nugget94M · · Score: 4
      Yes, you're correct. It's very enlightening that this error existed in one of the few pieces of code which is not present in the public source. It's unfortunate that we're required to obscure the buffer-handling and network protocol aspects of the client for project integrity, and we'd very much prefer it not be that way. It's unlikely that this sort of error would have survived the scrutiny of public source.

      For those hwo haven't read it, Jeff Lawson wrote a document which explains why there are still portions of the client which are necessarily closed-source. The link is easy to miss, so I'm assuming those who are raising the issue here on slashdot have simply missed it.

    2. Re:Dead Horses, Beating of by evilpenguin · · Score: 2

      Since you say "we're required to obscure," I presume you are part of distributed.net. Please understand that I respectfully disagree with your policy. In other words, it's not the choice I'd make, but I don't consider you to be a bunch of blinkered philistine code despots either!

      I simply do not think hiding the code prevents a thing and opening might prevent embarassing incidents like this one.

      I *do* understand that opening the code makes it easier to generate "fake" data, and that it requires person-hours to undo such shenanigans. If you had more bogus data, it might overwhelm your ability to remove it and block the generators of it.

      You might find, however, some creative remedies out in the world if you let your peers review it.

      In any case, I did read the document you cite, I just disagree with it. That disagreement is tempered by respect for your point of view and your accomplishments. I certainly haven't built anything that matches the acheivements of distributed.net.

      Good luck on the fix, and meanwhile, back to RC5-64!

  6. Down-moderation! YES! by DonkPunch · · Score: 2

    AAAALLLLLLRIIIIGHT! My first down-moderation! I LOVE IT!

    BRING IT ON! I've got 100+ Karma to burn and it STARTS TODAY!

    Let the word go out to both moderators and trolls alike, TODAY DONKPUNCH IS OFFICIALLY ON THE DARK SIDE! I have become a moderator's worst freakin' nightmare -- an over-caffeinated offtopic troll with a default 2!

    Why did this have to happen? Where did things go wrong? Was I forced into it? Did the down-moderation destroy my self-esteem? Am I just a burnout? Is my unique humor and insight unappreciated by my peers in my time? Will I be remembered as a misunderstood genius when I'm gone?

    I predict a new article: "Ask Slashdot: DonkPunch -- when good posters go bad. How can we keep this from happening again?"

    E! News and VH-1 will feature a special "Behind The Dot" episode: "The Rise And Fall of DonkPunch's Karma" They'll show scenes of me posting pro-Linux suckup posts to desperately get my Karma back up to 50 or so. All of my posts will be at least 200 lines long, requiring a "Read the Rest of This Comment" link.

    Ye Gods, Moderators, don't you see what you've done? You've created a monster! You've banished me to the land of the trolls AND I LIKE IT HERE! Seems to me the trolls have a heck of a lot more fun on slashduh anyway.

    Now you will pay the price for your lack of vision!

    --

    Save the whales. Feed the hungry. Free the mallocs.
  7. Re:Why Golumb rulers, anyway? by Phil+Gregory · · Score: 4

    You might be surprised at the varied applications of many "pure" mathematical problems.

    The only application I am certain of for OGRs is radio telescope arrangement. When surveying space, the bigger the telescope (although these tend to look more like satellite dishes), the better. However, you can have two smallish dishes a certain distance apart function in tandem just like a single dish with a diameter equal to the separation of the dishes.

    With an array of smaller dishes, an ideal arrangement will maximize the number of different distances between dishes (maximizing the frequencies which can be observed). Sound familiar? OGR solutions can be mapped onto radio telescope placements.

    I'm sure that there are other applications where the number of differences between a cetain number of points needs to be maximized, but I don't know of any off the top of my head.


    --Phil (I remember first being introduced to Golumb Rulers via a link from the (now defunct) Geek Site of the Day.)
    --
    355/113 -- Not the famous irrational number PI, but an incredible simulation!
  8. Re:Why Golumb rulers, anyway? by scheme · · Score: 4

    OGRs have application to data communications, cryptography and lithography. I would say that it has a lot of use since it may lead to faster/better encryption/data transfers as well as better/cheaper chip fabs and indirectly cheaper cpu and microprocessors. A lot more useful than pretty pictures,

    --
    "When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
  9. It is Open Source. by mbrooks · · Score: 2

    Look at http://www.distributed.net/source/. If you had visited the distributed.net site, you probably would have noticed the "Source" option in the lefthand menu bar. But, hey, it's easier to whine, yes?

    --Matt

  10. Thank you Sir! by DonkPunch · · Score: 2

    *smack* THANK YOU SIR! MAY I HAVE ANOTHER?

    C'mon wimps! Is that the best you can do? I'm laughing in your humorless, petrified, grits-covered, moderating faces.

    What is that!? A FreeBSD pin!!?? ON YOUR UNIFORM!!!???

    Just you wait.... I won't be the last. Even if you crush my karma with your dogma; even if you cancel my login, there will be others. Foogle has already started to turn. I am convinced that Signal 11 will someday turn. In fact, I believe that Signal 11 is already a troll who is just building up unstoppable karma for THE DAY OF RECKONING.

    As Mariah Carey sang so eloquently in "The Matrix", "My Heart Will Go On."

    Someday, perhaps even Bruce Perens will submit a down-moderated post? WHAT WILL YOU DO THEN? Will it be the end of everything you've believed in? Will it be the end of all you hold dear? Will you have to go back to actually WRITING CODE instead of sharing your feelings on what it means to be a geek?

    I know some of you long-time slashduh readers will be frightened by my tone. Fear not. I'm still the same warm, fuzzy, lovable DonkPunch. You can still order plush DonkPunch toys from the Copyleft website.

    But the humorless moderators have wronged me and today I must dwell in the land of the trolls. You know what? It's kind of nice here! These guys have cable and a VERY nice cappucino machine. Best of all, they actually WRITE CODE instead of whining for big companies to do the work for them. If Trollmastah, GritsBoy, and NakedAndPetrifiedMan don't mind, I might stay awhile.

    --

    Save the whales. Feed the hungry. Free the mallocs.
  11. When will it resume? by Esperandi · · Score: 2

    I joined the Distributed.net effort specifically because I wanted to help out working with the golomb rulers, a very interesting mathematical project. I really have no interest in cracking any form of encryption. Yes, it can be done brute force. We know that already. We don't know what the 24 mark optimal golomb ruler is. So finding it would be cool. Anyhow, nowhere on the distributed.net page does it say when the new clients will be available... does anyone have any insider information on whether the client upgrades they are considering are going to take awhile?

    Adding the conversion call is easy, sure, but the 'improved progress reporting' and such... any idea how long it'll take?

    Esperandi

  12. Old clients/new clients by linuxci · · Score: 4

    So it appears that we'll all have to go and download new clients when they're released to get round this bug. AFAIK distributed.net will discard any blocks submitted by the old clients but the old clients will still attempt to fetch blocks off the keyserver. It'd be a good idea to change the project ID slightly (e.g. to OGR-a) so that the old clients will not try to fetch these blocks in the first place. Because if these old clients are just downloading OGR blocks that just get discarded it's a waste of the CPU time where they could be doing RC5 instead.

    Basically there has to be some system to stop the buggy clients downloading blocks and wasting their time.

    --
    Make use of your spare CPU time!

  13. Boom Boom Boom by Nugget94M · · Score: 3
    This is slashdot... After the horse is dead, we skin it and learn to play drums.

    The fear is not bogus data that we have to remove. Rather, the true damage to the integrity of a project comes from bogus data which is indistinguishable from legitimate data. Infinite man-hours of effort cannot correct the damage done by a false-negative in the case of a crypto contest.

    It's also a bit optimistic to assume that we'd be able to isolate a committed vandal to the degree required to successfully filter their bogus submissions. An attacker could simply instruct their malicious client to submit work using participant emails randomly taken from stats, easily blending their work in with legitimate work. We can't assume that every attacker will send in their work with a consistent IP or email address.

    I'm not making the argument that there's not room for improvement in the current scheme, but it's difficult for us to become too enamored with solutions that only offer a marginal improvement over the current model.

    We welcome suggestions and creative remedies from out in the world. If someone has a solution to this quandary, we'd love to implement it. This client trust issue is the holy grail of distributed computing projects, and we hope that it's solvable. I don't think that a lack of access to our buffer file formats is a stumbling block which would prevent a creative and insightful person from devising a solution, however. We don't need to open that source in order to allow someone to solve the issue.

    Thanks for your comments and support, and if you do have any proposal which would allow us to trust the work performed by an open source client, we'd love to put it to work.

    1. Re:Boom Boom Boom by evilpenguin · · Score: 2

      I've thought of dozens! Unfortunately, all of them are just as easily compromised as the original. It is a tough nut to crack. I've thought about an MD5 hash that includes the result and the client code memory image, but since a programmer can just write a routine that calculates an MD5 sum over his bogus data set and his real client image, it isn't much of a solution, is it?

      The same problem seems to exist with networked games. How do you prevent cheating?

      I'm not sure you can prevent cheating, but can't you at least use public key cryptography (specifically digital signatures) to definitively identify sources? You then double check a random packet that comes in with that signature. If it is good, you can be reasonably sure that everything that comes in signed with that key is good? (You don't want to validate a fixed packet, say, the first packet -- an attacker would send back a real first packet and then send fake ones). From then on you retest random packets from random users. If you retain the cryptographically verified identity of the origin of each result, you can quickly isolate all results from a source that shows up with false negatives in a random check.

      If everyone participating knows that they will definitely be checked at least once for validity, and may be checked additional times at any time, then I think the incentive to cheat will be brought down several notches.

      Sure, in this scheme someone can implement the public key crypto algorithm solely to leigitimately send fake data, but since they have to send you the public key and must sign each result set with the private key, you WILL be able to identify and remove the bogus source when you detect it.

      I realize this is a lot more server side work! I also realize it may be impossible because of crypto export regulations (ding dang it!), but I still think a scheme along these lines could be implemented without too much difficulty.

      This idea may be full of holes (I worked it out as I typed, so I haven't exactly "bench audited" it!), but I think the premise is sound. It doesn't prevent anything, but it is likely to detect abuse and any abuse can easily be isolated and removed...

      Thoughts, criticisms, abusive epithets?

  14. Speaking of Silby.... by Vladinator · · Score: 2

    .... Why can't we find Silby's .plan pages anymore? Would Nugget or someone care to comment on that? I understand he was critical of what D.Net has become, but is that really reason to wipe his plan pages?

    Hey Rob, Thanks for that tarball!

    --

    "Going to war without France is like going deer hunting without your accordion." - Jed Babbin

    1. Re:Speaking of Silby.... by Vladinator · · Score: 2

      One of the things that I find to be almost as interesting was the fact that this was NOT carried by Slashdot. It WAS submitted. I even submitted it. I asked why it was rejected, and was told by CmdrTaco himself that they were looking into it. I guess they never finished looking into it. :-) It was low of them to remove it, and very unprofessional IM(NS)HO. I hope they restore it, and appologize to Silby for deleting it.

      Hey Rob, Thanks for that tarball!

      --

      "Going to war without France is like going deer hunting without your accordion." - Jed Babbin

    2. Re:Speaking of Silby.... by Vladinator · · Score: 2

      You miss the point. The point wasn't that Silby left. The point was that D.Net censored him. Badly. THAT was the story, and I think it still is one. YMMV.

      Hey Rob, Thanks for that tarball!

      --

      "Going to war without France is like going deer hunting without your accordion." - Jed Babbin

    3. Re:Speaking of Silby.... by Vladinator · · Score: 2

      Sigh. You're right. I think it's part of a larger puzzle too. That's why it's important to report. D.Net (which I DO support) should be above such petty behavour.

      Hey Rob, Thanks for that tarball!

      --

      "Going to war without France is like going deer hunting without your accordion." - Jed Babbin

  15. Shrink the Buffer Size by Wanker · · Score: 2

    You're not the only one-- I'm running this on a fairly beefy box and it still takes a LONG time (as in several days) to complete a single work unit. In order for the daily stats to be useful, it seems like one ought to be able to finish more than one work unit per day.

    It's my hope that this is what they mean by: we will have the opportunity to improve some other aspects of client operation. In particular, we plan to add more configurable checkpointing and a better display of progress in their announcement.

    As to the speed of the whole search-- that would depend as much on the size of the search space as on the speed of the client. Clearly we are looking at a real small search space if it were 25% searched in only a few days.

    I know they never counted my seven days' work since it's all still sitting in my buff-out.ogr file. I'm using the dnetc v2.8007-458-CTR-00020606 for Linux (Linux 2.2.12-20) client-- perhaps it's client-specific?