Seti@Home Bandwidth Problems
reflexreaction writes: "With so many of the /. users actively using and supporting Seti@home, many of you have realized that in the last couple of weeks that Seti has had some serious problems receiving completed data and getting new data to process from its 3 million members because of network bandwidth problems. All the gritty details are here. The article details some things that users can do to alleviate some of the problems including connecting during off hours and downloading more than unit than once using programs like SetiQueue for PC and Seti Unit Manager for Mac. Donations are also accepted. There is also a plea for bandwidth donations. It will be truly unfortunate if this page becomes /.ted without benefit from /. users."
There goes whatever remaining bandwidth they had...
2/6/2002
The problem
When your SETI@home screensaver downloads a work unit, the data flows from a server in our laboratory, through the University of California at Berkeley campus network, and through a connection to the commercial Internet. This connection is shared by all UCB Internet users - departmental web and FTP sites, email, SETI@home, and so on. The University pays for bandwidth on this connection; it is currently buying 70 megabits per second (Mbps). The student residence hall have a separate 40 Mbps connection.
Until recently, SETI@home was given about 25 Mbps, and the remaining 45 Mbps was shared by the rest of campus. But starting last month (January 2002) the bandwidth used by the rest of campus increased in an unexpected and unexplained way. During peak periods the demand now exceeds 70 Mbps. If SETI@home continued to use 25 Mbps, the performance of all other outgoing traffic would suffer.
The UCB network administrators have worked hard to balance the bandwidth needs of SETI@home and the rest of campus. Currently, SETI@home traffic is given lower priority than other traffic. During peak periods (typically 10 AM - 10 PM PST) SETI@home averages 6 Mbps, and sometimes gets no bandwidth. During non-peak periods SETI@home gets as much as 50 Mbps.
When SETI@home is not getting enough bandwidth, our data server backs up - all of its processes are waiting to send data, and it can't accept new connections. During these periods, your screensaver will get report that it "can't connect to server".
The impact on our overall computing rate is significant but not too serious - the rate has dropped about 25%. But many SETI@home users are unhappy that their computers are sitting idle for many hours, waiting for data. We share this unhappiness, and are working to solve the problem.
Short-term solutions
We're working on several short-term solutions:
Increase the bandwidth of UCB's network connection. We hope to "expand the pipe" by about 10 Mbps - enough to ease, but not eliminate, the crisis. The issue is money - bandwidth costs about $300 a month per megabit, and neither SETI@home nor the university has budgeted for this cost.
Send data more efficiently. Currently work units are encoded as text. By sending them in binary, we can shrink them by about 25%. (Note: data compression isn't effective for our data, which is primarily random noise). This change will require a new version of the client software. Increase the amount of computation per work unit. Doubling the CPU time per work unit - by looking at more chirp rates, for example - will reduce bandwidth by 50%. There is scientific justification for doing this, although the law of diminishing returns applies. This will also require a new version of the client software. Long-term solutions
The long-term solution is to allow work units to be sent from servers outside UC Berkeley. This could be done, for example, by sending work units to servers at organizations - companies and universities - that are willing to donate part of their outgoing network bandwidth to SETI@home. In addition to solving the current problem, this could greatly increase our overall data capacity, enabling us to search for ET signals in a wider frequency band.
This solution represents a significant change to our software; we will use this approach in our next-generation software. We are seeking funding to develop this software, and it won't be ready for at least 6 months.
What you can do There are a couple of things you can do to keep your computers busy processing SETI@home data:
If you connect manually (e.g., over a modem) try connecting during off hours (23:00 to 3:00 Pacific Standard Time, or 7:00 to 11:00 UT). You can check the Server status page to see if we're currently dropping connections. Download more than one work unit when you connect. This can be done manually, or by automated workunit caching software. Example programs include SetiQueue for Windows, or Seti Unit Manager for Macintosh. For more information about other SETI@home add-ons see our links page.
To help us achieve a short-term solution, you can help in two ways:
Donate to SETI@home. This will enable us to buy network bandwidth. Help us find "bandwidth sponsors". We hope that a major commercial ISP might donate bandwidth to UC Berkeley to help SETI@home. If you work for, or have contacts in, such a company, please contact us.
About the current bandwidth problems
She sat at the window watching the evening invade the avenue.
If their BW problems stem from the fact that the rest of the campus has experienced a "mysterious" increase in network traffic, a good start may be to block access on ports used by popular file sharing programs. I'll bet that this is where a lot of the BW demand is coming from since the increase happened at the beginning of a new semester.
---
I didn't want to leave this space blank.
"But starting last month (January 2002) the bandwidth used by the rest of campus increased in an unexpected and unexplained way."
Doh. I was looking for the gritty details. Massive DDOS bot invasion? SNMP exploit? Warez? Rogue Quake III servers? Son of Napster? Backhoe dug up a cable? There has to be at least an educated guess as to where the bandwidth is going.
I think the network admins at UC Berkeley are just cutting back on Seti, but don't want to admit it publicly. Bad press and all.
null sig
Go ask the little green men if they could perhaps borrow some bandwith =)
Possibly of related interest, the is an article on Internet Scale Operating Systems in the newest Scientific American.
Sheesh, evil *and* a jerk. -- Jade
You have to give the Set@Home Team their props for making a system thats scaleable and able to handle the user load from the first 100,000 users to the now 3,000,000.
I've always believed the bottleneck in Distributed Computing was the Data Packets being sent/recieved because the demand will grow exponentially the more users you aquire.
Most applications seem to remidy this problem by limiting the data packet sizes from 5 - 15k compressed packets. This has worked for projects like Distributed.net.
I can only forsee the future of this problem being the same that plagues Video Card Chipsets, which is insted of re-engineering the device to make a more robust and lower overhead solution, they'll just throw a bigger pipe on the line (much like Memory Bandwidth demand).
But again, my respect goes out to the Seti@Home team and their sponsors for architecting a technological data mining marvel.
Despite the fact that nothing new has come out of distributed.net for a while now, it's still the best-run distributed computing network. They have the most clients, for the most platforms with the most features, and that's why I continue to install the client on several PCs a month.
I've used SETI@Home and United Devices before, but frankly, I didn't like them much.
SETI has more users than it needs, last time I checked, the same data was being tested over and over again, simply because they have more volunteers than they need. I'd much rather see that CPU time go to the projects that need it.
United Devices has an admirable goal, curing cancer, but a lack of SMP support in their clients, and the lack of a Linux or Mac client pretty much rules them out for me. I use Windows, Linux, and Mac OS X every day, I can't run United Devices on all those platforms...
So come on everybody that's running SETI, save them some bandwidth, come join distributed.net, and we can power through the rest of RC5-64!!!
Just don't get me started on the OGR projects, they've been open for too long, and no one seems to know how to close them. OGR-24 should have been done a long time ago, but isn't, due (apparently) to a lack of managerial oversight, or poor planning.
When in danger or in doubt, run in circles, scream and shout. --Robert A. Heinlein
I'm not sure whether or not this is a good thing or a bad thing. Lemme elaborate.
Disclamer: I have never been part of SETI@home; I feel that statistically it's a collossal waste of time. I've been part of both the GIMPS project and the distributed.net RC5-64 projects for about four years now. I've got the Kevlar body armor halfway on.
The good, I guess, is that there's such a collossal interest in this. I mean, hell, if KzAplOcQQ and boB are sharing the Encyclopaedia Galactica (or the Hitchikers' Guide, whatever) over radio waves, then we'll eventually find it hopefully in something that resembles paEr Unicode.
However, I see a great many downsides to this.
First off, if the aforementioned theoretical KzAplocQQ and boB of the paEr race have to use radio waves, then there's a pretty good chance they haven't been able to go superphotonic, in which case we're going to have a long wait before we can even think of going to their New York and flipping them the left tentacle.
Secondly, how will we be able to decode a xenic dataset, much less their language? I mean, what if they can transmit trits or quaytes while we're looking for bits or bytes? How do we know what a newline would appear? Hell, do we even know if it would even be necessary? And what about the characters? What if the Chinese language is easier to interpret than paEr?
Third, there are much better uses of free cycles, at least fiscally. GIMPS will provide a hundred kilobucks to the first person to successfully find a ten megadigit Mersenne prime. distributed.net provides a two kilobuck prize and a large donation to the FSF, EFF, or other worthy charities. Even the commercial distributed computing projects at least pay for the use of your rig.
(PS: paEr is a theoretical name for a xenic (alien) species, contrived from randomly entering characters on the number pad. KzAplocQQ is an unpronouncable name, unless you're lucky or high. boB just sounds funny.)
I used to be someone else. Now I'm someone better.
Real life is underrated.
In a way, this hurdle could prove a boon, by forcing the SETI@home developers to make their system more efficient.
Necessity is, after all, the mother of invention.
As their own statement points out, two of the short-term solutions include making the data sent out more efficient (binary instead of text) and letting each node do more computation.
SETI@home was originally developed to male up for the shortcomings of processing power of any single computer. To solve the problem, they took a bit of a free ride on networking bandwidth to distribute the problem.
Now their success is also forcing them to be more efficient when it comes to network bandwidth, as well as processor, utilization.
So this forced economy will hopefully make the system more efficient through improvement of the system.
Pie-in-the-sky and we have all the computing power and bandwidth we need, but then who would have an incentive to innovate?
Ultimately, SETI@home's legacy will probably have less to do with discoveries of extraterrestrial intelligence and more to do with the evolution of better computing techniques!
evanchik.net
I would have expected UC Berkeley to have a higher bandwidth connection to the Internet.
Internet2's goal is 1Tbps connections -- That's faster than 70Mbps by over 10^5. Pretty funny.
Cure Cancer with UD? Think again.
If you didn't see the story last week here it is (http://www.theinquirer.net/15020202.htm)
"THE INTEL/UD cancer project is about to close, but there is confusion as to whether this is due to a shortage of funds or because the work has been completed. According to Andy Prince, Director of Corporate Communications at UD, the cancer programme is about to be terminated because its goals have been met.
Said Prince: "Absolutely. We have actually exceeded our goals as far as the cancer project goes. According to the contract, we agreed to analyze 250M molecules against 8 proteins. We are close to finishing 3.5B molecules against 12 proteins and will be announcing the close of the project soon - not a premature close, but the actual end of the project. "
-zAmboni
Team Ars Technica Lamb Chop
...I don't run SETI@home. It's my understanding that the SETI@home project now provides more processing power than they really need, as they have not optimized the client and do not support multiple processors.
"It take 9 months to bear a child, no matter how many women you assign to the job."
IF there are aliens who fly around the universe with SUPERIOR technology - they'd have the means to contact us.... and when they DO - we'll know it.
1) The point isn't necessarily to find aliens with, as you described it "SUPERIOR technology", but any sign of intellegent life. I.e. any race that has sufficent technology to emit a signal capable of reaching earth (and that limitation only because we currently can't do much better).
they'd have the means to contact us
2) What do you base this upon? (Aside from SciFi movies?) We simply don't know if it's possible at all or even how long it would take a civilization to reach that point. We've had radio for over 100 years, and we don't know how to contact other alien civilizations. How do we know it won't be another 10,000 years until we can.
Personally, I find it an excellent use of my spare cpu cycles. You're free to take yours where you wish.
-Bill
SlashSig Karma: Excellent (mostly affected by moderatio
"IF there are aliens who fly around the universe with SUPERIOR technology - they'd have the means to contact us"
What if they're at the same level as we are? Then they're hard to find, easy to lose in the background noise, and may not even realize we're looking for them.
"Would it be more practical/feasible to donate those spare cpu cycles elsewhere???"
Maybe, but it will be limited. The cancer research screen saver you mentioned won't work on anything truly meaningful - after all, there's money in cancer research and nothing sensitive will be allowed out like that. A cure for any type of cancer will be worth billions to the lab that puts it together. They won't risk a competitor installing a screen saver and starting to sift data...
Other applications for distributed computing that start to involve money end up with the same problem - people don't want to donate their electricity & time so someone else can get rich, and I haven't seen any for-profit distributed program that would let me break even on the electricity cost to run the client 24/7.
So non-commercial stuff like SETI or crack the latest encryption scheme will always be the ones most successful. Anyway, the SETI program is starting to spin off other pure science radio astronomy uses for the data, so it's not just little green men anymore.
null sig
"she says i'm lousy conversation. as if that's supposed to help."
Except that as the article states, your student is on the residential halls network, which uses separate bandwidth cap and doesn't affect the SETI bandwidth.
Seriously - I shut off all my machines seti@home search and my electric bill dropped 10$ and I'm not kidding anyone in the slightest.
What is the point anyhow? I mean this is collectively costing them (probably) billions of dollars a month to do this - between everyone's increased power bill. And seriously - what are the chances that their algorithm are going to find something worthwhile?
Unfortunatly, Berkley has two pipes, one for the Residence halls and one for the rest of campus. It seems odd that they can't figure out where all the data is coming from, but I don't think its students in the dorms. Its possible that someone is running a public proxy or an ftp on their dept. network, but you'd think a renowned computer school like Berkley could afford staff and software that could figure the simple stuff out.
I Browse at +4 Flamebait
Open Source Sysadmin
Reminds me of those mailers I get in the mail.
"Here, we've sent you a bunch of preprinted address labels with your name and address on them which you never asked for and can use while sending out snail mail. We ask that you donate $10 for some poor kids because we need to make up for the costs of sending out these mailers."
No, I'm not making this up!
Mmmm.. Donuts
The one thing that interested me about the blurb from the Seti@Home site that was linked from this article was the following quote:
l la-rc.pdf for a great discussion on the perils of the flaws in the first generation Gnutella protocol).
> But starting last month (January 2002) the
> bandwidth used by the rest of campus increased in
> an unexpected and unexplained way.
I wonder if this isn't a byproduct of the intense bandwidth issues associated with peer to peer apps like Gnutella and Morpheus, popular music "sharing" applications that seem to get a bit of use on college grounds nationwide. I'd guess (if I had to; definitely talking out ye old arse here) the reason bandwidth usage wasn't noticed sooner is that many places (my place of work included -- I'm a gov't contractor) are placing a pretty high priority on "Homeland Security", including taking a fresh look at internet usage.
These things aren't exactly bandwidth friendly (see http://people.cs.uchicago.edu/~matei/PAPERS/gnute
Anyhow, that's what came to mind when I read the blurb. I think their best short term solution might be to chase down unattended Gnutella and Morpheus/KaZaA applications and get back that bandwidth.
It's all 0s and 1s. Or it's not.
Why even bother their servers at all? SETI should wait until we have our own world's problems figured out. Please visit Folding@Home or Genome@Home for two ways you can help solve actual problems. If solving geeky problems is more your style, visit d.net.
Why did they jump straight to OGR24? I thought we didn't know the OGRs higher than 19 yet?
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Go, do it now, I swear you'll feel all warm and fuzzy.
distributedfolding seems to be having problems for some time too. I haven't been able to upload for some time. bandwidth?
photosMy Photostream
These bandwidth problems aren't technical, they're political. We're getting too close, so they're shutting us down.
* * Always question "the National Interest" - 9 times out of 10 it is a cover for evil
This isn't good, how's ET ever gonna phone home now???
Question everything that you've accepted without thinking.
At any rate Seti doesn't use any extra power if your computer is running anyways since a CPU is always at 100% anyways (cept instead of SETI data it is doing Idle Loop calcs).
Not necessarily. Some operating systems call a special instruction when they hit the idle loop. This instruction tells the processor to go to sleep until the timer or a device signals an interrupt to the CPU. I'm sure Windows 98 and 98se do that; my laptop fan runs less often when I run dnetc than when I run only the system idle process.
Will I retire or break 10K?
Call me crazy, but I'd guess that demand on seti's servers grows linearly with the number of users.
However, the number of users grows exponentially with respect to time. Grandparent specified only that "the demand will grow exponentially" and that it will increase as the number of users increases. A colloquial meaning of "grow exponentially" is to grow following the early exponential-like stages of a logistic model, a model designed to model the spread of information such as a web site URL or a Warhol worm.
Will I retire or break 10K?
I'm currently trying to run Seti@Home and the UD Cancer Cure program but it's not going well... Seti won't give up any cycles to UD.... and in light of this I'll be shutting down Seti for a while.
But what I really wish was created was a single program which all other tasks of this nature could be setup as plug-in's.... each plug-in getting all the unused cycles until it completes a unit and then the next plug-in get's it's turn... maybe even be able to decide how you want to skew the processings:
5 Seti@ Home units, then 12 UD units, 4 Folding@Home, etc....
There are a lot of projects out there I'd like to help with.... if only they'd play nice...
Wiwi
"I trust in my abilities,
but I want more then they offer"
So it sounds like all they need to do is ban students from running Windows XP ("Do you want to download a patch? How 'bout a passport account? You know you want one. All your friends are getting them. And I've got another security update for you...what'd you say? Come on, give it a try. The first one's free you know..." etc. etc. That's probably 80% of the bandwidth right there.)
-- MarkusQ
P.S. Note for the humour impaired...oh, what's the use.
If we make contact with ET, he will surely tell us how to cure all those deseases, No?
Uh, do you really thing that ET is going to have some advice on curing human diseases like Alzeihmers, cancer, or Anthrax?
The only thing that extraterrestrials will be able to tell us about medicine is how to get rid of intergalactic genital warts.
It's == It Is
Its == possessive version of 'it'
The rules of the apostrophe for it/its/it's are a special case and do not follow "Bob's Quick Guide to the Apostrophe, You Idiots."
</troll>
Sounds like they could use some mirror sites for work units. Distribution could either be done late a night or by sneakernet.
Also, the big "work_unit.sah" file appears to have most of its content in a uuencoded-type of format, which makes it 33% larger than its binary equivalent. Also, I don't know what format the binary data is in, but could it be compressed more?
Seti@home isn't looking for aliens that are flying around in spaceships at warp speed, it's looking for planetbound aliens who are at roughly our technology level.
this may sound funny if you can't raise money at $300 dollars per megabit but ever think of using a provider like cogent you could be provisioned a 100Mbps cat5 link for $3000 per month and use all you want. Just a thought
UCB net admins and other interested parties have been discussing how to deal with the increased bandwidth demand on the ucb.net.discussion newsgroup: Google Groups thread: "latency from off-campus".
I live across the street from the Berkeley CS building where half the EECS servers are housed, and my connection to those machines can get pretty lagged. Having an inconsistent ISP certainly exacerbates the situation, but my experience with off-campus latencies has been quite bad for the past two years.
Sure it's sad that Seti@home users can't use their computer's idle cycles quite so effortlessly anymore, but the bigger picture is that everyone trying to connect off-campus is suffering, especially people who are trying to get work done.
The surprising thing for me is that detaching the dorm network (with all the student-run servers) leaves very few computers that could be sucking up all the bandwidth. We've suffered through DoS attacks from time to time, but the fact that Kazaa is still the number one bandwidth hog makes me wonder who runs these apps (professors? grad students? janitors?) and where are they running them from (lab computers aren't the best places to store all that warez, mp3s, and divx files, unless you don't care that they all get erased every day).
The residence halls have a separate 40Mbps pipe, so it is 110Mbps combined. Also UC Berkeley conntects to Calren-2 and Internet-2 which run at much higher speeds but the problem with those is that they connect to large universities only.
SETI has more users than it needs, last time I checked, the same data was being tested over and over again, simply because they have more volunteers than they need.
Wrong. Learn, before you speak.
From one of the FAQ pages:
If a signal is observed two or more times, and it's not RFI or a test signal, the SETI@home team will ask another group to take a look. This other group will be using different telescopes, receivers, computers, etc. This will hopefully rule out a bug in our equipment or our computer code
Need you still wonder why the same Work Unit is processed by 2 or 3 machines?
Didn't think so.
I'm not a prophet or a stone-age man,
I'm just a mortal with potential of a super man.
That is true. But that's not taking into account the XP upgrades that the IT folks probably did during X-mas break when the students weren't around to bother them...
/me lifts an eyebrow.
I'm not a prophet or a stone-age man,
I'm just a mortal with potential of a super man.
And I've got 3 machines trying to return units too... O-well, another day of no processing...
I'm not a prophet or a stone-age man,
I'm just a mortal with potential of a super man.
CalREN-2 consists of two giant loops - called CalREN North serving UC berkley and CalREN South (in the Los Angeles area). Each loop is a gigaPOP - providing the high-speed connection into the nationwide Internet. Each loop provides OC-48 (2,448 Mbp/s) connections to member campuses.
Now, since this equipment has been in place since the middle of last summer, Why are they using their dual 45Mb/s connection? Just get some cable dogs out there to run some fiber. Hell, I'll get out there and run some fiber for them. Remember when some yahoo's cut their fiber while stealing copper to recycle? They were down for like two weeks. Well, it took them two weeks to run fiber across the campus again. If they get started now, they could have as much bandwidth as they could possible want by running fiber to their Internet 2 pop.
I have seen the I2 Pop at the Sonoma county office of education. It is running at OC-3 (155Mb/s). That means a bunch of elementary schools have twice the bandwidth as the most prestigious Computer Science program currently running in the world. Prestigious? Yes, they have effectively harnessed millions of desktops to create the fastest computer on the planet by a huge margin. They push 27 Tflop/s on 25 Mb/s compared to ASCI White that just passed 10 Tflop/s. My computers, like every body else's, have wasted a lot of cycles waiting for data. Imagine if they had 2,448 Mbp/s available to them and enough users to create the first 2+ giga-flops computer. Of course they would need 240 million users to achieve that.
Just to be a pessimist, that is probably exactly what all the distributed modules in Win2K/XP are for. Bill is going to have a really nice computer one of these days.
If voting were effective, it would be illegal by now.
Happily, the US government's recent public patent-busting hard-on for Cipro has taken the wind out of their sails in this regard. Look for a whole lot less stubbornness from Washington on the drug patent issue in the near future.
"Patriotism is your conviction that this country is superior to all other countries because you were born in it." -- GBS
Last month there was a presentation by the Berkeley campus net. admin regarding the issues that are being discuessed here. It shows the traffic flows, how they increased when the students came, how problems occured when controlling traffic, and more!
In fact, you can look here to get the story on what various universities are doing to manage traffic.
One possible solution is to run SETI proxies at other universities that will route the traffic to Berkeley via Internet2, since that traffic is free and isn't being regulated/restricted. However, this may not work given that the problem is with transmitting the large data sets to clients, rather than receiving their relatively small responses.
that's pretty good. how long did it take you to come up with that?
did you come up with it while you were waiting for your seti@home client to finish its processing so that it could flush and you could see if SetiQueue really works? that would kinda make it not offtopic.
now if you would've done commander tom, and wrote it about both a troll/first poster and seti@home, then i'd give you mad props.
THERE IS NO DATA. THERE IS O
This wasn't very hard to see coming, but its still unfortunate.
For those who are looking for a workunit-caching program for linux, I've written a perlscript which has done a quite good job at it. I've decided to release it tonight, to help everyone out, but its a bit rough on the edges. It does the job, though. Read the README, download it here. Also, mirrors are welcome - my connection sucks far worse than theirs does =)
Paranoid
Bwaahahahahaa.
I know that at least 4 universities have a 10Gbit upling connecting eachother. Most others have 1Gbit. The Surfnet-network which interconnects all dutch universities is connected to several other research networks (one of them is the US Internet2) with at least gigabit speed. Read more about this at this website. Since the network is there, and it is clearly meant to be used for research purposes I hope some Dutch university (or the Surfnet organisation itself) will raise its hand and help out.
I've also created an actual webpage for it.
You can find it here.
Paranoid
Bwaahahahahaa.
Wouldn't it make MORE sense to try and find out what's causing the sudden and obviously unexpected BW usage?
I mean, surely they have ruled out file-sharing services etc. They wouldn't overlook something so simple. (slight sarcasm intended.) Data isn't something that leaks out of Ethernet wire, it has to go SOMEWHERE. At worst, it's a bug that needs fixing.
Of course you're right, but...
Isn't it possible to use these services from computers not on the dorm network?