Seti@Home Bandwidth Problems
reflexreaction writes: "With so many of the /. users actively using and supporting Seti@home, many of you have realized that in the last couple of weeks that Seti has had some serious problems receiving completed data and getting new data to process from its 3 million members because of network bandwidth problems. All the gritty details are here. The article details some things that users can do to alleviate some of the problems including connecting during off hours and downloading more than unit than once using programs like SetiQueue for PC and Seti Unit Manager for Mac. Donations are also accepted. There is also a plea for bandwidth donations. It will be truly unfortunate if this page becomes /.ted without benefit from /. users."
2/6/2002
The problem
When your SETI@home screensaver downloads a work unit, the data flows from a server in our laboratory, through the University of California at Berkeley campus network, and through a connection to the commercial Internet. This connection is shared by all UCB Internet users - departmental web and FTP sites, email, SETI@home, and so on. The University pays for bandwidth on this connection; it is currently buying 70 megabits per second (Mbps). The student residence hall have a separate 40 Mbps connection.
Until recently, SETI@home was given about 25 Mbps, and the remaining 45 Mbps was shared by the rest of campus. But starting last month (January 2002) the bandwidth used by the rest of campus increased in an unexpected and unexplained way. During peak periods the demand now exceeds 70 Mbps. If SETI@home continued to use 25 Mbps, the performance of all other outgoing traffic would suffer.
The UCB network administrators have worked hard to balance the bandwidth needs of SETI@home and the rest of campus. Currently, SETI@home traffic is given lower priority than other traffic. During peak periods (typically 10 AM - 10 PM PST) SETI@home averages 6 Mbps, and sometimes gets no bandwidth. During non-peak periods SETI@home gets as much as 50 Mbps.
When SETI@home is not getting enough bandwidth, our data server backs up - all of its processes are waiting to send data, and it can't accept new connections. During these periods, your screensaver will get report that it "can't connect to server".
The impact on our overall computing rate is significant but not too serious - the rate has dropped about 25%. But many SETI@home users are unhappy that their computers are sitting idle for many hours, waiting for data. We share this unhappiness, and are working to solve the problem.
Short-term solutions
We're working on several short-term solutions:
Increase the bandwidth of UCB's network connection. We hope to "expand the pipe" by about 10 Mbps - enough to ease, but not eliminate, the crisis. The issue is money - bandwidth costs about $300 a month per megabit, and neither SETI@home nor the university has budgeted for this cost.
Send data more efficiently. Currently work units are encoded as text. By sending them in binary, we can shrink them by about 25%. (Note: data compression isn't effective for our data, which is primarily random noise). This change will require a new version of the client software. Increase the amount of computation per work unit. Doubling the CPU time per work unit - by looking at more chirp rates, for example - will reduce bandwidth by 50%. There is scientific justification for doing this, although the law of diminishing returns applies. This will also require a new version of the client software. Long-term solutions
The long-term solution is to allow work units to be sent from servers outside UC Berkeley. This could be done, for example, by sending work units to servers at organizations - companies and universities - that are willing to donate part of their outgoing network bandwidth to SETI@home. In addition to solving the current problem, this could greatly increase our overall data capacity, enabling us to search for ET signals in a wider frequency band.
This solution represents a significant change to our software; we will use this approach in our next-generation software. We are seeking funding to develop this software, and it won't be ready for at least 6 months.
What you can do There are a couple of things you can do to keep your computers busy processing SETI@home data:
If you connect manually (e.g., over a modem) try connecting during off hours (23:00 to 3:00 Pacific Standard Time, or 7:00 to 11:00 UT). You can check the Server status page to see if we're currently dropping connections. Download more than one work unit when you connect. This can be done manually, or by automated workunit caching software. Example programs include SetiQueue for Windows, or Seti Unit Manager for Macintosh. For more information about other SETI@home add-ons see our links page.
To help us achieve a short-term solution, you can help in two ways:
Donate to SETI@home. This will enable us to buy network bandwidth. Help us find "bandwidth sponsors". We hope that a major commercial ISP might donate bandwidth to UC Berkeley to help SETI@home. If you work for, or have contacts in, such a company, please contact us.
About the current bandwidth problems
She sat at the window watching the evening invade the avenue.
If their BW problems stem from the fact that the rest of the campus has experienced a "mysterious" increase in network traffic, a good start may be to block access on ports used by popular file sharing programs. I'll bet that this is where a lot of the BW demand is coming from since the increase happened at the beginning of a new semester.
---
I didn't want to leave this space blank.
Possibly of related interest, the is an article on Internet Scale Operating Systems in the newest Scientific American.
Sheesh, evil *and* a jerk. -- Jade
You have to give the Set@Home Team their props for making a system thats scaleable and able to handle the user load from the first 100,000 users to the now 3,000,000.
I've always believed the bottleneck in Distributed Computing was the Data Packets being sent/recieved because the demand will grow exponentially the more users you aquire.
Most applications seem to remidy this problem by limiting the data packet sizes from 5 - 15k compressed packets. This has worked for projects like Distributed.net.
I can only forsee the future of this problem being the same that plagues Video Card Chipsets, which is insted of re-engineering the device to make a more robust and lower overhead solution, they'll just throw a bigger pipe on the line (much like Memory Bandwidth demand).
But again, my respect goes out to the Seti@Home team and their sponsors for architecting a technological data mining marvel.
Despite the fact that nothing new has come out of distributed.net for a while now, it's still the best-run distributed computing network. They have the most clients, for the most platforms with the most features, and that's why I continue to install the client on several PCs a month.
I've used SETI@Home and United Devices before, but frankly, I didn't like them much.
SETI has more users than it needs, last time I checked, the same data was being tested over and over again, simply because they have more volunteers than they need. I'd much rather see that CPU time go to the projects that need it.
United Devices has an admirable goal, curing cancer, but a lack of SMP support in their clients, and the lack of a Linux or Mac client pretty much rules them out for me. I use Windows, Linux, and Mac OS X every day, I can't run United Devices on all those platforms...
So come on everybody that's running SETI, save them some bandwidth, come join distributed.net, and we can power through the rest of RC5-64!!!
Just don't get me started on the OGR projects, they've been open for too long, and no one seems to know how to close them. OGR-24 should have been done a long time ago, but isn't, due (apparently) to a lack of managerial oversight, or poor planning.
When in danger or in doubt, run in circles, scream and shout. --Robert A. Heinlein
I'm not sure whether or not this is a good thing or a bad thing. Lemme elaborate.
Disclamer: I have never been part of SETI@home; I feel that statistically it's a collossal waste of time. I've been part of both the GIMPS project and the distributed.net RC5-64 projects for about four years now. I've got the Kevlar body armor halfway on.
The good, I guess, is that there's such a collossal interest in this. I mean, hell, if KzAplOcQQ and boB are sharing the Encyclopaedia Galactica (or the Hitchikers' Guide, whatever) over radio waves, then we'll eventually find it hopefully in something that resembles paEr Unicode.
However, I see a great many downsides to this.
First off, if the aforementioned theoretical KzAplocQQ and boB of the paEr race have to use radio waves, then there's a pretty good chance they haven't been able to go superphotonic, in which case we're going to have a long wait before we can even think of going to their New York and flipping them the left tentacle.
Secondly, how will we be able to decode a xenic dataset, much less their language? I mean, what if they can transmit trits or quaytes while we're looking for bits or bytes? How do we know what a newline would appear? Hell, do we even know if it would even be necessary? And what about the characters? What if the Chinese language is easier to interpret than paEr?
Third, there are much better uses of free cycles, at least fiscally. GIMPS will provide a hundred kilobucks to the first person to successfully find a ten megadigit Mersenne prime. distributed.net provides a two kilobuck prize and a large donation to the FSF, EFF, or other worthy charities. Even the commercial distributed computing projects at least pay for the use of your rig.
(PS: paEr is a theoretical name for a xenic (alien) species, contrived from randomly entering characters on the number pad. KzAplocQQ is an unpronouncable name, unless you're lucky or high. boB just sounds funny.)
I used to be someone else. Now I'm someone better.
Real life is underrated.
In a way, this hurdle could prove a boon, by forcing the SETI@home developers to make their system more efficient.
Necessity is, after all, the mother of invention.
As their own statement points out, two of the short-term solutions include making the data sent out more efficient (binary instead of text) and letting each node do more computation.
SETI@home was originally developed to male up for the shortcomings of processing power of any single computer. To solve the problem, they took a bit of a free ride on networking bandwidth to distribute the problem.
Now their success is also forcing them to be more efficient when it comes to network bandwidth, as well as processor, utilization.
So this forced economy will hopefully make the system more efficient through improvement of the system.
Pie-in-the-sky and we have all the computing power and bandwidth we need, but then who would have an incentive to innovate?
Ultimately, SETI@home's legacy will probably have less to do with discoveries of extraterrestrial intelligence and more to do with the evolution of better computing techniques!
evanchik.net
I would have expected UC Berkeley to have a higher bandwidth connection to the Internet.
Internet2's goal is 1Tbps connections -- That's faster than 70Mbps by over 10^5. Pretty funny.
Actually, the network admins have pointed the finger at Kazaa & gnutella. According to the UCB Director of Communications & Network Services, "kazaa and gnutella account for more than half the bits in aggregate". And it's not just SETI that's suffering - all network users have been affected. Unfortunately, a lower priority or outright ban on those services has been rejected due to policy and legal issues.
"IF there are aliens who fly around the universe with SUPERIOR technology - they'd have the means to contact us"
What if they're at the same level as we are? Then they're hard to find, easy to lose in the background noise, and may not even realize we're looking for them.
"Would it be more practical/feasible to donate those spare cpu cycles elsewhere???"
Maybe, but it will be limited. The cancer research screen saver you mentioned won't work on anything truly meaningful - after all, there's money in cancer research and nothing sensitive will be allowed out like that. A cure for any type of cancer will be worth billions to the lab that puts it together. They won't risk a competitor installing a screen saver and starting to sift data...
Other applications for distributed computing that start to involve money end up with the same problem - people don't want to donate their electricity & time so someone else can get rich, and I haven't seen any for-profit distributed program that would let me break even on the electricity cost to run the client 24/7.
So non-commercial stuff like SETI or crack the latest encryption scheme will always be the ones most successful. Anyway, the SETI program is starting to spin off other pure science radio astronomy uses for the data, so it's not just little green men anymore.
null sig
"she says i'm lousy conversation. as if that's supposed to help."
Unfortunatly, Berkley has two pipes, one for the Residence halls and one for the rest of campus. It seems odd that they can't figure out where all the data is coming from, but I don't think its students in the dorms. Its possible that someone is running a public proxy or an ftp on their dept. network, but you'd think a renowned computer school like Berkley could afford staff and software that could figure the simple stuff out.
I Browse at +4 Flamebait
Open Source Sysadmin
The one thing that interested me about the blurb from the Seti@Home site that was linked from this article was the following quote:
l la-rc.pdf for a great discussion on the perils of the flaws in the first generation Gnutella protocol).
> But starting last month (January 2002) the
> bandwidth used by the rest of campus increased in
> an unexpected and unexplained way.
I wonder if this isn't a byproduct of the intense bandwidth issues associated with peer to peer apps like Gnutella and Morpheus, popular music "sharing" applications that seem to get a bit of use on college grounds nationwide. I'd guess (if I had to; definitely talking out ye old arse here) the reason bandwidth usage wasn't noticed sooner is that many places (my place of work included -- I'm a gov't contractor) are placing a pretty high priority on "Homeland Security", including taking a fresh look at internet usage.
These things aren't exactly bandwidth friendly (see http://people.cs.uchicago.edu/~matei/PAPERS/gnute
Anyhow, that's what came to mind when I read the blurb. I think their best short term solution might be to chase down unattended Gnutella and Morpheus/KaZaA applications and get back that bandwidth.
It's all 0s and 1s. Or it's not.
Go, do it now, I swear you'll feel all warm and fuzzy.
These bandwidth problems aren't technical, they're political. We're getting too close, so they're shutting us down.
* * Always question "the National Interest" - 9 times out of 10 it is a cover for evil
I'm currently trying to run Seti@Home and the UD Cancer Cure program but it's not going well... Seti won't give up any cycles to UD.... and in light of this I'll be shutting down Seti for a while.
But what I really wish was created was a single program which all other tasks of this nature could be setup as plug-in's.... each plug-in getting all the unused cycles until it completes a unit and then the next plug-in get's it's turn... maybe even be able to decide how you want to skew the processings:
5 Seti@ Home units, then 12 UD units, 4 Folding@Home, etc....
There are a lot of projects out there I'd like to help with.... if only they'd play nice...
Wiwi
"I trust in my abilities,
but I want more then they offer"
Seti@home isn't looking for aliens that are flying around in spaceships at warp speed, it's looking for planetbound aliens who are at roughly our technology level.
this may sound funny if you can't raise money at $300 dollars per megabit but ever think of using a provider like cogent you could be provisioned a 100Mbps cat5 link for $3000 per month and use all you want. Just a thought
UCB net admins and other interested parties have been discussing how to deal with the increased bandwidth demand on the ucb.net.discussion newsgroup: Google Groups thread: "latency from off-campus".
I live across the street from the Berkeley CS building where half the EECS servers are housed, and my connection to those machines can get pretty lagged. Having an inconsistent ISP certainly exacerbates the situation, but my experience with off-campus latencies has been quite bad for the past two years.
Sure it's sad that Seti@home users can't use their computer's idle cycles quite so effortlessly anymore, but the bigger picture is that everyone trying to connect off-campus is suffering, especially people who are trying to get work done.
The surprising thing for me is that detaching the dorm network (with all the student-run servers) leaves very few computers that could be sucking up all the bandwidth. We've suffered through DoS attacks from time to time, but the fact that Kazaa is still the number one bandwidth hog makes me wonder who runs these apps (professors? grad students? janitors?) and where are they running them from (lab computers aren't the best places to store all that warez, mp3s, and divx files, unless you don't care that they all get erased every day).
CalREN-2 consists of two giant loops - called CalREN North serving UC berkley and CalREN South (in the Los Angeles area). Each loop is a gigaPOP - providing the high-speed connection into the nationwide Internet. Each loop provides OC-48 (2,448 Mbp/s) connections to member campuses.
Now, since this equipment has been in place since the middle of last summer, Why are they using their dual 45Mb/s connection? Just get some cable dogs out there to run some fiber. Hell, I'll get out there and run some fiber for them. Remember when some yahoo's cut their fiber while stealing copper to recycle? They were down for like two weeks. Well, it took them two weeks to run fiber across the campus again. If they get started now, they could have as much bandwidth as they could possible want by running fiber to their Internet 2 pop.
I have seen the I2 Pop at the Sonoma county office of education. It is running at OC-3 (155Mb/s). That means a bunch of elementary schools have twice the bandwidth as the most prestigious Computer Science program currently running in the world. Prestigious? Yes, they have effectively harnessed millions of desktops to create the fastest computer on the planet by a huge margin. They push 27 Tflop/s on 25 Mb/s compared to ASCI White that just passed 10 Tflop/s. My computers, like every body else's, have wasted a lot of cycles waiting for data. Imagine if they had 2,448 Mbp/s available to them and enough users to create the first 2+ giga-flops computer. Of course they would need 240 million users to achieve that.
Just to be a pessimist, that is probably exactly what all the distributed modules in Win2K/XP are for. Bill is going to have a really nice computer one of these days.
If voting were effective, it would be illegal by now.
SETI should wait until we have our own world's problems figured out.
Humans are made of meat, and sure, cancer is a problem we'd like to solve. But humans are also uniquely explorers and thinkers, and Not Knowing(tm) IS genuinely one of our problems. Some believe that SETI is a step towards solving that problem. File it under "motivation" or "purpose" (by simplying "knowing").
A future generation may answer the eternal question for us. And if they do, every generation that follows will be affected in their daily outlook, their goals, their attitudes, their comforts, their concerns, etc. That's at least as profound as a cure for cancer.
This wasn't very hard to see coming, but its still unfortunate.
For those who are looking for a workunit-caching program for linux, I've written a perlscript which has done a quite good job at it. I've decided to release it tonight, to help everyone out, but its a bit rough on the edges. It does the job, though. Read the README, download it here. Also, mirrors are welcome - my connection sucks far worse than theirs does =)
Paranoid
Bwaahahahahaa.