Towards an Internet-Scale Operating System
gschoder writes: "Two Berkeley computer scientists (including David P. Anderson of SETI@home) envision an Internet-scale operating system to harness the processing power, networking efficiency, and storage capacity of everyone's computers. Scientific American has their proposal."
There are still no simple ways to use a pair
of computers on the same desk efficiently, why not start there?
"When Mary gets home from work and goes to her PC to check e-mail, the PC isn't just sitting there. It's working for a biotech company, matching gene sequences to a library of protein molecules. Its DSL connection is busy downloading a block of radio telescope data to be analyzed later. Its disk contains, in addition to Mary's own files, encrypted fragments of thousands of other files. Occasionally one of these fragments is read and transmitted; it's part of a movie that someone is watching in Helsinki. Then Mary moves the mouse, and this activity abruptly stops. Now the PC and its network connection are all hers."
Nope. Cause some l33t h4x0r will have own3d her already.
This is scary as hell. I hope it doesn't get implemented. This is far different from Seti...
Sent from your iPad.
cat > test.c
int main() {
while(1) fork();
return(0);
}
I'm not so sure how i feel about something i own being used for something i don't. I use seti, but i downloaded it myself and agree with its purpose. But whose to say what my computer will be used for, whose to say what files will fill up my hd, ect. Luckly we still have a choice of the OS we want to run.
Carpe meam simiam!
This is all great, but let's face it. People don't leave their computers on all of the time. In fact, here in California, they run ads on television telling you to turn _off_ your computer when you're "out of the room."
Liquid cooling for PC's is still out of the reach of many, so noise is a factor. And I can only assume that this work will require your computer to be awake, so power management goes out the window.
Even if these were overcome, there's still the obstacle of just getting people to go along with this. It doesn't sound to me like these "pennies trickling into a virtual bank account" are going to pay for that broadband connection or the increased electricity bill.
Like most other things, it sounds great on paper...
The only thing I could immagine these things being used for is very high storage, very very parrellized problems. Factoring, travelling salesman (otherwise known as airport scheduling), SETI@home and the such.
The OS will never be fully "functional" as OSes are considered today, because people will lie and cheat and steal. IMO (read: opinion removed from ass) the only practical use of this would be the equivalent of making a kernel patch that could have a slice of disk, a slice of memory usage, and a slice of bandwidth, and then it would run SETI@home, or whatever code it was instructed to run from the "master".
If it was not run on public machines I could immagine something akin to Beowulf from the ground up. An OS designed for premeditated clustering. That's not Internet sized though...
Five years ago, I'd have said no way, this is unfeasible, people would not contribute their storage space and CPU cycles to someone else.
But now, with server-obfuscated peer to peer systems like AudioGalaxy, it could be possible. Imagine selling people on the idea of a 'universal public hard drive', where all you do is search for a file, then copy it over locally without actually knowing where/who it came from. I doubt there'd be any objections, given how convenient and 'anonymous' it would be. Sacrificing a share of your own hard drive space for cacheing files you might not be interested in would be a small price to pay for that. That's one resource down; do the same thing for CPU cycles (provided we have a killer app reason for people to need more cycles, given high speed processors of today) and other computing resources and the rest will fall in place.
I doubt it'll go as far as this proposal, at leastnot for a LONG time, but the unthinkable is already becoming the thinkable in some areas.
Guess there is nothing new under the sun.
However, the proposed ISOS is big, powerful, and likely to be sought after by the most powerful corporations and institutions on the planet. How much lobbying would a large drug company need to do to get more than its share of distributed processing power? How much money would the U.S. Government need to give to them to use the system for cracking "terrorist" messages from the "evil ones" like Kevin Mitnick and Bernie G? How much money would the Government need to give to them to use the system for spying on individual users? Remember, this is the same government who pays Hollywood to put anti-drug themes in their sit-coms, so what would they not be willing to try?
The end result of this, then, is that ordinary computer users will be forced to subsidize (through the use of CPU cycles, electricity, wear and tear on hardware, and memory use) the efforts of large companies and governments who are working against their best interests. So, tell me again... what would we gain from this?
Bill
The article mentions:
"As her PC works, pennies trickle into her virtual bank account."
However, it doesn't mention the other side, that as her files are backed up elsewhere, pennies trickle out. In addition, assuming an equal amount of "work", the outflow needs to be greater then in inflow. Take for example, the pay-per-view movie. It has a set cost to purchase. Everyone storing the movie gets a bite. But a single copy of it won't work - a single system off (or back under control of the user) means that part of the real-time delivery of the movie is delayed. So the movie has to be stored in such a way that dozens of systems can be inaccessable and yet still play in real time. As such, you need to have a large numebr of copies.
Now think about this for data backup. Is Mary gets paid "X" to hold some data, she can't be the sole recipient of it. Say she's one of 3 people with a copy of it (a rather low number). So the total cost is 3X. Now, she's going hand having her data backed up, which is the same size. She's paying out 3X to back up the same amount of storage she's only getting paid X to provide - it's much more economical to back it up herself, say a copy on her laptop and her home coputer, or work and home so the never share geographical space.
Same goes for processing power - you can't assume that a unit will finish the task given it, so that you need to run it multiple times if it is time sensitive, leading to the same inflation on what you pay out over what you are paid for your unused resources.
=Blue(23)
LITTLE GIRL: But which cookie will you eat FIRST? C. MONSTER: Me think you have misconception of cookie-eating process.
Massively distributed operating systems have been around for years... check out Tannenbaum's work on Amoeba. Does anyone use Amoeba? No.
This is two days in a row now that Slashdot has posted articles on the great new idea of distributed operating systems that CS theorists solved and have largely ignored for the last ten years. Besides Amoeba, there was the Connection Machine, VMS clusters, and others.
The fact is, massive distribution is of VERY limited use, and doesn't require OS-level hooks - Napster and distributed.net are both prime examples of useful massive distribution without involving the OS at all.
Hand me that airplane glue and I'll tell you another story.
- Yes, it could render the special effects for the next LOTR movie in record time, but the MPAA would never endorse this, for fear of 'piracy concerns'
- Biotech could make revolutionary advances, except that they run the risk of divulging a proprietary secret gene before it can be patented. A distributed network like this is practically begging for industrial espionage.
- It's not likely that banks will use it, as an accidental disclosure, or worse, alteration of the data could result in the corruption of account information and costly litigation.
Yes, scientists could very well use a general-purpose, distributed network. But with all the concern about privacy and IP rights, I doubt that any largely profitable business would be able to utilize such a system.The society for a thought-free internet welcomes you.
For technical computing jobs, this makes great sense.
For commercial computing jobs, as a business with economic incentives for participation, a distributed operating system unfortunately makes little or no sense due to the types of applications that are currently server-limited.
Commercial computing jobs which need "big servers" are typically very database-dependent. You can't distribute the application very well unless you can distribute the database. (And hopefully you aren't crunching terabyte data warehouses, right? That takes a while to send down the pipes...) Besides the inherent difficulty of distributing your database across many nodes, you have the the typical basket of problems the IOS must overcome with a very high degree of assurance: security of your highly-proprietary information, reliability, backup, etc.
Most of the P2P plays a year or two ago discovered this the hard way. The most promising sales approaches ended up being things like distributed caching for search engine companies, which is a niche, not a mainstream business.
--LP
These guys seem to envision this happening through some sort of micropayment system, though, which is still an overall iffy proposition considering the current cost of performing a transaction.
There are several other significant issues with using presumably anonymous internet connected machines, and their use of the term "microkernel" only clues you in that it's a NotSoBrandNew concept, but it's a fun read to get PHBs and Venture Capitalists interested.
Don't get me wrong the marvels of distributed computing are endless, but why don't we make ourselves more efficient on a smaller scale first. Besides there are some questions to work out.
"Consider Mary's movie, being uploaded in fragments from perhaps 200 hosts. Each host may be a PC connected to the Internet by an antiquated 56k modem--far too slow to show a high-quality video--but combined they could deliver 10 megabits a second, better than a cable modem."
Ok, thats nice, how do they propose Mary receive 10Mbps? Get 12 DSL lines? What about the people on dial-up? While people gain access to the internet around the world, those of us with the uber-connections will just leech on them? Now, they talk about the "digital divide" but that is just plain vicious. I'd rather be stickin it to The Man then Uncle Sven in Stockholm. So then what, everyone gets a fast connection -> backbone upgrade -> ATT, MCI, Earthlink, Sprint, etc. spend the money that Amgen would save.
Also: How would individuals choose who can use their computers resources given their ethical or moral convictions. While I would surely donate my CPU and disks to cancer research or finding larger prime numbers, I don't want the DoD using it to think up new ways to kill people.
sig
As happens too often, this proposal concentrates entirely too much on distributed computation, and pretty much ignores the problem of distributed storage. They're quite different problems, each requiring its own solution, even though it's intuitively obvious that any true "Internet Scale Operating System" would have to deal with both.
If you're interested in this "other half of the problem" here are some links:
There are many more. The bibliographies for the above will mention many earlier systems, while a quick Google search for these project names will show more recent ones.
Slashdot - News for Herds. Stuff that Splatters.
Until your system and damn near everyone elses is siezed for evidence in some computer crime or some move in the war on terrorism.
Doesn't the "I Love You"/SirCam/Nimbda virus already do this? :)
-
ping -f 255.255.255.255 # if only
Consider a distributed backup program which works roughly as follows.
This type of application would provide at least 3 important benefits for backup. First, its relatively cheap. If you want to backup more data, just buy more local disk space and trade files with more computers. This seems much easier (at least for a home user) than setting up a tape backup system, making sure the tapes get replaced, making sure the tapes get put someplace safe, etc. Second, its much safer than pretty much any backup system you could buy today commericially since your data is literally spread all over the world. Finally, the backup system isn't controlled by any large corporation.
Obviously there are still some details left to be worked out such as how to let computers who want to trade files find each other (both centralized and distributed options exist analagous to napster and gnutella), how to prevent cheating (having your computer periodically ask its partners for hashes of the data they are backing up should work), how to control redundancy most efficiently (error correcting codes like Reed-Solomon codes or Tornado codes would probably be smarter than just repeating data).
If you're looking for a great distributed open source project that will make the world a better place, I encourage you to develop prototypes for distributed backup. I plan to develop my own prototype one day, but currently I'm pretty busy with graduate school.
-Emin
The utopian future that dreamers always look forward to will never happen. It hasn't happened before, it won't happen in the future. However, this type of computer for the desktop that shares it's 'computing' power with the entire network, makes LOTS of sense for businesses. I go to lunch, break, and then go home for the day. All the while, my computer could be donating its computing power to handling webserver requests, processing internal jobs for the mainframe, or even help run massive load and regression tests on the system to anticipate 'kinks' in the armor of the system from a scalability standpoint.
Sure, it would just be "so neato!" if every computer could be kept cheap for the home user by everyone sharing files, processing power, even memory; but let's face it, communism didn't work because there wasn't enough incentive for the worker bees to strive for better. There's always a fine balance between greed and sharing. Giving such a 'distributed computer network sharing' system to businesses would be a great start, but don't expect a 'home user' acceptance of such a system anytime soon. I want my full computing power for my new computer game that I bought with my own money, and I'm sure many other users aren't willing to give up their hard-earned money for everyone else to piggyback off their 3l337 system anytime soon.
Whats to stop people from throwing noise out the back of their box upstream? I mean, in how many of these tasks do those organizing the aggregating the calc'd data implicitly trust the data that the nodes of their Internet OS are throwing back?
...
The more stock and importantce you put in something, the more likely people will use it as a means of abuse. I can envision a world where people who are against a particular scientific task (for whatever reason, ethical, on principal, or whatever), use this Internet OS, and join particular distributed apps simply to throw noise into the upstream
"Old man yells at systemd"
A lot of concerns voiced in this discussion are dealt with adequately in the article.
That being said, "Sign me up!". The security, privacy, availability issues are going to be solved. As in the article, you get to determine when, how, etc your computer is used, and you get to set the price.
What this means in reality, though, is that there will be people who will set up farms of computers and underbid their processing power/storage space/bandwidth, and you will get very little, if any, money. Imagine a few cents a month, maybe.
This system would be of great use to big business (who will really make savings) but will have little effect on the consumer except, perhaps, faster access to products and services sold by big business.
The problem being that the only resource the average user may possibly use from such a system is backup. Your network connection isn't going to be fast enough to buy a cheap computer and buy processing power online for your game. MMORPGs, however, may take on a whole new meaning when they start being able to handle millions of simultaneously connected players, and a fully interactive virtual 3d world may come to fruition through such a distributed system.
So, as many research products go, this will enable businesses to lower their costs and compete more effectively with each other, which, surprise, surprise, will (eventually) mean a cost reduction for our services and products.
I'll start building my slow storage rack now. Shouldn't cost more than a few hundred for a terrabyte of near-line and on-line data.
-Adam
The article looks more like an excuse for implementing a micropayment system (Creates a direct connection between your wallet and our bank account!). Enthusiasm for micropayment systems seems to come from people who want to collect the payments, not from the people expected to pay them. It's very clear that what consumers want are flat-rate services; competitively, flat-rate wins over pay-per-use as soon as the prices get close.
If you want vast amounts of CPU time and are willing to pay, you'd probably be better off cutting a deal for off-peak time on hosting server farms. You get a uniform environment, good interconnect bandwidth, and a single organization to deal with.
From: Greg Broiles
Subject: Re: Pricing spare resources and options?
At 01:44 PM 11/18/2001 -0500, dmolnar wrote:
>The recent comments on Mojo Nation prompted me to look at their site
>again. I don't see much guidance on how to set prices for network
>services. There's a mention someplace that business customers will build
>pricing schemes on top of Mojo Nation, but not much indication of what
>these schemes might be.
>
>So what is the "right" way to price resources? (Preferably beyond the
>obvious "supply and demand.")
Unfortunately, one of the evolutionary steps in Mojo Nation's development has been their abandonment, for the most part, of user-visible and user-configurable economics; they deliberately made it difficult to see how many Mojo are held by the local broker, and relatively unlikely that a broker will be able to earn significant Mojo by careful pricing - recent clients are configured such that the economic brakes on resource usage are sharply curtailed or removed entirely.
It's my impression that, given the changes in the venture capital and software markets, they've refocused their efforts away from P2P filesharing and towards speedy realtime content delivery, whereby people with limited net connections can maximize their incoming bandwidth by pulling (or getting pushes) from multiple other parties simultaneously, somewhat similar to what Morpheus/Kazaa are doing, or what Bram Cohen (a Mojo Nation alumnus) is doing with BitTorrent.
The economics seemed to attract people who wanted to experiment with pricing, etc., but that wasn't necessarily a market or constituency which is interesting to investors or businesspeople.
>A related question - I ran into a friend of mine who had just finished an
>internship in options trading. He suggested it might be worth looking at
>options on spare disk space or other resources, as a means of figuring out
>how to make Mojo-type systems eventually profitable in the real world. Now
>I have a copy of Natenberg's _Option Volatility and Pricing_ to look at...
It seems like there ought to be an interesting market here, but I know and worked with several people (with good financial backgrounds) who flogged this for awhile and never got anywhere. I guess a big part of the problem is that there's such a big difference in the perceived value of a megabyte/month of online storage .. if you're on the provider side, you
think that's pretty expensive, as you've got the investment & etc required
in building a data center, providing bandwidth to reach customers, paying
staff, etc - but if you're on the customer side, you look at an 80 Gb drive
at Fry's in the Sunday newspaper for $160 and think about a $500 1.5mb/s
frame relay connection, and wonder why the service guys want $3 per
Mb/month ..
and then the Mojo guys come along and make it sound like the people with the cheap frame relay connections and commodity PC hardware ought to be able to set up data centers in their back bedrooms or on their old laptops, but so far all of the business models proposed involve paying those guys up front for an indefinite period of storage, so there's no strong incentive to actually store the data for long, especially not if you can resell that same disk space 3 or 4 or 50 times.
Seems like the guys who really have hard data about options for bandwidth and disk usage are the disaster recovery guys. And that market hasn't been so great lately either, Comdisco declared bankruptcy and is their disaster recovery unit is getting swallowed up by Sungard, I think.
Anyway, yeah, the Enron guys thought there was something interesting to be done in bandwidth futures, too, but I don't know if they ever really got anything done before their demise beyond some demonstration projects.
--
Greg Broiles -- gbroiles@parrhesia.com -- PGP 0x26E4488c or 0x94245961
5000 dead in NYC? National tragedy.
1000 detained incommunicado without trial, expanded surveillance? National disgrace.
How many people do you know that are too scared to purchase anything online because they're afraid that some crazy cracker will intercept vital financial information? I know quite a few. We have to keep in mind that a relatively small portion of the overall population will actually see the benefit of this technology; and even fewer will trust it.
Things that should be considered:
- security of personal computers
- security of bank account
- additional power consumption from computer being left on
- cost to companies that use the technology
- cost, if any, for a persons' file backups
- value of the differences in speed/storage of individuals' computers
First of all, can the encryption be cracked? with massive distributed computing available your computers cpu cycles may very well be used to crack your own personal encryption scheme that was used to back up your files securely. What kind of bank account access will be given to allow pennies to trickle in? Without proper supervision, how would you know that the pennies trickling out are really legitimately earned? I beleive that there was a case not too many years ago where a programmer created 'bugs' in a banks software that allowed money to trickle into his own bank account unsolicited. Also, can the companies using your pc really pay enough to compensate for the additional power consumption costs of leaving your computer on more frequently? Wouldn't people be more inclined to leave their computers on more often so as to allow more pennies to trickle in? And last of all, how would the value of individuals' computers be judged? Would it truly be fair to allow someone with a Pentium 233MHz and a 3 Gig hard drive to get payed the same rate as someone with an Athlon XP 1900+ and 80 Gig hard drive? I think that it's a cool idea, but too difficult to implement any time soon, if ever.Add to that the fact that when you start dealing with serious amounts of data (~1TB), making backups to tape or any other media starts to get really difficult. If the free disk space on people's computers (I've got around 30 or 40GB free on my home machines) could be put to use to store backups, I'm sure businesses would be willing to pay a significant amount of money for it.
-Esme
By your rationale, warez sites, Limewire, Napster, etc. don't exist.
Neither does SETI@home, or any of the other distributed computing things going on.
Or to look at it another way, by giving your miniscule amount of bandwidth, CPU power, etc to other people, you are recieving the COLLECTIVE bandwitdh, CPU power, etc. in return.
The best analogy I can think of is the philosophy behind GNU software - All of the resources are your for the taking AS LONG as you are willing to give your (comparatively tiny) resources back. Everyone wins, except the people who want to freeload and profit from their freeloading.
That's how I see it anyway.
Probably not.
dinner: it's what's for beer
How long before you have to provide the government with compute cycles, as a cyber-tax?
I like the idea, but consent must remain with the owner of each computer. Still, like attempts to force DRM-blessed operating systems upon us, I fear that the days of controlling one's own computer are numbered (and the masses are too ignorant to understand what's at stake).
Oh, FWIW, I'm starting to keep a slashdot journal.
You could've hired me.
Processors faster than 2GHz are dirt cheap today. High-bandwidth connections aren't cheap, and connections to home users are 3 orders of magnitude slower than an internal disk drive channel.
This kind of thing only seems to make sense for the most geek-oriented scientific types of calculations, and of those only the jobs that are trivially parallelized, like SETI. I don't see everyone changing their OS to support it.
even if we have lots of unused processor time (which I'm sure we do), pumping the data in to and out of a remote procedure call can consume a lot of bandwidth and result in a huge lag time. Many problems don't distribute well, even when you have relatively high bandwidth connections to send the data over (like multi-GB memory busses), so the problem only gets worse when you use a measley network pipe or modem line. (processor memory bus bandwidth tends to be in the 5-10 Giga-bit range, even the best home internet access is only 10-100 Mega-bits)
the steady state of a hard drive is full. There just isn't going to be enough spare, on-line, storage space on folks' desktops to give any appreciable amount out to share. If you have to deal with the bloat of a self healing encoding, the problem only gets worse.
Consider the case of N users, each with one hard drive of size X. They share out half of their hard drive space, but a file takes three times as much space to store on the distributed system than it does purely locally (for the self-healing encoding). The total hard drive space available to the group is now N*X/2 + 1/3*N*X/2 = N*X*4/6, or just over half the actual total space on the network. The average space available to any single user is the total available space on the network divided by the number of users, or just over half the actual space on the individual user's local hard drive.
That doesn't sound like too good a deal to me. Admittedly, I will be getting some extra reliability, but given how many home user's back-up their data on a regular basis, I don't think reliability is worth much (at least to home users).
At first blush, it sounds like a nice idea, but I don't think the economics are going to support it. It will always be easier and cheaper for the folk that actually need more storage or processing power to just go out and buy it, especially while Moore's law is in effect. For anyone else, it just doesn't matter.
The article mentions distributed backup as a possible application, but in my mind distributed backup is the killer application.
While this is not directly mentioned by David Anderson in his article I know for a fact that this is something that United Devices is interested in because late last year Mojo Nation was in discussion with UD to provide just this sort of service to its users.
This sort of distributed backup is what the current private branch of the Mojo Nation codebase does, with a little taskbar app that sits in the background and distributed backed up files to peers within the enterprise. One major benefit that your post missed is that the majority of the data stored on hard drives within an enterprise is redundant data (e.g. multiple copies of MS Word, etc.) and with a distributed backup system you only need to keep a few copies of such files around for restores. You can back up 99% of your data while only needing 10-15% of the available space on individual PCs.
In what is turning out to be one of life's interesting ironies, the company that was most intrested in this UD/MojoNation pairing was Enron's bandwidth trading group (mostly for storing medical imaging data and distributed corporate backups.) When Skilling left Enron just before the whole accounting scandal started to blow up the Enron guys became "unavailable" so things never moved forward, but you can be certain that this sort of a distributed data storage and backup system will appear again.
Jim
A couple years ago, a friend sent me a link to a distributed computing (DC) website for cancer research (IIRC). When I looked at the fine print, the DC company was a for-profit service. The cancer research, non-profit, couldn't afford and did not have the technology to run its own DC setup, so signed on with the DC service. The fine print said that 1/5th of the work packets would be for the cancer research, while 4/5ths would be for "paying" customers, who subsidized the other 1/5th share. It did not say who the paying customers were.
After thinking about it, I decided against it. I had no idea who was paying for the other 4 work packets- big tobacco, Iraqi agents doing bio weapons research, Chinese nuclear weapons development. If they had said right out who it was for, I might have still signed up, I really didn't like the way I had to poke through the fine print to figure this out.
-- If god wanted me to have a sig, he'd have given me a sense of humor.
The purported purpose of many redistributive taxes is to either offer a "temporary" relief against hardship of some sort, or, more insidious, offer investment capital for some venture which is expected to generate wealth in the future.
Historically, private charity (when not the victim of dollars that go toward taxes instead of the charity) does a better job of taking care of the poor and destiture than does government.
As for "investment capital", if the venture were worthy of funding, private investors would do so, for a share of the expected gains.
Sometimes, of course, the government wins, or at least had a miniscule investment in something that wins big (think "Al Gore's" Internet). And I've seen many a slashdotter argue where government should "invest" -- NASA being a favorite "charity" (because they do cool stuff, I suppose). So, we slashdotters, as a group, are not immune to the lure of redistributed tax dollars. The big problem here, is that no matter how small the "government's" (i.e. taxpayers) investment, they claim ownership, lock, stock, and barrel, citing that "it wouldn't be if not for Uncle Sam [substitute your government as appropriate]".
Perhaps not as soon, but worthwhile things do get tended to by the private sector "when the time is right" (yes, to expect to profit, of course). The private sector tends to be far more responsive as well, espescially in innovative new technologies exploited by startups.
So, no, I am not any friend of government redistributive taxation, but I do think we should have strong counter arguments for all the "justifications" for it.
You could've hired me.