Right now I'm more concerned about trying to set up coding standards, so that any developer can jump into any part of a project and be able to figure out what's going on, without wasting a couple hours just to figure out the code.
Whenever I hear about this sort of thing... I'm reminded of the genuine crap that's been published on this topic. I don't have any sources to cite... just an anecdote.
I was once told on a project that my code should be written such that any manager, with no programming experience could just jump in and start changing the code.
Health Hazards
PCP is addictive and its use often leads to psychological dependence, craving, and compulsive PCP-seeking behavior. Users of PCP report memory loss, difficulties with speech and learning, depression, and weight loss. These symptoms can persist up to a year after cessation of PCP use. PCP has sedative effects, and interactions with other central nervous system depressants, such as alcohol and benzodiazepines, can lead to coma or accidental overdose. Use of PCP among adolescents may interfere with hormones related to normal growth and development.
Many PCP users are brought to emergency rooms because of PCP's unpleasant psychological effects or because of overdoses. In a hospital or detention setting, they often become violent or suicidal, and are very dangerous to themselves and to others. They should be kept in a calm setting and should not be left alone.
As more of the hazard of using the substance, rather than the off chance that some moron might burn their house down while on it. I might burn my house down making bacon.
I should clarify that by communication overheads, I mean a sort of funny communication overhead that occurs in the specific technology used. Throughput is fine in general.
Sorry about that. I generally don't get personal in matters. I didn't like the way that you argued "well, ask an expert." My gut reaction is, "ok, an expert taught me everything I know." I really really really shouldn't have overreacted. Still, I could point out that generally it's in poor taste to question a person's expertise, rather than allowing them to tip their own hand. I still should have pointed the matter out in a more tactful way.
In short, I apologize.
I worked for a few years and did well in industry, which is about the only reason that I can count on getting into a reasonable PhD program (undergrad wasn't a stellar experience for me). I just about had a heart attack when I posted with the word "jerk" in it. I guess that I separate slashdot conversations from professional ones, but, given the visibility, perhaps I shouldn't. I certainly don't want any hurt feelings. You seem like a reasonable person, and I'm always glad to befriend an academe.
It's one of those things where I have a lot of conflict, because I have faith in the techniques used, because MS supported a lot of things that I've done in the past year, and because I'm also a supporter of Linux and the FSF.
Still, I really shouldn't have lost my cool, and I apologize.
The argument that I disliked was the "it's an OS vs OS thing." It really isn't. It's a technique vs technique thing. I thought about implementing the technology into the Linux kernel over the last summer, but I decided that thrusting myself into the center of a platform debate 6 months before PhD apps was a lousy idea.
As for specific features of the OS... clue free. I think that the IPC substrate employed offers superior semantics, however, and that it offers a simple interface to program distributed code over. There are other implementations that have similar semantics (jgroups, spread). Time and $ savings is in the form of the easier interface. I also think that it will, in general, perform better on clusters with lower communication overheads. It may have problems scaling to a Top500 machine... but that will really be tied to the task that is being performed and the amount of overhead. MS surely can find a low communicationverhead distributed application that will allow it to hit the top mark if they like. That's really probably the key in winning that sort of thing, to be honest... though the program manager from BlueGene claimed earlier this year that he didn't see any limit on scale of that machinery (it kind of cheats by having separate processors just for communication, but what can you do, it's a nice approach).
I would have to say, yes, grid computing and the like are very interesting efforts. They're based, however, on a different base technology. Grid computing is mostly virtual machine monitors. Distributed computing in the context that Windows Cluster will offer it is based on multicast distribution and synchrony models that deal with dissemination of information between machines. They probably SHOULD be related, but generally, they aren't (efforts have tried to tie them more tightly, but generally produce sort of niche application results... LINDA... Mosix).
Interestingly, the PhD is in another area. Systems is just a sort of extremely avid hobby for me.
Anyway, I really do feel a bit silly for flipping out over the matter. Generally I'm better about this. Still, I sincerely apologize.
Jeez, I hate when I get all huffy. I'm trying not to say anything that I'll regret, being a person who believes in an overall principle of "being nice to others." Just don't take that tone with me. It's uncouth.
If you want to sound inflated about the matter and be a jerk... then fine.
I'm giving a lecture on parallel distributed computing on Thursday to a class taught by one of the most influential voices in distributed computing. I'm trying not to drop names, because I'd rather not draw attention to my postings on Slashdot while I apply to PhD programs. Additionally, we've been running Windows clusters here for over a year now at our theory center, which is one of the best in the world. I am running parallel distributed computing experiments on my Linux laptop right now. Several of my best friends do research in parallel distributed computing. That's not research on a parallel distributed computing platform, they're actually researching the topics of scalability and robustness of scalable distributed platforms themselves.
The research institutes here have produced both Linux and Windows software. I'm arguing that the techniques used to establish parallelism in Windows Cluster Edition are superior to shared memory techniques. Coming into this argument, you should realize that it is a rather excited issue in the systems community, about which you've established that you have knowledge only from the perspective of being a user.
If you want to keep up at this, then you can, but you're being a jerk, and you honestly are arguing a weak point. I have friends on both sides of this debate. I already told you that I'm a Linux user. I'm telling you that this has the possibility to become rather popular, and if you want to tell me that I don't know what I'm talking about, then that's fine. I don't need to impress you. I'm doing just fine impressing the people that I need to impress.
Also, I'm not talking about "features", I'm talking about the ground level technology that people will need to use in order to develop HPC applications.
As for HPC people... I'll ask my own HPC people how they feel about the matter.
I'm not talking about the scientific APIs. I'm talking about the clustering APIs.
Beside that, I'm not even really talking about the specific implementation. Other implementations of virtual synchrony exist that could be exploited. Having kernel-level implementations however, will speed matters up.
As for their clustering API. The underlying technology is designed to be resilient in the face of crashed and/or flawed machinery, and should be more than adequate to provide correct results, even in the face of non-byzantine failures at a number of nodes, depending on the implementation built over it.
As for MS. I'm typing this to you from the Gentoo partition of my dual booted laptop.
MS is not trying to sell you a scientific API. All of that will have to be implemented on top, and some people will be willing to do so. The theory center here has been running Windows clusters for some time now.
You're uninformed. Click the "purchase" button on Napster. Then, click the "purchase" button on iTunes, and tell me what kinds of files you have.
I play my napster files on a device that was not designed to support napster. The DRM is only for the services where you do not purchase individual songs.
Applications that were not already written for clusters will require development time. Along those lines, the vast majority of applications that will run on clusters haven't even been written yet. If the book were closed on computer science, then I would drop out and become a carnival geek... or maybe an astronaut.
All tangents aside, seriously, software will be written in the future. If an API can help to bring them into existence faster, then that yields direct savings for the operator of the cluster, as well as allowing for their results to come more quickly. If that API also happens to offer a faster solution (which I believe this one does, in general), then there is no reason not to use it.
Your assertion hinges on the idea that most of this software has been written already. I can tell you firmly, that unless the past century has produced more than half of the software that will ever run on clusters, this is certainly not true.
Similar design primitives are used for high performance supercomputers as need to be used to run giant dot coms, replicated databases, and any number of other applications that do, or supposedly will, have a significant market. While I agree that OS licensing to these users will never match the desktop (by definition, there's no money to be made at a dot com if there are fewer users than dot coms), but introducing the technology here will give them the primitives to move into other markets.
You can't just run your application on a supercomputer and get it to run faster. You have to have an application that is written for the supercomputer. If you save 10 weeks in compute time on the supercomputer, but it took you a year to develop the code, you didn't get a solution faster.
Actually... that was marketing. Macs were too expensive for home, and too expensive for business, they managed to take hold in the educational and artistic sectors. They sell style with your ipod, which is why they are able to DRM the whole iTunes bit so it only works with iPods, and still sell the combo like hotcakes. Want proof? Look at Napster. I buy an mp3 from Napster that will work on any device that I want, and it costs me a buck. If I only want to listen at my computer, then I have a subscription service. If I don't want to pay $400 for a Napster brand NapsterPod, well, I don't have to, I can get a number of 3rd party players that work with their upgraded subscription service, and, as I said before, the mp3's I purchase have no DRM on them.
So, why is it that iTunes making commercials on TV and paying the bills, and Napster is just kind of chugging along?
1) What they are doing is available for other systems (ie Linux), in some places, though not as at fundamental a level in the OS. This is something that will have a major impact on performance. 2) What they are doing is fast. By fast, I mean blazingly, outrageously fast. Application level benchmarks will be fast. They will be very very fast. 3) By ease of use, they're not talking just about pretty graphical widgets, they're talking about implementation of distributed computing platforms. There are very few experts in the world on this topic, and even the experts disagree with each other on the finer points. Doing something with the correct algorithm in a distributed system will give you orders of magnitude better performance, just the same as in one that isn't distributed. Of course, most people didn't study distributed computing in college, and fewer in industry (assuming that not all programmers have degrees in computer science, let alone advanced degrees) have in industry.
I can't express how important the ease of use factor is. What happens when a node goes down? You don't want someone tooling around with your cluster regularly. You want it to sit there and work. When you add stuff to it, you'd like for integration to be very hands off. Want a checkpointing algorithm? Want task migration?
If you're talking about speed per dollar, then you have to account for development cost. We're not talking about office here. Nobody is developing your environmental simulation for free. If it's faster for your people to work, and they don't have to implement the lower level primitives required to support this functionality, you're going to save money... more than enough for a Windows license or 2, I promise.
Ditto. It's weird how those numbers seem to go down when you get older.
As an undergrad, most of the people that I met had smoked marijuana, as a grad student, far fewer, but still a good percentage. I don't know so many adults who do though. I would say that they probably regret it and have developed cognitive dissonance.
Not to encourage teenagers to smoke the stuff, but I think that this is based mostly on cultural norms, and not the actual harmfulness of the stuff. Tobacco and alcohol are much worse for you, but few people deny having ever tried them.
It is, but they have plenty of stuff in the API that sounds good.
You get these provider classes that turn whatever file might contain text into a stream of text... sound like grep? Think again... pdfs, word docs... they add new stuff to it. Think "but I could do this with a shell script?" Sure, you can, but someone adds library calls, and they extend your stuff.
Eh, ok. Folks who interned at MS this summer and researchers from MS who've come here to speak say it IS a big deal. Perhaps they're a bit biased, but the lines they pass are pretty good if they're not backed up by anything.
Vista is missing a few features from its more ambitious release, but don't discount them yet. If you think that the new functionality is just candy coating, you haven't been watching closely enough.
Would that be akin to a log filesystem, but implemented in RAM?
That would actually rock, because if you could implement logging like that, you'd have a very powerful paradigm for many systems.
Right now I'm more concerned about trying to set up coding standards, so that any developer can jump into any part of a project and be able to figure out what's going on, without wasting a couple hours just to figure out the code.
Whenever I hear about this sort of thing... I'm reminded of the genuine crap that's been published on this topic. I don't have any sources to cite... just an anecdote.
I was once told on a project that my code should be written such that any manager, with no programming experience could just jump in and start changing the code.
I tend to think of stuff like this
Health Hazards
PCP is addictive and its use often leads to psychological dependence, craving, and compulsive PCP-seeking behavior. Users of PCP report memory loss, difficulties with speech and learning, depression, and weight loss. These symptoms can persist up to a year after cessation of PCP use. PCP has sedative effects, and interactions with other central nervous system depressants, such as alcohol and benzodiazepines, can lead to coma or accidental overdose. Use of PCP among adolescents may interfere with hormones related to normal growth and development.
Many PCP users are brought to emergency rooms because of PCP's unpleasant psychological effects or because of overdoses. In a hospital or detention setting, they often become violent or suicidal, and are very dangerous to themselves and to others. They should be kept in a calm setting and should not be left alone.
As more of the hazard of using the substance, rather than the off chance that some moron might burn their house down while on it. I might burn my house down making bacon.
Yeah, but not having a PhD, the governor didn't realize that he had to perform a scaling step.
Deaths via accidents and fires are common, and respiratory arrest can also lead to death.
What about the use of PCP causes fires?
I should clarify that by communication overheads, I mean a sort of funny communication overhead that occurs in the specific technology used. Throughput is fine in general.
Sorry about that. I generally don't get personal in matters. I didn't like the way that you argued "well, ask an expert." My gut reaction is, "ok, an expert taught me everything I know." I really really really shouldn't have overreacted. Still, I could point out that generally it's in poor taste to question a person's expertise, rather than allowing them to tip their own hand. I still should have pointed the matter out in a more tactful way.
In short, I apologize.
I worked for a few years and did well in industry, which is about the only reason that I can count on getting into a reasonable PhD program (undergrad wasn't a stellar experience for me). I just about had a heart attack when I posted with the word "jerk" in it. I guess that I separate slashdot conversations from professional ones, but, given the visibility, perhaps I shouldn't. I certainly don't want any hurt feelings. You seem like a reasonable person, and I'm always glad to befriend an academe.
It's one of those things where I have a lot of conflict, because I have faith in the techniques used, because MS supported a lot of things that I've done in the past year, and because I'm also a supporter of Linux and the FSF.
Still, I really shouldn't have lost my cool, and I apologize.
The argument that I disliked was the "it's an OS vs OS thing." It really isn't. It's a technique vs technique thing. I thought about implementing the technology into the Linux kernel over the last summer, but I decided that thrusting myself into the center of a platform debate 6 months before PhD apps was a lousy idea.
As for specific features of the OS... clue free. I think that the IPC substrate employed offers superior semantics, however, and that it offers a simple interface to program distributed code over. There are other implementations that have similar semantics (jgroups, spread). Time and $ savings is in the form of the easier interface. I also think that it will, in general, perform better on clusters with lower communication overheads. It may have problems scaling to a Top500 machine... but that will really be tied to the task that is being performed and the amount of overhead. MS surely can find a low communicationverhead distributed application that will allow it to hit the top mark if they like. That's really probably the key in winning that sort of thing, to be honest... though the program manager from BlueGene claimed earlier this year that he didn't see any limit on scale of that machinery (it kind of cheats by having separate processors just for communication, but what can you do, it's a nice approach).
I would have to say, yes, grid computing and the like are very interesting efforts. They're based, however, on a different base technology. Grid computing is mostly virtual machine monitors. Distributed computing in the context that Windows Cluster will offer it is based on multicast distribution and synchrony models that deal with dissemination of information between machines. They probably SHOULD be related, but generally, they aren't (efforts have tried to tie them more tightly, but generally produce sort of niche application results... LINDA... Mosix).
Interestingly, the PhD is in another area. Systems is just a sort of extremely avid hobby for me.
Anyway, I really do feel a bit silly for flipping out over the matter. Generally I'm better about this. Still, I sincerely apologize.
Jeez, I hate when I get all huffy. I'm trying not to say anything that I'll regret, being a person who believes in an overall principle of "being nice to others." Just don't take that tone with me. It's uncouth.
If you want to sound inflated about the matter and be a jerk... then fine.
I'm giving a lecture on parallel distributed computing on Thursday to a class taught by one of the most influential voices in distributed computing. I'm trying not to drop names, because I'd rather not draw attention to my postings on Slashdot while I apply to PhD programs. Additionally, we've been running Windows clusters here for over a year now at our theory center, which is one of the best in the world. I am running parallel distributed computing experiments on my Linux laptop right now. Several of my best friends do research in parallel distributed computing. That's not research on a parallel distributed computing platform, they're actually researching the topics of scalability and robustness of scalable distributed platforms themselves.
The research institutes here have produced both Linux and Windows software. I'm arguing that the techniques used to establish parallelism in Windows Cluster Edition are superior to shared memory techniques. Coming into this argument, you should realize that it is a rather excited issue in the systems community, about which you've established that you have knowledge only from the perspective of being a user.
If you want to keep up at this, then you can, but you're being a jerk, and you honestly are arguing a weak point. I have friends on both sides of this debate. I already told you that I'm a Linux user. I'm telling you that this has the possibility to become rather popular, and if you want to tell me that I don't know what I'm talking about, then that's fine. I don't need to impress you. I'm doing just fine impressing the people that I need to impress.
Also, I'm not talking about "features", I'm talking about the ground level technology that people will need to use in order to develop HPC applications.
As for HPC people... I'll ask my own HPC people how they feel about the matter.
I'm not talking about the scientific APIs. I'm talking about the clustering APIs.
Beside that, I'm not even really talking about the specific implementation. Other implementations of virtual synchrony exist that could be exploited. Having kernel-level implementations however, will speed matters up.
As for their clustering API. The underlying technology is designed to be resilient in the face of crashed and/or flawed machinery, and should be more than adequate to provide correct results, even in the face of non-byzantine failures at a number of nodes, depending on the implementation built over it.
As for MS. I'm typing this to you from the Gentoo partition of my dual booted laptop.
MS is not trying to sell you a scientific API. All of that will have to be implemented on top, and some people will be willing to do so. The theory center here has been running Windows clusters for some time now.
You're uninformed. Click the "purchase" button on Napster. Then, click the "purchase" button on iTunes, and tell me what kinds of files you have.
I play my napster files on a device that was not designed to support napster. The DRM is only for the services where you do not purchase individual songs.
Applications that were not already written for clusters will require development time. Along those lines, the vast majority of applications that will run on clusters haven't even been written yet. If the book were closed on computer science, then I would drop out and become a carnival geek... or maybe an astronaut.
All tangents aside, seriously, software will be written in the future. If an API can help to bring them into existence faster, then that yields direct savings for the operator of the cluster, as well as allowing for their results to come more quickly. If that API also happens to offer a faster solution (which I believe this one does, in general), then there is no reason not to use it.
Your assertion hinges on the idea that most of this software has been written already. I can tell you firmly, that unless the past century has produced more than half of the software that will ever run on clusters, this is certainly not true.
I don't think of it that way at all.
Similar design primitives are used for high performance supercomputers as need to be used to run giant dot coms, replicated databases, and any number of other applications that do, or supposedly will, have a significant market. While I agree that OS licensing to these users will never match the desktop (by definition, there's no money to be made at a dot com if there are fewer users than dot coms), but introducing the technology here will give them the primitives to move into other markets.
You can't just run your application on a supercomputer and get it to run faster. You have to have an application that is written for the supercomputer. If you save 10 weeks in compute time on the supercomputer, but it took you a year to develop the code, you didn't get a solution faster.
Actually... that was marketing. Macs were too expensive for home, and too expensive for business, they managed to take hold in the educational and artistic sectors. They sell style with your ipod, which is why they are able to DRM the whole iTunes bit so it only works with iPods, and still sell the combo like hotcakes. Want proof? Look at Napster. I buy an mp3 from Napster that will work on any device that I want, and it costs me a buck. If I only want to listen at my computer, then I have a subscription service. If I don't want to pay $400 for a Napster brand NapsterPod, well, I don't have to, I can get a number of 3rd party players that work with their upgraded subscription service, and, as I said before, the mp3's I purchase have no DRM on them.
So, why is it that iTunes making commercials on TV and paying the bills, and Napster is just kind of chugging along?
Let me be brief and blunt.
1) What they are doing is available for other systems (ie Linux), in some places, though not as at fundamental a level in the OS. This is something that will have a major impact on performance.
2) What they are doing is fast. By fast, I mean blazingly, outrageously fast. Application level benchmarks will be fast. They will be very very fast.
3) By ease of use, they're not talking just about pretty graphical widgets, they're talking about implementation of distributed computing platforms. There are very few experts in the world on this topic, and even the experts disagree with each other on the finer points. Doing something with the correct algorithm in a distributed system will give you orders of magnitude better performance, just the same as in one that isn't distributed. Of course, most people didn't study distributed computing in college, and fewer in industry (assuming that not all programmers have degrees in computer science, let alone advanced degrees) have in industry.
I can't express how important the ease of use factor is. What happens when a node goes down? You don't want someone tooling around with your cluster regularly. You want it to sit there and work. When you add stuff to it, you'd like for integration to be very hands off. Want a checkpointing algorithm? Want task migration?
If you're talking about speed per dollar, then you have to account for development cost. We're not talking about office here. Nobody is developing your environmental simulation for free. If it's faster for your people to work, and they don't have to implement the lower level primitives required to support this functionality, you're going to save money... more than enough for a Windows license or 2, I promise.
Yes, the supercomputer software probably won't be included with the software for PCs... imagine that.
Ditto. It's weird how those numbers seem to go down when you get older.
As an undergrad, most of the people that I met had smoked marijuana, as a grad student, far fewer, but still a good percentage. I don't know so many adults who do though. I would say that they probably regret it and have developed cognitive dissonance.
Not to encourage teenagers to smoke the stuff, but I think that this is based mostly on cultural norms, and not the actual harmfulness of the stuff. Tobacco and alcohol are much worse for you, but few people deny having ever tried them.
I guess that since about half of all Americans have smoked pot that it's now popular and legal?
Prostitution is 100% legal in Nevada.
It is, but they have plenty of stuff in the API that sounds good.
You get these provider classes that turn whatever file might contain text into a stream of text... sound like grep? Think again... pdfs, word docs... they add new stuff to it. Think "but I could do this with a shell script?" Sure, you can, but someone adds library calls, and they extend your stuff.
Eh, ok. Folks who interned at MS this summer and researchers from MS who've come here to speak say it IS a big deal. Perhaps they're a bit biased, but the lines they pass are pretty good if they're not backed up by anything.
Gang bangers are rapping my childhood.
Don't hate the player... hate the game.
Vista is missing a few features from its more ambitious release, but don't discount them yet. If you think that the new functionality is just candy coating, you haven't been watching closely enough.
This one will have Linux playing catch-up.