My wife bought me that one.
Some people take offence when they see it, but (about) half of those understand when I explain that my wife bought it.
(Also Christian and Proud)
"Let him who boasts, boast in the Lord" (1 Corinthians 1:30-31)
According to RFC2068, it looks like 412 could be appropriate - "Precondition Failed", (where the precondition is one of "server can can take the load", or "your OS is secure" - take your pick).
Personally, I like 303 - "See Other" (where "Other" could be "geek friend", "support admin", "out of the window - there's a whole world out there beyond your computer screen")
Let's not forget Oracle RAC, either. Scalable database, but it requires gigabit interconnects. 4 machines will saturate that with the SGA. Doesn't matter what the OS is, Oracle can't scale above 4 machines because the SGA will fill a gig interconnect.
128 Dell/Linux boxes? Great - but you can't do anything useful with it - might as well settle for 4 Dell boxes, or spend more cash, get 4 E25K's, and get a truly reliable cluster.
Let's not forget MTTF (Mean time to failure) - the more hardware you add to a system, the more frequently you will experience failures. Adding cheap x86 boxes massivley increases your MTTF. Adding quality hardware increases MTTF, but adds resilience within the box, so the overall uptime is increased.
I don't care if I can replace 4 Sun boxes with 64 Linux boxes for half the price - I want uptime, and Sun can provide that, where Dell/Linux can not.
If you're lucky enough to have a massively-parallel, read-only application, then go for Linux clusters.
Read the Sun Blueprints (http://www.sun.com/blueprints/browsesubject.html# cluster) for how a real cluster works - actaully caring about data integrity. That is the crux with clustered systems: What happens if one node "goes mad" even though it's no longer a "valid" part of the cluster?
Look into Sun's dealing with failure-fencing; it's drastic (PANIC a node if it can't be sure it's a cluster member) but it works.
By contrast, Linux clustering seems to be at the level of "let's share an IP address, we can balance the load". Great for DNS (but -oh, DNS has that built-in) or Apache read-only servers (assuming no session-management, static-only pages).
Digital had an excellent cluster package last decade; Sun seem to be getting to that level now. Linux, sorry to say, is years behind.
I saw a Sun E10K on Friday - the roof had leaked water into it... remarkably, it kept running.
That is a remarkably lucky exception, but in the more likely scenario that CPUs and IO boards had died, DR would still have kept the domains alive and (to the users' eyes) "working" despite what most people would have called a castrastrophic failure.
I visited/. when it was down - for the first time in a few days.
I got a "503" message.
I'm sure OSDN can afford a simple webserver to serve a static page saying "/. is down from xx:xx to yy:yy - please vist after yy:yy," or even, describing the reason for the downtime.
Given the international audience, translating these times (or at least offering a link) would also have been a good idea.
If I got half the hits that slashdot gets, I'd post at least this much detail during the outage. As it was, my impression was "/. has been cracked //. has been misconfigured //. is run by a bunch of incompetent fools".
Looks like the third analysis was correct.
Given that you have the bandwidth, it seems fair to assume that even a single-CPU box with a a gigabit card could cope with serving up enough static pages to inform those who hit the site with an informative message.
Maybe next time, things will be done with a hint of professionalism? That's what your readers strive for, after all.
Anything less gives the impression that "the Linux folks don't really care about uptime, professionalism, presentation, image, or all the other things that corporates care about".
Just my 2p,
From these photos, I'd say it'd give them nightmares!
We start to dream around 18 months old (apparently... I've no idea how anybody can claim to know this, but that's another story!)
FWIW, my 23-month old daughter would destroy it - she's already worked out how to crash a Linux laptop whilst it's running a screensaver - if I could replicate it, I'd open a bug report!
Good for you, though it's probably not a great idea to name customers on a public forum like this. Advertising "JPL have exactly 3 open ports on their firewall" isn't the kind of thing they'd necessarily choose to air in public.
Just my 2p,
This could safely allow MUCH higher traffic density on freeways
No it wouldn't.
You're assuming human control - I agree 100% (99 computers + one human driver == 1 pileup); given that assumption, the generally accepted "safe distance" for a human driving at 70mph is 2 seconds; most drivers tend to give much less of a gap to the driver in front. (Leave that big a gap, and someone will fill it)
Your idea would actually allow for a potentially safer road, but certainly result in a much lower traffic density.
And that's an excuse?
If you can't turn around quickly, you can't safely make a manoevure to get out of the way of an emergency vehicle - that's just one example.
Some health reasons are reasons why you shouldn't drive, not why you should be excused.
CD's eventually took over because they worked with every CD player no matter what, no exceptions.
Yeah, that was the problem vinyl had...
CDs took over because they stopped selling vinyl.
The "protected" WMA, whatever, is low quality, but if you're listening through PC speakers, not a HiFi, what does it matter?
What's the option? Download 64/128 MP3's, and listen to that on your PC?
Quality can't be an issue to people who download from P2P networks - most of it seems to be sub-CD standard (takes too long)
All that you mention should be tested and fixed in later milestones; they may be acceptable for an interim milestone.
That's where MS let themselves down: "Can it f*ck up the Registry" is never a question - it it gets through the testing process without reproducibly doing so, then it gets out of the door.
Unexpected by the developers - found by users and/or QA process. Such bugs can be opened.
"Final testing" - that's why milestones exist - so that issues are found before *final* testing, and fixed ASAP.
Open bugs? Or closed as invalid?
I'd lodge a bug against it as "Doesn't include any SCSI drivers" for a cheque.
The point is - that's not a *valid* bug for the state of TeX. Maybe if, in future development, TeX were to evolve into entire OS, that would be a valid bug. But it's not a valid bug against *this milestone* of TeX.
That's the difference.
Bugs filed against FireFox 0.7 should not apply to 0.8; similarly, bugs against 0.8 should not apply to 0.9. That doesn't mean that 0.7 or 0.8 don't have bugs, just that they're expected behaviour for that milestone.
I suppose, to be fair to everyone, it means "free of bug reports"... expected bugs at this stage would either not be filed, or closed/postponed as "expected behaviour".
Unexpected bugs for this milestone should not be open.
Team interdependency is also a powerful motivational force.
This depends massively on the team in question, and the way the project is managed; "powerful" could be replaced with "negative"
Beware of a guy in a room:
A more healthy pattern is that of the true innovator who is truly designing something great, but who has no personal resources left over for anything but the work at hand. Every ounce of psychological, emotional and intellectual energy is being consumed in the work itself.
As an employee, I'd prefer to have some resources left over for, oh, let's say, trivia - my real life, for example. Staff with no life don't tend to last long.
When Slipping, Don't Fall
Make sure that each individual who has a role in the slip receives the needed guidance and support.
In a context of extreme openness about failures, this makes my Subject: line a lie
"Portability is for Canoes" says a lot about the Microsoft attitude.
In the "Ship mode" section, he comments:
Management must lead the team to ship mode by entering ship mode first. That is, superfluous management hoo-ha is eliminated, the manager's awareness of detail climbs, fire-drills and other de-prioritizing activities are eliminated entirely and tremendous focus is brought to bear.
Is this some strange usage of the phrase "fire-drill" I was not previously aware of, or he he seriously suggesting that shipping a piece of software on time is more important than staff safety?
In the same section:
Stabilization of the product is the principle goal
This seems to be the principle goal in the design phase of good software; making this the principle only in the final stage may explain the state of some Microsoft software.
Overall, this 21-points and the MSF itself look like very realistic documents, learned from the lessons a huge company such as MS have had the chance to learn. Certainly not to be dismissed just because of the source.
After the 16 quarters, NT4's sploits increased dramatically for a short while, while Solaris 2.5.1's decreases. So we've no evidence about closed-source here; takeup from NT3.51 and Solaris 2.5 was, presumably similar, given that both were, effectively "fix-releases" of new products (NT3.51 was pretty stable; NT4 basically added a Win98 skin); Solaris 2.5 was Sun's move from a BSD kernel to a SysV kernel - 2.5.1 was the fixes needed after such a dramatic move.
In that sense, maybe a comparison of NT 3.51 vs Solaris 2.51 would be more fair, or NT 4 vs Solaris 2.6? But that's a digression.
After those 4 years, we have no data for the the F/OSS OS's chosen. So there's not "open/closed" info available here.
What we do have, is info about 2 8yr-old OSes, and 2 4yr-old OSes. What they all have in common is a little bit of BSD code; they're far more diverse in most aspects, though.
What can we expect to learn from such a comparison?
I find figure 8 very troubling - NT4 and Solaris 2.5 were 1996 products (see earlier on page 9) whereas the FreeBSD and Linux variants were 2000 releases.
Taking the first 4 years (16 quarters) of each product, there's no significant difference. The first 2 look really bad; FreeBSD looks a bit better, and RH7.0 looks the worst.
Compare the first 16 quarters of each, and they look almost identical (yes, WinNT is starting a high-peak, but by cutting off at 16 quarters that is not evident yet)
It is not clear:
Why this article was written
What (if anything) it might have been "trying" to show (possibly it's a genuine research project - in which case, why compare 2 8yr-old Proprietary OS's with 2 4yr-old F/OSS OS's?
How security researchers can better spend their time
How great the world would be if nobody had ever checked code for security issues (until "The" internet worm of 1988, we lived in such a utopia)
You must imagine teams of developers waiting for a bug report, who all steam in as soon as a bug is found: identify, fix, create RPM, test extensively with every combo of support hardware and software, confirmed with beta customers, and then post on FTP site. All done on day zero.
I wish it were possible - often, it is possible, with trivial patches, and sometimes, it's done, too.
In reality, the zero-day patch will normally need updating after further testing finds that the original patch wasn't quite right.
I'm not saying there's anything wrong with that - issuing patches includes admitting a problem with the original code. Issuing patches immediately, of necessity, includes the risk that it cannot have been tested fully.
My wife bought me that one. Some people take offence when they see it, but (about) half of those understand when I explain that my wife bought it. (Also Christian and Proud) "Let him who boasts, boast in the Lord" (1 Corinthians 1:30-31)
And not forgetting - the difference between "e.g." and "i.e.". I'd explain it, but - oh, ask Google.
I think you'd find that the EULA would come first ;-)
According to RFC2068, it looks like 412 could be appropriate - "Precondition Failed", (where the precondition is one of "server can can take the load", or "your OS is secure" - take your pick). Personally, I like 303 - "See Other" (where "Other" could be "geek friend", "support admin", "out of the window - there's a whole world out there beyond your computer screen")
128 Dell/Linux boxes? Great - but you can't do anything useful with it - might as well settle for 4 Dell boxes, or spend more cash, get 4 E25K's, and get a truly reliable cluster.
Let's not forget MTTF (Mean time to failure) - the more hardware you add to a system, the more frequently you will experience failures. Adding cheap x86 boxes massivley increases your MTTF. Adding quality hardware increases MTTF, but adds resilience within the box, so the overall uptime is increased.
I don't care if I can replace 4 Sun boxes with 64 Linux boxes for half the price - I want uptime, and Sun can provide that, where Dell/Linux can not.
Read the Sun Blueprints (http://www.sun.com/blueprints/browsesubject.html# cluster) for how a real cluster works - actaully caring about data integrity. That is the crux with clustered systems: What happens if one node "goes mad" even though it's no longer a "valid" part of the cluster?
Look into Sun's dealing with failure-fencing; it's drastic (PANIC a node if it can't be sure it's a cluster member) but it works.
By contrast, Linux clustering seems to be at the level of "let's share an IP address, we can balance the load". Great for DNS (but -oh, DNS has that built-in) or Apache read-only servers (assuming no session-management, static-only pages).
Digital had an excellent cluster package last decade; Sun seem to be getting to that level now. Linux, sorry to say, is years behind.
That is a remarkably lucky exception, but in the more likely scenario that CPUs and IO boards had died, DR would still have kept the domains alive and (to the users' eyes) "working" despite what most people would have called a castrastrophic failure.
I got a "503" message.
I'm sure OSDN can afford a simple webserver to serve a static page saying "/. is down from xx:xx to yy:yy - please vist after yy:yy," or even, describing the reason for the downtime.
Given the international audience, translating these times (or at least offering a link) would also have been a good idea.
If I got half the hits that slashdot gets, I'd post at least this much detail during the outage. As it was, my impression was "/. has been cracked / /. has been misconfigured / /. is run by a bunch of incompetent fools".
Looks like the third analysis was correct.
Given that you have the bandwidth, it seems fair to assume that even a single-CPU box with a a gigabit card could cope with serving up enough static pages to inform those who hit the site with an informative message.
Maybe next time, things will be done with a hint of professionalism? That's what your readers strive for, after all.
Anything less gives the impression that "the Linux folks don't really care about uptime, professionalism, presentation, image, or all the other things that corporates care about". Just my 2p,
We start to dream around 18 months old (apparently
FWIW, my 23-month old daughter would destroy it - she's already worked out how to crash a Linux laptop whilst it's running a screensaver - if I could replicate it, I'd open a bug report!
If JumpStart is your friend, JET (http://www.sun.com/bigadmin/content/jet/) is your best buddy!
Good for you, though it's probably not a great idea to name customers on a public forum like this. Advertising "JPL have exactly 3 open ports on their firewall" isn't the kind of thing they'd necessarily choose to air in public. Just my 2p,
Xenophobia?
No it wouldn't.
You're assuming human control - I agree 100% (99 computers + one human driver == 1 pileup); given that assumption, the generally accepted "safe distance" for a human driving at 70mph is 2 seconds; most drivers tend to give much less of a gap to the driver in front. (Leave that big a gap, and someone will fill it)
Your idea would actually allow for a potentially safer road, but certainly result in a much lower traffic density.
And that's an excuse?
If you can't turn around quickly, you can't safely make a manoevure to get out of the way of an emergency vehicle - that's just one example.
Some health reasons are reasons why you shouldn't drive, not why you should be excused.
Everything runs on something, it's just a question of how far down the stack you take it.
Yeah, that was the problem vinyl had
CDs took over because they stopped selling vinyl.
The "protected" WMA, whatever, is low quality, but if you're listening through PC speakers, not a HiFi, what does it matter?
What's the option? Download 64/128 MP3's, and listen to that on your PC?
Quality can't be an issue to people who download from P2P networks - most of it seems to be sub-CD standard (takes too long)
Exception to prove the rule
That's where MS let themselves down: "Can it f*ck up the Registry" is never a question - it it gets through the testing process without reproducibly doing so, then it gets out of the door.
"Final testing" - that's why milestones exist - so that issues are found before *final* testing, and fixed ASAP.
Get a grip.
I'd lodge a bug against it as "Doesn't include any SCSI drivers" for a cheque.
The point is - that's not a *valid* bug for the state of TeX. Maybe if, in future development, TeX were to evolve into entire OS, that would be a valid bug. But it's not a valid bug against *this milestone* of TeX.
That's the difference.
Bugs filed against FireFox 0.7 should not apply to 0.8; similarly, bugs against 0.8 should not apply to 0.9. That doesn't mean that 0.7 or 0.8 don't have bugs, just that they're expected behaviour for that milestone.
I suppose, to be fair to everyone, it means "free of bug reports" ... expected bugs at this stage would either not be filed, or closed/postponed as "expected behaviour".
Unexpected bugs for this milestone should not be open.
Beware of a guy in a room:
As an employee, I'd prefer to have some resources left over for, oh, let's say, trivia - my real life, for example. Staff with no life don't tend to last long.When Slipping, Don't Fall In a context of extreme openness about failures, this makes my Subject: line a lie
"Portability is for Canoes" says a lot about the Microsoft attitude.
In the "Ship mode" section, he comments:
Is this some strange usage of the phrase "fire-drill" I was not previously aware of, or he he seriously suggesting that shipping a piece of software on time is more important than staff safety?In the same section:
This seems to be the principle goal in the design phase of good software; making this the principle only in the final stage may explain the state of some Microsoft software.Overall, this 21-points and the MSF itself look like very realistic documents, learned from the lessons a huge company such as MS have had the chance to learn. Certainly not to be dismissed just because of the source.
In that sense, maybe a comparison of NT 3.51 vs Solaris 2.51 would be more fair, or NT 4 vs Solaris 2.6? But that's a digression.
After those 4 years, we have no data for the the F/OSS OS's chosen. So there's not "open/closed" info available here.
What we do have, is info about 2 8yr-old OSes, and 2 4yr-old OSes. What they all have in common is a little bit of BSD code; they're far more diverse in most aspects, though.
What can we expect to learn from such a comparison?
Taking the first 4 years (16 quarters) of each product, there's no significant difference. The first 2 look really bad; FreeBSD looks a bit better, and RH7.0 looks the worst.
Compare the first 16 quarters of each, and they look almost identical (yes, WinNT is starting a high-peak, but by cutting off at 16 quarters that is not evident yet)
It is not clear:
You must imagine teams of developers waiting for a bug report, who all steam in as soon as a bug is found: identify, fix, create RPM, test extensively with every combo of support hardware and software, confirmed with beta customers, and then post on FTP site. All done on day zero.
I wish it were possible - often, it is possible, with trivial patches, and sometimes, it's done, too.
In reality, the zero-day patch will normally need updating after further testing finds that the original patch wasn't quite right.
I'm not saying there's anything wrong with that - issuing patches includes admitting a problem with the original code. Issuing patches immediately, of necessity, includes the risk that it cannot have been tested fully.
Groklaw stared out great - strong but unbiased criticism of events.
Can we have another of those, please?
I don't have the time / resources / technical/legal expertise for it, but maybe in this big'ol'world, somebody does?
Slamming corporations for being corporations is just dull. Slamming them for actually doing something wrong is a democratic responsibility.