The London Stock Exchange Goes Down For Whole Day
Colin Smith writes "TradElect, the Microsoft .Net based trading platform for the London Stock Exchange, was offline for about seven hours, meaning that their 5-nines SLAs are shot for approximately the next 100 years. The TradElect system was launched back in June of 2007 and was designed for increased speed and system capacity."
So what happens when this happens again?
Ignore this signature. By order.
Perhaps the bit you're missing is that windows isn't quite as bad as the /. crowd likes to say it is. Especially if its an older (translation: fixed & stable) variety like win2k or even nt4.
A) Yes, in fact, it is quite that bad (just not as bad as when it was first released) and
B) There is no "fixed and stable" version of .NET yet. At least none I would hinge my mission critical business on.
"A person is smart. People are dumb, panicky dangerous animals and you know it." - K
The LSE going down is a big deal. The US exchanges have been trying very hard to displace LSE's strong hold in the EUROPEAN markets. With the merger of NYSE/Euronext and NASDAQ/OMX this cuts market share and faith in LSE as everyday passes. Additionally with continued tech issues, NASDAQ could reinvigorate their bid for LSE again! I work for a data major data vendor, and I know from experience the NYSE and NASDAQ are much more reliable than their European counterparts. Also LSE going down today is huge, considering the news on Fannie/Freddie, WAMU, Lehman, and the WRONG news on United Airlines. Many arbitrage opportunities were lost for LSE traders.
Even if MS is able to make Windows good at what it is and generally reliable, what it is is not a high-SLA platform intended for mission critical systems, so there's really no excuse. I don't think NSA/CIA/DoD would say, "The security model of Windows isn't quite as bad as the /. crowd likes to say it is. Sure, we haven't reviewed it, but the IT guy says it will help us leverage synergy to effect better ROI."
You've seen the first scene of "four weddings and a funeral", surely?
The Johannesburg Stock Exchange, which uses the LSE's trading platform TradElect, also suspended trading.
Hmm. Smells like a new version to me.
Get your own free personal location tracker
Windows does suck, building any mission-critical system on a fundamentally botched foundation is begging for trouble, and knowing that TradElect was built on quicksand is prima facie evidence of negligence. IOW, it probably failed because windows sucks, not the other way around.
Let me explain computers to you
Let me explain stock exchanges to you: if they go down during a trading day, a lot of people lose a lot of money. In years past, this kind of work was typically done on Tandem, Stratus or IBM systems which were so reliable that any unscheduled reboot merited a visit from the factory.
BTW, I've worked on trading systems for Salomon Brothers, Phibro Energy, JP Morgan, and UBS/Warburg. If anyone had suggested running mission-critical back-office apps (like the system of record of a major stock exchange) on windows, they would have been laughed out of the room. I'm astounded that the LSE could be so sloppy.
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
What the hell are you smoking??!? I worked for one of the top switch/router manufacturers for 7 years and this is FAR from true in their shop and pretty much everyone else's. Talk to any technical call-center rep for the top 5 or so router manufacturers and I am sure they can tell you many horror stories...
Is that this is good news for most of /. readers. A lot of big corporatations are being reminded of how important it can be to not cut corners when hiring programmers and IT.
"The ability to delude yourself may be an important survival tool" - Jane Wagner -
Perhaps .NET is not directly at fault here.
Interestingly, the main reason .NET (specifically, C#) was being adopted in our company is that "it gives access to far cheaper programmers than your legacy C++ types."
I guess programmers are not only more expensive because they have a "legacy language" on their resume after all.
You can talk up system z all you want, but when it comes right down to it, most of the outages problems are caused by incompetence, not hardware failure. Because of this, I've actually seen a Win2K based system beat zOS based systems a few years in a row. It frequently has little to do with the hardware, or even the OS.
Yep, I remembered and laughed so hard I had to put the images next to each other:
http://tipotheday.com/2008/09/08/microsofts-foot-in-mouth-london-stock-exchange/
Technology tips and tricks.
Mainframes, AS/400 - pricey but high reliability with 2-3 decades of baseline.
PC's / Windows - cheap, "good enough" for home use and light business use, new as hell and subject to unknown problems- and with a known history of issues too.
I use a pc daily... but there is no way I would put any 99% uptime application on it where huge amounts of money or lives were at stake. It's fine as a client.
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
but it does have a lot to do with the development. They chucked the old, stable (but "obsolete" and slow) systems for something shiny and new. In this case, Windows and .NET 1.1 written by a consultancy with Indian developers.
I doubt reliability really factored much into it, using the newest coolest stuff came first. Possibly for marketing reasons. Remember this was 4 years ago, .NET had pretty much just come out (we ignore v1.0 which was practically a preview release), you know it couldn't have been as good as the MS marketing man said it was.
Follow that logic rigorously and all mission-critical systems would be running on abacus.
You have hit the nail right on the head.
I'd like to suggest another factor: Stability-conscious developers -- those that know about race conditions, memory leaks, atomic transactions, and the like -- tend to gravitate towards operating systems that make it easy to put their ideas into practice.
That isn't to say Windows is inherently unstable, it just means that it is more difficult to write a stable and reliable application on that platform. And even if you think you got all your bases covered, you can still get blindsided by depending on poorly written code churned out by some .NET developer who was happy enough to ship something that appeared to work most of the time.
The good developers then shrug and say Windows is not suitable for critical computing, and go back to UNIX-ish platforms or whatever they are more comfortable with. Rushing into that void are legions of Windows developers who are also happy enough to ship something that appeared to work most of the time, and the cycle continues.
Interesting since they haven't been "running on Microsoft technologies" for "the past six years"...
Modding me -1 troll doesn't make me wrong.
Ok, so here's the tally I've seen so far:
- LSE today (7 hours downtime)
- Ho Chi Minh City stock exchange (3 days downtime)
- Brazil futures, BM & F, aug 26, 2008 and Bovespa Nov, 30th, 2007.
that I've heard of.
It's incredible! This looks systemic and widespread.
I guess it's a great marketing achievement for Microsoft.
When will people in the financial sector wake up and learn they've been duped?
Main difference between the BSD license and the GPL license: one is from California and the other is from Massachusetts
In business, generally it means that solution provider (software + hardware) bears direct responsibility for all unplanned downtimes.
If solution cannot provide such service availability, the solution provider has to be ready to cover all the damages. And it is often planned that way from day one: some downtimes are covers by the "5 nines", some are covered monetarily by solution providers.
That's why 5 nines solutions cost as much as they cost: on one side to allow providers to bring quality of solution to desired level, on another side, in case of emergency, to let them to cover some downtimes with money.
But covering seven(!) hours(!!) can be lethal to the solution provider. But again, it all depends on their support contract. Some (cheaper) 5 nines are delivered without any guarantees: they only theoretically 5 nines and provide only "best effort" service availability.
All hope abandon ye who enter here.
But it says a lot that NYSE runs on Linux and not Microsoft. It seems SOMEONE did listen to the techies.
Not really. I work for an IT sales company. There are plenty of times when we try to sell Linux solutions to our customers and management is all for it but it gets shot down by the "techies" because they want a Windows solution because that is what they are used to.
Haha. Having worked for a Super-Platinum-Alpha MS partner before I'll tell you how it works:
You pay your enormous Partner fee every year. Occasionally MS will send someone out with a powerpoint presentation on a 6 month old MSDN article. If you are doing one of these projects, MS will never ever touch it, or help in any way, beyond paid support for a particular product (like anyone else can pay for). They will put an article in MSDN Magazine like "Microsoft and Fagware collaborate to make Some Awesome System". There will be a photo, with the MS rep shaking the clients hand, with your boss half cropped out. They will mention a whole bunch of MS tech that you (probably didnt) use.
Its not like they ever see a single design doc, let alone line of code. They don't do a damn thing beyond telling everyone how they are collaborating with you to build Awesome-X Plus for Important Client.
Also you get to put "Gold Partner" on your website, and MS occasionally refers clients that need someone to implement stuff.
3laws: No freebies, no backsies, GTFO.
Typical MS hosted services aim for 99.5%. That means about 44 hours of downtime per year. Don't ask how I know. (Posting anonymously, of course.)
WTF did a moderator mark this as flamebait? The poster was right, HA is a) hard and b) expensive.
I designed some of the HA stuff many years ago for Eurex. We used OpenVMS and had two clusters (over 40Km apart) for the main and standby with the standby system also being used for development with a flick of a switch the standby cluster could take over in production. We had no SANs in those days but used Digital's Hierarchical Storage Controllers. These days it runs with SANs but the host systems still run VMS and there are now product specific clusters.
The next level down there are access points containing communications servers providing connectivity to member systems and routing to the hosts which are scattered around the globe. A member normally has connectivity to two access points. The only single point of failure for a member is where both lines come together for the last few metres into their building and some idiot digs a hole in the road.
See my journal, I write things there
I don't believe that this was a technical glitch.
I think people should investigate whether or not certain parties benefited, financially or otherwise, from such a prolonged suspension of trading. I think analyzing the trades that most rapidly transitioned to alternative trading exchanges, and had the highest volumes or dollars, would be very interesting and revealing...
This news story makes me think of a fictional "singularity" themed story I read yesterday:
http://www.ssec.wisc.edu/~billh/g/mcnrsts.html
Is it happening?
MOST of your systems, that says it all. If you design a system you must be a little bit more certain then that "most" of it will be up for the availability your are promising.
I got a consumer HD that so far lasted for 6 years, non-stop. By your logic therefor consumer HD's are fine for a server enviroment because they last for 6 years running 24/7.
If 100% of you systems run 99.999 availabibilty, THEN you can come back. The acceptable error-rate is 0.001% FOR ALL YOUR SYSTEMS. Lets say that you have 10 2003 systems and MOST is 9, then you only got 90% availabilty, you fail, back to the drawing board.
It is hard to get true HA. MS can't do it, doesn't mean you can't use it for little projects. I have seen many a succesfull project launch on PC's thrown together at a local computer store and shoved into a rack. Worked fine, but only an idiot would guarantee any availability on it. Prove me wrong, put in your contracts that you guarantee HA for your clients. See you how your boss/lawyer reacts.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.