The London Stock Exchange Goes Down For Whole Day
Colin Smith writes "TradElect, the Microsoft .Net based trading platform for the London Stock Exchange, was offline for about seven hours, meaning that their 5-nines SLAs are shot for approximately the next 100 years. The TradElect system was launched back in June of 2007 and was designed for increased speed and system capacity."
...now if only my wife would do that! /rimshot!
most of the american stock exchanges have been going down all year.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0 is the magic number.
Assuming 8.5 hour trading day (0700-1530) and 250 trading days/year. Maybe a squirrel caused the problem ... ;-)
Hulk SMASH Celiac Disease
But Patch Tuesday is tomorrow?
Get your own free personal location tracker
Looks like someone needs to brush up on their buzzwords, specifically "mission critical" and "services no longer required".
"As God is my witness, I thought turkeys could fly." A. Carlson
Since when is 7 hours even close to "a whole day"? Maybe you meant "almost a whole business day"?
It's a whole trading day--and that's all that really matters when it comes to a major market.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
I wish people would get into the habit of linking to the single page version of the FA.
The summary implies that TradElect was responsible for the shutdown, but according to the stock exchange itself, it wasn't the case. They say instead it was a network problem.
Oh, she does... just not with you.
nudge nudge, wink wink.
Dedicated Cthulhu Cultist since 4523 BC.
"and was designed for increased speed and system capacity"
and see - it went down far faster and more completely than the previous system would have been able to. So that's progress. It's all in how you present it.
So their 9.9999% uptime is screwed?
proud caffeine whore
Perhaps the bit you're missing is that windows isn't quite as bad as the /. crowd likes to say it is. Especially if its an older (translation: fixed & stable) variety like win2k or even nt4.
I'm not sure if you're serious or not, but surely you aren't trying to compare NT4 uptime with the 5 9s of a solid System z platform?
Oh please. Persuasive marketers can get Windows installed just about anywhere including US war ships.
While it is commonly accepted by many techies (and strongly denied by others) that Microsoft Windows is not a suitable platform for that level of computing, sales people often bypass the techies who know better and sell to managers and executives who still believe "you can't get fired for using Microsoft."
With all this said, it will be quite some time (and possibly never) that we will ever know for certain what is at the root cause of the failure. You can be sure that Microsoft is all over this problem both technically and P.R.-wise. They won't let the facts get out if they are damaging. Recall the major power outage that many still believe was caused by a worm attacking Microsoft servers? As far as I can see, the true cause of that failure has yet to be revealed.
But if this was a planned event, or an unplanned disaster resulting from a planned event gone bad (updates, upgrade, other maintenance), you would think they would have provided for mishaps in some way or another.
But as this news story is all I have to go on, there is no indication of cause and so I will not presume this is a Microsoft problem. But it says a lot that NYSE runs on Linux and not Microsoft. It seems SOMEONE did listen to the techies.
After the malfunction, TradElect was immediately bought by UK's government for $200 billion and all its debts waved. In an unrelated story, medicare tax was raised yet again because of an unexpected shortfall.
Does anyone else remember the "The london stock exchange chose windows 2003 for reliability, they didn't choose linux" ad banners that used to run all over the place, including slashdot if i remember?
Funny how it's all come crashing down...
"The london stock exchange chose windows, but after 7 hours of downtime wishes they had chosen linux".
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
The same thing that happened this time?
He's getting rather old, but he's a good mouse.
When I worked in academia I used to collaborate on a research project with a data architect from one of the major electronic exchanges. His whole shop is MS and .NET. I asked him why he didn't run Linux / Unix. He said that with competent guys the MS boxes had great uptime. Wall Street can afford to pay the top salaries so they attract guys who really know their stuff. Not just semi-competent people who managed to sit through an MSCE exam. [his words not mine]
Also he said support was crucial for his company. If something went down, he wanted to be able to call someone immediately. He couldn't afford to just post a question on a message board and hope someone replies. He wanted contracts with 3rd party support that had experience with similar huge enterprise systems that he had.
When I said there were companies who could provide excellent Linux support, he said his ass was on the line if something broke so he wanted to be able to justify his software choice to the the C-level guys. And those guys knew the name Microsoft. So he didn't see anything else as an option.
So what happens when this happens again?
Well, first "Have you tried turning it off and on again?"
Otherwise, "Are you sure it's plugged in?"
GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
Wait! Are you suggesting that downtime can be caused by application problems, network problems, hardware problems, dumbass systems administrators and a whole slew of other things completed unrelated to the platform on which it is running?
I am *shocked*! *Shocked* I tell you!
My blog
Actually this is "again".
The LSE used to run on HP-NonStop (w/ Cobol and C as far as I can find) but still managed to take itself down for 8 hours in 2000.
If they're going to go down for a day every 7-8 years it might as well be cheaper and faster. (Articles quote the CTO as citing 10x performance increases).
(All based on a quick google search)
So before the hounds descend upon Microsoft it would seem the LSE has a history managing to bring down whatever system they run on.
Followed by the youngest member of the team becoming the scape goat and being fired.
-Ours is the wisdom of Solomon, the magic of Merlyn, the fall of Icaris.
The LSE going down is a big deal. The US exchanges have been trying very hard to displace LSE's strong hold in the EUROPEAN markets. With the merger of NYSE/Euronext and NASDAQ/OMX this cuts market share and faith in LSE as everyday passes. Additionally with continued tech issues, NASDAQ could reinvigorate their bid for LSE again! I work for a data major data vendor, and I know from experience the NYSE and NASDAQ are much more reliable than their European counterparts. Also LSE going down today is huge, considering the news on Fannie/Freddie, WAMU, Lehman, and the WRONG news on United Airlines. Many arbitrage opportunities were lost for LSE traders.
Which from the sounds of this article http://www.computerweekly.com/Articles/2008/06/12/231031/agile-trading-software-critical-to-london-stock-exchange.htm was the intent.
One very interesting note is at the end of the article:
Timeline for Tradelect upgrades
18 June 2007: Tradelect launched, reducing the time taken to process trades from 140 milliseconds to 10 milliseconds. Capacity increased from 593 to 2,500 orders a second.
November 2007: Version 2 upgrade. Trading time reduced from 10 milliseconds to about 6 milliseconds. Capacity increased by 70% from 2,500 to 4,200 orders a second. Introduced full suite of Mifid-compliant services.
September 2008: Planned migration of Italian trades to Tradelect platform.
September 2008: Tradelect Version 2 to launch. Plans to double trading capacity to 10,000 continuous messages per second. Aims to cut average time taken to complete a trade by half from 6 milliseconds to 3 milliseconds.
Coincidence that this month was when they intended to release a new version?
5 nines does not mean what you think it means.
No, you're right. By my calculation, the actual figure is more like 360 years.
(Remember, this is a system that only operates 7.5 hours per day, 250 days per year)
Well, that gives a new meaning to opening Windows to Dungeon Dimensions.
Ignore this signature. By order.
In other words, he used the "no one ever got fired for buying IBM" defense.
No, he'd waggle his arse .
A fanny would be a vagina in Britain.
Come on +5 informative!
Oh, ye of lesser cynicism. I also, long ago, used to believe that language features could improve software reliability. Nowadays the idea just makes me cackle -- in actuality the universe just invents better idiots.
- "History shows again and again how nature points out the folly of men" -- Blue Oyster Cult, 'Godzilla'
The article here blames it on some sort of botched upgrade.
I couldn't disagree more. Although automatic garbage collection is nice, this doesn't mean that you'll get "five nines uptime" systems by working with "less experienced" coders.
If you're building a system that must guarantee 999.99% uptime, you wait until your best professionals become available, because it doesn't only involve code. You DON'T give the job to the less experienced ones, no matter how great the programming language. Five nines uptime requires a very robust design and very solid code quality running on a very solid platform which is running on a very solid OS on a very solid infrastructure. You'll want everything to be tested by unit tests, integration tests, regression tests, and whatnot. That involves a whole lot more than 'just' coders, but whoever works on it, they better be good at it.
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
Nah.
They'll be back at "5 nines" by next week.
The trick is to either redefine what the term means (so they are actually referring to 9.9999% uptime), or the timeframe (we've been at "5 nines" for the whole year" - said Jan 1 2009), or both ("so, we use 1 day as a data point, then if we've been up for any part of that day, we're good... so we've always operated at '5 nines' reliability")
This space for rent. All reasonable inquiries will be entertained at proprietors discretion.
Here: http://www.londonstockexchange.com/en-gb/products/membershiptrading/tradingservices/Incident/LIVE
Notice that there were several unsuccessful attempts to bring it back up.
What's really pitiful, LSE has just a fraction of data/trade volume of major US exchanges like Nasdaq or NYSE and still, their systems are regularly getting hosed, albeit not as much as today's meltdown.
Hopefully in coming years LSE will lose market share to Nasdaq/Europe, BATS/Europe, Chi-X and other electronic markets - that should teach them well.
Well, the Reuters article does say that trading started normally, but some traders were unable to connect, so the whole exchange was bought down to avoid unfair advantage/disadvantage occurring, so actually both stories are consistent.
Interesting since they haven't been "running on Microsoft technologies" for "the past six years"...
Modding me -1 troll doesn't make me wrong.
That was the their first mistake. What were they thinking? You need a 3 highly available Unix clusters with three SANs. You need three to elect a quorum. If you don't know what a quorum is you shouldn't be attempting to design system that is supposed to deliver on a 5-nine SLA. Each geographic location should include 1 cluster and 1 SAN. All three locations networked with dark fiber. fiber routing should be set up so that a cluster can fail over to a SAN in another location. As far as Hardware is concerned, I would go with a cluster of IBM P6-570 and use an EMC Symmetrix DMX SAN at each site. .Net trading platform.. I have to laugh! Microsoft .net = 5.none SLA! .Net is only good for people who would like to create a light duty website. Under a load it breaks. The London Stock Exchange proves my point.
Who the heck designed this?
What.. what's a wife?
WIFE: Specialized form of WIFI, indicating one of two stations engaged in a (semi-)permanent point-to-point link, the other station typically called HUSBAND. Unsecured transmission often leads to packet loss 9 months after initial association, resulting in long-term elevated QoS requirements. Roaming is usually forbidden by link protocol, although experiments with mesh networks have been reported. DOS attacks often lead to severed links, litigation and possibly material and financial damages.
The Hacker's Guide To The Kernel: Don't panic()!
I have a feeling that the 'normal' IT situation was to blame for this.
Preamble: Technical Expertise provided a wonderful architecture that was HA and robust, fast, and scalable.
Bean Counters looked at the cost and said "You Tech guys spend too much money."
IT architects: "How much is your data worth?"
Bean Counters: "Not this much. Look we don't really need all of these systems. My home system has been working for 4 years with no problems. And I've talked with Microsoft Execs and they will cut us a deal for their platform. Now go away, I've just decided how the architecture will be done. Why did we hire you anyways?"
There are no loopholes. It's either legal or it's not.