The London Stock Exchange Goes Down For Whole Day
Colin Smith writes "TradElect, the Microsoft .Net based trading platform for the London Stock Exchange, was offline for about seven hours, meaning that their 5-nines SLAs are shot for approximately the next 100 years. The TradElect system was launched back in June of 2007 and was designed for increased speed and system capacity."
Assuming 8.5 hour trading day (0700-1530) and 250 trading days/year. Maybe a squirrel caused the problem ... ;-)
Hulk SMASH Celiac Disease
It was an ugly day of finger-pointing and near-fixes, but in the end, it just left all the financial firms standing there staring at the Exchange. Definitely was a big deal--and it seemed like a lot of volume spilled over to US markets, creating volume related issues here.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
I wish people would get into the habit of linking to the single page version of the FA.
The summary implies that TradElect was responsible for the shutdown, but according to the stock exchange itself, it wasn't the case. They say instead it was a network problem.
"5-nines SLA"
I had to look this up, so I imagine other people didn't know it either (I thought was was a stock exchange term). First Google search result reveals the answer,
The Battle With "3 Nines" and The Goal of "5 Nines"
When I worked in academia I used to collaborate on a research project with a data architect from one of the major electronic exchanges. His whole shop is MS and .NET. I asked him why he didn't run Linux / Unix. He said that with competent guys the MS boxes had great uptime. Wall Street can afford to pay the top salaries so they attract guys who really know their stuff. Not just semi-competent people who managed to sit through an MSCE exam. [his words not mine]
Also he said support was crucial for his company. If something went down, he wanted to be able to call someone immediately. He couldn't afford to just post a question on a message board and hope someone replies. He wanted contracts with 3rd party support that had experience with similar huge enterprise systems that he had.
When I said there were companies who could provide excellent Linux support, he said his ass was on the line if something broke so he wanted to be able to justify his software choice to the the C-level guys. And those guys knew the name Microsoft. So he didn't see anything else as an option.
Which from the sounds of this article http://www.computerweekly.com/Articles/2008/06/12/231031/agile-trading-software-critical-to-london-stock-exchange.htm was the intent.
One very interesting note is at the end of the article:
Timeline for Tradelect upgrades
18 June 2007: Tradelect launched, reducing the time taken to process trades from 140 milliseconds to 10 milliseconds. Capacity increased from 593 to 2,500 orders a second.
November 2007: Version 2 upgrade. Trading time reduced from 10 milliseconds to about 6 milliseconds. Capacity increased by 70% from 2,500 to 4,200 orders a second. Introduced full suite of Mifid-compliant services.
September 2008: Planned migration of Italian trades to Tradelect platform.
September 2008: Tradelect Version 2 to launch. Plans to double trading capacity to 10,000 continuous messages per second. Aims to cut average time taken to complete a trade by half from 6 milliseconds to 3 milliseconds.
Coincidence that this month was when they intended to release a new version?
5 nines does not mean what you think it means.
No, you're right. By my calculation, the actual figure is more like 360 years.
(Remember, this is a system that only operates 7.5 hours per day, 250 days per year)
No, he'd waggle his arse .
A fanny would be a vagina in Britain.
Come on +5 informative!
"In the past six years, there have been no production outages at the London Stock Exchange, and the new systems running on Microsoft technologies are critical to maintaining this 100 per cent reliability record."
http://www.microsoft.com/casestudies/casestudy.aspx?casestudyid=200042
"XML is like violence. If it doesn't solve your problem, use more." - Anonymous Coward
The article here blames it on some sort of botched upgrade.
Leaky abstractions (look it up, it is a good read). A lot of times for kitchen sink platforms like .Net and Java you get burned by the bugs buried in the underlying platform. If to many of these system are stacked it becomes really difficult to have any stability.
Of course it is very unlikely that MS achieves five 9s on any installation, let alone as an average.
Engineering is the art of compromise.
Here: http://www.londonstockexchange.com/en-gb/products/membershiptrading/tradingservices/Incident/LIVE
Notice that there were several unsuccessful attempts to bring it back up.
What's really pitiful, LSE has just a fraction of data/trade volume of major US exchanges like Nasdaq or NYSE and still, their systems are regularly getting hosed, albeit not as much as today's meltdown.
Hopefully in coming years LSE will lose market share to Nasdaq/Europe, BATS/Europe, Chi-X and other electronic markets - that should teach them well.
I'm not sure I understand the distinction you're trying to draw,
Latency versus throughput. If the new system processed those serially while the old could handle 130 in parallel, then the old system would be 10x faster even though the new was 10x quicker.
but total transaction capacity of the system increased along the same lines.
Yes, after throwing massive amounts of hardware at the problem.
Dewey, what part of this looks like authorities should be involved?
No, but I can point to the New York Stock Exchange, which uses AIX and Linux.
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Well, the Reuters article does say that trading started normally, but some traders were unable to connect, so the whole exchange was bought down to avoid unfair advantage/disadvantage occurring, so actually both stories are consistent.
Certain languages have features that eliminate large classes of errors. Whilst its possible that programmers will find other ways to screw up, I'd have though that reducing the set of errors that are actually possible would go some way to improving reliability.
With a general purpose programming language, the number of ways to screw up is effectively infinite. If you take another infinite set, say, the integers, and eliminate a large subset, say the even integers, you still have an infinite set left over. The GP is simply pointing out that there will always be programmers who screw up in ways that haven't been eliminated.
That was the their first mistake. What were they thinking? You need a 3 highly available Unix clusters with three SANs. You need three to elect a quorum. If you don't know what a quorum is you shouldn't be attempting to design system that is supposed to deliver on a 5-nine SLA. Each geographic location should include 1 cluster and 1 SAN. All three locations networked with dark fiber. fiber routing should be set up so that a cluster can fail over to a SAN in another location. As far as Hardware is concerned, I would go with a cluster of IBM P6-570 and use an EMC Symmetrix DMX SAN at each site. .Net trading platform.. I have to laugh! Microsoft .net = 5.none SLA! .Net is only good for people who would like to create a light duty website. Under a load it breaks. The London Stock Exchange proves my point.
Who the heck designed this?
I've been the lead architect and/or senior programmer on a couple of futures and options exchange trading platforms (CBOT's Order Routing System, CBOE's CBOEdirect) and pay the bills currently by connecting firms to various electronic trading platforms. Hardly anyone uses Microsoft except for GUI/user-facing applications. The back-end stuff is almost always UNIX/Linux.
Off the top of my head, I know that all the LiffeConnect-based systems (London Financial Futures Exchange, EuroNext, Amsterdam, CBOT Metals Complex, Tokyo Futures Exchange, probably a couple of others) run on Linux (a relatively recent change from Sun boxen). NYSE now owns that codebase, and I'm pretty sure that the NYSE uses Linux and AIX on its own platform.
The Chicago Mercantile Exchange's GLOBEX trading engine (running CME, CBOT non-Metals, NYMEX plus a couple smaller exchanges like Minneapolis and Kansas City) platform runs on Linux. They migrated from Solaris to Red Hat back in 2004.
The Intercontinental Exchange's WebICE platform is written in Java and I believe it's running on Linux, but there may be some Solaris still around.
The CBOEdirect system is Java but runs mostly on Sun Enterprise hardware. There is some Linux in the mix, and they certainly use it on some of their other trading systems.
In the (futures and options) trading world, running on Windows servers is considered to be a sure sign of being bush-league. Demand for UNIX/Linux is huge. And I'm not saying this as a Java/UNIX/Linux snob - most of the systems I've written were Microsoft-based (for a variety of reasons - most started out as technology demonstrations that grew way beyond their intended lifespan - "the client's always right").
I work in London as a freelancer in IT in Investment Banking. My professional experience was mostly with IT Products/Services companies.
Although I haven't worked in the LSE, from the places I've worked in around here I came out with the impression that most people in IT in this industry are amateurs (and that includes those in other geographical locations).
Any kind of more advanced IT concepts such as technical analysis, software/hardware architecture, iterative software development processes are pretty much either not done or done by people you don't have clue about what they're doing.
I'm hardly surprised with what happened in the LSE.