Message Storm Knocks NYSE Offline
ninjee writes "The New York Stock Exchange is re-examining its network after it was forced to close four minutes early at 3:56pm on Wednesday (1 June) because of a communications glitch. Trading opened on time (09:30 EDT) the following morning but the outage irked traders and raised questions about the reliability of a network described as 'ultra reliable' following improvements made in the wake the September 11 terrorist attacks. The outage stemmed from a fault in a system designed to distribute market data and operate computer trading systems. NYSE Chief Executive John Thain said that both the main system and its backup were swamped with error messages, Reuters reports. He added that the exchange would carry out remedial work designed to prevent any repetition of the problem."
They will begin beating the squirrels at precisely 3:55 EST from now on.
"Rocky Rococo, at your cervix!"
Immediately claimed the message storm to be the work of linux hackers
Was there any Linux role here? If not someone should be fired.
Oh well, it's only money.
Now that we've discussed this to death for the last six days on NANOG and similar lists, Slashdot and its various clueless idiots can have at it.
Go!
I blame MS, Gates and Ballmer. If they were 100% open source this would never happen.
That the stock market rallied after GM announced 25,000 job cuts. Looks like high unemployment to drive down the cost of labour is the new in thing.
Oh and btw Slashdot nobody can read your f**king anti-script image half the time. God job discriminating against those with less than perfect eyesight...which is half your audience.
Man, this is messed. Sooner or later, somebody will probably blame Windows or whatever excuse they can think of ATM. Also, I don't think it'll be just squirrels that will be beaten at 3:55. XDXDXD
It is "ultra reliable' but you've got to remember the amount of hits this site takes a day... it makes /. Trolling look like a fairy Godmother!
||| I still can't believe Parkay's not butter.
I as well as many others in my office got royally screwed here, getting stuck with quite sizeable unhedged positions overnight. It's bad enough that order routing went down, but they failed to open up for a final print (as originally proposed) later in the afternoon. Very bad.
How embarassing. The NYSE should switch to OS X, a REAL OS for REAL computing.
Why not just say, "Unsinkable" instead?
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
1. If anything can go wrong, it will. (see Murphy's law)
2. Systems in general work poorly or not at all.
3. Complicated systems seldom exceed five percent efficiency.
4. In complex systems, malfunction and even total non-function may not be detectable for long periods (if ever).
5. A system can fail in an infinite number of ways.
6. Systems tend to grow, and as they grow, they encroach.
7. As systems grow in complexity, they tend to oppose their stated function.
8. As systems grow in size, they tend to lose basic functions.
9. The larger the system, the less the variety in the product.
10. The larger the system, the narrower and more specialized the interfaces between individual elements.
11. Control of a system is exercised by the element with the greatest variety of behavioral responses.
12. Loose systems last longer and work better.
13. Complex systems exhibit complex and unexpected behaviors.
14. Colossal systems foster colossal errors.
-KISS
It's called "leaving early from work".....everyone does it.
I haven't got the time or I would look myself - does anyone have any more informative sources on the specific information about the cause of the problem? And WTF is a "Message Storm"? God - another catchphrase - great!
1f u c4n r34d th1s u r34lly n33d t0 g37 l41d Capitalization really works: i helped my uncle jack off a horse
They had to hire outside *nix coders when the in-house MS crew couldn't integrate the existing WinLAN into the (unsupported, shortsighted) Linux rollout last month.
It sounds like a distributed systems failure, alright.
Here is something about the system that might have broken. I'm wondering if the thing that failed really is the thing mentioned here -- the stuff the stuff Birman did. His new book on distributed systems is out, by the way.
Somone will get flying ninja-kicked in the nuts for this, you can be sure.
http://www.thebricktestament.com/the_law/when_to_
inquiring minds want to know .. ..
~darkness_falls
Seriously, who cares? No system is ever perfect. It's better that it happens now and gets fixed then sometime down the road where it can cause serious problems.
...of a wikipedia text. (You didn't follow the terms of the GNU Free Documentation License.)
That doesn't mean it will never get dirty.
-Randy
I was performing page move vandalism on their website!
Do you play with your Willy?
JoeTrader: dood, chk out MSFT, 12m volume :O
XyxyZ: wtf i sold on margin
-- NASDUCK has entered the channel
JoeTrader: rofl!
NASDUCK: whatsup?
JoeTrader: sam sold msft on margin before the spike
NASDUCK: HAHAHA!
JoeTrader: werd
XyxyZ: screw you guys
JoeTrader: OMG roflrofldolololo!!!!!
NASDUCK: you are such a tool, sam
JoeTrader: brb, gotta tell the office
-SYSTEM- JoeTrader has left the channel (sam in a tool)
-SYSTEM-:NASDUCK has changed the subject to "XyxyZ sold MSFT before the spike today!!!:D:D:D"
XyxyZ: fu duck. i hope my boss isn't online
XyxyZ: ops
XyxyZ: +ops
-SYSTEM- Hot2Trade has joined the channel
NASDUCK: nice try, only way to erase that is to crash the server
Hot2Trade: Sam, I heard that you got the horns of the bull shoved up where the bear don't shine
XyxyZ: dude this sux hard
-SYSTEM- JOHN@MLYNCH has joined the channel
NASDUCK: nice one Hot2Trade. asl?
Hot2Trade: fu hippy, this is Jerry in at prudential
NASDUCK: fuc sorry, didn't recognize you
XyxyZ: So if I can down the server, I can erase the subject?
Hot2Trade: no worries I just changed my nic
NASDUCK: XyxyZ, you got pwned by the bull
JOHN@MLYNCH: SAM! HAHAHA I TOLDYOU NOT TO SELL!
JOHN@MLYNCH: YOU AER
JOHN@MLYNCH: SUCH A SP
XyxyZ: i got s cript
JOHN@MLYNCH: AZZZ!!!!!!!!!!!!!!!!!!!
XyxyZ: take this bitches
XyxyZ: THE C THE R THE I THE M THE I THE N THE A THE L
XyxyZ: THE C THE R THE I THE M THE I THE N THE A THE L
XyxyZ: THE C THE R THE I THE M THE I THE N THE A THE L
XyxyZ: THE C THE R THE I THE M THE I THE N THE A THE L
XyxyZ: THE C THE R THE I THE M THE I THE N THE A THE L
- SYSTEM - NASDUCK (quit(connection reset by peer))
- SYSTEM - JOHN@MLYNCH (quit(connection reset by peer))
- SYSTEM - Hot2Trade (quite(connection reset by peer))
- SYSTEM - error(91) - rebooting
as a trading engine developer/support guy for a financial firm in ny, i can't stress enough what a pain in the ass this was. the day after the nyse crash, it took hours upon hours of verifying (by hand) trades that the nyse says we were filled on that we never say (because all nyse trading lines were down).
this type of 'message flood' occurs from time to time, but not on the nyse in a while. it's generally the ecms trading otc stocks that have rouge programs blast orders in an infinite loop. when this happens to an ecm, they slow down but generally don't lose the ability to trade. the nyse, who toutes the importance of their rapists^H^H^H^H^H^H^Hspecialists because they add 'stability' to the system, was dead in the water. this crash goes to show how useless the specialists really are - without the technology working, they can do nothing. if this is the case, why not just replace them altogether with electronic trade matching?
interestingly enough, the nyse announced mere months ago that they are 'merging' with archipelago - a large ecm. perhaps this merger will be the beginning of the end of the specialists.
Wave upon wave of demented avengers March cheerfully out of obscurity into the dream
"the main system and its backup were swamped with error messages, Reuters reports"
Which is kinda funny, since it was *probably* a reuters feed that was spewing the errors in the first place....
At least this didn't happen at the begining of the trading day, the very last thing we need is strain on the economy...
I wonder if NYSE uses tibco rendevouz for their message transport "bus". My work uses this software, and our usage of it has stressed it to extremes and you can end up with message storm issues.
FYI this system is a multicast-based publish-subscribe system. The multicast thing tends to be a wash IMHO, especially since many people use it for queues, rather than true 1 to many messaging.
This was actually most likely the result of a multicast or "slow consumer" storm. In a multicast network environment, often desktops are overloaded by all of the filtering they must perform (multicast sends nearly everything to everybody). Sometimes some desktops will miss a packet and ask for a retransmission. Often, this involves retransmitting in multicast-form - that is to all of the consumers. If this happens too many times, you get a storm. No matter what the NYSE does (unless they buy our technology - email for info if you'd like), they will not be able to avoid this happening in the future. It is a condition of middleware systems that were not meant to handle the volumes of modern stock markets, rather than a simple glitch in the network infrastructure.
... that they use linux. =P
Source
NYSE is an IBM shop, using DB2, websphere. Its competitor, NASDAQ is using a Microsoft solution. Not a good week to be IBM.
Have you ever been to a turkish prison?
"The outage stemmed from a fault in a system designed to distribute market data and operate computer trading systems."
.
I think they are using TIBCO for their data messaging bus.
. . . the reliability of a network described as "ultra reliable" . .
The use of the word network doesn't seem to fit. They won't be calling cisco.
The Dude
PHB, about to leave for vacation, configures an automatic out-of-office response to any incoming e-mail. Then, not sure that he's done it correctly, he decides to send himself an e-mail...
That message storm brought down our network.
Somewhere, in a secret underground lair wallpapered with 100 dollar bills, Dick Grasso is laughing maniacally.
I work in Technology for a Wall Street firm (you've heard of them). Stuff like this happens all the time -- systems go down and are usually back up pretty quickly, some route to some exchange will bounce for a few min. This time it was worse in that it affected NYSE and not one of the smaller exchanges at the end of the trading day. If you look at any graph showing trading volumes, the last few minutes of trading are always the heaviest.
99.9% of the time, things bounce back very quickly and with the exception of a few internal emails, nobody cares, things go on.
T-Shirt:
"I built NYSE IT backbone and all I got was this lousy t-shirt"
CV (Some poor teenager applying for underpaid software QA position):
Work Experience:
- Leading developer in development of mission critical heavily distributed and absolutely fault tolerant system that can handle 20m transactions per second (NYSE).
I told them to throw away all those hubs and upgrade to a nice set of Layer-3 switches, no one ever listens to me though...
Get your Unix fortune now!
NYSE Chief Executive John Thain said that both the main system and its backup were swamped with error messages, Reuters reports. He added that the exchange would carry out remedial work designed to prevent any repetition of the problem.
No, the remedial work is designed to cull out less adaptive problems, thus preparing the digital ecosystem for the emergence of tougher problems.
-kgj
-kgj
you should only drink natural breast milk
Except, of course, that we're not talking about the public website here, but some of the trading systems themselves. A little bot of a difference there.
Yes those systems are not IBM's... they are SIAC.
||| I still can't believe Parkay's not butter.
NYSE/OS has searched your drive for under-used and overly-large files and generated the following report.
Under-used
----------
c:\tickerlogs\SUNW.log 942 bytes
Over-sized
----------
c:\logs\NYSEerrors.log 543 Gb
Well?
...is that I could actually see that being exactly what happened and even what was said...
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
I didn't realise "Ultra Reliable" mean't "Totally Reliable". Mistakes will occur, people need to realise that error-proof systems do not and most likely will not exist.
I seem to remember the repetition of error messages as common in these high reliability systems.
I think the same sort if thing happened with the US power grid shutdown?
Time to put in something to slow down the maximum number of error messages.
A blog I run for the wealth
SELL SCOXE
I am the unwilling control for my Origin.
I'm not 100% current on my obscure references, but this one completely escapes me. Any help?