JPMorgan Rolls Out (Another) FPGA Supercomputer
An anonymous reader writes "JP Morgan is expanding its use of dataflow supercomputers to speed up more of its fixed income trading operations. Earlier this year, the bank revealed how it reduced the time it took to run an end-of-day risk calculation from eight hours down to just 238 seconds. The new dataflow supercomputer, where the computer chips are tailored to perform specific, bespoke tasks (as explained in this Wall Street Journal article) — will be equivalent to more than 12,000 conventional x86 cores, providing 128 Teraflops of performance."
So they can project how much money to borrow from the Federal Government the next time they have lent beyond sane limits to property speculators or invested in schemes even Mandelbrot wouldn't be able to simulate.
A feeling of having made the same mistake before: Deja Foobar
for all you people who don't really think in seconds when seconds is > 60.
How much is your data worth? Back it up now.
These banks aren't just siphoning money, they are also siphoning talent away from more important projects. The people working on these things could be brilliant physicists or engineers, if they weren't sucked into the dark side.
This is not really news.
Nope. OpenCL 1.2 was released recently. That's news.
;)
But it's certainly popular to blame banks with computers for everything. It's been happening for as long as I can remember.
As a boy being forced to attend church, I remember sermons on the evil computer somewhere in a room, nicknamed "The Beast," calculating everything everyone did. Boy were those people wrong. Turns out the computer was named google.
using FPGA's instead of x86 would probably consume a significant amount less electricity. using manpower is good in terms of the many men being paid for their efforts instead or a few ceos just pocketing the money as extra bonuses
This story got me thinking that many of the tasks routinely executed on personal computers (perhaps cryptography, video decoding, and such) may benefit from including a FPGA in PCs to serve as a programmable coprocessor. Much like graphics-intensive software can come with shader code to offload processing to the GPU, couldn't a video codec or an implementation of SSL or whatever come with code that would allow an FPGA to do part of the work?
I googled around and found that at least CERN has done something of the sort, but that was over seven years ago. There was a story on Slashdot about something of this sort, but it's even older than the CERN publication. Is anyone working on this sort of idea? If not, why? Is it simply a matter of cost, or is there some other issue that makes this impractical?
Maybe I just suck at googling...
I spent two great years working for J. P. Morgan Chase, starting in 1999, followed by a year with the merged JPMC, so I have some knowledge of how this new system will be used and how it fits into the business process of running a bank. I can't discuss details about that, but I just wanted to share my congratulations with the JPMC team for tackling that thorny issue.
You have to understand that investments can't be made until those risk analyses are done, so cutting 7-8 hours off the run time will earn the company millions over the course of a year. We're talking about the kind of investment loans where even a 4-5 hour overnight "float" of capital to help someone seal a bigger deal can be worth a significant amount of interest and profit.
Remember: the big investment banks are dealing with numbers that cause spreadsheets to overflow. You can't even visualize the data with standard desktop tools. You wouldn't believe the totals I saw come out of some reports, and I wish I could forget them. Such numbers are not meant for the grasp of mere humans living on a working wage.
I do not fail; I succeed at finding out what does not work.
Now JP Morgan can raid future MF Globals all that much faster, while hiding their shenanigans at the COMEX.
How much money are they spending in manpower, electricity and consumables by calculating risk? how about make a super computer to figure out how to solve the world debt.
Everyone knows the answer to that question already. Learn to live with less resources for each person, or figure out how to have less people.
ALU * clock is meaningless measure.
Nowhere did I mention clock speeds. I mentioned TFlops. Any TFlop numbers from the manufacturers, are most likely, best case theoretical hardware performance numbers, based on ALU * clock.
As for whether they are meaningful or meaningless, that depends on your point of view.
If you are a kid playing video games that only cares about frame rates, then perhaps it is meaningless to you.
If you are a developer looking at hardware, who can't yet benchmark the actual application, then it is as meaningful a measure as can be estimated.
I don't know any competent developer or systems integrator who thinks they will get 100% of the manufacturers advertised performance out of anything once they execute their own software. Please feel free to downgrade the manufacturers stated performance to suit your personal situation.
It is a FPGA supercomputer, don't you need to reconfigure it first?
GENERATION 25: The first time you see this, copy it into your sig on any forum and add 1 to the generation.
More importantly, it makes it impossible for anyone to dispute their results. After all, no one else has exactly the same system so no one is better qualified to evaluate their conclusions. "No, we made the best possible choice at the time. You would know that if you had OUR analytic engine, but since you don't your speculations are baseless."
The reason we subjugate ourselves to law is to better procure justice. If law does not accomplish this purpose then it m
Let face it JP is doing some Bitcoin Mining on the side with this hardware! loll Off course that would be less of a scam that what they really do with it!
*sigh*
The problems the risk analysis team faced even in the 2000 era was such a tough nut to crack that they had to limit the complexity of the algorithms they used just because there wasn't hardware powerful enough.
All the things you mentioned have only added to that complexity, making the calculations that much worse and that much more expensive.
So instead of making me change my mind, you just made me realize how much more impressive their achievement was than I first thought.
Spreadsheets back then did not have arbitrary precision decimal or integer values -- they used floating point. I have no idea whether newer spreadsheets have shifted to arbitrary precision values or not, but if they haven't the spreadsheets blow up.
Remember: little companies like GE, GM, and Exxon are the kind of customers who have deposits with an investment bank. As a result, numbers like total deposits held blow floating point values out of the water by a significant margin. The amount of money floating around the world really does generate some stunning sequences of digits, they're almost magically long numbers like the nth digit of Pi. They just don't register as "billions" or "trillions" automatically, you have to count the digits and think a moment about what that number is supposed to be called. :D
I do not fail; I succeed at finding out what does not work.
What a waste. These banks can build the most impressive hardware in the world, perform calculations in the petaflops, and still have absolutely no clue what risk is involved in the business they're doing because their assumptions and data are all wrong. If by some freak accident they were to get the right answer, they would conceal it from their clients and investors anyway because their incentive is to take big risks - they get enormous rewards if they are right and lose little if they are wrong. They are incompetent and amoral, which is simply not a technical problem.
Typically, you write a testbench that can, in fact, printf() (sort of). Basically, you end up running a timing-level simulation of the FPGA, or sub-blocks thereof. You're really not developing a piece of software, you're developing a small ASIC. In any event, after you run timing simulations through your testbench where you put in known inputs and verify that you get the expected outputs, you're ready for anywhere from a few minutes to a few days (depending on the size of the FPGA) of compilation to get your code turned into a bitstream to program the device. Then you run the same inputs from the testbench and see if you get the same output. At this point, you hope that you remembered to build various debugging registers into your design so you can have a prayer of finding problems. You can also stop the clock and scan out the value of every register on the chip through something called JTAG. You can then import that back into your simulator to try to figure out what has gone wrong. Then restart at the "few days of compilation" stage.
This is why so few people do reconfigurable computing....
Go Badgers! -- #include "std/disclaimer.h"
and they do their damn best to keep it quiet. Look at how quickly stories about Congress and their staffs using insider information was quashed. It went from being front page to gone in days, as if it didn't exist. Similar to how Fast and Furious vanished.
We need an OKS, Occupy K-Street, Wall Street is fully enabled by Washington. They just pay their dues and Washington insiders reap the real rewards.
* Winners compare their achievements to their goals, losers compare theirs to that of others.
Sadly, both stories lack details on how the FPGAs are used in the computing architecture. Instead the spend great lengths on listing telephone number like, meaningless speedup comparisons with conventional hardware. A typical drawback of FPGAs is that they cannot accommodate as many floating point units (FPUs) per chip as current GPUs and that FPGAs run at about 10x lower clock speeds. Their advantage however, is that the internal chip architecture can be reconfigured to match the algorithm, so that all FPUs run at maximum efficiency. At the end of the day, it really depends on the algorithm, whether it's run best on FPGAs, GPUs or standard CPUs. This is also the reason why one cannot say that an FPGA is X times faster than a GPU: it really depends on the algorithm.
Maxeler, the manufacturer of the machine, had a booth at SC11. The basic component is the MAX3 card, a PCIe 2.0 8x card with up to 96 GB of DRAM on board. The boards are optimized for data stream processing. This is not unlike how GPUs are architectured.
Up to 4 of those boards are located in a MaxNode, which can then be networked via 10Gbit Ethernet or InfiniBand. Multiple MaxNodes can be put into a MaxRack, which can also be seen in the WSJ article. The MAX3 boards can be connected via a custom MaxRing network, which provides a bandwidth of 8 GB/s.
Computer simulation made easy -- LibGeoDecomp
The problems the risk analysis team faced even in the 2000 era was such a tough nut to crack that they had to limit the complexity of the algorithms they used just because there wasn't hardware powerful enough.
Look, the problem here is the black swan. You can't model a black swan unless you can simulate the entire world economy down to the last neuron in some farmer's brain in a rural Chinese village. Right now we can't model a single human brain let alone all of them.
The world economy didn't melt down because some spreadsheet only calculated 12 decimal places when it should have calculated 325. It melted down because everybody decided to leverage themselves 100x on the bet that housing prices wouldn't ever go down, and they did. Now the world governments are starting to leverage themselves in small multiples on the bet that nobody would ever stop buying their bonds, mostly to bail out the bankers who bet on housing prices. I don't need arbitrary precision arithmetic to tell you where that is going to end up if it doesn't change FAST.