The Story Behind a Failed HPC Startup
jbrodkin writes "SiCortex had an idea that it thought would take the supercomputing world by storm — build the most energy-efficient HPC clusters on the planet. But the recession, and the difficulties of penetrating a market dominated by Intel-based machines, proved to be too much for the company to handle. SiCortex ended up folding earlier this year, and its story may be a cautionary tale for startups trying to bring innovation to the supercomputing industry."
Don't try anything new.
Lightfleet soon to follow. How is the company that was using Transmeta chips doing?
I've abandoned my search for truth; now I'm just looking for some useful delusions.
Why not use something based of the Atom chip but massively parallel.
You can create something that is unique but you reduce the buying base of your systems....
I would like to see a super computer based off of Laptop type low power components.
Tsukasa: All I really want, is to be left alone...
If energy efficiency was their main pitch, perhaps starting it when they did was just bad luck or being too far ahead of the curve.
Wait a couple 5-10-20 years or so. As people use less energy and the companies raise their rates to compensate, perhaps these types of solutions will become more appetizing.
OTOH I don't have any numbers data to back it all up, though, such as cost of a new HPC system from a different vendor versus Intel and all the energy, support, and training that go with each.
-
In a blog post after SiCortex shut down, Reilly says he believes there is still room for non-x86 machines in the HPC market. He is wrong. Much more money is being spent every year on improving x86 chips than all the competitors combined. Basing a supercomputer on MIPs was short-sighted; even if it offers a a price/performance or power/performance advantage now, in a couple years it won't, because x86 is being improved at a much faster rate. Where is Sequent now? The only way to build a successful desktop HPC company is to be able to do system design turns as fast as new x86 generations come out and ship soon after the new CPUs become widely available, e.g. a complete new product every 6 months. That requires partnership with either Intel or AMD, not use of a MIPs chip that no one is spending R&D resources on anymore.
I've abandoned my search for truth; now I'm just looking for some useful delusions.
SAVING ENERGY HAS FAILED!
in ZA WARUDO!
MUDA DA!
... is almost always wrong. As one of the principals on a large-ish (not large by world standards, 1000 cores, mainly Nehalem so approximately 100 GFLOPS) cluster, I've been very pleased that we've done things as simply as possible. Sun Grid Engine and ROCKS running on commodity 1Us delivers an economical and effective solution (no, I don't work for Sun).
Most importantly, the environment does not unduly restrict what kind of compute jobs can be run. If it can be compiled on *nix, we can probably run it. We lose to specialized hardware (GPU-based, Cell-based, ... ) in raw throughput but we make up for it in both initial price and ease of deployment. We don't even have a dedicated admin for the cluster -- we had one to set it up and he did such a good job we haven't needed to hire a replacement!
Ultimately, I feel like it's not worth paying extra in hardware and software-dev costs to save few dollars on cooling and power. Sure, you get credibility of running a "green" cluster (nevermind that you have to pay to feed and house those extra developers, which should legitimately come out of your carbon budget) but you end with with a far less useful product.
Long Live X86(_64)!
...market conditions, didn't have the right people, time just wasn't right... But people don't buy what you sell. They buy why you sell it.
These guys failed at that, and got unlucky.
And this is basically a perfect example of how central bank meddling makes us all worse off. Small firms responding to the market and engaging in actual innovation threaten large, established corporations. Stock indexes fall. The "economy" collapses. The FED goes into damage control mode and starts printing money to hand out to their friends: the large, established corporations. Small firms and start-ups don't receive any of this free money. Large firms use this taxpayer money and the inflationary power of the FED to catch up to their smaller competitors by making incremental changes to existing production lines. The small firms go belly-up. Oligopoly is maintained. The newly unemployed die from lack of healthcare or are sent to get shot at in some unnecessary foreign war funded by their taxes and the same banks that put them out of business. Everything goes smoothly until a new generation or flood of immigrants precipitates resource shortages which incentivize the rise of new, innovative start-ups and begins the "business cycle" all over again.
"I assumed blithely that there were no elves out there in the darkness"
I heard about this company in Mass High Tech, started checking them out as a potential employer, and then heard they went out of business. It's unfortunate, they had an interesting product. This also means I won't be applying for jobs at startups until the economy is much stronger.
Whenever I hear a story about some new type of "super" computer, I think of an old Road Runner cartoon. Wile E Coyote, Genius, is mixing chemical explosives in his little shack, which he doesn't know was moved onto the train tracks.
He says to himself, "Wile E. Coyote SUPER genius. I like the sound of that." He then gets hit by the train.
Some of these companies remind me a LOT of good, old Wile E. Coyote. The one in this article just found the train.
Learning HOW to think is more important than learning WHAT to think.
x86 is certainly entrenched in the desktop, but in supercomputing? In the top 10, it's maybe half x86. There's a strong showing from Power (BlueGene) and of course the #1 spot held by an x86/Cell hybrid (which gest most fo the FLOPS from the SPU, not PPC or x86)
Hardly entrenched.
Looking down further, there is mainly x86, but still a strong showing from Power (IBM) but also SPARC, NEC's vector processor (kind of PPC), Itanium and a few randoms.
So, the to 100 is dominated by AMD, Intel and IBM in roughly equal parts, but there is still room for other vendors.
Still sad to see an inovative computer go to the wall :(
SJW n. One who posts facts.
FTFA:
Many years ago, I wrote a paper for my business class that using DRAM industry as a commodity industry. The ignint professor gave me a C for that cuz he insisted DRAM is not a commodity. That dude at the time was a young one, too.
Lesson? Don't waste your time and money at b-school - it may damage your brains.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
The easiest product to use should be the best product. OSX running on PPC chips was great, but not as good as x86? I suppose it's true, but why wasn't it true when all the fanboys were on PPCs?
These guys failed in a very typical geeky fashion. they understood the technology but not the business, and at the end of the day your customers need a business case to use your services. it's the tail attempting to wag the dog.
If you mod me down, I will become more powerful than you can imagine....
Points (made above) about non-x86 processors are doomed aside, the Si-cortex had an interesting interconnect design. Their kautz graph based interconnect was fairly (at least to me) innovative.
Personally I'm sorry to see them go, we never had a chance to benchmark our software on their system but I was suspicious it might have behaved very well per $. Even if the underlying system disappears their interconnect ideas may survive.
I looked at the SiCortex machine (and its price) online about 2 years ago. They were charging ~$30/core, and their cores were simple ~500 mhz MIPS processors. Considering that Tilera and nVidia have actual customers, this could just be company specific.
What are you using to benchmark your cluster? I benchmarked a pair of nehalem Xeons(the 2,0Ghz ones, i.e. the cheapest quad cores) at 90GFLOPS with linpack. Individually ~57Gflops.
My next cluster is going to be based around Tesla's. GPU's are the future. It takes 100,000 x86 cores to get a petaflop, You can get there with 25,000 if you use cells(5K x86, 20k cells) You can do the same thing with 10k if you use GPU's (5k x86, 5k Tesla's) Guess what the cheapest option is? They might not be the most energy efficient, but haven't we learned the problem with custom chips in HPC market, That's why we went to clusters in the first place
FLOPS / Watt is the future. No doubt. Tesla (NVIDIA) has the edge on the low end now. Low power per cycle will win in the long term. Multi core x64 bit for now. But to the powers that be, 32 / 64 /128 bits per watt per cycle will rule someday.
Somebody is going to crack the market--and it won't be one of the people who sit at home and cry in their beer about how Intel rules the world and that nobody has any hope of success!!
Thank goodness for the entrepreneurs who spit on lassitude and take their shot! Those wozniaks are the people who end up delivering really cool stuff for the rest of humanity, and leave the conventional wisdom people in the dust.
Basing a supercomputer on MIPs was short-sighted; even if it offers a a price/performance or power/performance advantage now, in a couple years it won't, because x86 is being improved at a much faster rate.
It wasn't even MIPS. From TFA:
But SiCortex went against conventional wisdom by building its own processors and this decision limited the company's market to early adopters, Conway says. In building its chips, SiCortex obtained intellectual property from several vendors, including MIPS Technologies, and tweaked the design to meet its own needs.
An HPC start-up going into the microprocessor design business now? That really is a fool's errand. Mind you, that's sort of how the ARM processor came to be, but that was a loooonnng time ago.
Breakfast served all day!
can i have their inventory?
Exactly right. I've got >10K cores and >10M LOC. "Hardware fault" typically means a datacenter caught fire, or was flooded, or an undersea cable got cut.
If someone pitches a cheaper solution (e.g. power savings,) I'm happy to listen for 10 minutes. Then I just want to know how fast I can see results: a dev costs $50K/month here, so I'll give it a week or two: if you don't have a test farm ready to go with full compilers, a data security plan, etc, I'm going to just reject. If you can get traction with universities, great, come back and pitch again in a year.
To be clear: this was not a failure due to the economics of competing against Intel/x86. This was a failure due to not being lucky. It takes sustained funding to make your way from start-up to profit in most technical businesses. HPC is more technical and thus more expensive than most.
I work as a sysadmin at a Boston-based university, and one of my jobs is managing an HPC cluster. We actually had SiCortex come give us a demo of one of their systems a little over a year ago and were rather impressed from a basic technology standpoint. However the biggest drawback we saw, which was a significant one, was that their cluster wasn't x86 based. We run a number of well known commercial apps on our cluster like Matlab, Mathematica, Fluent, Abaqus, and many others. Without those vendors all actively supporting MIPS, SciCortex was simply a non-starter for us when we were researching our next generation cluster. And by actively I mean rolling out MIPS versions of their products on a schedule comparable to their x86 product releases. Having to wait 6 months or more for MIPS versions simply isn't acceptable. If they could get firm commitments from those commercial vendors then we might have pursued SciCortex, but that simply wasn't the case. Even the inability to run a standard commercial linux distro was a huge drawback, since many commercial software vendors specifically require a commercial distro like Red Hat or SUSE if you're trying to get support from them.
That Sun pissed away one billion dollars on MySQL instead of buying out SiCortex. Smart move, Sun!
Classical Liberalism: All your base are belong to you.
That's what they designed: it's basically a bog-ordinary Linux-with-MPI cluster in a box. They had a custom internal fabric that was far more efficient than ordinary switches and even had on-die MPI accelerators. They also shipped with compilers for C, C++ and Fortran.
It was meant to be a drop-in replacement for room-sized clusters for a fraction of the space and heat. Basically what killed them was cashflow.
Classical Liberalism: All your base are belong to you.
By your logic, General Motors should be crushing Ferrari. After all, GM spends much more on their car development than Ferrari does.
Classical Liberalism: All your base are belong to you.
Yep, cashflow is a bitch: if I need to spend $25K to even look at the product, and they need $20M to run a demo datacenter, they need something like $100M in capital to avoid dying on the vine :(
Apologies, the above comment seems to have been assigned to the wrong OP.
Classical Liberalism: All your base are belong to you.
GPU
but, even with the GPU you have to:
1) make a serious effort to train programmers
2) recognize it will penetrate new markets faster than old markets and
3) offer a factor of 10 (or more) improvement
The Sicortex people really only competed in existing markets. Nvidia is *developing* new markets, like embedded and deskside HPC. Cheap 3D CAT scan anyone?
The sicortex folk were not originally interested in *green* computing. They only hyped this aspect when they realized they screwed up the memory architecture.
I don't know how Intel could run screaming into the power wall -- and then threw away a whole generation of bad Nehalem designs, plus the Indian design team -- and yet people claim they are unbeatable.
Just out of curiosity how much of that $50K/month is salary? And how does the rest break down?
The tyrant will always find a pretext for his tyranny - Aesop
Having actually used a SiCortex machine, I can tell you that the problem wasn't the VC, or the compilers, or even really the hardware.
The problem was the market.
There are two types of x86-based small clusters (the market that SiCortex was aiming for): clusters with Gigabit Ethernet and clusters with expensive interconnects (Mirinet, InfiniBand, or 10G Ethernet).
Gigabit Ethernet clusters do a good job with problems that are embarrassingly parallel (or at least have minimal communication demands). $150k gets you 300 Nehalem cores and a lot of memory. SiCortex fails here because their competition (the SC1458) is much more expensive and much slower. The fact that the SC1458 uses less power (around 5kW instead of 10kW) is impressive, but unless you're very power or cooling constrained, it's simply more cost effective to deal with the extra power and cooling cost.
SiCortex hardware was more cost effective against clusters with expensive interconnects. The problem is, the people who buy clusters with expensive interconnects do so because their problem is interconnect heavy. Unfortunately, despite all of the cool CS behind SiCortex's interconnect, the fact is that it just didn't do that well against InfiniBand. That's partly because the SiCortex system has more nodes, which means that more messages have to use the interconnect. It's partly because for very small clusters, it's possible to use a single IB switch that connects every node to every other node. And it's partly because SiCortex didn't have the kind of mature hardware/software stack that someone like Mellanox has.
So, there you have it. For the problems that ran well on SiCortex hardware, you could get the same performance at dramatically lower cost using Gigabit Ethernet. For the problems that require an expensive interconnect, the SiCortex approach of "more, smaller nodes" results in dramatically more overhead compared with the "fewer, faster nodes" strategy.
Well, that $50K includes the cost of the developer, plus associated costs of the auditors, lawyers, etc, that she will have to call on to get a project like this moving. So, knock off a chunk in chargebacks. Of the remaining money, figure a standard breakdown of 50% compensation to the dev and 50% overhead.
Doing the math, the dev is getting paid pretty well, but it isn't a code monkey job: it's analysis, implementation, presentation, making a business case, etc.
Actually, we don't call our developers "developers" because we don't want some stupid HR person to do a salary comparison and announce that we are overpaying them.