NVIDIA's $10K Tesla GPU-Based Personal Supercomputer

Ooooooo! Ahhhh! by itsybitsy · 2008-11-22 21:31 · Score: 1

Sweet. I got myself a tesla board... now what the heck to do with it... no kidding... I got one of these beasties... any suggestions?

Re:Ooooooo! Ahhhh! by Surreal+Puppet · 2008-11-22 21:38 · Score: 2, Interesting

Port john the ripper/aircrack-ng? Buy a few terabyte drives and start generating hash tables?
Re:Ooooooo! Ahhhh! by karstux · 2008-11-23 03:57 · Score: 1

Real-time radiosity rendering?

--
Don't whistle while you're pissing.
Re:Ooooooo! Ahhhh! by neokushan · 2008-11-23 04:14 · Score: 1

Port Doom to it.

--
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
Re:Ooooooo! Ahhhh! by drik00 · 2008-11-23 08:15 · Score: 1

I'm so proud of the /. community... even almost ten years later, I still spotted the "beowulf" and "doom" references :) *tear*
J

--
Beer, now there's a temporary solution -- Homer Jay S.
Re:Ooooooo! Ahhhh! by neokushan · 2008-11-23 08:28 · Score: 1

Is it really a reference if it's stated outright?

--
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
Re:Ooooooo! Ahhhh! by narcberry · 2008-11-23 13:18 · Score: 1

I sure hope these hungry CPU's don't expect to feed off my slow RAM.

--
Modding me -1 troll doesn't make me wrong.

Penguins' Got One Liquid Cooled! by Koensayr · 2008-11-22 21:32 · Score: 1

Perhaps the coolest Personal Super Computer was the one shown by the good folks at Penguin Computing. It was rumored to be over-clocked and featured liquid cooling for silent operation!

http://www.penguincomputing.com/products/linux/workstations

Re:Penguins' Got One Liquid Cooled! by BOFHelsinki · 2008-11-23 01:02 · Score: 2, Informative

BTW, TFS makes a mistake calling this Tesla rig a supercomputer. Nvidia correctly just calls it a cluster replacement. A cluster is not a supercomputer, the interconnect makes all the difference, no matter how much FP crunching power there is. See NEC NX-9 or Cray's Seastar for a real supercomputer interconnect. Can't be arsed to check (this is Slashdot after all) but that Penguin Computing system likely has only InfiniBand or 10GbE for the switch network, making it "only" a cluster. :-)

Graphics by Anonymous Coward · 2008-11-22 21:34 · Score: 5, Funny

Wow, that's some serious computing power! I wonder if anyone has thought of using these for graphics or rendering? I imagine they could make some killer games, especially with advanced technology like Direct 3D.

Re:Graphics by GigaplexNZ · 2008-11-22 22:26 · Score: 2, Funny

I wonder if anyone has thought of using these for graphics or rendering?
These are effectively just NVIDIA GT280 chips with the ports removed. Their heritage is gaming.

I imagine they could make some killer games
If you can find some way to get the video out to a monitor... but then you effectively just have Quad SLI GT280.

especially with advanced technology like Direct 3D
Uh... what? Direct 3D has been commonly used for years, you make it sound like some new and exotic technology. It is also effectively Windows only, whereas this hardware is more likely to use something like Linux.
Re:Graphics by Gnavpot · 2008-11-22 23:05 · Score: 4, Funny

"I wonder if anyone has thought of using these for graphics or rendering?"
These are effectively just NVIDIA GT280 chips with the ports removed. Their heritage is gaming.

We need a "+1 Whoosh" moderation option.
No, I do not mean "-1 Whoosh". I want to see those embarrassingly stupid postings. But perhaps this moderation option should subtract karma.
Re:Graphics by Provocateur · 2008-11-22 23:22 · Score: 1

If you can find some way to get the video out to a monitor
Yup, time to break out those ol' CGA monitors out from the garage...knew they'd come in handy again one day, and with Linux' oh-so-retro CLI mode, I'm set!

--
WARNING: Smartphones have side effects--most of them undocumented.
Re:Graphics by GigaplexNZ · 2008-11-23 00:15 · Score: 4, Funny

I suppose I'm one of those guys now. Hook, line and sinker.
Re:Graphics by evilbessie · 2008-11-23 00:42 · Score: 2, Informative

In much the same way that the current Quadro FX cards are based on the same chip as the gaming gforce cards. But still the most expensive gaming card is ~£400, but you'll pay ~£1500 for the top of the line FX5700.
It's because workstation graphics cards are configured for accuracy above all else, where as gaming cards are configured for speed. Having a few pixels being wrong does not affect gaming at all, getting the numbers wrong in simulations is going to cause problems.
Mostly the people who use these cards care about OpenGL support, but some people do use them under Windows and DirectX.
This type of computing came in with the gforce 8 range when CUDA (Computer Unified Device Architecture) brought C programming to the massively parallel graphics chips. Which has allowed nVidia to port the Ageia PhysX technology to the gforce cards so a separate addin card is not necessary.
I believe that ATi are doing something similar with their FireGL cards, which again are based on the same chip as their Radeon cards. This is why they have both moved from Shader/Vertex to Unified Stream processors. This is a really interesting development if you happen to work in a research establishment, otherwise please move along nothing to see here.
Re:Graphics by xonar · 2008-11-23 03:28 · Score: 2, Interesting

So being naive to the ways of the world is bad karma now? I thought Buddhism stressed being free from the material things of the world.
Re:Graphics by ksd1337 · 2008-11-23 04:32 · Score: 1

Crysis and Vista will never melt my computer again!
Re:Graphics by aj50 · 2008-11-23 08:01 · Score: 1

I'd suggest -1 since that's the most likely preference.
It doesn't really matter which it is as you can add a modifier for each of the moderation types in your preferences (should you dislike reading funny posts or enjoy a good bit of flamebait.)

--
I wish to remain anomalous
Re:Graphics by Anonymous Coward · 2008-11-23 08:23 · Score: 0

Seriously though, why on earth take away the video port, it might be able to run Crysis.
Re:Graphics by beav007 · 2008-11-23 13:19 · Score: 1

Jokes and text on the internet are material now? Maybe the RIAA have a case after all...
Re:Graphics by badkarmadayaccount · 2008-11-25 05:19 · Score: 1

Does anybody else think some SPARC microblades (~3 GB D[Q?]*DR2 RAM, 6[8?] cores) with embedded and optimized in-motherboard RAM are gonna wipe the floor with these? Besides, the ISA isn't half as obscure, so it will do well for general purpose shit (as in "OMG, THAT'S your home server/NAS/main desktop (probably at once). Where is that commodity SPARC hardware Sun? My money ain't good enough?
QDR - Quad Data Rate - Intell tech usually met in Infiniband links. Two clock signals 90 out of faze of each other.

--
I know tobacco is bad for you, so I smoke weed with crack.

Heartening... by blind+biker · 2008-11-22 21:37 · Score: 2, Interesting

...to see a company established in a certain market, to branch out so aggressively and boldly into something... well, completely new, really.

Does anyone know if Comsol Multiphysics can be ported to CUDA?

--
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.

Re:Heartening... by Anonymous Coward · 2008-11-22 22:52 · Score: 0

Can you imagine a Beowulf cluster of these?
Re:Heartening... by mangu · 2008-11-22 23:10 · Score: 4, Interesting

Can you imagine a Beowulf cluster of these?
Yes, I can. My first thought when I saw the article was to calculate how many of them one would need to simulate a human brain in real time. The answer is: with 2500 of these machines one could simulate a hundred billion neurons with a thousand synapses each, firing a hundred times per second, which is the approximate capacity of a human brain.
People have paid $20 million to visit the space station, now who will be the first millionaire hobbyist to pay $25 million to have his own simulated human brain?
Re:Heartening... by swamp_ig · 2008-11-22 23:45 · Score: 3, Interesting

Would the interconnects be fast enough? There's a lot of non-locality in the synaptic connections, so you're going to need some pretty heavy comms between the cores.
Also a selection of neurons are far more heavily connected than 1000s of synapses, and they're fairly essential ones. Might these be a critical path?
Sure would be cool to build such a beast, do some random connections, and see what happens...
Re:Heartening... by Anonymous Coward · 2008-11-23 00:02 · Score: 1, Insightful

It would take a hell of a lot more than 25 mil to program the brain simulator.
Re:Heartening... by Anonymous Coward · 2008-11-23 00:07 · Score: 1, Interesting

Would the interconnects be fast enough?
Short answer: no.

Long answer: there is no direct interconnect between the cards, so any data would have to go down the PCIe bus to the host and then back up to an interconnect card, across the network and back across the PCIe bus twice to get to the other. With a specially designed PCIe root complex you could probably eliminate some of the overhead and allow the card to send direct to the interconnect without having to share the bus with the host, but you couldn't do that currently. Even then, there isn't any interconnect currently available that even comes close to the bandwidth you'd need.
Re:Heartening... by smallfries · 2008-11-23 00:38 · Score: 4, Interesting

Your figures are off by several orders of magnitude. 2500 of these is roughly 10,000T/flops. As a Tflop is 10^12 operations, and we have 10^11 neurons that leaves 10^5 floating point operations per neuron. If each has 1000 synapses to process then we are down to 100 operations per connection, per second.
At this point it seems obvious that you've assumed a really simplistic model of a neuron that can compute a synaptic value in a single floating point operation. These simple neuron models don't behave like a real brain, and scaling up simulations of them doesn't produce anything interesting. Real neurons are capable of computing much more complex functions than these models. The throughput on the interconnect is going to be a major factor, and simulating each neuron will require from 10s to 1000000s of operations depending on the level of biological realism that is required. The Blue Brain project has a lot of interesting material on different models of the neuron and the tradeoff between performance and realism.
Their end goal is to dedicate a large IBM Blue Gene to simulating an entire column within the brain (roughly 1,000,000 neurons) using a biologically-realistic model.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Heartening... by LeDopore · 2008-11-23 03:13 · Score: 5, Informative

You're right unless there's a computational way to take advantage of the fact that most neurons in cortex pretty much never fire (1), and that a small minority of synapses are responsible for nearly all of the excitation in a slab of cortical tissue (2). If not active == not important == not necessary to simulate with a 100% duty cycle (these are big "ifs"), then we could be literally about 3-5 orders of magnitude closer to being able to simulate whole brains than anyone realizes.
(1) How silent is the brain: is there a "dark matter" problem in neuroscience? Shy Shoham, Daniel H. O'Connor, Ronen Segev. J Comp Physiol A (2006)
(2) Highly Nonrandom Features of Synaptic Connectivity in Local Cortical Circuits. Sen Song, Per Jesper Sjostro, Markus Reigl, Sacha Nelson, Dmitri B. Chklovskii. PLOS biology March 2005

--
Expected time to finish is 1 hour and 60 minutes.
Re:Heartening... by Anonymous Coward · 2008-11-23 04:37 · Score: 0

What both you and the GP are also forgetting is that there's more to the brain than hardware. You all make it sound as if the only thing that needs to be done in order to simulate the human brain is to build a supercomputer with enough processing power - but you also need software to run on it.
Re:Heartening... by dkf · 2008-11-23 05:04 · Score: 1

At this point it seems obvious that you've assumed a really simplistic model of a neuron that can compute a synaptic value in a single floating point operation. These simple neuron models don't behave like a real brain, and scaling up simulations of them doesn't produce anything interesting. Real neurons are capable of computing much more complex functions than these models. The throughput on the interconnect is going to be a major factor, and simulating each neuron will require from 10s to 1000000s of operations depending on the level of biological realism that is required.
The real question from an AI perspective is whether all that detail is necessary. Do we need to simulate individual signaling molecules? Do we need to simulate individual synapses? Can we simulate things at a higher level than neurons and still get a functionally similar model?
Obviously, for some things we definitely can't abstract (such as the effect of certain kinds of small molecules on consciousness). But nobody really has any idea whether general AI needs that level of detail. My hunch is that it doesn't, just as you don't need to know how transistors really work to write a computer program or even to simulate a computer with a pretty high degree of accuracy. But that is just a hunch...

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Heartening... by Anonymous Coward · 2008-11-23 05:06 · Score: 0

Ok, so we'd have W's brain.
I KID, I KID!
Re:Heartening... by Anonymous Coward · 2008-11-23 05:24 · Score: 0

Connection established....
booting core....
connecting core to internet....
User input
>? Is there a god?
>?
Re:Heartening... by Anonymous Coward · 2008-11-23 05:31 · Score: 0

And the first thing the simulation will do, is spend another 25 million to buy another... and so on
Re:Heartening... by ThisNukes4u · 2008-11-23 05:51 · Score: 1

Also don't forget that the 10 Tflops figure quoted is PEAK Tflops. I would be very surprised if the hardware could sustain even half of that in any realistic simulation.

--
thisnukes4u.net
Re:Heartening... by Anonymous Coward · 2008-11-23 06:23 · Score: 0

so what you're saying, is that I need to keep using my current brain until i can get 25 million dollars... And then, I still may come short. The real question seems to be whether or not it comes with a 3 year service agreement, and does it have blue-tooth?
see this video: http://www.huffingtonpost.com/2008/08/14/robot-with-rat-brain-robo_n_119057.html
Re:Heartening... by tryfan · 2008-11-23 06:27 · Score: 1

who will be the first millionaire hobbyist to pay $25 million to have his own simulated human brain?
Well, some I know of already seem to have one...
Re:Heartening... by deander2 · 2008-11-23 06:27 · Score: 1

the problem w/ simulating the human brain is more of the software than the hardware. if you have any unique insight into how to program the thing, i think it would make a dissertation topic that would bring you almost instant fame and fortune. =P

--
http://kered.org
Re:Heartening... by Jeanius · 2008-11-23 07:04 · Score: 1

Now, to write the software simulating a thinking brain...
Re:Heartening... by Laxori666 · 2008-11-23 08:02 · Score: 1

I'm not sure about your avg number of synapses. I think a significant amount of neurons have many orders of magnitude more than this. A google search on "average synapses per neuron" has a third link stating that a rat's visual cortex has an average of 12000 synapses/neuron. Another link states "A typical neuron has about 1,000 to 10,000 synapses" - making the average a lot higher. So I think you'd need a lot more of these.
Re:Heartening... by HiThere · 2008-11-23 09:06 · Score: 2, Interesting

I think your post was intended humorously, but I'm going to pretend otherwise. (Note, I'm not a specialist in computational mentalistics, or whatever the field would be called, but:)
I'm fairly certain the interconnects are fast enough. The brain is no speed demon on individual connections. It's basically chemical, with only a little electrical stuff on top that's still based on ions floating in liquid.
The problem is the software. And the sensoria. And the effectors.
Each of those problems is being addressed separately. What do you want to bet that when they all come to "good enough" solutions, interfacing them is going to be a MASSIVE kludge.
And even if you could, you can't just copy how people did it. A camera is basically different from a retina. It extracts different information. You can use complex processing to convert one into a simulation of the other, but there's no straightforward mapping. Each conversion involves loss of information...so you need to ensure that the correct information is lost.
Just as a silly example of the difference, a recent experimental hearing aid uses infra-red lasers to stimulate the nerves in the cochlea. You KNOW that people use electric signals, but artificially generated electrical signals spread too much in the interface, so you can't get decent tone resolution. With infra-red lasers, though, you can stimulate any particular neuron you choose.
Guaranteed: random connections will give you a crashed program. Secondary chance is an infinite loop.
Mind you, there are neural nets that are initialized with random initial values, but they have strict boundary conditions. Otherwise you never get better than garbage out of them.
Also: There are lots of groups of neurons that are more highly connected than average. These are "functional specialists". There often isn't anything special about the neurons, but only about the way that their connections have been reinforced. I'm not sure about the neurons that branch outside of the column, but I suspect the same of them.
My projection for a human mind equivalent computer remains at around 2020-2030. This announcement drops my estimate of the cost, but that was never an exact number of dollars, so I can't quantify it. Also note that I said equivalent. I'm not going to assert that it would enjoy watching Star Wars, or even 2001. It's emotions are unlikely to be similar in nature to those of a mammal...unless that's necessary in order to understand human language...and only to the extent necessary.
For that matter, we wouldn't WANT it to have the same emotional structure that we have. That would be very dangerous. If we did that then it might have "take over the world!" as an innate goal, rather than as a tactical move. Even as a tactical move it's rather dangerous, so we would probably want to so design it's goals that such a tactical move would appear extremely distasteful, and best accomplished by manipulating willing proxies. (This would ensure that there was room for people where people would be comfortable.)
OTOH, I don't see a human mind equivalent AI as remaining merely human equivalent. Progress rarely stops. But if it's motivational structure is so designed that there's plenty of comfortable room for people, I don't see this as a problem. Entities rarely want to alter their motivational structure unless it's giving them severe problems, and often not then. But don't expect it to be passive or a mere recipient of orders. It would, however, be reasonable to expect it to be a lot more considerate of human needs and desires than the current bureaucracy...in any country. (Note that individual office holders may well be sympathetic, but the system itself isn't.)

--

I think we've pushed this "anyone can grow up to be president" thing too far.
Re:Heartening... by sznupi · 2008-11-23 16:55 · Score: 1

There IS no more to the brain than hardware.
Now, If you'd ba saying about "computer simulation of the brain" also in your first sentence, then yes, there's more.

--
One that hath name thou can not otter
Re:Heartening... by Meski · 2008-11-23 18:12 · Score: 1

Balmer's running on a 486 version of this as we 'speak'

Wonder what the idle % is.
Re:Heartening... by fastsynaptic · 2008-11-23 18:31 · Score: 1

also, neurons have on average closer to 10,000 synaptic connections
Re:Heartening... by Anonymous Coward · 2008-11-23 19:57 · Score: 0

I'm fairly certain the interconnects are fast enough. The brain is no speed demon on individual connections.
Between individual "neurons"? Sure. As whole? No. The cross-connected nature of the brain doesn't map well onto a fat-tree, switched interconnect model. If you only have one hundred neurons communicating at once, that's a lot of small, bursty data bouncing all the way around your network, essentially at random. Current interconnects just aren't designed for that.
Re:Heartening... by Anonymous Coward · 2008-11-23 21:31 · Score: 0

so how are we going to simulate the fact that the nerons are not static but reconfigureing there connections, even in an adult mind
Re:Heartening... by LeDopore · 2008-11-24 01:38 · Score: 1

If a neuron has become essentially inactive, we can still poll it once every (say) 1000 simulation cycles, and simulate it with 1000 times the real chance that it will fire. If it starts firing in such a way that would increase its excitability (see (3)), we can crank up the duty cycle of the simulation for that neuron. The net effect of this trade-off trick might be to introduce noise in the extremely rare spikes coming from mostly inactive neurons (which form the vast majority of all neurons, see (1) above). In my opinion, this extra noise is a small price to pay in exchange for a factor of ~1000 speedup.
PS this trade-off is the project I should be working on now rather than checking out Slashdot. There's a good chance it won't pan out, since nobody's published about it yet.
(3) Bidirectional Modification of Presynaptic Neuronal Excitability Accompanying Spike Timing-Dependent Synaptic Plasticity. Li, Lu, Wu, Duan and Poo. Neuron, Vol. 41, 257-268, January 22, 2004.

--
Expected time to finish is 1 hour and 60 minutes.
Re:Heartening... by Anonymous Coward · 2008-11-24 02:06 · Score: 0

If they are their they are important.
I don't care what religion or science you follow we have yet to beat the efficiency of any natural system.
Re:Heartening... by LeDopore · 2008-11-24 06:56 · Score: 1

That's certainly the dominant position in neuroscience these days: that Nature never wastes neural hardware. I think that maybe Nature tries not to waste a combination of metabolic energy, learning time and neural hardware.
Suppose that to perform a task you need X well-trained neurons. Suppose, for a small cost to the animal, you could build 100 X neurons and have most of them silent in exchange for a lower total metabolic cost or in exchange for the ability to learn faster. (This supposition is something I'm working on: that neural hardware redundancy can buy you faster learning and metabolic efficiency, as well as robustness in the face of random cell death.) Are the 99 X neurons that almost never fire really "wasted" even though they're not used? I'd argue that they're the byproduct of an efficient process: one that's efficient at learning and at conserving energy as well as efficient enough with neural hardware.
In any case, it looks like brain regions that are not undergoing a whole lot of learning really do have only 1% of their neurons fire on a regular basis. This has been shown consistently in a bajillion systems using every technique available to modern experimenters; read (1, above) for an enormously exhaustive list.
There's no evidence to support the idea that most cortical neurons fire on a regular basis, which is your position and what most neuroscientists have been assuming all along. Although this seems to be an absolutely reasonable assumption, it's just not how things work in reality.

--
Expected time to finish is 1 hour and 60 minutes.
Re:Heartening... by HiThere · 2008-11-25 09:38 · Score: 1

You'll recall my projection for a human equivalent AI was 10-20 years off? There are many reasons why it's that distant. And don't expect the first one to be cheap. 5 years later it's likely to be cheap, and there's likely to be one hell of an economic depression. And the only way out that I see is to make holding jobs optional, not necessary for comfortable survival. We should be heading towards that now, because we don't have much lead time.
Massive automation is coming, and it's getting cheaper. Demanding that everyone have is job is becoming more and more foolish. But we need an alternative, as pure idleness isn't good for people. And not everyone's cut out to be an artist. But as a start hassles involved in being, say, a musician should be reduced. Payments need to be spread out, not so centralized. The Star system needs to be dis-empowered. Possibly eliminated, but that's not certain. Some musicians actually are a lot better than others. There just needs to be lots of room for lots of others. Every city and town should have it's own orchestra AND it's own band. (We used to have music spread that way, before the phonograph.) Copyrights need to be both weakened and shortened. More room needs to be made for parody and pastiche. Bands need to be able to create their own music without worrying about infringing on someone else's copyright...unless they are actually copying it.
This won't suffice, but it's the direction we need to be headed. And generalize it to apply to all the arts. (Most haven't been as brutally treated by copyright law as music, but it hasn't been kind to any of them.)
Copyright needs to be restricted to actual copying of material, and it needs to have it's term cut to, say, 20 years. Or even 10. We want more artists, not fewer.
Consider the effects of automation so far. Actual work has largely been replaced by papershuffling. ALL papershuffling can be replaced by sub-human AIs. (More powerful that we have currently, possibly, but not that much so.) And remember that the costs keep dropping.
Sometime with the next 10-20 years we are going to pass through the bottom of the energy crisis. I don't know if it's going to be solar or wind or nuclear (but hopefully not coal), but somehow we're going to do it. If we're lucky it won't be horrendously vile. But by the time these human equivalent AIs show up we will be on the up swing. Energy will still be a problem, but it will be becoming less of a problem. It will likely be twice as expensive as it was in January, 2007 (rough guess). So it will need to be used more carefully, and that means more efficient use. Which means, e.g., tractors that reduce weight by driving themselves, and not carrying along an operator. Ditto for trucks (semis). For trains that will be less important, but they're already almost automated. Planes are a special case. Probably they will be used MUCH more sparingly, and any weight shipped will be a lot more expensive. But probably passenger flights will continue to have pilots, though likely co-pilots will be eliminated, and in case of emergency it will shift to full automation.
The real problems lie with inertia. Legal, organizational, and mental. There are currently extant ways of harvesting energy in the price range that I mentioned, but nobody is willing to build them, because they don't want to commit to energy being that expensive. And it takes a long time to build a large plant. Every year that they avoid committing to a particular design, becomes a year when the best design is just that much cheaper than the prior year. Unfortunately, it takes a long time to build once you've committed. And people keep trying to beat the cost of what we had last year, when the price of oil just keeps going up (irregularly rather than monotonically, yes, but the 24-months average price keeps increasing. [And if I were shown a counterexample I'd say the 5-year average and still be right. But I do know that it's quite irregular, and not at all monotonic in the sho

--

I think we've pushed this "anyone can grow up to be president" thing too far.

4 TFLOPS? by Anonymous Coward · 2008-11-22 21:38 · Score: 5, Insightful

A single Radeon 4870x2 is 2.4 TFLOPS. Some supercomputer, that.

Seriously, why is this even news? nVidia makes a product, which is OK, but nothing revolutionary. The devaluation of the "supercomputer" term is appalling.

Also, how much of that 4 TFLOPS you can get on actual applications? How's FFT? Or LINPACK?

Re:4 TFLOPS? by GigaplexNZ · 2008-11-22 22:30 · Score: 4, Informative

A single Radeon 4870x2 is 2.4 TFLOPS.
A single Radeon 4870x2 uses two chips. This Tesla thing uses 4 chips that are comparable to the Radeon ones. It should be obvious that they would be in a similar ballpark.

Seriously, why is this even news?
It isn't. Tesla was released a while ago, this is just a slashvertisement.
Re:4 TFLOPS? by Anonymous Coward · 2008-11-23 01:18 · Score: 0

I can think of one... weather modeling. Weather models are math intensive and if you can take a forecast run down from 3 hours to 15 minutes, that's a huge deal.
Re:4 TFLOPS? by hairyfeet · 2008-11-23 03:41 · Score: 3, Interesting

The problem is how do you actually define supercomputer. I mean, does only machines released in the past month count? Or do you still count the original bad boys like the Cray? After all, when first built most Crays were multi million dollar number crunching beasts. Does the fact that you can get the same performance in a desktop now mean the Cray no longer counts? The power of computers is still growing at such a pace that the machine that costs millions a decade ago can probably be beaten by a cluster that would cost you less than 25K today, so how exactly would you suggest they define supercomputer?

--
ACs don't waste your time replying, your posts are never seen by me.
Re:4 TFLOPS? by X-acious · 2008-11-23 04:37 · Score: 2, Interesting

A single Radeon 4870x2 uses two chips

2.4 / 2 = 1.2

Each Tesla GPU has 240 cores and delivers about 1 TeraFLOPS single precision...
Each Radeon HD 4870 produces 1.2 TFLOPS, about 0.2 more than one Tesla GPU.

"NVIDIA announced...the Tesla Personal Supercomputer -- a 4 TeraFLOPS desktop...
Two 4870 X2s equal 4.8 TFLOPS, 0.8 more than four Tesla GPUs.
I think the parent's point was that even when an HD Radeon 4870 X2 is made up of two cards they're still connected and recognized as one. Thus, with "fewer" cards and fewer slots you could achieve more performance. Or you could use the other two vacant slots for yet another two 4870s: Four of them in crossfire would equal 9.6 TFLOPS, 5.6 more than four Tesla GPUs.
Futhermore, I would assume two GPUs that are closely interconnected as a "single" card (4870 X2) would be better than a pair of GPUs connected through a combination of the motherboard (x2 Tesla GPU) and custom interconnects.
I'm not implying that an HD 4870 is a viable alternative to a Tesla GPU but the "performance" is more than just comparable. As it's been mentioned before, the hardware concerned is meant for precision and not speed, otherwise known as performance. Then again, you could compensate for in-accuracy by using all that computing power to make multiple passes rather than making sure your initial calculations were accurate.
Note: Emphasis by me in all quotes provided.
Re:4 TFLOPS? by MostAwesomeDude · 2008-11-23 08:59 · Score: 1

Depends on the kind of precision you want. Also the big limiting factor in these kinds of apps is actually feeding the GPUs. Y'know that little glxgears test app that everybody uses to test their FPS? The glxgears framerate is actually just the number of times per second that the driver can properly set up the card, prepare a display list, flush it to the card, and then swap the buffers. The card usually can go much faster than that.
(And, of course, the point is, glxgears is probably the fastest thing that you'll be able to ever run on that GPU!)

--
~ C.
Re:4 TFLOPS? by ThwartedEfforts · 2008-11-23 09:25 · Score: 1

Nah, old Apple ads like this one devalued the term "supercomputer".
http://www.youtube.com/watch?v=zXEG0RLzhDA
Re:4 TFLOPS? by jbridge21 · 2008-11-23 11:53 · Score: 1

I haven't timed GLX calls-to-the-card, but I have timed CUDA calls-to-the-card, and IIRC it was about a 10 usec overhead per call. If it was the same for glxgears, that would indicate a framerate of about 100,000 fps. However, when I run glxgears, I get more like 2000 - 20,000 fps, and it varies based on the size of the window -- it definitely seems to be slowing down based on the size of the window that must be cleared/redrawn. So there are definitely points on the glxgears performance spectrum where drawing operations (including copying 2D buffers around) overtakes simple driver overhead.
Re:4 TFLOPS? by Anonymous Coward · 2008-11-23 11:59 · Score: 0

This is a second generation Tesla product that does double precision, which makes it slightly more interesting.
Re:4 TFLOPS? by MostAwesomeDude · 2008-11-23 12:28 · Score: 1

Buffer clears always take time. CUDA doesn't need to write to any color buffers, so yes, it will be faster. I was talking about OpenGL calls, not CUDA calls.
Even so, the point is "Drivers can't actually feed the GPU at maximum speed in real applications," and I think it's still a valid point.

--
~ C.
Re:4 TFLOPS? by jbridge21 · 2008-11-23 13:46 · Score: 1

Right ok. I got a bit off topic... to get back on topic, with a 10 usec call overhead, it's not that hard to design kernel invocations that run for > 10X that time, thus minimizing the driver overhead, and getting pretty close to maximum GPU speed. BUT the one big caveat is that in many cases, this means you cannot use (for instance) CUBLAS but must write a custom kernel. You can do in one custom kernel invocation what could take three dozen CUBLAS calls, and the reduction in setup overhead can help. But an even bigger factor can be better use of the tiny cache on the GPU, where doing everything you need to do to a particular piece of data in a custom kernel means saving a ton of memory fetch overhead versus making 20 passes over the data by doing one thing at a time to it with CUBLAS. All in all, with a bit of work it is quite possible to get it to where feeding data from the CPU to the GPU is not the bottleneck.
Re:4 TFLOPS? by escay · 2008-11-23 18:14 · Score: 1

when first built most Crays were multi million dollar number crunching beasts.

A number crunching beast that costs multi-million dollars today is how I'd define a supercomputer. The reason why Crays (et al) were denoted 'Supercomputers' was because of their ability to compute way beyond what the regular computer was able to do. If today's 'regular' (as in, a $10K very powerful) computer can do 4 TFlops, I'd call a machine (such as the 'human brain simulator' proposed above that apparently costs $25 million), a supercomputer.
For fields such as technology with rapidly changing standards, price is the relatively most reliable factor for scaling.

--
My sig has been answered.
Re:4 TFLOPS? by Anonymous Coward · 2008-11-23 20:42 · Score: 0

According to
http://www.top500.org/list/2008/11/500
the tail entry is 12.6 TFlops of november 2008.
4.0 TFlops would be pushed out of the list in June 2007.
In double precision, the performance of the system is "only" 78 GFlops per GPU, i.e. 0.3 TFlops overall.
Re:4 TFLOPS? by hairyfeet · 2008-11-24 01:57 · Score: 1

So by your definition anything that doesn't cost millions of dollars today doesn't count. Which kind of makes our supercomputing history meaningless. What I propose instead is something like this: let us take something that we can all agree was a supercomputer, like say the Cray, and we will call that a supercomputer. Then those released after it will be supercomputer Xx, with the second x denoting how many more times faster it is than the Cray. This will not only allow us to look back and still call them supercomputers instead of "really big PCs" but gives us a way to point out to laymen who don't understand the whole FLOPs thing how truly powerful these number crunching beasts are.Because we will be able to point at a picture of something like the Cray with all the bubbles running through it(sorry it is early and I can't remember the model) and say "You see that giant monster? This machine is more powerful than 300 of those put together" which will help to give a scope to the average Joe how powerful these things truly are.

--
ACs don't waste your time replying, your posts are never seen by me.

But.. by D_Blackthorne · 2008-11-22 21:43 · Score: 1

..will it run Vista? ;-)

Re:But.. by itsybitsy · 2008-11-22 21:46 · Score: 2, Funny

Not yet.... darn NVidia, no Vista Drivers yet...
Come on NVidia GET WITH IT!!!
Re:But.. by Anonymous Coward · 2008-11-22 22:01 · Score: 0

To heck with vista. I just need to know if it runs Linux.
Re:But.. by angelwolf71885 · 2008-11-22 22:20 · Score: 0

i think the real question is can it run crysis running on vista...and play a bluray movie at 1080P at the same time...

What, no coil? by dgun · 2008-11-22 21:48 · Score: 5, Funny

What a rip.

--
FAQs are evil.

Re:What, no coil? by geekmux · 2008-11-23 02:03 · Score: 3, Funny

What a rip.
Yeah, no shit. First bastard that tries to put a "Tesla Capable" sticker on the front, I'm gonna sue.
Re:What, no coil? by dgun · 2008-11-23 16:46 · Score: 1

Yeah, no shit. First bastard that tries to put a "Tesla Capable" sticker on the front, I'm gonna sue.
I don't hand out lolz just for lolz, but lolz.

--
FAQs are evil.

Louis Savain will be all over this one! by Anonymous Coward · 2008-11-22 21:48 · Score: 0

Can't wait for him to come and tell us how Nvidia, ATI, Intel and all are idiots, and this is completely and totally unusable and we're in a parallel crisis!

What a disappointment by dleigh · 2008-11-22 21:49 · Score: 2, Interesting

At first glance I thought these used actual Tesla coils in the processor, or the devices were at least powered or cooled by some apparatus that used Tesla coils.

Turns out "Tesla" is just the name of the product.

Drat. I demand a refund.

Re:What a disappointment by rhyder128k · 2008-11-22 23:20 · Score: 1

They should at least come up with a "mad scientist lab pack" that includes some Tesla coils. Perhaps they presume that mad scientists will have their own gear.
I just spent an entire morning trying out massive single throw switches.
"Now, we'll SEE who's mad! [thunk]"
"Now, we'll see who's MAD! [thunk]"
In all fairness, these things can be pretty personal.

--
Michael Reed, freelance tech writer.
Re:What a disappointment by David+Gerard · 2008-11-23 00:08 · Score: 1

I thought of the car first. I figured that's how much battery you'd need to run it in a laptop.

--
http://rocknerd.co.uk
Re:What a disappointment by ufoolme · 2008-11-23 05:27 · Score: 1

wtb computer w/Tesla coils as cpu cores.
must be internet capable

Binary-only toolchain by Anonymous Coward · 2008-11-22 21:50 · Score: 5, Informative

The toolchain is binary only and has an EULA that prohibits reverse engineering.

Re:Binary-only toolchain by FireFury03 · 2008-11-22 22:23 · Score: 5, Informative

has an EULA that prohibits reverse engineering.
Not really a big deal to those of us in the EU since we have a legally guaranteed right to reverse engineer stuff for interoperability purposes.

--
http://blog.nexusuk.org
Re:Binary-only toolchain by JamesP · 2008-11-22 23:17 · Score: 1

The toolchain is binary only and has an EULA that prohibits reverse engineering.
Show me a non-free EULA that doesn't.

--
how long until /. fixes commenting on Chrome?
Re:Binary-only toolchain by oneofthose · 2008-11-22 23:47 · Score: 1

I haven't heard of anyone who reverse engineered the toolchain but there's an awesome tool that helps you reverse engineer your own binaries: http://www.cs.rug.nl/~wladimir/decuda/ This is relevant because the compiler creates device specific binaries that you can't get the assembler code for. So if you want to know exactly what your kernel is doing disassemble it with decuda. Unfortunately the tool is a bit outdated but it still might be useful to some.
Re:Binary-only toolchain by janwedekind · 2008-11-23 02:45 · Score: 1

Thanks for raising awareness of that! Stream processing could be great for machine vision. However the situation seems to be almost as bad as with most FPGA boards where you need proprietary compilers and proprietary libraries to compile and run your programs (not to talk about firmware and hardware design). Not hacker-friendly at all :( If anyone has time and money to spend please join and support something like the Open Graphics Project instead.
Re:Binary-only toolchain by neumayr · 2008-11-23 03:33 · Score: 1, Flamebait

So not only are you bragging with your confederation's legislation - something you personally most likely had no influence on, at all, you're also claiming other confederation's legislations have no influence on you, or your confederation.
That kind of euro-centric view reeks of nationalism, and is no different than the US-centric view Americans contantly get accused of.

--
Truth arises more readily from error than from confusion. -Francis Bacon
Re:Binary-only toolchain by ScrewMaster · 2008-11-23 04:33 · Score: 1

has an EULA that prohibits reverse engineering.
Not really a big deal to those of us in the EU since we have a legally guaranteed right to reverse engineer stuff for interoperability purposes.
Don't get cocky. It's only presently guaranteed. Laws change, and there's a whole lot of pressure to make that change.

--
The higher the technology, the sharper that two-edged sword.
Re:Binary-only toolchain by devman · 2008-11-23 05:43 · Score: 1

We do in the US as well, it's listed in the exceptions part of the DMCA and has been part of the U.S. Code for awhile.
Re:Binary-only toolchain by FireFury03 · 2008-11-23 06:15 · Score: 1

So not only are you bragging with your confederation's legislation
Not really - I'm making people aware of a fairly sensible piece of legislation. There are two good reasons for this:
1. Some people in the EU may not be aware of this legislation, and it may be in their interests to know about it.
2. Some people not in the EU may not be aware of this legislation and may want to try and get similar legislation adopted in their locality.

something you personally most likely had no influence on, at all
And my influence on the legislation is relevant how exactly?

you're also claiming other confederation's legislations have no influence on you
No, please cite anything I said which even implies this, let alone expressly states it.

and is no different than the US-centric view Americans contantly get accused of.
And your point is...?

--
http://blog.nexusuk.org
Re:Binary-only toolchain by Jeremi · 2008-11-23 06:20 · Score: 1

So not only are you bragging with your confederation's legislation
Was he bragging, or merely stating a fact? Your assumption of the former suggests a certain defensiveness about our country's wise and glorious IP law... ;^)

--

I don't care if it's 90,000 hectares. That lake was not my doing.
Re:Binary-only toolchain by neumayr · 2008-11-23 09:27 · Score: 1

I might have been a little too annoyed with your comment. If you weren't bragging, your influence on the legislation is of course not relevant.
Concerning your claim that, quoting myself, "other confederation's legislations have no influence on you", that is made clear by your wording: "Not really a big deal to those of us in the EU"
My point is that it's annoying to see people from either side of the Atlantic pointing out in which way they're better off, and how whatever sucks on the other side is treated as other people's problems.
And while Americans get to hear about their US-centric views all the time, at least on /., Euro-centric comments get modded up. Even though both exhibit almost coldwar era nationalistic tendencies.

--
Truth arises more readily from error than from confusion. -Francis Bacon
Re:Binary-only toolchain by FireFury03 · 2008-11-23 21:39 · Score: 1

Concerning your claim that, quoting myself, "other confederation's legislations have no influence on you", that is made clear by your wording: "Not really a big deal to those of us in the EU"

More that the _lack_ of legislation elsewhere in the world has little effect on me in this case.

And while Americans get to hear about their US-centric views all the time, at least on /., Euro-centric comments get modded up. Even though both exhibit almost coldwar era nationalistic tendencies.
I'm happy to hear about good legislation, no matter where it comes from, in the hope that some of it might eventually make it into my locality. However, it does seem rare that any good legislation is announced from the US - the lawmakers are far too happy to bow to lobbying from big corporates at the expense of the individuals and minority groups. Sadly the EU seems to be slowly heading in the same direction as the US, but we still do have a few good laws to protect people from the large corporates. Places like Canada, on the other hand, seem to be much better, although any government does seem to pass very unbalanced laws every so often.

--
http://blog.nexusuk.org
Re:Binary-only toolchain by wazoox · 2008-11-24 02:50 · Score: 1

Yuck. A pure proprietary development toolchain on a pure proprietary platform, looks like a bad remake of IBM in the sixties.

And the worst timing ever award goes to... by CryptoJones · 2008-11-22 21:55 · Score: 2, Insightful

While the inner nerd in me screams to take out a loan against my house to buy one, I can't imagine this being very popular outside academia. Most users don't use the power of their crappy computers, let alone this. And then there is the whole "ECONOMY" thing.

--
"Chance favors the prepared mind." ~Me

Re:And the worst timing ever award goes to... by Yetihehe · 2008-11-22 22:06 · Score: 2, Insightful

It IS marketed for academia. Normal users don't really need to fold proteins or simulate nuclear weapons at home.

--
Extreme Programming - Redundant Array of Inexpensive Developers
Re:And the worst timing ever award goes to... by palegray.net · 2008-11-22 22:15 · Score: 2, Informative

I'm perfectly normal, and I fold proteins all the time.

--
512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
Re:And the worst timing ever award goes to... by Anonymous Coward · 2008-11-22 22:26 · Score: 2, Interesting

according to http://folding.stanford.edu/English/Stats about 250.000 "normal" users are folding proteins at home.
Personally, I would use it as a render farm, but Blender compatibility could take a while if Nvidia keeps the drivers and specification locked up.
What they don't seem to mention is the amount of memory/core (at 960 cores). I'd guess about 32 MB/core, and 240 cores sharing the same memory bus...
Re:And the worst timing ever award goes to... by evilbessie · 2008-11-23 00:55 · Score: 1

err, you seem to have missed something fairly major in your understanding. Specifically about what constitutes a 'core'. These cards are based on the same chip in the GT280, so they have 240 stream processors, which are very good at specific types of calculation (If I was wiser I could tell you what types but I'm sure you can use google yourself). I believe that each of the chips has a 512 bit wide bus to 4GiB of memory. I'm not sure what the memory allocation per stream processor is but I think the other parts of the chip control what goes where. There probably are some bottlenecks but I don't know enough about it to be able to give useful information on the subject. There is something like 102GB/s memory bandwidth per 240 core chip.
Re:And the worst timing ever award goes to... by Anonymous Coward · 2008-11-23 03:54 · Score: 0

Balling up the sock afterwards *does not* count.
Re:And the worst timing ever award goes to... by T-Bone-T · 2008-11-23 17:24 · Score: 1

There is a big difference between needing to fold proteins and allowing your computer to be told to fold proteins. How many of those 250,000 people know what proteins they are folding and why?

The question here is... by Anonymous Coward · 2008-11-22 21:56 · Score: 0

But will it run Crys- ...

Oh.

Only in C? Oh dear. by Viol8 · 2008-11-22 22:07 · Score: 0, Flamebait

That'll frighten a lot of the OO fanboys who have to have a friggin inheritance tree and a factory based abstracted class design before they can write Hello World.

Sorry , its early , I'm feeling grouchy.

Re:Only in C? Oh dear. by GigaplexNZ · 2008-11-22 22:34 · Score: 1, Insightful

OO is very good for graphical interfaces, but it isn't particularly well suited for algorithms and other maths oriented stuff. Why should we care if OO fanboys are scared off? Decent developers know to use the right tool for the job, not try to shoehorn whatever their personal favourite is into every situation.
Re:Only in C? Oh dear. by Joce640k · 2008-11-22 22:58 · Score: 1

"...isn't particularly well suited for algorithms and other maths oriented stuff"
Yeah, all that operator overloading is a real pain in the ass for numerical work.

--
No sig today...
Re:Only in C? Oh dear. by MROD · 2008-11-22 23:20 · Score: 1

Actually, OOP is a bit rubbish for number crunching, far too much overhead.
What is disappointing is that there isn't a high performance FORTRAN compiler. That's where most scientific number crunching is done. (After all, that's what the language was designed for.)

--

Agrajag: "Oh no, not again!"
Re:Only in C? Oh dear. by xororand · 2008-11-22 23:26 · Score: 5, Informative

OO is very good for graphical interfaces, but it isn't particularly well suited for algorithms and other maths oriented stuff.
The term OO is too general to make a statement about its usefulness for mathematics oriented problems. The powerful templating features of modern C++ are indeed very useful for numerical simulations:
It's called C++ Expression Templates, an excellent tool for numerical simulations. ETs can get you very close to the performance of hand optimized C code while they're much more comfortable to use than plain C. Parallelization is also relatively easy to achieve with expression templates.
A research team at my university actually uses expression templates to build some sort of meta compiler which translates C++ ETs into CUDA code. They use it to numerically simulate laser diodes.
Search for papers by David Vandevoorde & Todd Veldhuizen if you want to know more about this. They both developed the technique independently.
Vandevoorde also explains ETs to some degree in his excellent book "C++ Templates - The Complete Guide".
Re:Only in C? Oh dear. by cnettel · 2008-11-22 23:43 · Score: 2, Informative

OOP with virtual and all, yes. OOP with template magic to allow the compiler to do specializations can beat the heck out of even quite tediously hand-written C or FORTRAN, with much superior readability.
Re:Only in C? Oh dear. by ardin,mcallister · 2008-11-23 00:10 · Score: 1

offtopic, but i actually know someone who writes everything in fortran. my boss and i have seen the man write opengl calls in fortran... its a tad unsettling.

--
"Some men just want to watch the world burn..."
Re:Only in C? Oh dear. by GigaplexNZ · 2008-11-23 00:12 · Score: 0

As a C++ programmer myself I generally agree with what you said. But when I say "isn't particularly well suited" I mean it isn't necessarily the best solution even though it may work. I was also referring to the typical inheritance and dynamic polymorphism style of OO that the "fanboys" tend to love.

PS: Thanks for the references, I'll look into them.
Re:Only in C? Oh dear. by Fred_A · 2008-11-23 01:38 · Score: 1

OO is very good for graphical interfaces, but it isn't particularly well suited for algorithms and other maths oriented stuff.
Absolutely, that's what Fortran is for !

--

May contain traces of nut.
Made from the freshest electrons.
Re:Only in C? Oh dear. by VoidCrow · 2008-11-23 02:02 · Score: 1

> but it isn't particularly well suited for algorithms and other maths oriented stuff.
STL? Discrete math up the wazooo? Well, maybe not up the *wazoo*...
Re:Only in C? Oh dear. by HuguesT · 2008-11-23 03:30 · Score: 2, Informative

Actually yes it is. For instance nobody has yet figured out an efficient matrix class in C++ that uses operator overloading. This is basically an impossible task to write B=A*X*A^t efficiently, which occurs all the time in linear analysis, because in C++ the transpose would require a copy operator, whereas one ought to get the job done simply with a different iterator. C++ is not equipped for this yet.
Re:Only in C? Oh dear. by Calgary · 2008-11-23 05:40 · Score: 1

Not true. http://www.oonumerics.org/blitz/
It uses template metaprogramming, so any error messages you get are 7 screens long and take hours to decipher. That's probably why the technique never really caught on.
Re:Only in C? Oh dear. by ufoolme · 2008-11-23 06:11 · Score: 1

Absolutely, that's what Fortran is for !
Fortress ftw! http://en.wikipedia.org/wiki/Fortress_(programming_language)
Re:Only in C? Oh dear. by eh2o · 2008-11-23 07:31 · Score: 1

The trick to that sort of optimization is to defer the evaluation until the =, at which point the optimal execution plan is selected. But, you can't overload "=" in C++. The workaround is basically to provide another function to explicitly force the evaluation when you need it.
Re:Only in C? Oh dear. by Anonymous Coward · 2008-11-23 09:04 · Score: 0

You can use limited templates in CUDA. Please don't go around saying that ETs are the best thing since sliced bread, have you ever done any debugging on a code that's littered with massive usage of ETs? Try it some time...
Re:Only in C? Oh dear. by Anonymous Coward · 2008-11-23 09:59 · Score: 0

Standard Templates are fucking terrible for performance and should only be used by newbie programmers. I'd never touch it when doing non-linear work, let along in a parallel environment. Period.
Re:Only in C? Oh dear. by HuguesT · 2008-11-23 19:39 · Score: 1

Blitz doesn't do what I describe, at least they didn't when I last checked, unfortunately. In spite of what they claim(ed) AXAt are inefficient.
Re:Only in C? Oh dear. by HuguesT · 2008-11-23 19:41 · Score: 1

Exactly, which sort of defeats the whole operator overloading scheme. You need to know when to use vanilla operators and when to use special functions, which is confusing.

Let me be the first to say... by rdnetto · 2008-11-22 22:08 · Score: 5, Funny

4 Terraflops should be more than enough for anybody...

--
Most human behaviour can be explained in terms of identity.

Re:Let me be the first to say... by Trogre · 2008-11-23 08:39 · Score: 1

You can keep your Terraflops. I demand Martianflops!

--
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife

Wow. by osir · 2008-11-22 22:11 · Score: 1

Well cool. I dunno why people have such a tendency to start commenting on slashdot posts with ridicule. This is laudable, not laughable.

Re:Wow. by Anonymous Coward · 2008-11-22 22:20 · Score: 0

Because a person is smart. People are dumb panicky animals, and you know it.

nerdgasm by Anonymous Coward · 2008-11-22 22:14 · Score: 0

If this can be enabled to work for Digital Audio Workstations, to offer massive processing for VST/RTAS away from the CPU, I can still see a thriving market for this in multimedia...or even with real-time video processing.

Who remembers BionicFX? This will certainly make up for their vaporware...

I would save up for this

---
actual CAPTCHA:
costed

I want one... by frictionless+man · 2008-11-22 22:17 · Score: 1

No wait, I want one of these and the skills to be able to write something cool in c that would actually use it.

Scientist speak by jnnnnn · 2008-11-22 22:25 · Score: 2, Interesting

So many scientists use the word "codes" when they mean "program(s)".

Why is this?

Re:Scientist speak by Anonymous Coward · 2008-11-22 22:56 · Score: 3, Interesting

It's cultural.
You're not even allowed to say that you're "coding", but only that you produce "codes".
Maybe it's because analytic science is basic on equations which become algorithms in computing, and you can't say that you're "equationing" nor "algorithming".
In practice it's actually dishonest, because the algorithms don't have the conceptual power of the equations that they represent (they would if programmed in LISP, but "codes" are mostly written in Fortran and C), so the computations are often questionable. Even worse, it's almost impossible for one research group to compare the "codes" that yielded their results against those produced by another group when numerical computing is used, whereas equations are universally portable.
The theoretical half of the scientific method has lost some of the firm foundations upon which it used to build in recent years, as a result of theorizing through numerical simulation. Fortunately it doesn't matter too much in most sciences because experiment soon demolishes any incorrect predictions. However, those sciences which deal with long-term or historic or otherwise untestable areas are suffering, as a fair bit of unsubstantiated nonsense is popping out of poorly approximated simulations and being claimed as "fact", even though reality hasn't agreed yet.
Things are probably going to get worse in this area before they get better.
Re:Scientist speak by VisceralLogic · 2008-11-23 17:53 · Score: 1

A lot of what I've seen really is just some code, that's been poorly wrapped to produce a program.

--
Stop! Dremel time!

Comment removed by account_deleted · 2008-11-22 22:29 · Score: 4, Funny

Comment removed based on user account deletion

Sorry nVidia, but this isn't gaming anymore by Anonymous Coward · 2008-11-22 22:30 · Score: 0

Scientific and rendering outfits don't work to the same M.O. as gamers, ie. "it works, I'll use it, game on".

Open-source has become the name of the game over the last few years, and vendor tie-in has become the arse-end of the computing world, especially for this particular customer base.

While you may like the smell at the arse end, those who need HPC don't. Your blind attachment to the concept of closed-source accelerator solutions is so myopic that it's in danger of becoming an Internet joke meme.

I predict the worst possible outcome for a company with its head in the sand and a chip up its arse.

Re:Sorry nVidia, but this isn't gaming anymore by Gorgonzolanoid · 2008-11-22 22:59 · Score: 1

Cray just announced a new closed-source $25K supercomputer two months ago.
IBM is going open source on its supercomputers, but last August is not what I would call "the last few years".
Re:Sorry nVidia, but this isn't gaming anymore by Wesley+Felter · 2008-11-23 07:49 · Score: 1

There is some truth to what you say; I know the national labs in particular are working on a completely open source HPC stack. But many others in the HPC world have been using proprietary compilers, debuggers, filesystems, etc. for decades.

Nah. by osir · 2008-11-22 22:34 · Score: 1

No, I don't, and thats a piss poor plea (theirs' not your) for mod points. People are generally smart, they just lower themselves for social reasons, in groups or otherwise. You may or may not know that, but I think its more accurate than saying groups are inherently stupid.

Re:Nah. by Anonymous Coward · 2008-11-22 23:40 · Score: 0

That was a stupid Men in Black quote.
Re:Nah. by Anonymous Coward · 2008-11-23 00:50 · Score: 0

>People are generally smart
So, you're saying that on average people are smart?

More than 1 by rdnetto · 2008-11-22 22:47 · Score: 1

Imagine a Beowolf cluster of these!

--
Most human behaviour can be explained in terms of identity.

Re:More than 1 by Provocateur · 2008-11-22 23:27 · Score: 1

Owooooo.... (at the full moon)

--
WARNING: Smartphones have side effects--most of them undocumented.

Yes but by Colin+Smith · 2008-11-22 22:50 · Score: 2, Funny

And then there is the whole "ECONOMY" thing.

The whole reason the ECONOMY is in the tank is because there are not enough people like you taking loans out against their house to buy random stuff like this.

Basically... IT'S ALL YOUR FAULT!

--
Deleted

weak DP performance by Henriok · 2008-11-22 22:53 · Score: 5, Informative

I supercomputing circles (i.e. Top500.org) double precision floating point operations seems to be what is desired. 4 TFLOPS single precision, while impressive, is overshadowed by the equally weak 80 GFLOPS double precision, beaten by a single PowerXCell 8i (successor to the Cell in PS3) or the latest crop of Xeons. I'm sure tesla will find its users but we won't see them on the Top500 list anytime soon.

--

- Henrik

- when the Shadows descend -

Re:weak DP performance by Anonymous Coward · 2008-11-22 23:07 · Score: 1, Interesting

> Each Tesla GPU has 240 cores and delivers about 1 Teraflop single precision and about 80 Gigaflops double-precision floating point performance.
The 80GFlops are per card. So you end up with 320GFlops total.
Not much better, but still better than nothing ;)
Re:weak DP performance by Anonymous Coward · 2008-11-23 01:06 · Score: 0

> I supercomputing circles (i.e. Top500.org) double precision floating point operations seems to be what is desired.
That depends. In many cases, memory bandwidth is what is most desired, and compared to ordinary CPUs (as soon as your problem is larger than the cache), GPUs deliver very well here.
Either way GPUs are still very much special-purpose, so they sure will not be interesting for everyone.
Re:weak DP performance by timeOday · 2008-11-23 02:25 · Score: 1

I'm just amazed that the performance loss from single to double precision is more than a factor of 10! It's only 2x the bits, what's the holdup?
Re:weak DP performance by Anonymous Coward · 2008-11-23 03:08 · Score: 0

I probably shouldn't say anything since I don't actually know, but whatever. In my understanding, graphics processors are really good at certain vectorized calculations. In a vectorized calculation, the same operation is performed on each number in a column or maybe in a pair of columns. The result of each operation is independent of all the others, so they can be done in parallel. Graphics chips are designed to take advantage of that inherent parallelism.
When you are rendering scenes for a video game, nobody is going to notice the difference between single precision and double precision. So, they don't bother designing the graphics chip to do the fancy parallelized vector operations on double precision numbers. Double precision calculations can still be performed, but only in a much slower, serial, naive way.
Re:weak DP performance by Anonymous Coward · 2008-11-23 03:28 · Score: 0

Check out #29 on the top 500:
GSIC Center, Tokyo Institute of Technology
TSUBAME Grid Cluster with CompView TSUBASA - Sun Fire x4600/x6250, Opteron 2.4/2.6 GHz, Xeon E5440 2.833 GHz, ClearSpeed CSX600, nVidia GT200; Voltaire Infiniband
77 TFLOPS (the GPU contribution is a small fraction)
Re:weak DP performance by Anonymous Coward · 2008-11-23 06:03 · Score: 0

If by "80 GFLOPS", you mean "320 GFLOPS", and by "beaten by" you mean "soundly beats", then yes, you are correct.
There aren't many of the Top500 supercomputers that run on a single chip to sport better DP perf.
Re:weak DP performance by cheier · 2008-11-23 06:44 · Score: 1

They have hit the Top500. They just recently made 29th position with 170 Tesla S1070 systems in tandem with Xeon cluster nodes. Their overall performance in LINPACK was about 77.48 TFLOPS. This installation was done by Tokyo Tech.
Re:weak DP performance by Anonymous Coward · 2008-11-23 07:45 · Score: 0

http://arstechnica.com/news.ars/post/20081118-game-on-nvidia-ps3-hardware-in-top-500-supercomputers-list.html
Re:weak DP performance by Anonymous Coward · 2008-11-23 09:12 · Score: 0

Oh Snap! How about #29 in the latest Top500 rankings?
Re:weak DP performance by Anonymous Coward · 2008-11-23 09:37 · Score: 0

I'm sure tesla will find its users but we won't see them on the Top500 list anytime soon.
5-10 of these hooked up on a decent network will get you on the Top500 for under 100K. It will happen very soon...
Re:weak DP performance by BaverBud · 2008-11-23 11:32 · Score: 1

Check entry number 29. http://top500.org/system/9853 They're already there.

--
Baver
Re:weak DP performance by gupg · 2008-11-23 11:49 · Score: 1

I'm sure tesla will find its users but we won't see them on the Top500 list anytime soon.
Its already made the Top 500 last week:
Tokyo Tech's cluster just placed at #29 on the Top 500 with the installation of 170 NVIDIA Tesla S1070 1U systems. Each Tesla S1070 has 4 Tesla GPUs in it.
Re:weak DP performance by Anonymous Coward · 2008-11-23 12:28 · Score: 0

The Tesla 10/Geforce GT200 architecture has eight single precision units per multiprocessor and one double precision units per multiprocessor. ( Tesla 10 GPUs have 30 multiprocessors.)
On the other hand that does not quite explain the entirety of the performance difference. I'm not really sure where the 80GFLOPS double precision figure comes from - I work with Tesla 10's and I've seen double precision performance somewhat better than that. Granted, only on very simple code. It's strange, as nVidia usually is more prone to inflating their figures than deflating.
Re:weak DP performance by Anonymous Coward · 2008-11-23 20:37 · Score: 0

How about #29 on the latest Top500.org list? Oh snap!
Re:weak DP performance by Anonymous Coward · 2008-11-23 20:50 · Score: 0

According to the specs, it is 933 Gflops single precision and 78 GFlops double precision per processor.
For a 4-processor system, this makes 3.73 TFlop sp and 0.31 Tflop dp.
I am too lazy to^W^W^W^W did not bother to check on the PowerXCell 8i or the Xeons. Can someone submit those figures?

FTFL by mangu · 2008-11-22 22:53 · Score: 2, Informative

now what the heck to do with it...

All you need to do is follow the fscking link. Plenty of examples there.

Re:FTFL by Anonymous Coward · 2008-11-23 01:04 · Score: 1, Funny

Now why haven't they developed anything on it which has use for the common Jack like me. There were practically NO examples of it's applications in watching porn!
Re:FTFL by SmokeyTheBalrog · 2008-11-23 05:02 · Score: 2, Funny

Once CUDA has deep consumer penetration the 3D CGI furry anime loli porn will come! In droves if not herds.

Oh crap, I forgot to click Post AC.

boring apps... let's have some realtime raytracing by Lazy+Jones · 2008-11-22 22:59 · Score: 3, Insightful

there were a lot of early efforts trying to implement realtime rayracing engines for games (e.g. at Intel recently), let's port that stuff and have some fun.

--
"I love my job, but I hate talking to people like you" (Freddie Mercury)

Weird options by mangu · 2008-11-22 23:03 · Score: 3, Insightful

I went to the site and tried to configure one. The disk partition options are: "General Purpose, Internet Server, Developer's Workstation, File Server". I wonder, who needs three Tesla cards in a file server or an internet server?

Re:Weird options by Anonymous Coward · 2008-11-23 00:09 · Score: 0

Drawball.com?
Re:Weird options by Anonymous Coward · 2008-11-23 05:07 · Score: 0

www.zombo.com
Re:Weird options by Anonymous Coward · 2008-11-23 07:17 · Score: 0

there's a spinning pedobear coming at me!

Can I have a smaller version? by Fuzuli · 2008-11-22 23:08 · Score: 1

Is it possible to build a smaller version of this configuration? I do not have 10K, but I can come up with something smaller for my PhD research. In that case, is this a package that can be replicated via off the shelf nvidia hardware, or do I need to wait for NVidia to release a smaller version?

Re:Can I have a smaller version? by JamesP · 2008-11-22 23:20 · Score: 1

Well, buy any card that supports CUDA (pretty much all offers by nVidia today - except you probably want to stay off the cheapest stuff)
You can also try running a PS3 + Linux or try the similar offers from AMD/ATI

--
how long until /. fixes commenting on Chrome?
Re:Can I have a smaller version? by Fuzuli · 2008-11-23 02:06 · Score: 1

Sorry, I should have been clearer. I'm aware of those solutions, but would it be the same in terms of processing power, software support (cuda, related libraries etc..)
I mean is this a convenient repackaging of what is already out there, or does it have something extra?
Re:Can I have a smaller version? by SpinyNorman · 2008-11-23 02:44 · Score: 3, Informative

From NVidia's CUDA site, most of their regular display cards support CUDA, just with less cores (hence less performance) than the Tesla card. The cores that CUDA uses are what used to be called the vertex shaders on your (NVidia) card. The CUDA API is designed so that your code doesn't know/specify how many cores are going to be used - you just code to the CUDA architecture and at runtime it distrubutes the workload to the available cores... so you can develop for a low end card (or they even have an emulator) then later pay for th hardware/performance you need.
Re:Can I have a smaller version? by cheier · 2008-11-23 06:52 · Score: 1

NVIDIA themselves don't make the personal supercomputer. They partner with companies like ours to build and test systems, like our Slipstream S4 that can be sold and branded as a Tesla personal supercomputer. The system I linked in particular will still give you 4 Tesla C1060 GPUs, but for $8,495. Shameless plug? Sure, but it certainly isn't 10k, and is still certified by NVIDIA.
Re:Can I have a smaller version? by kramulous · 2008-11-23 10:09 · Score: 2, Informative

The 10K refers to a rack mount solution containing 4xGPUs. You can still buy a single GPU and try and put it in a standard machine (provided it doesn't melt - I'd read the specs) for about a quarter of the price.

--
.
Re:Can I have a smaller version? by AragornSonOfArathorn · 2008-11-24 10:05 · Score: 1

They are selling $10k desktop systems with 4 of these Tesla cards.
1 card by itself is about $1500.00.

--
sudo eat my shorts

FLOPS not FLOP! by 91degrees · 2008-11-22 23:09 · Score: 1

The S stands for "seconds". The singular is therefore "FLOPS".

Re:FLOPS not FLOP! by TeknoHog · 2008-11-23 01:55 · Score: 4, Funny

What's the plural of FLOPS then? My preciouss FLOPSes?

--
Escher was the first MC and Giger invented the HR department.
Re:FLOPS not FLOP! by 91degrees · 2008-11-23 05:17 · Score: 1

1 FLOPS, 2 FLOPSes, 3 FLOPSeses, 4 FLOPSeseses...

It also runs Python by mangu · 2008-11-22 23:15 · Score: 3, Informative

Look, there's Python here. You can do the low-level high-performance core routines in C, and use Python to do all the OO programming. This is how God intended us to program.

Re:It also runs Python by BOFHelsinki · 2008-11-23 01:08 · Score: 2, Funny

Ah, Parseltongue. So you are of the Slytherin school of programmers?
Re:It also runs Python by OriginalArlen · 2008-11-23 01:28 · Score: 3, Funny

This is how God intended us to program.
Then why did he write Perl?

--

Everything I needed to know about life, I learnt from Blake's Seven
Re:It also runs Python by rhsanborn · 2008-11-23 03:01 · Score: 1

*sigh*...if you insist

XKCD
Re:It also runs Python by Anonymous Coward · 2008-11-23 03:55 · Score: 0

It was the devil who gave us Perl, not God.
Re:It also runs Python by Anonymous Coward · 2008-11-23 05:16 · Score: 0

God created Perl for real programmers - and Python to distract those who THINK they're real programmers and keep them busy so they don't bother those who actually are real programmers.
Re:It also runs Python by Anonymous Coward · 2008-11-23 05:34 · Score: 0

Because he was drunk
Re:It also runs Python by Anpheus · 2008-11-23 05:47 · Score: 1

The Old Testament God was vindictive and angry, and Perl is the unspoken 11th Plague.
Re:It also runs Python by Anonymous Coward · 2008-11-23 07:08 · Score: 0

First god created python, then he wrote the bible and the rest that really didn't make any sense became known as perl.
Re:It also runs Python by OriginalArlen · 2008-11-23 13:24 · Score: 1

Perl's a plague?
OK then: http://uk.youtube.com/watch?v=Ey0eS-Nx6Ks

--

Everything I needed to know about life, I learnt from Blake's Seven
Re:It also runs Python by Anonymous Coward · 2008-11-23 16:58 · Score: 0

That was why Satan was permabanned from heaven.
Re:It also runs Python by Lupu · 2008-11-23 19:57 · Score: 1

Obligatory XKCD reference:
http://xkcd.com/224/

Erlang by Safiire+Arrowny · 2008-11-22 23:25 · Score: 2, Interesting

So how do you get an Erlang system to run on this?

Re:Erlang by stonecypher · 2008-11-23 08:00 · Score: 1

You don't. The cores in these cards need to run the same function. It's not a bunch of discrete cores, it's a large matrix processor. This card cannot meaningfully run distinct processes. Erlang will not run here.
I'm sorry, Mario, but your princess is in another castle.

--
StoneCypher is Full of BS
Re:Erlang by eggnoglatte · 2008-11-23 08:34 · Score: 2, Insightful

By writing an Erlang-to-CUDA compiler?
More seriously though, it is probably not worth even trying, since the GPUs used in the Tesla support a very limited model of parallelism. Shoehorning the flexibility of Erlang into that would at the very leas result in a dramatic performance loss, if it is possible at all.

And in other news... by bsDaemon · 2008-11-22 23:58 · Score: 5, Funny

... AMD has annouced today it new Edison Personal Supercomputer technology.

The game is on.

Re:And in other news... by ancient_kings · 2008-11-23 03:24 · Score: 1

Yes, and this new "Edison Personal Supercomputer" has a funny habit of importing European Supercomputer Systems and, through sophisticated beyatch slapping robotics, stamping a "Edison Personal Supercomputer" over the logos of that European Supercomputer Systems.
Re:And in other news... by Anonymous Coward · 2008-11-23 04:22 · Score: 0

... and Apple announced Netwton Personal Digital Superassisstant

cold hard facts about cuda by Gearoid_Murphy · 2008-11-23 00:16 · Score: 2, Interesting

it's not about how many cores you have but how efficiently they can be used. If your CUDA application is any way memory intensive you're going to experience a serious drop in performance. A read from the local cache is 100 times faster than a read from the main ram memory. This cache is only 16kb. I spend most of my time figuring out how to minimise data transfers. That said, CUDA is probably the only platform that offers a realistic means for a single machine to tackle problems requiring gargantuan computing resources.

--
prepare the survey weasels.

Taken from bash.org by Artuir · 2008-11-23 00:33 · Score: 1

someone speak python here?
HHHHHSSSSSHSSS
SSSSS
the programming language

http://www.bash.org/?400459

Re: Is that all you got? by itsybitsy · 2008-11-23 00:39 · Score: 1

wow, as if I didn't do that before plunking down the money for the darn card...

I was asking for innovative ideas... not their existing boring ones...

So, who's got some cool ideas of what to do with Tesla?

Re:cold hard facts about cuda- unbalanced by anon+mouse-cow-aard · 2008-11-23 00:52 · Score: 4, Insightful

People are always coming out of the wood work to claim supercomputer performance with such and such a solution, go back and look at GRAPE (which is really cool.) http://arstechnica.com/news.ars/post/20061212-8408.html or a lot of other supercomputer clusters. When you want something flexible, you look for "balance" that means a good relationship between memory capacity, latency & bandwidth, as well as computer power. in terms of memory capacity, the number people talk about is: 1 byte/flop... that is 1 Tbyte of memory is about right to keep 1 TFLOP flexibly useful. this thing has 4 G of memory for 4 TF... in other words: 1 byte / 1000 flops. it's going to be hard to use in a general purpose way.

BrookGPU by Skinkie · 2008-11-23 00:53 · Score: 1

In the paste I was not very impressed by things as http://www-graphics.stanford.edu/projects/brookgpu/ because of the latency that is involved in actually transferring data back and forth from CPU to GPU memory. Thus I observed the same thing. But now it seems to the actual latency for transfer is reduced because of PCI-e, one might wonder if decent compiler technology is able to optimise 'normal' code for GPU instructions.

--
Support Eachother, Copy Dutch Property!

Nor turbine. by BOFHelsinki · 2008-11-23 01:04 · Score: 2, Interesting

Shameless exploitation of the good name of one of the greatest inventors of all time. :-)

Developement Platform by dreamchaser · 2008-11-23 01:49 · Score: 2, Insightful

On that note, it would be a good development platform for realtime raytraced game engines. That way the code would be mature when affordable GPU's come out that can match that level of performance.

Patmos International by Danzigism · 2008-11-23 01:57 · Score: 3, Interesting

ahh yes the idea of personal supercomputing. Back in '99 I worked for Patmos International. We were at the Linux Expo for that year as well if some of you might remember. Our dream was to have a parallel supercomputer in everyone's home. We used mostly Lisp and Daisy for the programming aspect. The idea was wonderful, but eventually came to a screeching halt when nothing was being sold. It was ahead of it's time for sure. you can find out a little more about it here. I find the whole ideal of symbolic multiprocessing very fascinating though.

--
*plays the Apogee theme song music*

Re:Patmos International by Anonymous Coward · 2008-11-23 04:37 · Score: 0

Our dream was to have a parallel supercomputer in everyone's home.
Doing what, exactly? I can't honestly think of anything I do on a day-to-day basis that would even take advantage of a "personal supercomputer" let alone benefit from it.
Posted AC because I moderated in this thread already...
Re:Patmos International by Danzigism · 2008-11-23 06:36 · Score: 1

basically just data centralization, optional linux terminal services, and a directory server for controlling user policies on your kids' computers.. backing up is something people are not capable of doing, and USB hard drives didn't even exist yet, and computers were expensive.. we really focused more on a commercial market for weather stations that needed large amounts of computing power to perform predictions and calculations.. sold a few small units to schools and small internet providers as well.. there was an underlying idealism behind our product of connectionism and AI.. we never really got that far, but it was fun while it lasted.. had some very talented programmers that worked on the parallel programming architecture we created.. the machines consisted of a 6, 8, 16, or 24 node system with a central "brain computer" called the Limbix. It controlled all the nodes and would even act as a failover machine if any of the nodes went down temporarily.. why anyone would need something like this in their home I can't really answer besides the reasons I've already given.. but computers became faster and smaller so nothing really became of the company sadly.. the dream turned in to a small and somewhat efficient home server which you see companies like HP selling.. even that isn't doing too hot.. but I do think a centralized location for all your family's data, such as pictures, movies, letters, and other multimedia is a good thing.. doesn't require a supercomputer though..

--
*plays the Apogee theme song music*

Re: Is that all you got? by Anonymous Coward · 2008-11-23 02:27 · Score: 0

how about go outside and grow up...

Your probably right about the "mad scientist" ... by PolygamousRanchKid+ · 2008-11-23 02:40 · Score: 2, Insightful

. . . that's probably exactly the person who would buy one of these.

Folks who are professionally working on mainstream problems that require supercomputers, well, they probably have access to one already. (Maybe one of the supercomputing folks might want to chime in here; do you have enough access/time? Would a baby-supercomputer be useful to you?)

But there is certainly someone out there who was denied access, because his idea was rejected by peer review. He is considered a loopy nut bag, because he wants to prove that the Higg's boson is made of cottage cheese, or something like that.

Yep, look for rejected supercomputing program proposals, and you have a list of potential customers.

--
Schroedinger's Brexit: The UK is both in and out of the EU at the same time!

Re: Is that all you got? by neomunk · 2008-11-23 02:47 · Score: 2, Interesting

Neural nets.

This setup sounds ideal for a training bed for fann programs. I can't recall if there's a port of fann for CUDA, but I think there might be.

Re:Your probably right about the "mad scientist" . by rhyder128k · 2008-11-23 02:56 · Score: 1

Perhaps there will be a resurgence in mad, unethical experimentation. In 20 years, this computer might acquire a status similar to that of the Altair 8800 home computer kit.

I still say that 640 human embryos should be enough for anybody.

--
Michael Reed, freelance tech writer.

Re: Is that all you got? by fatphil · 2008-11-23 03:07 · Score: 1

Can you port Dan Bernstein's DJBFFT to it? And then benchmark a complex double-precision 8192-limb FFT against the CUDA libraries. If you can provide me with the benchmark results, then I'll be able to tell if it's a good platform for big-number number-crunching. (In particular, prime number hunting.)

--
Also FatPhil on SoylentNews, id 863

Re: Is that all you got? by Anonymous Coward · 2008-11-23 03:12 · Score: 0

So, you bought a product without having any idea on what to do with it..
Not a smart thing to do, and bitching about nobody else having an idea doesn't make you appear any smarter.
You could ask some community to come up with ideas and offer to lend the board to whoever's idea you like best..

NVCC intermediate assembly by DrYak · 2008-11-23 04:04 · Score: 1

This is relevant because the compiler creates device specific binaries that you can't get the assembler code for.

Yes you can. Just give the proper switch to ask NVCC to keep all intermediate files.
You'll both get the high level shaders that got compiled. And the resulting assembler which subsequently code compiled into op-codes.
(Just don't have cuda handy at home to check what the options where).

My main objection is that CUDA is nVidia hardware-specific only, and ties you to a single provider.

The various incarnation of Brook (currently supported by ATI's card) are much more interesting as they are vendor neutral and support several back-ends (BrookGPU has an OpenGL back-end).

OpenCL looks like another interesting thing to follow regarding interoperability.

(My other objection is that CUDA isn't all that high-level as nVidia would like you to believe. Only the code to call kernels has C extensions. Everything else on the CPU side uses an API which is rather low level management of memory, initialisation, etc. Also all the different type of memory aren't properly abstracted - texture memory is still accessed with functions in kernels, not simply as plain C arrays like the other types of memory).

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]

Re:NVCC intermediate assembly by oneofthose · 2008-11-23 05:26 · Score: 1

As far as I know what you see when asking NVCC to keep all intermediate files is PTX code. This is not the code the device executes.
Re:NVCC intermediate assembly by MostAwesomeDude · 2008-11-23 09:02 · Score: 1

If you'd like CUDA to be available for all platforms, write a compiler for Gallium for it. Either compile CUDA to LLVM IR, or compile it to TGSI, and all Gallium-aware cards will be able to run it.

--
~ C.

4 Terraflops? by yfkar · 2008-11-23 04:19 · Score: 2, Funny

As opposed to astroflops?

CUDA memory structure by DrYak · 2008-11-23 04:35 · Score: 2, Informative

but I don't know enough about it to be able to give useful information on the subject.

I do write some CUDA code, so I'll try to help.

I believe that each of the chips has a 512 bit wide bus to 4GiB of memory.

Indeed each physical package has entirely access to its own whole chuck of memory, regardless of who many "cores" the package contains (between 2 for the lowest end laptops GPUs and 16 for the highest end 8/9800 cards. Don't know about GT280. But the summary is wrong 240 is probably the amount of ALUs or the width of the SIMD) and regaless of how many "stream processor" there are (each core has 8 ALUs, which are exposed as 32-wide SIMD processing units, which in turn can keep up to 768-threads in flight thanks to some clever hyperthreading-like scheduling).

So in one single GPU card all the memory is accessible.
In a dual-GPU SLI card, each GPU has a full access to its own memory.
So, in our situation, it's 4GiB for each Tesla Card.

Then each core has a special internal memory which is shared by all the 32-to-768 threads running in parallel on the SIMD. (A couple of KiB, don't have the exact number handy).

I'm not sure what the memory allocation per stream processor is but I think the other parts of the chip control what goes where.

There's no actual per-stream-processor control of memory. There is something that looks like a "per-thread memory" but it's actually memory auto-allocated from the global memory.
(It all the same global memory, and the compiler just makes sure that each thread uses a different chunk of it to avoid conflicts).

And you actually do not control the stream-processors themselves.
You write a kernel (a piece of code which will process a mass of data) and throw a number of threads to one GPU (one physical package : i.e.: either 1 normal graphic card, or half of a SLI dual GPU graphic card).
The sceduler will dynamically spread all the concurrent threads among the SIMD processors on the GPU.

There probably are some bottlenecks

Yes, indeed :
- These 4GiB aren't cached at all (that's why it's preferable to use them only in the begin and the end of a calculation and use other types of memory during the calculations), have a big latency (that's why its better to have more threads running together so the scheduler can switch threads to hide latency) and you have to access them in a special fashion to group together the read-writes for faster access.

- Then there's the texture access. Using a special set of functions you can access the memory not directly but as if it was textures. It still has a big latency and it read-only. On the other hand, it has a cache so it has much better bandwidth and the texture units don't require special ordering of the access.

- The last type of memory is an ultra fast on-chip read-write memory which is shared for all the threads executed at the same time on the same core. But its access pattern is weird because everything is accessed in banks (one bank per thread or all threads on the same bank. Never many-to-many).

So, in the end writing good CUDA code requires some voodoo magic to correctly organise your stuff into memory in the most efficient way.

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]

Re:CUDA memory structure by evilbessie · 2008-11-23 10:15 · Score: 1

There are 240 shaders on the GT280, which is why there is something like 1.4bn transistors on this chip. http://en.wikipedia.org/wiki/GeForce_200_Series#GeForce_GTX_200

OpenCL by JiNG · 2008-11-23 04:50 · Score: 1

The real reason this is interesting is because of OpenCL (http://www.khronos.org/opencl/) which just got approved by Khronos:

"OpenCL (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming of heterogeneous systems. OpenCL provides a uniform programming environment for software developers to write efficient, portable code for high-performance compute servers, desktop computer systems and handheld devices using a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs."

It's similar to OpenGL / OpenAL except that it's designed for general purpose computing and is already approved by the vast majority of players in the industry. Developing for proprietary CUDA is riddled with problems, but OpenCL should open up the doors for some very interesting applications. In my opinion, support for OpenCL is the single biggest feature in Apple's Snow Leopard.

I just put up a post on this in my blog - http://blog.expensivedna.com/?p=82

Tesla? WTF? by Warshadow · 2008-11-23 05:44 · Score: 1

If it doesn't shoot arcs of lighting from it, then it shouldn't be named Tesla.

Yes but, by Landshark17 · 2008-11-23 06:17 · Score: 1

Will it run Duke Nukem For... eh, you all know where this is going...

--
This sig is false.

It gets worse... by raftpeople · 2008-11-23 06:21 · Score: 1

Also left out of the calculations are the glial cells. There are 10x more glial cells than neurons. They were previously thought to not be part of brain calculation but have since been shown to modulate the activity of the neurons. We've got a long way to go.

Re:It gets worse... by HiThere · 2008-11-23 09:17 · Score: 1

Well, there are people who think the correct approach is to simulate things at that level.
My personal feeling is that you are abstracting at the wrong level, and that if you abstract at the correct level you'll save many orders of magnitude of calculation. I'll admit, though, that all I definitely know is that my personal computer is seriously too limited to even store the needed data. And it's also blind and deaf and diskinesic. Being able to read and write text isn't sufficient compensation. The text has no grounding in meaning (i.e., a multi-sensory representation of a chunk of knowledge).
So far I haven't thought of a way around this. Virtual games are a possibility, but I'd need direct access to their sensoria, and what I could easily get would be restricted to multiple screen captures. Virtual games are usually intentionally constructed to prohibit bots. (Reasonable from their point of view, but a dman nuisance from my point of view.)

--

I think we've pushed this "anyone can grow up to be president" thing too far.
Re:It gets worse... by raftpeople · 2008-11-23 17:16 · Score: 1

I don't disagree. It seems that we can create something mathematically equivalent without jumping through the exact same hoops as nature.

Regarding your point about text without context: I completely agree, what we write and speak is really a translation/communication of an inner model based on interaction with our environment. Programming in a bunch of text with rules won't cut it.

It's news because... by raftpeople · 2008-11-23 06:26 · Score: 2, Interesting

NVIDIA has done a good job of making the processing power accessible to programmers that are not GPU coding experts. In addition, they have made hardware changes to better support the type of scientific computation being done on these devices.

So, while in theory you could put together some Radeon's, work with their API and achieve the same thing, NVIDIA has significantly reduced the level of effort to make it happen.

Re:cold hard facts about cuda- unbalanced by Anonymous Coward · 2008-11-23 06:42 · Score: 0

in terms of memory capacity, the number people talk about is:

1 byte/flop... that is 1 Tbyte of memory is about right to keep 1 TFLOP flexibly useful.

this thing has 4 G of memory for 4 TF... in other words:
1 byte / 1000 flops.

it's going to be hard to use in a general purpose way.

Note that these processors are decidedly not general purpose. As is typical with many scientific computing applications, these processors specialize in matrix and vector operations.
For most signal/image processing uses memory bandwidth is the limiting factor, something that GPU's have had an very large lead over their more general CPU counterparts for some time. The real reason people get excited about this technology is cost. For 10K we replaced an SGI with 128 Itanium 2's that cost well over 2 million. Talk about big iron. That thing required a 6 figure cooling unit just to run it.

Heck with games, I want a holodeck ! by OneInEveryCrowd · 2008-11-23 07:51 · Score: 1

10 Gs ? I'd pay that.

The network is the computer by wikinerd · 2008-11-23 08:43 · Score: 1

Personal supercomputer? Surely it's cool, but how about turning the whole Internet into a supercomputer?

Make Internet fast enough and equip every node with a network operating system to share its resources with all other nodes. Sounds like a security nightmare, but let's focus on the performance part for now. Every one of us has a CPU, a storage device (eg SSD), and some RAM. But not all of us use all of our CPU, SSD, or RAM at the same time. While I play a game effectivelly making my CPU to work at 100% capacity, my neighbour may let their CPU to sleep, but if we had a fast communication link between us and we trusted each other we could just share our work and let my and the neighbour's CPU to work at 50% instead. And two CPUs at 50% deliver faster results than one CPU at 100% if the software is designed to take advantage of multiprocessing and there are no communication overheads.

Similarly for storage: the need to take backup copies would be made obsolete if we could implement a worldwide RAID system of all of our SSDs, HDDs, etc. Our data would be replicated all over the planet's computers in a P2P fashion, and we would never have to worry about backups and lost data. Plus, assuming zero communication costs, such a RAID system would be extremely fast.

The only obstacles to a worldwide supercomputer are communication costs and human trust. Unfortunately with the currently deployed Internet technologies the communication overhead is significant, and we cannot seem to be able to trust our neighbours in this world. The trust problem can potentially be solved (right now whenever you talk on VoIP your data get transmitted through other nodes and yet no one seems to have a problem with this), but I am not so sure about the communication technology and infrastructure. But once we solve the communication problem, the global supercomputer could become reality.

Re:The network is the computer by Thagg · 2008-11-23 09:06 · Score: 1

It's already being done, and very successfully.
Unfortunately, it's all being done by criminal gangs using botnets. But, it is a proof of concept!

--
I love Mondays. On a Monday, anything is possible.

Re:cold hard facts about cuda- unbalanced by Anonymous Coward · 2008-11-23 08:49 · Score: 1, Interesting

The 1 byte/flop ratio is more about memory bandwidth than capacity. Each Tesla processing unit may only have 1GB of onboard memory but that doesn't restrict you from transferring data in and out from your system's main memory, which could be as large as you need it to be. The bandwidth on this figure for PCI-Express 2.0 would probably still be a bottleneck though. I don't have the exact specifications on hand, but even a previous generation G80 has about 80GB/s memory bandwidth to on-chip main memory, and there are 4 of these so you're looking at a minimum of 320GB/s or about 1/3 of a byte per flop for a nominal input dataset of 4GB, lower than that for larger data sets. Not ideal, but not useless either.

Re: Is that all you got? by itsybitsy · 2008-11-23 08:52 · Score: 1

Excellent idea! A growth algorithm for an AI. Sweet.

Re:Your probably right about the "mad scientist" . by kramulous · 2008-11-23 09:02 · Score: 1

I'll bite. I manage a cluster as well as what I deem to be a supercomputer. In spare time, I'm running codes on them and try to get the best efficiency out of them as possible. So I can show the guys that write their own codes on these machines.

About 5 months ago I told my boss we need to get one of these. We'll get one but have to wait for end of year budget cleaning. See, I also experiment with the GPU (8800 GTX) in my workstation. I had a client at an institute across town that needed to run 8*10^7 2D ffts with local minimisation and was going to be adding another half million jobs each day with the remote sensing devices he had. Our cluster and SMP machine are totally full and a back-log of work 10x the compute power (no free time for about a month). The GPU did this FFT work wonderfully. I rolled out a machine to the institute and computation caught up the back log and processed any incoming work on the fly.

To get this done on the big machines, we would have had to talk to the security guys to open up some ports (reluctantly = time lost) and then would have to figure out some workflow to push the data across the network to process, compute, and push results back.

--
.

obselete definition of supercomputer by peter303 · 2008-11-23 10:46 · Score: 1

A supercomputer is the world's fastest computer and computers within one magnitude lower. That that is 100 teraflops and faster these days. Latest list announced last week.

What's the point... by Anonymous Coward · 2008-11-23 12:04 · Score: 0

What's the point if it can't play Crysis?

Shaders != cores. by DrYak · 2008-11-23 14:29 · Score: 1

There are 240 shaders on the GT280, which is why there is something like 1.4bn transistors on this chip.

Shaders aren't cores.
GPU tend to use massively wide SIMD architecture (Sinlge Instruction Multiple Data). When you're going to run the exact same shader on huge amount of pixels, its redundant to put a complete pipeline and control for each thread. Instead you group data you have to process into SIMDs.
The SIMD processors runs one piece of code, one shader, but applies this shader on several pixels at the same tame. In terms of GPGPU : you write one kernel function, each SIMD processors executes one instance but applies it to lots of elements of the data array at the same time.

9800GTX was advertised having "128 shaders" where in fact, that is 16 cores each with 8 ALUs.
There aren't 128 discete processors. There are only 16, but each can process 8 pieces of data at the same time.

The "240 shaders" of the GT280 are technically 30 cores with 8 ALUs each.
There are only 30 processors. They just have 8 units each, and thus able to run 32 to 768 threads per processor.

Check the Appendix A of the CUDA programming guide if you don't trust me.

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]

Re:Shaders != cores. by badkarmadayaccount · 2008-11-25 05:54 · Score: 1

This is OT, but i must ask, are you on DVORAK?

--
I know tobacco is bad for you, so I smoke weed with crack.

THE TITLE IS WRONG!!! by EncryptedSoldier · 2008-11-23 14:37 · Score: 0

If you go to purchase it, depending on the configuration you can get it for well under $10000. More like $8000.

DIS-Heartening... by Anachragnome · 2008-11-23 15:23 · Score: 1

"Sure would be cool to build such a beast, do some random connections, and see what happens..."

Just don't give it a modem.

"...and see what happens..." can be pretty fucking scary sometimes, and letting scary out of the building just might be a bad idea.

Never know if your going to end up with Skynet or the mother of all spammers...

What a difference a %20 makes by Zero__Kelvin · 2008-11-23 17:56 · Score: 1

It runs Linux and the Windows virus cannot infect it. Sounds like a super computer to me :-)

--
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun

Re: Is that all you got? by itsybitsy · 2008-11-23 18:03 · Score: 1

Actually I do have a whole bunch of purposes for it. I'm just interested in what you guys would do with it if you had one...

The ultimate thing to do with it by colk99 · 2008-11-24 18:31 · Score: 1

Play Crysis on max settings

Offtopic by Anonymous Coward · 2008-11-25 10:27 · Score: 0

are you on DVORAK?

Nope, not at all. It's fr_CH on buckling springs. Why do you ask ?

Slashdot Mirror

NVIDIA's $10K Tesla GPU-Based Personal Supercomputer

236 comments