TACC "Stampede" Supercomputer To Go Live In January

← Back to Stories (view on slashdot.org)

TACC "Stampede" Supercomputer To Go Live In January

Posted by samzenpus on Wednesday September 12, 2012 @05:58PM from the coming-soon dept.

Nerval's Lobster writes "The Texas Advanced Computing Center plans to go live on January 7 with "Stampede," a ten-petaflop supercomputer predicted to be the most powerful Intel supercomputer in the world once it launches. Stampede should also be among the top five supercomputers in the TOP500 list when it goes live, Jay Boisseau, TACC's director, said at the Intel Developer Forum Sept. 11. Stampede was announced a bit more than two years ago. Specs include 272 terabytes of total memory and 14 petabytes of disk storage. TACC said the compute nodes would include "several thousand" Dell Stallion servers, with each server boasting dual 8-core Intel E5-2680 processors and 32 gigabytes of memory. In addition, TACC will include a special pre-release version of the Intel MIC, or "Knights Bridge" architecture, which has been formally branded as Xeon Phi. Interestingly, the thousands of Xeon compute nodes should generate just 2 teraflops worth of performance, with the remaining 8 generated by the Xeon Phi chips, which provide highly parallelized computational power for specialized workloads."

67 comments

Min score:

Reason:

Sort:

WTF by Anonymous Coward · 2012-09-12 18:08 · Score: 0

I can't find any source at all.
Why so little memory? by afidel · 2012-09-12 18:11 · Score: 4, Interesting

I wonder why it's got such little memory? You can easily run 64GB per socket at full speed with the E5-2600 (16GB x 4 channels) without spending that much money. Heck for maybe 10% more you can run 128GB per socket (You need RDIMM's to run two 16GB modules per bank). They're apparently only running one 16GB DIMM per socket (any other configuration would be slower on the E5) which IMHO is crazy as you're going to have a hard time keeping 8 cores busy with such a small amount.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
1. Re:Why so little memory? by Taco+Cowboy · 2012-09-12 18:24 · Score: 2
  
  You can easily run 64GB per socket at full speed with the E5-2600 (16GB x 4 channels) without spending that much money. Heck for maybe 10% more you can run 128GB per socket (You need RDIMM's to run two 16GB modules per bank).
  As TFA has put it:
  
  " ... the compute nodes would include "several thousand" Dell Stallion servers, with each server boasting dual 8-core Intel E5-2680 processors and 32 gigabytes of memory"
  I am guessing it might have something to do with budget
  
  From the way I look at it, they are populating each memory slot with 4GB of el-cheapo DDR3 DRAM and that way they may be saving quite a bit of $$$ to buy more Dell servers
  
  --
  Muchas Gracias, Señor Edward Snowden !
2. Re:Why so little memory? by loufoque · 2012-09-12 18:31 · Score: 1
  
  To program the MIC you need to design your program so that each thread only requires 128 MB of RAM anyway...
3. Re:Why so little memory? by afidel · 2012-09-12 18:43 · Score: 0
  
  Ah, another esoteric, nearly impossible to program for architecture that flies for some problem sets but is nearly unapproachable for the non-CS science folks. I mean it's great that such things exist I guess, and in theory they can have great FLOPS/watt figures, but I wonder how much science will really get accomplished per dollar spent compared to something where standard code just runs?
  
  --
  There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
4. Re:Why so little memory? by loufoque · 2012-09-12 19:04 · Score: 1
  
  How is that esoteric? A thread shouldn't require more than this even on a PC. That's also much more than the Cell allowed, which is a similar architecture.
5. Re:Why so little memory? by PCK · 2012-09-12 19:27 · Score: 1
  
  I'm assuming you got the 128MB number from dividing 8GB by 64 cores, I have n't seen anything to indicate that a core is limited in that way, in fact Knights Corner caches are coherent.
6. Re:Why so little memory? by loufoque · 2012-09-12 19:48 · Score: 1
  
  Yes, a core can access the memory of other cores.
  Your point being?
  If you run all cores at the same time each with their own dataset, which is what you want to do in order to actually use the architecture properly, you'll have that limit for each thread.
7. Re:Why so little memory? by Anonymous Coward · 2012-09-12 20:08 · Score: 2, Insightful
  
  That's 2GB per core, a fine amount for supercomputer problems requiring compute density and bandwidth. No virtualization there and the compilers, middleware and programmers are probably sufficiently educated to know how to split the problem.
8. Re:Why so little memory? by PCK · 2012-09-12 20:30 · Score: 1
  
  According to the architecture each core does n't have it's own memory other than the L2 & L1 caches. How the memory is mapped per core is arbitary, there is nothing stopping you from having for exampe a shared data set using 4GB and using 64MB per core for a raytracer where the scene data is stored in the shared memory and each core works on part of scene. So no, you don't have to limit memory per thread to fully use the architecture properly.
9. Re:Why so little memory? by loufoque · 2012-09-12 20:53 · Score: 1
  
  You can do that, but that will only reduce the amount of memory available to each core, not increase it...
  You could also have something even less homogeneous, but that would be a nightmare to schedule.
10. Re:Why so little memory? by Shinobi · 2012-09-12 21:00 · Score: 2, Informative
  
  Esoteric? Nearly impossible to program for? Methinks you haven't read through the actual docs for it. You can use all the standard Intel tools to program for it, which are also MIC-aware, just like you program for a standard multi-core CPU. That includes the threading and math kernel libraries, as well as OpenCL if you want to go that route.
11. Re:Why so little memory? by PCK · 2012-09-12 21:14 · Score: 1
  
  On a card with 8GB your effective memory accessible per core is 8GB, lots of problems have large data sets that can be shared over cores such as the example I gave. In fact this is a major advantage of MIC versus GPUs.
  There is nothing nighmarish of the above, it would appear just a shared memory area to the process.
12. Re:Why so little memory? by loufoque · 2012-09-12 22:43 · Score: 3, Interesting
  
  You will be parallelizing, and each thread will only ever be able to use max_mem/N for its own processing.
  When you parallelize, you avoid sharing memory between threads. Your data set is split over the threads and synchronization is minimized. In a SMP/NUMA model, this is done transparently by simply avoiding to access memory that other threads are working on. In other models, you have to explicitly send the chunk of memory that each thread will be working on (through DMA, the network, an in-memory FIFO or whatever), but it doesn't change anything from a conceptual point of view.
  If your parallel decomposition is much more efficient if your data per thread is larger than 1GB, then you cannot possibly run 64 threads set up like this on the MIC platform. There is often a minimum size required for a parallel primitive to be efficient, and if that minimum size is greater than max_mem/N then you have a problem. This is the limiting factor I'm talking about.
  128 MB, however, is IMO quite large enough.
  
  In fact this is a major advantage of MIC versus GPUs.
  The advantage of MIC lies in ease of programming thanks to compatibility with existing tools and the more flexible programming model.
  Memory on GPUs is global as well, so I have no idea what you're talking about. There is also so-called "shared" memory (CUDA terminology, OpenCL is different) which is per block, but that's just some local scratch memory shared by a group of threads.
  
  There is nothing nighmarish of the above
  Please stop deforming what I'm saying. What is nightmarish is finding the optimal work distribution and scheduling of a heterogeneous or irregular system.
  Platforms like GPUs are only fit for regular problems. Most HPC applications written using OpenMP or MPI are regular as well. Whether the MIC will be able to enable good scalability of irregular problems remains to be seen, but the first applications will definitely be regular ones.
13. Re:Why so little memory? by Nite_Hawk · 2012-09-12 23:24 · Score: 2
  
  The Cynic in me says that you don't get to into the Top 5 by spending all of your budget on memory. :)
  Practically speaking there are a lot of research codes out there that are using 1GB or less of memory per core. Our systems at MSI typically had somewhere between 2-3GB of memory per core and often were only using half of their memory or less. There's a good chance that TACC has looked at the kinds of computations that would happen on the machine and determined that they don't need more.
  We had another much smaller cluster that had significantly more memory per node where we tried to push big memory people to use. They of course don't like it because they want to run on the big fancy glorious machine that gets mentioned in all of the press articles even though they aren't well suited to use it. Such is the way Academia works though.
14. Re:Why so little memory? by gentryx · 2012-09-12 23:42 · Score: 3, Informative
  
  Agreed. 2 GB/core seems to be the current agreement on almost all machines except for IBM BlueGene which has just 1 GB per core.
  
  --
  Computer simulation made easy -- LibGeoDecomp
15. Re:Why so little memory? by TheRaven64 · 2012-09-13 00:18 · Score: 1
  
  Each core has 320KB of local cache, but the 8 all share 20MB of L3 cache, so for most efficient use you either want each core to have a working set (including code) of under 2.5MB, or for them to be accessing 256KB of independent data from a shared set of 20MB. That's not quite the whole story, because there is no cache coherency problem if two cores are reading the same data.
  
  --
  I am TheRaven on Soylent News
16. Re:Why so little memory? by PCK · 2012-09-13 01:07 · Score: 1
  
  You will be parallelizing, and each thread will only ever be able to use max_mem/N for its own processing.
  When you parallelize, you avoid sharing memory between threads. Your data set is split over the threads and synchronization is minimized. In a SMP/NUMA model, this is done transparently by simply avoiding to access memory that other threads are working on. In other models, you have to explicitly send the chunk of memory that each thread will be working on (through DMA, the network, an in-memory FIFO or whatever), but it doesn't change anything from a conceptual point of view.
  If your parallel decomposition is much more efficient if your data per thread is larger than 1GB, then you cannot possibly run 64 threads set up like this on the MIC platform. There is often a minimum size required for a parallel primitive to be efficient, and if that minimum size is greater than max_mem/N then you have a problem. This is the limiting factor I'm talking about.128 MB, however, is IMO quite large enough.
  For algorithms where you have a basically regular streaming data then yes, your working data set will be mem/n but as I mentioned there are a number of problems where you have a large mainly static dataset such as raytracing or financial modeling. In these scenarios being able to access a large shared pool of memory has big advantages.
  
  In fact this is a major advantage of MIC versus GPUs.
  The advantage of MIC lies in ease of programming thanks to compatibility with existing tools and the more flexible programming model.
  Memory on GPUs is global as well, so I have no idea what you're talking about. There is also so-called "shared" memory (CUDA terminology, OpenCL is different) which is per block, but that's just some local scratch memory shared by a group of threads.
  Accessing global memory on GPUs is extremely slow and there is a strict memory heirarchy that you have to adhere to in order to get any kind of performance. This heirarchy is what makes it a pain to program for and why you need special tools and kernels in the first place. Any problem where you need random access over a large amount of data is just not feasible on GPUs.
  
  There is nothing nighmarish of the above
  Please stop deforming what I'm saying. What is nightmarish is finding the optimal work distribution and scheduling of a heterogeneous or irregular system.
  Platforms like GPUs are only fit for regular problems. Most HPC applications written using OpenMP or MPI are regular as well. Whether the MIC will be able to enable good scalability of irregular problems remains to be seen, but the first applications will definitely be regular ones.
  For those kinds of problems there is n't anything in the MIC that will set the world on fire other than the easier programming model as it basically comes down to bandwidth and FLOPS. However from what I have seen in terms of architecture there are a number of areas where it should perform nicely. FYI, if you havee n't already read it:
  http://newsroom.intel.com/servlet/JiveServlet/download/38-11511/Intel_Xeon_Phi_Hotchips_architecture_presentation.pdf
17. Re:Why so little memory? by Anonymous Coward · 2012-09-13 01:07 · Score: 0
  
  I run on other TACC clusters and other UT System clusters. Most clusters have ~24GB per node for regular memory nodes. This is more than enough to keep 8 cores busy. Actually, with 24GB per 24 core node, I can keep 20 nodes running for more than a week.
18. Re:Why so little memory? by loufoque · 2012-09-13 01:51 · Score: 1
  
  Accessing global memory on GPUs is extremely slow and there is a strict memory heirarchy that you have to adhere to in order to get any kind of performance.
  It could be seen as being the same as the CPU, except will automatically cache it to fast memory for you.
  
  Any problem where you need random access over a large amount of data is just not feasible on GPUs.
  What makes you think it would be faster with the MIC?
19. Re:Why so little memory? by loufoque · 2012-09-13 01:53 · Score: 1
  
  Seems like you simply want to make sure you parallelize on the L3 cache line boundary to avoid false sharing (same as with regular CPUs)
20. Re:Why so little memory? by afidel · 2012-09-13 01:55 · Score: 1
  
  I was taking loufoque's comment literally that you were architecturally limited to 128MB per thread which would be fairly difficult to code for.
  
  --
  There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
21. Re:Why so little memory? by PCK · 2012-09-13 03:34 · Score: 1
  
  Accessing global memory on GPUs is extremely slow and there is a strict memory heirarchy that you have to adhere to in order to get any kind of performance.
  It could be seen as being the same as the CPU, except will automatically cache it to fast memory for you.
  
  Any problem where you need random access over a large amount of data is just not feasible on GPUs.
  What makes you think it would be faster with the MIC?
  Granted it is the hardware doing the work, but you have four threads per core on Knights Corner with a cacheline miss causing a context switch masking the latencies. You also do n't have the extra overhead of your code setting up the copies between global and shared memory (which is limited to 48K on CUDA) everytime you want to access a data structure. Obviously you have many more cores on a GPU but how much performance do you think you will get once you have to jump through all the hoops and basically implement your own caching mechanisms? Ultimately GPUs are limited to simple problems where your dataset can be broken into very small peices with very little logic and simple random memory access which is fine for big number crunching problems, with MIC you atleast will have more flexibility.
22. Re:Why so little memory? by tempest69 · 2012-09-13 04:00 · Score: 1
  
  Reliability and power.. more memory is more chances for a node to have a failure, ram gets hot, and so it will need to be cooled. And my bet is that it is ECC-DDR3, as el-cheapo isn't remotely worth the price in these applications.
23. Re:Why so little memory? by Anonymous Coward · 2012-09-13 06:14 · Score: 0
  
  272 terabytes should be enough for anybody
Summary: s/tera/peta/ by gentryx · 2012-09-12 18:16 · Score: 3, Informative

The summary mentions that 2 teraflops are generated by the CPUs while 8 are generated by the Knights Bridge chips. It should say petaflops.

--
Computer simulation made easy -- LibGeoDecomp
Time For A New Supercomputer Metric by Jane+Q.+Public · 2012-09-12 18:20 · Score: 3, Insightful

"Petaflops" is not representative of the power of modern supercomputers, many of which use massively parallel integer processing to perform their duties. Sure, you can say that simulating floating point operations with the integer units amounts to the same thing, but it actually doesn't. We have discovered that there are a great many real-world problems for which parallel integer math works just fine, or even better (more efficient) than floating point. And for those, flops is a completely meaningless metric.

We need a standard that actually makes sense.
1. Re:Time For A New Supercomputer Metric by loufoque · 2012-09-12 18:39 · Score: 1
  
  Flops had always been a useless metric. If you want good metrics, look at the instruction reference with the speed in cycles of each instruction, its latency, its pipelining capabilities, the processor frequency, and cross it all with the number of cores and memory and cache interconnect specifications.
  Flops are just a number that give a value for a single dumb computation in the ideal case ; real computations can be up to 100 times slower than that.
2. Re:Time For A New Supercomputer Metric by afidel · 2012-09-12 19:01 · Score: 2
  
  Well, as far as achievable computation, that's why Linkpack reports Rmax and Rpeak, however the one big area where Linpack is lacking as a measurement stick for many real workloads is its small communications overhead, it's much easier to achieve high utilization on Linpack then it is for many other workloads.
  
  --
  There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
3. Re:Time For A New Supercomputer Metric by gentryx · 2012-09-13 00:02 · Score: 1
  
  ...sort of. And whoever rated the parent "insightful" apparently has little insight into HPC and supercomputing. Interesting might have been appropriate.
  First off, any metric which yields a single number is bound to be misleading as it is easy to find two applications a and b where a runs faster than b on machine 1 and slower than b on machine 2. Bit since we want such a simple metric, we might just as well settle for the one we already have. Why flops? Because applications use them. I know that the calls on conferences for fixed point logic (more or less integer arithmetic) are getting louder as you can actually prove that you can safe some power (fixed point needs less transistors), but simultaneously users prefer floating point because it's much easier to prove numerical stability with floating point numbers. And correctness always trumps.
  
  --
  Computer simulation made easy -- LibGeoDecomp
4. Re:Time For A New Supercomputer Metric by Jane+Q.+Public · 2012-09-13 00:47 · Score: 1
  
  "First off, any metric which yields a single number is bound to be misleading as it is easy to find two applications a and b where a runs faster than b on machine 1 and slower than b on machine 2."
  Part of the point I was making.
  
  "it's much easier to prove numerical stability with floating point numbers"
  False.
5. Re:Time For A New Supercomputer Metric by Jane+Q.+Public · 2012-09-13 00:50 · Score: 1
  
  To elaborate on the latter:
  
  Floating-point arithmetic in processors is fraught with errors (such as rounding errors) and has quite often turned out to contain very significant bugs. Integer math simply does not have those problems. If you want "numerical stability", you need to stay away from floating-point in hardware.
6. Re:Time For A New Supercomputer Metric by gentryx · 2012-09-13 00:59 · Score: 1
  
  Numerical stability is not the same as exactness, which you are referring to. Exactness is something we can never achieve (just think of irrational numbers: impossible to store all digits). So we have to clip numbers and resort to rounding, which introduces errors. When caring for numerical stability one usually tries to prove that the errors introduced by the imperfect representation of numbers in computers is less than an acceptable limit "foo". And these proofs are simpler for floating points arithmetics.
  
  --
  Computer simulation made easy -- LibGeoDecomp
7. Re:Time For A New Supercomputer Metric by timeOday · 2012-09-13 01:07 · Score: 1
  
  Of course the Top500 uses Linpack anyways, not flops.
  It would be interesting to see how well flops and the Linpack score correlate across the members of the Top500. My guess would be that they correlate pretty well, just because flops is never used as a serious benchmark so nobody bothers gaming it. But I could certainly be wrong.
8. Re:Time For A New Supercomputer Metric by gentryx · 2012-09-13 01:08 · Score: 1
  
  "First off, any metric which yields a single number is bound to be misleading as it is easy to find two applications a and b where a runs faster than b on machine 1 and slower than b on machine 2."
  Part of the point I was making.
  Sure, but apparently people still want such a metric, however imperfect and misleading it may be.
  
  "it's much easier to prove numerical stability with floating point numbers"
  False.
  How eloquent. Simple example: I have two numbers a and b, approximately of the same size, with their LSB being tainted by rounding errors. If I add them on a floating point machine the last two bits of the mantissa will be tainted, but because during normalization the mantissa will be cut we end up with again just the LSB beint tainted.
  On a fixed point machine however adding both numbers will either result in an overflow or in two bits being tainted. And so on. Care to disprove me?
  
  --
  Computer simulation made easy -- LibGeoDecomp
9. Re:Time For A New Supercomputer Metric by Anonymous Coward · 2012-09-13 01:12 · Score: 0
  
  You don't seem to be making sense. If you're representing integer values with integer data types, as long as you don't overflow, your representation IS perfect, and does not introduce ANY errors.
10. Re:Time For A New Supercomputer Metric by gentryx · 2012-09-13 01:29 · Score: 1
  
  I'm talking about fixed-point numbers, which are (almost) the same as integers in respect to the logic, but keep in mind the decimal point, which is is fixed (hence the name) to a certain place. Who in his right set of mind would propose to using pure integer arithmetic? You need a way to represent, say, 1.337.
  
  --
  Computer simulation made easy -- LibGeoDecomp
11. Re:Time For A New Supercomputer Metric by Jane+Q.+Public · 2012-09-13 08:02 · Score: 1
  
  "On a fixed point machine however adding both numbers will either result in an overflow or in two bits being tainted. And so on. Care to disprove me? "
  I would not attempt to try to "disprove" you on Slashdot. But I will argue with some of your assumptions.
  
  First, you can do "floating point" math using scaled integers of a size to represent the number of decimal points you desire. But the integer math is not subject to either the speed limitations, or the bugs that have been not just known but fairly common in fp hardware. Sure, you still get rounding errors, but you get those anyway. But they aren't the only kind of errors that occur with floating-point. They ARE the only kind of errors you get with integer math.
12. Re:Time For A New Supercomputer Metric by Jane+Q.+Public · 2012-09-13 08:05 · Score: 1
  
  I, too, was referring to fixed-point numbers. My error for saying "integer", when fixed-point is what I meant.
  
  No, I was not referring to exactness.
13. Re:Time For A New Supercomputer Metric by Jane+Q.+Public · 2012-09-13 08:48 · Score: 1
  
  I should clarify, since at one point in this thread I was talking about fixed-point but improperly calling them integers, and in another part actually talking about integers.
  
  Apologies for any confusion. It was my fault but not intentional.
  
  The solution to overflow in fixed point is to use scaled integers.
  
  I know of no way to absolutely avoid rounding errors, except to simply use more digits than the significant digits you require, and even that is simply a probability game; you can't absolutely avoid them but you can reduce the frequency.
GDDR5 by PCK · 2012-09-12 18:21 · Score: 2

The Knights Corner chips use GDDR5 memory, bandwidth is a big problem when you have 50+ cores to feed.
Ah, memories! by hyades1 · 2012-09-12 18:27 · Score: 4, Funny

This reminds me of an old science fiction story. The designers, builders and programmers assemble. The Switch is flipped. The computer boots. The first question they ask is, "Is there a God?" The machine hums away for a few seconds, then arc welds the power switch open and responds, "There is now!"

--
I've calculated my velocity with such exquisite precision that I have no idea where I am.
1. Re:Ah, memories! by Anonymous Coward · 2012-09-12 19:12 · Score: 0
  
  But can it "play" Quake?
2. Re:Ah, memories! by RicktheBrick · 2012-09-13 00:37 · Score: 1
  
  I can just see on January 8th someone coming in and asking "So what have you invented yet?" So with 864,000 petaflops this is all that you have come up with? IBM's latest supercomputer does 2 gigaflops per watt. I wonder if this one does better. I really think this is the future of computing since they are capable of doing computing at a far less electrical cost than the average home computer. So any problem that requires a series amount of computing would be sent to a supercomputer with the home computer/phone required to just do trivial matters.
3. Re:Ah, memories! by pitchpipe · 2012-09-13 06:48 · Score: 1
  
  The machine hums away for a few seconds, then arc welds the power switch open and responds, "There is now!"
  And then abruptly died as its capacitors drain because it didn't know the difference between an open switch and a closed switch.
  
  --
  Look where all this talking got us, baby.
Ooops. by PCK · 2012-09-12 18:31 · Score: 2

Ooops, scratch that miss-read the summary. There probably is n't a need for that much memory because the kind of problems they are most likely to be dealing with will have massive datasets that don't fit in memory anyway. The limiting factory will be CPU and node interconnect bandwidth so adding extra memory wont make much if any difference to performance.
1. Re:Ooops. by hairyfeet · 2012-09-13 07:29 · Score: 1
  
  Uhh..haven't looked at the new AMD monster racks designed to function as a single unit thanks to the tech they got from the SeaMicro purchase. We are talking 2048 cores and 16Tb of RAM in a single rack, with THAT much space if you got the cash you should be able to fit your large dataset into RAM and run it from there.
  Man I don't want to even think about what the electric bill for something THAT powerful would be though.
  
  --
  ACs don't waste your time replying, your posts are never seen by me.
2. Re:Ooops. by Crosshair84 · 2012-09-13 09:31 · Score: 1
  
  Someone is gonna run one in their house, for whatever reason, and have the DEA bust down their door because they think they are running a grow op via the power bill.
3. Re:Ooops. by hairyfeet · 2012-09-13 21:25 · Score: 1
  
  I doubt it because grow lights cycle, which means one look at their power bill, which they can get via court order, will show a start/stop cycle that sticks out like a sore thumb.
  These monsters on the other hand will be blowing through power like a drunk hitting a minibar so it'll be pretty obvious whatever they are running isn't grow lights.
  As far as running it at home? Maybe if you were an "At Home" nut and wanted to get the top spot on the leaderboards but most guys I know have been going the opposite way, with high electricity its often better to use low power AMD and Intel chips whenever possible. Hell that's why I use an old 754 Sempron in the shop as a downloader and nettop, the thing sucks practically nothing while still giving me enough computing power to do everyday tasks. I've had a lot of customers have me build them E350 units for the same reason, make a great downloader and office box while generating almost no heat and sucking less than 16w under load.
  BTW if you haven't tried one of the E350/450 units you really should Crosshair, those things are sweet! You can get the boards for less than $80 on sale, they take almost no power, can easily be dropped into an HTPC style SFF case, no heat, and still have enough power that they make pretty nice HTPCs or office boxes. The trick with 'em is to give them the 1333MHz or better memory since the APU uses system memory for the GPU but for the price they make really great low cost/low power systems.
  
  --
  ACs don't waste your time replying, your posts are never seen by me.
Umm, No. by slew · 2012-09-12 18:52 · Score: 2

I'm pretty sure you are mistaken on this point.
Most modern supercomputers get their "flop" count from SSE3/4 and/or GPUs which are not integer, but Floating point processing machines(at least 32-bit single precision fp, but also double precision albeit at a slower rate). These machines most certainly do NOT simulate floating point with their integer units (nor cheat by calling an integer op as an approximate fp op), and they have massive amounts of dedicated hardware SIMD FP processing units to do their heavy lifting.
Of course there are many real world problems that could use parallel integer math and CPUs and GPUs are also capable of lots of SIMD integer ops as well, but that's not how supercomputers are rated these days, they are rated by the number of IEEE FP operations (mostly FMA or fused multipy-add counting as 2-ops) with at least 32-bits of precision.
The integer OPs currently don't count in the current ratings and I don't see that changing any time soon. Important scientific operations like matrix inversion, finite-element analysis, FFTs, and linear programming don't work the same with integer ops, so it is unfair to compare supercomputers by their integer ops.
1. Re:Umm, No. by Jane+Q.+Public · 2012-09-13 00:44 · Score: 1
  
  "These machines most certainly do NOT simulate floating point with their integer units (nor cheat by calling an integer op as an approximate fp op), and they have massive amounts of dedicated hardware SIMD FP processing units to do their heavy lifting."
  I did not say that they did. I said that you could consider integer units that emulated fp hardware to be doing flops. I did not state that this is the usual case.
  
  "... they are rated by the number of IEEE FP operations..."
  I KNOW that... my point is that it probably is not an appropriate rating these days. Not representative of many real-world problems.
  
  "The integer OPs currently don't count in the current ratings and I don't see that changing any time soon."
  Well, thanks for repeating pretty much what I already said.
2. Re:Umm, No. by serviscope_minor · 2012-09-13 03:07 · Score: 1
  
  I KNOW that... my point is that it probably is not an appropriate rating these days. Not representative of many real-world problems.
  What real world problems are you thinking of? Most of the big super computer problems are focussed on scientific simulation of some sort which is very floating point heavy.
  Going through the top 5 on wikipedia, of the applicaitons mentioned, all are floating point.
  Besides vector units like SSE can churn through either integer or FP instructions at about the same rate and throughput, so the FP performance will be somewhat similar to equally regular integer performance as well.
  
  --
  SJW n. One who posts facts.
3. Re:Umm, No. by slew · 2012-09-13 11:55 · Score: 1
  
  Okay, can you tell me what the following statement means to you?
  
  I said that you could consider integer units that emulated fp hardware to be doing flops.
  1. I don't see any supercomputers emulating fp hardware on integer units...
  2. Even if they did (which they don't), it would be so slow that it would be a rounding error in their flop rating.
  As I (and others have posted), although there are some interesting integer problems, existing supercomputers issue those instructions to processing units that are essentially the same speed as FP units, so there's not much difference between a FLOP and the IOP number of these machines, and the scientific code that require floating point are much more interesting to the current buyers of those machines. That's why they don't rate them in a different dimension.
  It would be like suggesting people rate cars by Refueling Range and 60-90 time vs MPG and 0-60 time. Sure for some applications you would like to know how far you can go on a tank of gas and how fast you can pass people on the highway, but most people buying cars today simply want to know if they can reasonably accelerate to merge on a highway and how much gas will cost them for commuting. For people that care more about the other stuff, they can look deeper into the numbers for that specific car, but they've gotta know that since most people aren't using their criteria, car manufactures aren't optimizing for them and likely won't be for the forseeable future.
What! by aussie.virologist · 2012-09-12 19:14 · Score: 1

No Lego!
Yeah but by Anonymous Coward · 2012-09-12 20:28 · Score: 0

how does it rank in the TOP100? Or the TOP1000?
1. Re:Yeah but by Sique · 2012-09-12 21:44 · Score: 1
  
  It would clock in at rank 3 or 4, because the current rank 3 has 10 petaFLOPS Rpeak, (which isn't an Intel System, but POWER based), and the rank 4 is currently an Intel system at 3 petaFLOPs Rpeak.
  
  --
  .sig: Sique *sigh*
But can I play doom on it? by negativeduck · 2012-09-12 21:04 · Score: 1

yea will it run Doom!
O well... by jbeaupre · 2012-09-12 21:58 · Score: 1

I'm seriously bothered by the fact they couldn't figure out how to put an O at the end of the acronym.

--
The world is made by those who show up for the job.
Most important question by Anonymous Coward · 2012-09-12 22:54 · Score: 0

The summary is poorly written. It doesn't answer the question most Slashdot users are dying to know about supercomputers. Does it run Linux?
First Projects by Anonymous Coward · 2012-09-13 00:09 · Score: 1

The chief science official of Texas has divined that the computational projects will be:
1. Derive a proof that the universe is only 5000-6000 years old.
2. Derive a proof that God is a silver haired, white man from Texas.
1. Re:First Projects by Anonymous Coward · 2012-09-13 02:13 · Score: 0
  
  The chief science official of Texas has divined that the computational projects will be:
  1. Derive a proof that the universe is only 5000-6000 years old.
  2. Derive a proof that God is a silver haired, white man from Texas.
  3. Derive a proof the Anonymous Coward exists.
2. Re:First Projects by Anonymous Coward · 2012-09-13 17:01 · Score: 0
  
  As I live and breath, i.e., SELF evident.
Just how many, Earl? by myowntrueself · 2012-09-13 02:01 · Score: 1

Just how many supercomputers are required for a stampede, Earl? I mean, is it like three or more? Is there a minimum speed?

--
In the free world the media isn't government run; the government is media run.
Raspberry Pi Comparison by Anonymous Coward · 2012-09-13 02:03 · Score: 0

I heard about a new raspberry pi supercomputer here yesterday, which is more powerful? Which one makes more bitcoins per MPAA lawsuit? Can I get a Beowulf cluster of these in soviet Russia? or in soviet russia do bitcoints cluster my beowulf?
Disappointed by 93+Escort+Wagon · 2012-09-13 05:37 · Score: 1

I got excited for a couple seconds, I thought it was talking about a "Taco Stampede".

--
#DeleteChrome