GPU Supercomputer Could Crunch Exabyte of Data Daily For Square Kilometer Array

← Back to Stories (view on slashdot.org)

GPU Supercomputer Could Crunch Exabyte of Data Daily For Square Kilometer Array

Posted by Soulskill on Saturday August 4, 2012 @02:40AM from the maybe-they-should-process-it-instead dept.

An anonymous reader writes "Researchers on the Square Kilometer Array project to build the world's largest radio telescope believe that a GPU cluster could be suited to stitching together the more than an exabyte of data that will be gathered by the telescope each day after its completion in 2024. One of the project heads said that graphics cards could be cut out for the job because of their high I/O and core count, adding that a conventional CPU-based supercomputer doesn't have the necessary I/O bandwidth to do the work."

40 comments

Min score:

Reason:

Sort:

Oh that's nothing by Anonymous Coward · 2012-08-04 02:49 · Score: 0

I could beat that with my VIC-20.
1. Re:Oh that's nothing by Anonymous Coward · 2012-08-04 06:35 · Score: 0
  
  TYPE sqka(200000007) | /dev/null
  Even TYPEing the output from the Sq. Km. Array through /dev/null would take years on that machine. I was going to do a rough calculation, but I'm too lazy.
Computations by girlintrainingpants · 2012-08-04 02:51 · Score: 1

Interesting but not surprising that they are looking at the GPU route. The fine article doesn't explain enough though. Does anyone know exactly what they are trying to process?
1. Re:Computations by Anonymous Coward · 2012-08-04 02:54 · Score: 0
  
  Interesting but not surprising that they are looking at the GPU route. The fine article doesn't explain enough though. Does anyone know exactly what they are trying to process?
  perhaps "stitching together the more than an exabyte of data that will be gathered by the telescope each day" was a clue
2. Re:Computations by Tony2Heads · 2012-08-04 02:56 · Score: 1
  
  Lots of 3D (fast) Fourier transforms
3. Re:Computations by girlintrainingpants · 2012-08-04 02:59 · Score: 3, Interesting
  
  Lots of 3D (fast) Fourier transforms
  Again, interesting, we do a lot of this at work. Complex 3D FFT transforms. I write my plan and processing code using CUFFT. I'm curious as to whether they'd be using fully custom code for such a large computer. We're only using 8x Tesla cards at work.
4. Re:Computations by girlintrainingpants · 2012-08-04 03:10 · Score: 1
  
  No, coming from a computer scientist, that is too vague for words.
5. Re:Computations by loufoque · 2012-08-04 03:18 · Score: 1
  
  There is better than cufft, especially when multi gpu is involved
6. Re:Computations by jkflying · 2012-08-04 04:48 · Score: 1
  
  If you are just doing FFT, a single FPGA is even better than many GPUs.
  
  --
  Help I am stuck in a signature factory!
7. Re:Computations by GumphMaster · 2012-08-04 15:18 · Score: 4, Informative
  
  The SKA will have digitised signals coming from one or more receiving heads and radio receivers mounted on each of the 3000 radio telescopes that form the array. There's a massive amount of data that that needs to be time correlated to within a nanosecond or so (over transmission distances > 1000km), corrected for known system distortions, subject to beam forming, corrected for rotation and atmospheric effects, passed through Fourier analysis, analysed for polarisation, filtered, binned, summarised and stored in useful ways. Some of the tasks need to be done in real time, others can wait. Some of those tasks are heavy on the floating point work and easy to parallelise. Much can be done with dedicated hardware but that is much less flexible over the longer term than a programmable device.
  
  --
  Patent litigation: A doctrine of Mutually Assured Destruction... in which everyone seems willing to push the button
2024 by Hans+Lehmann · 2012-08-04 02:53 · Score: 1, Insightful

They're not going to even start collecting data for another 12 years, yet they're basing their hardware estimates on what's available today. Compare today's GPUs with those made 12 years ago. I'm guessing they'll be able to crunch their data in 2024 by just using a video game console.

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
1. Re:2024 by epiphani · 2012-08-04 03:15 · Score: 2
  
  Read the paper. They make rather optimistic assumptions about Moore's law.
  
  --
  .
2. Re:2024 by epiphani · 2012-08-04 03:17 · Score: 4, Informative
  
  And for good measure, now the actual paper:
  http://www.skatelescope.org/uploaded/31235_139_Memo_Ford.pdf
  Funny thing, I was reading this last night.
  
  --
  .
Not well explained by EyeSavant · 2012-08-04 03:11 · Score: 5, Informative

I guess they did not get anyone that technical to write that article or the summary.
For I/O I guess they mean memory bandwidth. GPUs have a LOT of memory bandwidth from their cache memory, the problem is that they sit at the end of a PCIe bus from the CPU and the CPU has to handle most of the book keeping (and the actual IO, i.e. taking data from an external source).
So what is important is the compute density i.e. how much computation you do for each piece of data. Getting stuff into the GPU is slow, getting stuff out is slow, but doing stuff on the data is very very fast (because you have so many compute units and so much memory bandwidth).
That is also the way they are programmed, with the main code running on the CPU, and then the kernals getting launched on the GPU with explicit or implict transfer of data from the CPU memory to the GPU memory and back again.
I do have high hopes for stuff like Fusion ( http://en.wikipedia.org/wiki/AMD_Fusion ) which gets rid of the PCIe bus, and make it a lot easier to get data to the GPU cores and back again.
And if you are going to mention GPU machines, why not mention titan ? ( http://www.olcf.ornl.gov/computing-resources/titan/ )
GPU perfect for image analysis by DishpanMan · 2012-08-04 03:12 · Score: 4, Interesting

We use GPU cards for computed tomography, and large reconstructions went from taking days, to hours to minutes. OpenCL should be mature in 12 years so they can go with that instead of CUDA, and by then GPGPU computing will probably be using the hybrid APU chips that AMD is starting to market. The bandwidth on the Tesla cards right now is the bottleneck as the PCI bus transfer speeds can cause huge wait times for large data sets. Plus even the biggest Tesla cards only have 4GB or on-board memory, which is not enough. I'd rather have the chips be on-board and have direct access to 512GB of ram for large data sets. Although I can't wit for the Kepler chips to come out, they'll probably reduce computation times by another factor of 3 for our image processing problems.
1. Re:GPU perfect for image analysis by loufoque · 2012-08-04 03:25 · Score: 1
  
  A gpu is at most 20 times faster than a cpu while costing 4 times as much. If your code is so much faster on gpu, it's just because your cpu version was crap and not optimized.
2. Re:GPU perfect for image analysis by burisch_research · 2012-08-04 03:46 · Score: 2
  
  You have no idea what you are talking about.
  
  --
  char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
3. Re:GPU perfect for image analysis by Anonymous Coward · 2012-08-04 03:59 · Score: 0
  
  Where is this 20 times coming from?
  I agree that in many cases the speedup is measured completely incorrectly and by comparing a super optimized GPU version to standard gcc compiled CPU version without optimizations enabled. But I haven't seen this 20 number before.
4. Re:GPU perfect for image analysis by Anonymous Coward · 2012-08-04 04:02 · Score: 0
  
  20 times faster and 4 times more expensive = 5 times the performance per cost.
  It would be dumb to use the CPUs as for your quotes to get the performance of one lets say $1000 GPU, you'd need 20 CPUs that cost $250 each... and I'm pretty sure $250 * 20 is a lot more than $1000... (in case you didn't figure that out its $$5000 for the CPU setup)
5. Re:GPU perfect for image analysis by raddan · 2012-08-04 04:05 · Score: 2
  
  Exactly. The degree of parallelism (i.e., the number of independent compute cores) is much higher in a GPU. Having optimized code on a CPU has nothing to do with it. That said, GPUs are extremely limited devices, and they only work well for parallel jobs that operate in lockstep, so if you need asynchrony, traditional parallelism on CPUs is the way to go.
6. Re:GPU perfect for image analysis by Anonymous Coward · 2012-08-04 04:18 · Score: 0
  
  OpenCL should be mature in 12 years so they can go with that instead of CUDA,
  
  They can use the standard OpenMP at that time for the GPU accelerated code. The OpenMP standard will soon have a new version supporting GPU acceleration, among others.
7. Re:GPU perfect for image analysis by Anonymous Coward · 2012-08-04 04:37 · Score: 0
  
  Or, it's because you have a problem very well suited to the SIMD nature of GPGPU programming.
8. Re:GPU perfect for image analysis by glueball · 2012-08-04 05:03 · Score: 2
  
  Yawn. GPU's are good for CT because you have a medium amount of data and a lot of processing to compensate for crap detectors and low ionizing radiation levels. You're staking the future of your CT on AMD? Good luck with that. I had people from one of the big 3 CT vendors evangelizing to me about the GPU and AMD. My group chose not to work with them because they were being stubborn on their religious choices of product and not on solving the problems. That and stealing some of our IP.
  For an array, and people panicking over FFTs, a CPU today can keep up with 1D FFT data on a PCIe bus and some 2D FFTs. So using a GPU for a 1D FFTs is not necessary. But what about the gigantic 3D FFT? A GPU can't fit the 3D data set in memory without non-commercial hardware. So the GPU needs to pull data in through the bus. Multiple times. That is hardly efficient use of the GPU.
  Someday, maybe a GPU will be available with a huge amount of memory but I'm just not seeing it. Even with a fully rendered lighting model, forward projected, there's just not much memory that will be needed for a commercial product. GPU's have an end-life trajectory. We're not that far away from it.
  So what about a custom GPU for this project? If you're going to go custom, there's probably a lot better compute-memory-interconnect choices you can make than using a GPU.
9. Re:GPU perfect for image analysis by PhamTrinli · 2012-08-04 06:32 · Score: 1
  
  Parent is probably comparing performance in FLOPS (i.e. max performance when using fully parallelizable code not limited by i/o etc.). Assuming CPU is always more efficient than GPU (in terms of percent of this theoretical max performance that is achieved), this gives an upper limit on how much improvement a GPU can give.
10. Re:GPU perfect for image analysis by Anonymous Coward · 2012-08-04 07:56 · Score: 5, Informative
  
  You are the one without a clue what you are talking about. Let's look at the fastest shipping devices from Intel and Nvidia.
  Intel SandyBridge CPU (8c 2.6GHz) has a peak compute of 166 DP GFLOPS, peak memory bandwidth of 51.2GB/s.
  GF110 based tesla has peak compute of 666 DP GFLOPS, peak memory bandwidth of 177.4GB/s.
  It has 4 times the raw DP compute, and 3.5x the raw memory bandwidth.
  Now this is best case for the GPU. In reality, they tend to have far lower efficiency (actual versus peak) numbers for a few reasons. Firstly, they are harder to program and need to be driven by a CPU. Secondly, they require far higher parallelism which can fall afoul of Amdahl's law. Thirdly, they have limited memory capacities and relatively slow and high latency PCI connection to main memory which must be used to copy data from and copy results back to. Fourthly, the SandyBridge CPU has far greater capabilities to extract performance, it is aggressively out of order, and has several levels of large fast caches.
  Look at the numbers on top500 supercomputers. Linpack (which is very easy and incredibly parallel, i.e., a great case for GPUs). The top Xeon result achieves 91% efficiency. The top NVIDIA result got 54.5%.
  So in a *real* workload when comparing a properly optimized CPU implementation with an optimized GPU implementation, you would be very lucky to see a 4x increase with the GPU. Very lucky indeed. Somewhere around 2-3x would be more typical. No matter how much you stick your head in the sand, you can't get away from the reality of these numbers.
  Now there are some other cases where fixed function units on the GPU have been used to provide a larger speedup. That's all well and good, but it tends to be rather limited. It may be akin to comparing a load using the CPU's encryption or random number acceleration functions.
  Here is some further reading if you're interested.
  www.cs.utexas.edu/users/ckkim/papers/isca10_ckkim.pdf
  www.realworldtech.com/compute-efficiency-2012/
  top500.org
11. Re:GPU perfect for image analysis by DishpanMan · 2012-08-04 07:57 · Score: 1
  
  A gpu is at most 20 times faster than a cpu while costing 4 times as much. If your code is so much faster on gpu, it's just because your cpu version was crap and not optimized.
  Yes, that's why every industrial and medical CT system comes with GPU reconstruction routines unlike 5 years ago. But don't let that little fact stop you from your ignorant post. Please do write your efficient CPU based reconstruction code for your custom hardware, and sell it for cheaper than others in the industry. I would buy it and you would make money. But alas, you have no clue or experience about this topic because you have no idea what a filtered backprojection is, or how to write CUDA code, nor the commercial market for this kind of computing. I suggest some more schooling and getting past "hello world" in more than one programming language for different hardware before you state idiotic posts again.
12. Re:GPU perfect for image analysis by RicktheBrick · 2012-08-04 08:28 · Score: 1
  
  I do volunteer work for World Community Grid. IBM started this project to help in the cure of several diseases. They so far do not have a project which uses Cuda or a GPU. They are doing a huge number of results lately(about 140,000 a day). I would think that they would have the incentive to do that work in the most efficient manner. But since there is still no use for the GPU, I guess IBM does not consider it cost worthy yet. Now for a little bragging on my part. One can join a team and I have joined the slashdot user team which is about 27th position in results returned. Well I am number one in results returned in that team. I would think that slashdot user would have better equipment than the average user so being number one in results returned for that team means a lot to me anyway.
13. Re:GPU perfect for image analysis by loufoque · 2012-08-04 09:44 · Score: 3, Interesting
  
  Yes, that's why every industrial and medical CT system comes with GPU reconstruction routines unlike 5 years ago.
  I didn't say GPUs were not faster, I said they were not as much faster as people claimed.
  The OP said that his code went from taking days to taking minutes. That's an acceleration of the order of several thousand times. A GPU is simply not that much faster than a CPU. If it was made so much faster, it's simply that it was rewritten by competent people that knew how to make it fast, while the original CPU version was not.
  
  your ignorant post
  
  alas, you have no clue or experience about this topic
  Just so you know, I am the CEO of a company that edits compilers and libraries for parallel computing. We mostly work in two industries : image/video/multimedia/computer vision (a bit of medical imaging too) and banking/insurance/financial. Our people are seasoned computer architecture experts, many of which also sport a phd in various fields, including mathematics, robotics, and computer science. We have strong partnerships not only with NVIDIA, but also with AMD and Intel, which give us future products for evaluation. I myself contribute to the evolution of parallel programming in C++ as an HPC expert at the C++ standards committee.
  If you feel like you'd want to apply for some consulting to have us help you improve the performance of your filtered backprojection -- I myself have no knowledge of that field, but I assume it's similar to tomography for which we have good results already deployed in the industry --, I'm sure our team would be delighted to help you.
14. Re:GPU perfect for image analysis by loufoque · 2012-08-04 09:49 · Score: 1
  
  This is a very insightful post, too bad it is not rated higher.
And I'll have 32,000 cores to spare. by Impy+the+Impiuos+Imp · 2012-08-04 03:30 · Score: 1

> believe that a GPU cluster could be suited to stitching together the more than
> an exabyte of data that will be gathered by the telescope each day after its
> completion in 2024
Nah, I'll let you use an app on my Galaxy S12

--
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
That's amazing by Kimomaru · 2012-08-04 04:13 · Score: 0

They've finally created a computer that can run Crysis at 60 fps.
so little scientific progress in the last 30 years by Anonymous Coward · 2012-08-04 04:24 · Score: 0

Thanks, Reagan+Thatcher. Now academics basically jack around with extremely expensive and utterly useless crap.
I miss the days when scientists applied their brains rather than their grant money.
Faux Supercomputers by rbmyers · 2012-08-04 04:46 · Score: 1

"One of the project heads said that graphics cards could be cut out for the job because of their high I/O and core count, adding that a conventional CPU-based supercomputer doesn't have the necessary I/O bandwidth to do the work." And maybe one of these days even the national labs will realize that billions and billions of CPU's that can barely talk to one another do not a supercomputer make.
GPU I/O bandwidth? by saratchandra · 2012-08-04 05:27 · Score: 3, Informative

Give me a break.

a conventional CPU-based supercomputer doesn't have the necessary I/O bandwidth to do the work.
I work in HPC and the trend is towards heterogeneous architectures ( CPU+accelerators). Moore's law, power requirements and economics are dictating that trend. It's definitely a stretch to claim that you get better I/O bandwidth with GPUs. Even with PCI Gen 3, the effective bandwidth you get per CPU core is greater than that of an 'equivalent' GPU core.
1. Re:GPU I/O bandwidth? by rbmyers · 2012-08-04 09:49 · Score: 1
  
  Yes, and the pervasive use of GPU's will actually make the flops/byte bisection bandwidth problem worse. Actually, almost no one talks about bisection bandwidth any longer because the numbers are already embarrassingly small.
2. Re:GPU I/O bandwidth? by rbmyers · 2012-08-04 11:08 · Score: 1
  
  The previous post should have said bytes/flop, but it hardly matters. I've had the conversation with the people who matter, and flops are just a helluva lot cheaper than bytes per second, and no one is the wiser when the Top 500 list comes out.
Old days by Anonymous Coward · 2012-08-04 05:42 · Score: 0

Remember those old tape fridge days where data buses where 256bit and could transmit stuff faster than they could process them?
Why is this even being discussed 12 yours out? by Anonymous Coward · 2012-08-04 09:42 · Score: 0

Twelve years is a long time for the state of the art to change, GPUs being a perfect example. Let's have this discussion again in ten years...
Probably not by Xhris · 2012-08-04 12:37 · Score: 1

GPUs have small memory footprints. SKA will be processing HUGE images and data sets. And the image creation cannot be broken up into discrete independent chunks. So the I/O between GPUs is a real problem. Obviously CPUs have the same problem as the on chip memory is (relatively) tiny, but they are designed to pull on the much larger system memory which should be adequate.
Image analysis may well be a different kettle of fish.