Slashdot Mirror


Harvard/MIT Student Creates GPU Database, Hacker-Style

First time accepted submitter IamIanB writes "Harvard Middle Eastern Studies student Todd Mostak's first tangle with big data didn't go well; trying to process and map 40 million geolocated tweets from the Arab Spring uprising took days. So while taking a database course across town at MIT, he developed a massively parallel database that uses GeForce Titan GPUs to do the data processing. The system sees 70x performance increases over CPU-based systems, and can out crunch a 1000 node MapReduce cluster, in some cases. All for around $5,000 worth of hardware. Mostak plans to release the system under an open source license; you can play with a data set of 125 million tweets hosted at Harvard's WorldMap and see the millisecond response time." I seem to recall a dedicated database query processor that worked by having a few hundred really small processors that was integrated with INGRES in the '80s.

135 comments

  1. Two thoughts based on this story by Anonymous Coward · · Score: 5, Interesting

    1. Facebook would like to have a discussion with him.
    2. The FBI would like to have a discussion with him.

    1. Re:Two thoughts based on this story by Anonymous Coward · · Score: 0

      3. Facebook and the FBI realize that they have the same goal of having a discussion with him and integrate. The new entity is called the FB.

    2. Re:Two thoughts based on this story by Anonymous Coward · · Score: 2, Informative

      Drop the "the", just FB, it's cleaner

    3. Re:Two thoughts based on this story by Jawnn · · Score: 2

      1. Facebook would like to have a discussion with him.
      2. The FBI would like to have a discussion with him.

      Sadly, I offer one more thought...

      3. Some patent attorney from East Texas would like to have a discussion with him.

  2. I'm not a computer scientist, and... by Anonymous Coward · · Score: 1

    I want to know why GPUs are so much better at some tasks than CPUs? And, why aren't they used more often if they are orders of magnitude faster?

    Thanks.

    1. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 4, Insightful

      Sprinters can run really fast. So, if speed is important in other sports, why aren't the other sports full of sprinters? Because being good at one thing doesn't mean you're well-suited to do everything. A sprinter who can't throw a ball is going to be terrible at a lot of sports.

    2. Re:I'm not a computer scientist, and... by gubon13 · · Score: 5, Informative

      Sort of a lazy effort on my part to not summarize, but here's a great explanation: https://en.bitcoin.it/wiki/Why_a_GPU_mines_faster_than_a_CPU.

    3. Re:I'm not a computer scientist, and... by PhamNguyen · · Score: 5, Informative

      GPUs are much faster for code that can be parallelized (basically this means having many cores doing the same thing, but on different data). However there is a signficant complexity in isolating hte parts of the code that can be done in parallel. Additionally, there is a cost to moving data to the GPU's memory, and also from the GPU memory to the GPU cores. CPU's on the other hand, have a cache architecture that means that much of the time, memory access is extremely fast.

      Given progress in the last 10 years, the set of algorithms that can be parallelized is very large. So the GPU advantage should be overwhelming. The main issue is that the complexity writing a program that does things on the GPU is much higher.

    4. Re:I'm not a computer scientist, and... by gatkinso · · Score: 4, Informative

      This is a gross simplification, glossing over the details and not correct in some aspects... but close enough.

      SIMD - single instruction multiple data. If you have thousands or millions of elements/records/whatever that all require the exact same processing (gee, say like a bunch of polygons being rotated x radians perhaps????) then this data can all be arranged into a bitmap and loaded onto the GPU at once. The GPU then performs the same operation on your data elements simultaneously (simplification). You then yank off the resultant bitmap and off you go. CPU arranges data, loads and unloads the data. GPU crunches it.

      A CPU would have to operate on each of these elements serially.

      Think of it this way - you are making pennies. GPU takes a big sheet of copper and stamps out 10000 pennies at a time. CPU takes a ribbon of copper and stamps out 1 penny at a time... but each iteration of the CPU is much faster than each iteration of the GPU. Perhaps the CPU can perform 7000 cycles per second, but the GPU can only perform 1 cycle per second. At the end of that second... the GPU produced 3000 more pennies than the CPU.

      Some problem sets are not SIMD in nature. Lot's of branhcing or relienace on the value of neighboring elements. This will slow the GPU processing down insanely. FPGA is far better (and more expensive, and more difficult to program) than GPU for this. CPU is better as well.

      --
      I am very small, utmostly microscopic.
    5. Re:I'm not a computer scientist, and... by gatkinso · · Score: 2

      >> The main issue is that the complexity writing a program that does things on the GPU is much higher.

      Not so much. There is programming overhead, but it isn't too bad.

      --
      I am very small, utmostly microscopic.
    6. Re:I'm not a computer scientist, and... by Morpf · · Score: 5, Informative

      Close, but not quite correct.

      The point is GPUs are fast doing the same operation on multiple data. (e.g. multiplying a vector with a scalar) The emphasize is on _same operation_, which might not be the case for every problem one can solve parallel. You will loose speed as soon your elements of a wavefront (e.g. 16 threads, executed in lockstep) diverge into multiple execution paths. This happens if you have something like an "if" in your code and one for one work item the condition is evaluated to true and for another it's evaluated to false. Your wavefront will only be executed one path at a time, so your code becomes kind of "sequential" at this point. You will loose speed, too, if the way you access your GPU memory does not fulfill some restrictions. And by the way: I'm not speaking about some mere 1% performance loss but quite a number. ;) So generally speaking: not every problem one can solve in parallel can be efficiently solved by a GPU.

      There is something similar to caches in OpenCL: it's called local data storage, but it's the programmers job to use them efficiently. Memory access is always slow if it's not registers you are accessing, be it CPU or GPU. When using a GPU you can hide part of the memory latency by scheduling way more threads than you can physically run and always switch to those who aren't waiting for memory. This way you waste less cycles waiting for memory.

      I support your view writing for GPU takes quite a bit of effort. ;)

    7. Re:I'm not a computer scientist, and... by UnknownSoldier · · Score: 5, Informative

      If one woman can have a baby in 9 months, then 9 women can have a baby in one month, right?

      No.

      Not every task can be run in parallel.

      Now however if your data is _independent_ then you can distribute the work out to each core. Let's say you want to search 2000 objects for some matching value. On a 8-core CPU you would need 2000/8 = 250 searches. On the Titan each core could process 1 object.

      There are also latency vs bandwidth issues, meaning it takes time to transfer the data from RAM to the GPU, process, and transfer the results back, but if the GPU's processing time is vastly less then the CPU, you can still have HUGE wins.

      There are also SIMD / MIMD paradigms which I won't get into, but basically in layman's terms means the SIMD is able to process more data in the same amount of time.

      You may be interested in reading:
      http://perilsofparallel.blogspot.com/2008/09/larrabee-vs-nvidia-mimd-vs-simd.html
      http://stackoverflow.com/questions/7091958/cpu-vs-gpu-when-cpu-is-better

      When your problem domain & data are able to be run in parallel then GPU's totally kick a CPU's in terms of processing power AND in price. i.e.
      An i7 3770K costs around $330. Price/Core is $330/8 = $41.25/core
      A GTX Titan costs around $1000. Price/Core is $1000/2688 = $0.37/core

      Remember computing is about 2 extremes:

      Slow & Flexible < - - - > Fast & Rigid
      CPU (flexible) vs GPU (rigid)

      * http://www.newegg.com/Product/Product.aspx?Item=N82E16819116501
      * http://www.newegg.com/Product/Product.aspx?Item=N82E16814130897

    8. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 2, Insightful

      Yes, it is that bad. Not only is it extremely platform-specific, the toolchains are crap. We're just now transitioning from "impossible to debug" to "difficult to debug".

    9. Re:I'm not a computer scientist, and... by crutchy · · Score: 1

      not sure if i'm right, but i tend to think of any gpu-based application as having to construct data like pixels on a screen or image (since that's what gpu's are primarily designed to handle)

      a cpu treats each pixel separately, whereas a gpu can process multiple pixels simultaneously

      problem comes about if you try to feed data into a gpu that isn't like pixels

      is the programming difficulties in trying to trick the gpu into thinking it's processing pixels even though it may be processing bitcoin algorithms etc?

    10. Re:I'm not a computer scientist, and... by Morpf · · Score: 1

      Well, you don't have to trick the GPU in thinking it processes pixels. You can do general purpose computation with a language quite similar to C99.

      You are right in that way, that you partition your problem in many subelements. In OpenCL those are called work items. But those are more like identical threads than pixels. Sometimes one maps the work items on a 2d or 3d grid if the problem domain fits. (e.g. image manipulation, physics simulation)

      Actually it's not that hard implementing "normal" algorithms on a GPU. For example the bitcoin mining algorithm can be implemented quite straight forward. It may even look almost the same as a C method programmed for a CPU. The programming is a bit difficult as you have many restrictions to obey to get good performance out of a GPU.

    11. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 0

      Yes, it is that bad. Not only is it extremely platform-specific, the toolchains are crap. We're just now transitioning from "impossible to debug" to "difficult to debug".

      OpenCL has been a joy to program with, and CUDA, while having interesting quirks, has always built very easily for me.

    12. Re:I'm not a computer scientist, and... by BitZtream · · Score: 1

      They do ONE thing well. Floating point ops. EVERYTHING ELSE THEY SUCK AT, including simple logic checks, like if statements are painfully mind numbingly slow on the GPU.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    13. Re:I'm not a computer scientist, and... by BitZtream · · Score: 2, Insightful

      Parallelization is not why GPUs are fast, its a side effect of rendering pixels, nothing more.

      GPUs are fast because they do an extremely limited number of things REALLY REALLY fast, and when you're doing graphics ... well guess what, its all pretty much doing those few things the GPU does well over and over again, per pixel (or vertex). They are parallelized because those simply, super fast processors are also small from a chip perspective, so stuffing a ton of them on the chip so it can do many pixels in parallel works, again, because all those pixels get treated the same way with a very limited number of well known operations performed on them.

      They are not replacing CPUs because something like a simple if statement doesn't pause one processor, it pauses them ALL, and then top it off with the GPU being absolutely horrible (speed wise) at dealing with an IF statement. In shaders, you can get by with an if on a uniform because it only has to be calculated once and a decent driver can optimize the if away early on be for sending it to the cell processors on the CPU. Do IFs on an attribute (say a vertex or texture coord) and watch your GPU crawl like a snail.

      Parallelization in GPUs is a direct result of the fact that they perform the same task on massive arrays of data. Since the code works on individual cells in the array individually, there is no 'race' condition possibility in the code, so its ready to run concurrently. Adding a new shader cell effectively gives you more speed without any sort of programmer effort what so ever.

      The reason these parallel cells can work together so fast is also because the silicon works in lock step. (thats why IFs or attributes kill performance). Basically each line of the shader program executes side by side on all the shader cells at once. This makes all sorts of neat silicon based performance tricks possible.

      Where you get screwed however is those IFs (All branching instructions really) because if any one shader cell has to run a branch of code, they ALL run the code, and then just discard the results. So when you write branching code in a shader, you are almost certainly going to run every code path provided if you use the wrong data for your branch.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    14. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 0

      I want to know why GPUs are so much better at some tasks than CPUs? And, why aren't they used more often if they are orders of magnitude faster?

      CPUs are built for multitasking a ton of dynamic, middle-of-the-road workloads.
      GPUs are built more for specific type of workload, and for those they work much better.

      CPUs use tons of die space for local cache, and main memory latency is lower than dedicated graphics memory. A PC is basically rigged to work on smaller, random data sets. That's mostly what they do, switching between tons of different tasks all the time, with low latency.

      GPUs are more weighted by parallel processing units, very small cache, and higher latency but much faster bandwidth memory. They work on consistently huge, parallelized, predictable math/computation heavy workloads.

      Now, look at the PS4's architecture, its main memory is all higher latency GDDR5.
      It will get by because it will still have a decent sized cache on the main processor, and it really doesn't have to multitask much. That could be the future PC gaming architecture =D

    15. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 3, Funny

      Now however if your data is _independent_ then you can distribute the work out to each core.

      Let me translate this into a woman-baby analogy: if one woman can have a baby in 9 months, then 9 women can have 9 babies in 9 months. At first the challenge is joggling with the timing of dates and dividing the calendar for conception events as near as possible to each other to keep up the efficiency and synchronization. Afterwards the challenge is the alimony, paying up college and particularly the Thanksgiving, when the fruits of the labor come together.

    16. Re:I'm not a computer scientist, and... by VortexCortex · · Score: 1

      If one woman can have a baby in 9 months, then 9 women can have a baby in one month, right?

      No.

      You're wrong, otherwise we'd need close to 130 million months per year. Furthermore, the 9 women have their 9 babies after ~9 months yielding in an average production rate of 1bpm (one baby per month) from this group of women -- If kept perpetually pregnant. If we put 90 women in the baby farm they will produce TEN Babies Per Month.

      Some people's kids, I swear -- They must have botch the batch of logic circuits in your revision; This is Matrixology 101.

    17. Re:I'm not a computer scientist, and... by loneDreamer · · Score: 1

      True. Nevertheless, using it for databases when data is cached seems like a neat idea. Lots of "_same operation_" for let's say, selecting all tuples with a specific value on a huge table.

    18. Re:I'm not a computer scientist, and... by H0p313ss · · Score: 0

      I want to know why GPUs are so much better at some tasks than CPUs? And, why aren't they used more often if they are orders of magnitude faster?

      Thanks.

      I'm glad you put the preface in there, because it's basic comp. sci.

      --
      XML is a known as a key material required to create SMD: Software of Mass Destruction
    19. Re:I'm not a computer scientist, and... by r2kordmaa · · Score: 1

      CPU has small number of very complex cores, good for fast decision making, eg managing opsys resources GPU has lots of very simple cores, useless for decision making, but great for parallel number crunching

    20. Re:I'm not a computer scientist, and... by anagama · · Score: 3, Insightful

      I think you totally missed his point -- tin whiskers on your circuit board? Blown caps?

      The fact that 9 women can have 9 babies in 9 months for an average rate of 1/mo, does not disprove the assertion 9 women cannot have __a__ (i.e. a single) baby in one month. You're talking about something totally different and being awfully smug about it to boot.

      --
      What changed under Obama? Nothing Good
    21. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 0

      so i can see xor'ing the tuple with the query tuple and looking for a zero result, but dont you need to branch or move depending on the result? doesnt this add a dreaded "if" and therefore degrade the parallelism?

    22. Re:I'm not a computer scientist, and... by PhamNguyen · · Score: 4, Informative

      What you are describing is GPU computing 5 to 10 years ago. Now, (1) you don't wrote shaders you write kernels. (2) a GPU can do most of the functions of a CPU, the difference is in things like branch prediction and caching. (3) threads execute in blocks of 16 or some other round number. There is no performance loss as long as all threads in the same block take the same execution path.

    23. Re:I'm not a computer scientist, and... by PhamNguyen · · Score: 1

      I was intentionally simplifying, but I agree with your more detailed exposition. I did understate the extent to which fundamental issues related to the GPU architecture are still relevant. My own experience is in embarrassingly parallelizable problems so my knowledge of these issues is nor very deep.

    24. Re:I'm not a computer scientist, and... by mwvdlee · · Score: 1

      What if you wanted only one baby? You'd still have to wait nine months, no matter how many women are involved.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    25. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 0

      I want to know why GPUs are so much better at some tasks than CPUs? And, why aren't they used more often if they are orders of magnitude faster?

      Thanks.

      The explanations you've got so far are either way too complex for a non-computer-scientist to understand or just plain useless. I'll try to do better.

      A typical GPU has a very large number of processor cores (somewhere between 32 for low end models up to 128 or so for high end ones). They have a few design aspects that make these processor cores different to standard CPU cores:

      * They have a slower clock speed than CPU cores (typically 1 GHz). This doesn't matter because they do more in each clock cycle (operations run on 4-component vectors rather than single numbers, although CPUs can sometimes do this kind of thing too -- just not as well because they aren't really designed for it) and because there are more than twice as many cores, so a 2x slowdown doesn't matter.
      * They have a faster memory interface than CPUs. A CPU has to integrate with a motherboard that has a variable amount of memory in multiple slots. This makes the memory interface design more complex, which slows it down. GPUs are usually directly soldered to a small board with a fixed amount of memory in a single bank that is also soldered directly to the board. They also often have more pins dedicated to memory, as they don't have to conform to existing standards (i.e. the number of pins on a DIMM), so you often see 256-bit wide memory interfaces rather than the 128-bit wide interface most current CPUs have. These advantages means that while CPUs typically manage about 20GB/s of memory transfer, a modern GPU can perhaps manage about 200GB/s.
      * In order to fit so many cores in a single chip, the cores have to be really simple. They achieve this by making them really good at what they do a lot of (mathematical calculations) but sacrificing what they don't really need to do much of (making decisions between different paths of a calculation). They also sacrifice cache memory which doesn't help very much for the kind of calculations they're designed for (which typically work with very large sets of data with each item accessed with roughly equal probability, whereas the kind of work CPUs do often has a small amount of data that is accessed a lot and the rest only occasionally).

    26. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 0

      It's a completely different style of architecture. CPUs are generic. GPUs on the other hand can implement an entire algorithm in hardware with a single instruction call. That means they're less flexible and can't do many things, but what they can do they can do extremely quickly (just look at the difference between software rendering and hardware acceleration speed/quality). They're not used often because of that lack of flexibility, which makes them unsuitable for many types of program or at last much much harder to program an algorithm for.

    27. Re:I'm not a computer scientist, and... by gatkinso · · Score: 1

      I guess we all have our strengths and weaknesses.

      --
      I am very small, utmostly microscopic.
    28. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 0

      Not many problems require multi GBs of RAM or could be chopped up into 1k pieces: a GPU is better because you can have 500 to 1,000 parallel threads running looking at the same memory set. eg renderng a 3D scene.

    29. Re:I'm not a computer scientist, and... by ciderbrew · · Score: 1

      Baby girl born at just 21 weeks and five days. Five months in neonatal care. That (edited down version) came from the dailymail and I'm not linking to them.
      If it is self sufficient from the parent - then that could be 1 every 30 years. So you draw the line where you want on this one.

    30. Re:I'm not a computer scientist, and... by Skrapion · · Score: 1

      Get back to us when you've written a GPU-powered database.

      --
      The details are trivial and useless; The reasons, as always, purely human ones.
    31. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 0

      The GP wanted to know (quoting) "why GPUs are better at some tasks than CPUs" and you've basically answered with "because GPUs are better at some tasks than CPUs". How this was modded insightful I don't know.

    32. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 0

      well, first of all, a computer scientist can't answer your question. Listen, kids, computer science has little to do with computers and nothing to do with programming or computer administration or computer repair. Pure and simple, computer science is mathematics. That's it... there's no mystery to the discipline. So give up your dreams of studying computer science because you want to work "in computers," and study something useful and worth while... like electrical engineering or computer engineering... or, seriously, medicine.... you just can't go wrong with medicine (nursing, premed, pharmacology, medical technology... just can't go wrong in medicine).

    33. Re:I'm not a computer scientist, and... by raxhelm · · Score: 1

      "When your problem domain & data are able to be run in parallel then GPU's totally kick a CPU's in terms of processing power AND in price. i.e.
      An i7 3770K costs around $330. Price/Core is $330/8 = $41.25/core
      A GTX Titan costs around $1000. Price/Core is $1000/2688 = $0.37/core"

      That's a very unfair comparison. For one the i7-3770k has SIMD as well (8-wide AVX). A better comparison is maximum GFLOPs/s.
      The max for the 3700k is frequency * 8 (AVX) * 2(simultaneous multiplication and addition) * 4 cores. Let's assume the frequency is 4.0 GHz (base turbo is 3.8 - mine is overclocked to 4.4). That's 256 SP (single precision) GFLOPs/s and 128 DP (double precision) GFLOPs/s. According to wikipedia the Titan can do 4500 SP GFLOPs/s and 1300 DP GFLOPs/s. That's 19x as many SP GFLOP/s and 10x as many DP GFLOPs. In USD that's 6x as many SP GFLOPs/s per USD and 3x as many DP GFLOPs/s per USD.

      When Haswell comes out in June the max will double due to FMA3. That means the Titan will deliver 3x SP GFLOPs/s per USD and only 1.5x DP GLOPs/s per USD. Hawell's DP performance and price will already be better than the GK 104 chip (GTX 680). However, AMD Radeon will still have an advantage in performance and performance/price both for SP and DP.

      Likely the student who made his/her database on the GPU did not optimize the CPU code because if he/she did the advantage would have been a lot less that 70x.

    34. Re:I'm not a computer scientist, and... by Morpf · · Score: 1

      Yes, you will somehow need to branch at one point - well at least I can't think of a way without branching - but not every branch makes your program crawl like a snail. For example the amount of work done in the branches really does matter. If you can't avoid branching try to do as little as possible in the branches. ;)

      I for one would write the current position of a "hit" into the same position of a second array. Otherwise write a zero. So your branches are quite minimal:

      if(hit) {
          secondArray[id] = id;
      } else {
          secondArray[id] = 0;
      }

      Now you can sort secondArray to get rid of the zeros. The result is something similar to a list of all rows your query should fetch. Now you can grab the content of the rows and write them to a buffer and send it to the cpu.

    35. Re:I'm not a computer scientist, and... by Bobfrankly1 · · Score: 1

      If one woman can have a baby in 9 months, then 9 women can have a baby in one month, right?...

      I'm sorry, this is slashdot. You must include an obligatory car analogy to get your point across.

    36. Re:I'm not a computer scientist, and... by dj245 · · Score: 1

      Close, but not quite correct.

      The point is GPUs are fast doing the same operation on multiple data. (e.g. multiplying a vector with a scalar) The emphasize is on _same operation_, which might not be the case for every problem one can solve parallel. You will loose speed as soon your elements of a wavefront (e.g. 16 threads, executed in lockstep) diverge into multiple execution paths. This happens if you have something like an "if" in your code and one for one work item the condition is evaluated to true and for another it's evaluated to false. Your wavefront will only be executed one path at a time, so your code becomes kind of "sequential" at this point. You will loose speed, too, if the way you access your GPU memory does not fulfill some restrictions.

      I'm not an expert on this subject, but with hundreds, or even thousands, of GPU cores, why not just run the calculation for all cases of an if/then and then toss out the ones that don't pan out? It's not a very efficient way to do things, but it could work.

      I believe this is the principle of quantum computing. Process all possible answers simultaneously and then pick the right one(s).

      --
      Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
    37. Re:I'm not a computer scientist, and... by Morpf · · Score: 1

      Actually this is what is done on GPUs. Think of it this way: You have a number of "processors" which share one control flow. The number of "processors" sharing one control flow on a AMD 79xx is 64. Now all these "processors" evaluate the if-statement. If it's true for some and false for other "processors" than both paths are executed sequentially. Those "processors" which would normally not run -because they belong to the other branch- are masked, so they don't execute the instructions. If the if-statement is evaluated true or false for all "processors" in this group only one path is taken.

      Note: Actually there are 16 processors working parallel but they execute 4 times, each time on a different work item. So the correct word for "processor" would be work item.

    38. Re:I'm not a computer scientist, and... by Morpf · · Score: 1

      Actually parallelization IS why GPUs are fast. You have some restrictions but it's the parallel execution which gives you the boost in performance.

      The things a GPU can do are not so limited as you might think.

      The statement about "if" pausing all processors is wrong. On my card 64 work items are executed in lockstep on 16 processors in something called a wavefront. Now I have way more processors on the card. Furthermore only when the if statement in the control flow is evaluated true for some work items and false for others you get hit by degraded performance, as you then need to execute both paths sequentially. If the if statement is evaluated the same for the whole wave front you don't loose anything. This works on attributes of the work items or other run time data, too.

      There is a good possibility for race conditions in OpenCL code. Depending on the algorithms and optimizations one has to synchronize or use different places for input and output.

    39. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 0

      16 or some other round number

      Hehe, only on /. is 16 considered a round number.

    40. Re:I'm not a computer scientist, and... by sapgau · · Score: 1

      Part of the answer is the "magic" of matrix math. You can represent multiple linear equations in every row of a matrix and when you apply one operation (add, multiply, etc) you performed it on all your encoded equations inside the matrix.

      If you can, for example, represent your problem in a linear equation (algebra) then you can also formulate 50 similar equations. You want to "transform" all your equations by some operation (lets say divide by 20), so instead of calculating 50 times that operation for every equation you just do it once if you encode them in a matrix.

      This started in graphics when you wanted to apply the same operation to many pixels on a screen (i.e. apply a shading rule), so that's why video cards have these massive processing power on pixels.

      https://en.wikipedia.org/wiki/Matrix_(mathematics)

    41. Re:I'm not a computer scientist, and... by Zzzoom · · Score: 0

      That core count is a bit deceptive. GPU makers count an N-wide vector unit as N cores. An i7 3770K has 4 cores that have a 256-bit (8 single precision operands) AVX vector unit each. A GTX Titan has 14 multiprocessors/cores that have 6 32-wide single precision vector units each. Then there's the mess of counting FMAs as two instructions, the 3770k letting you execute one addition and one multiplication per cycle...just look at the FP32 and FP64 GFLOPS instead of counting cores.

    42. Re:I'm not a computer scientist, and... by UnknownSoldier · · Score: 1

      Yeah, it is not exactly an apples to apples comparison. I really should of specified that.

      Thanks for the extra details. I'm sure where you are getting 6 x 32-bit from though? Are you confusing the 6 ROPs?
      http://images.anandtech.com/doci/6760/GK110_Block_Diagram_FINAL2.png

      I just got my Titan and was kind of surprised to see the 14 SMXes myself. Each SMX has 192 FP32 cores (12*16) which explains where the odd 2688 comes from: 14*12*16 = 14*192 = 2688.

      GK110 is composed of 15 of NVIDIAâ(TM)s SMXes, each of which in turn is composed of a number of functional units. Every GK110 packs 192 FP32 CUDA cores, 64 FP64 CUDA cores, 64KB of L1 cache, 65K 32bit registers, and 16 texture units. These SMXes are in turn paired with GK110's 6 ROP partitions, each one composed of 8 ROPs, 256KB of L2 cache, and connected to a 64bit memory controller. Altogether GK110 is a massive chip, coming in at 7.1 billion transistors, occupying 551mm2 on TSMC's 28nm process

      For Titan NVIDIA will be using a partially disabled GK110 GPU. Titan will have all 6 ROP partitions and the full 384bit memory bus enabled, but only 14 of the 15 SMXes will be enabled.

      Reference

      * http://www.anandtech.com/show/6760/nvidias-geforce-gtx-titan-part-1

    43. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 0

      Remember computing is about 2 extremes:

      Slow & Flexible < - - - > Fast & Rigid

      Close. Computing is about three extremes: flexible, fast, and affordable. Pick two ;)

      We'll pretend power requirements vs. mobility is not an issue for now but this is generally the other big one.

    44. Re:I'm not a computer scientist, and... by gatkinso · · Score: 1

      No GPU powered database - just an H.264 codec, real time LIDAR data processor, and several GIS data processing modules.

      GPGPU development (I use CUDA which makes it even easier) really isn't that hard. But then again I would expect a person with a high six digit user id to be challenged by the fundamentals.

      --
      I am very small, utmostly microscopic.
    45. Re:I'm not a computer scientist, and... by Skrapion · · Score: 1

      Colour me curious. I've never met a programmer who's capable of writing an H.264 codec but still has the arrogance of a sophomore. Do you have a link to your project?

      --
      The details are trivial and useless; The reasons, as always, purely human ones.
  3. That Didn't Take Long: Database Down For Maint. by cmholm · · Score: 2

    Slashdotted? I happened to catch the story just as it went live, and hit the link to the service. After scrolling the map and getting a couple of updates: Database is down for maintenance. The front end may not be as high performance as the back... or it may have been coincidence.

    --
    Luke, help me take this mask off ... Just for once, let me butterfly kiss you with my own eyes.
    1. Re:That Didn't Take Long: Database Down For Maint. by tmostak · · Score: 5, Informative

      Hi... MapD creator here... this is the first time we've been seriously load tested, and I realize I might have a "locking" bug that's creating a deadlock when people hit the server at the exact same time. Todd

    2. Re:That Didn't Take Long: Database Down For Maint. by Phrogman · · Score: 1

      Well since this is apparently from the guy who the article is talking about, perhaps someone could mod it up just a bit?
      No points here

      --
      "The first time I got drunk, I got married. The second time I bought a chimpanzee, after that I stayed sober" Arian Seid
    3. Re:That Didn't Take Long: Database Down For Maint. by Frankie70 · · Score: 1

      Is that why it's faster? It doesn't do any synchronization?

    4. Re:That Didn't Take Long: Database Down For Maint. by static0verdrive · · Score: 1, Interesting

      An open source license will help get those bugs squashed in no time! ;)

      --
      ========
      77 77 77 2e 6d 65 6c 76 69 6e 73 2e 63 6f 6d
    5. Re:That Didn't Take Long: Database Down For Maint. by tmostak · · Score: 5, Informative

      Har har... Well things got tricky when I wrote the code to support streaming inserts (not implemented in the current map) so you could view tweets or whatever else as they came in - this required a lot of fine-grained locking. May just bandaid this and give locks to connections as they come in until I can figure out what's going on. Todd

    6. Re:That Didn't Take Long: Database Down For Maint. by BitZtream · · Score: 0, Troll

      Citation Needed (RMS not allowed, sorry, we want reality here)

      Please show quantitative proof that just open sourcing something instantly provides you faster feedback without any other costs or shut the fuck up with that tired bullshit.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    7. Re:That Didn't Take Long: Database Down For Maint. by Indigo · · Score: 1

      You mean, if Open Source isn't magic, it's bullshit? Way to straw man.

    8. Re:That Didn't Take Long: Database Down For Maint. by Anonymous Coward · · Score: 1

      No, he means (and said) if Open Source isn't magic, claiming Open Source is magic is bullshit.

    9. Re:That Didn't Take Long: Database Down For Maint. by Anonymous Coward · · Score: 0

      So are you disputing the fact that many people can spot and fix bugs better than one person?

      That's just nonsense. You can parallelize the work and many people will ALWAYS find bugs better than a single person.

      A nice strawman by the way. The static0verdrive guy said essentially "getting more people to look at the code means they can spot and fix bugs faster than you, a single person, ever would". Then you go in, demanding quantitative proof about "instant" betterment - and here's the strawman - "without any other costs". The original person said nothing about "other costs".

    10. Re:That Didn't Take Long: Database Down For Maint. by Anonymous Coward · · Score: 0

      "Big Data"? Give me a break! 40 million tweats fits in RAM on a single desktop. If your accesses to that weren't quick, then you're shit at database design. Massively-multi-core GPUs are not the solution, not being shit at database design is the solution.

  4. and the most amazing thing by roman_mir · · Score: 4, Funny

    as the TFS states he uses GPUs to do the data processing, but you are never going to believe what he uses to store the actual data, you won't believe it, that's why it's not mentioned in TFS. Sure sure, it's PostgreSQL, but the way the data was stored physically was in the computer monitor itself. Yes, he punched holes in computer monitors with a chisel and used punch card readers to read those holes from the screens.

    1. Re:and the most amazing thing by eyenot · · Score: 4, Funny

      Mod parent up!

      Also: I heard he's using the printer port for commuication. By spooling tractor feed paper between two printers in a loop, and by stopping and starting simultaneous paper-feed jobs, he can create a cybernetic feedback between the two printers that results in a series of quickly occurring "error - paper jam" messages that (due to two taped-down "reset" buttons) are quickly translated from the wide bandwidth analog physical matrix into kajamabits of digital codes. The perceived bandwidth gain is much higher than just a single one or zero at a time.

      That way, he can access the mainframe any time, from any physical location, and it will translate directly into a virtual presence.

      --
      "Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee
    2. Re:and the most amazing thing by roman_mir · · Score: 1

      They don't grt it. He solved the speed of processing and the lack of long term durability of storage by doing what's described in the original comment... Worked like a charm without needing to rithink the entire problem of a single bus used to retrieve and store data on the physical storage that still accessess data serially.

    3. Re:and the most amazing thing by crutchy · · Score: 1

      By spooling tractor feed paper between two printers in a loop, and by stopping and starting simultaneous paper-feed jobs, he can create a cybernetic feedback between the two printers that results in a series of quickly occurring "error - paper jam" messages that (due to two taped-down "reset" buttons) are quickly translated from the wide bandwidth analog physical matrix into kajamabits of digital codes

      i would be really careful doing that... the system may become self-aware

  5. How's gpu that much faster?! by Anonymous Coward · · Score: 0

    Could anyone give a brief and non over technical explanation about this?!

    1. Re:How's gpu that much faster?! by roman_mir · · Score: 1

      It's like tons of little fish devouring an elepant carcas rather than one shark doing the same. You asked for a non technical... Of-course it's still harddrives (or sdds today) all the way down.

    2. Re:How's gpu that much faster?! by crutchy · · Score: 1

      does the shark have a laser?

    3. Re:How's gpu that much faster?! by Anonymous Coward · · Score: 0

      that depends on whether or not the shark is equipped with a laser.

  6. sounds like... by stenvar · · Score: 1, Redundant

    It sounds like he's doing standard GPU computations, loading everything into memory, and then calling it a "database", even though it really isn't a "database" in any traditional sense.

    1. Re:sounds like... by tmostak · · Score: 5, Informative

      Hi, MapD creator here - and I have to disagree with you. The database ultimately stores everything on disk, but it caches what it can in GPU memory and performs all the computation there. So all the SQL operations are occurring on the GPU, after which, in case of the tweetmap demo, the results are rendered to a texture before being sent out as a png. But it works equally well as a traditional database - it doesn't do the whole SQL standard yet but can handle aggregations, joins, etc just like a normal database, just much faster. Todd

    2. Re:sounds like... by nebosuke · · Score: 3, Interesting

      Just out of curiosity, did you use PGStrom or roll your own pgsql/GPU solution? If the latter, did you also hook into pgsql via the FDW interface or some other way?

    3. Re:sounds like... by Anonymous Coward · · Score: 1

      I'd be very interested to hear more details about the GPU SQL algorithms (JOIN in particular) if you are willing to share them. Did you use the set operations
      in Thrust or did you write something custom?

      Some of my colleagues are planning on releasing an open source library and some online tutorials about hash join and sort merge join in CUDA, and I would be very interested to share notes.

    4. Re:sounds like... by tmostak · · Score: 5, Informative

      So I use postgres all the time, but MapD isn't built on Postgres, it actually stores its own data on disk in column-form in (I admit crude) memory-mapped files. I have written a Postgres connector that connects MapD to Postgres though since I use postgres to store the tweets I harvest for long-term archiving. The connector uses pqxx (the C++ Postgres library). Todd

    5. Re:sounds like... by tmostak · · Score: 5, Informative

      I'm not using thrust - I rolled my own hash join algorithm. This is something I still haven't optimized a great deal and I'm sure your stuff runs much better. Would love to talk. Just contact me on Twitter (@toddmostak) and I'll give you my contact details. Todd

    6. Re:sounds like... by korgitser · · Score: 1

      I wonder what would it mean to the data if you were to lossily compress that png...

      --
      FCKGW 09F9 42
    7. Re:sounds like... by cbhacking · · Score: 1

      Horrible things, probably. Good thing PNG is lossless compression...

      --
      There's no place I could be, since I've found Serenity...
    8. Re:sounds like... by stenvar · · Score: 1

      So, it sounds like you're implementing SQL as a data analytics language for in-memory data (plus a bunch of potentially useful algorithms), but apparently without the features that usually make a database a "database", like persistence, transactions, rollbacks, etc. It's those other features that make real databases slow, which is why you can't claim huge speedups over "databases" since you're not implementing the same thing.

      Data analytics on GPUs is a great thing, which is why tons of people are doing it. SQL isn't usually the language of choice because it isn't a good match and you have to build everything from scratch. GPU support in languages like R and Matlab gives you all the analytics features of SQL with a nicer syntax and really fast performance. Those languages also have tons of useful libraries for GIS, text analysis and visualization built in already.

    9. Re:sounds like... by Inda · · Score: 1

      I don't understand a lot of what you're talking about but thanks for taking the time to reply to the questions on here. I wish more people would do it.

      --
      This post contains benzene, nitrosamines, formaldehyde and hydrogen cyanide.
  7. "a few hundred really small processors" by Anonymous Coward · · Score: 0

    I'd hardly call them "really small processors" haha.

  8. PostgreSQL used GPU 2 years ago by Anonymous Coward · · Score: 1

    The 70x times seem optimistic. Does this include ALL the overheads for the GPU?
    But this done and patented over 2 years ago.
    http://www.scribd.com/doc/44661593/PostgreSQL-OpenCL-Procedural-Language

    And there has been earlier work using SQLite on GPU's.

  9. First customer? by __aaltlg1547 · · Score: 0

    The Egyptian government...

  10. So this is where all the Titans ended up.... by BulletMagnet · · Score: 1

    Still waiting for one/two....to play games on....

    1. Re:So this is where all the Titans ended up.... by do0b · · Score: 1

      bought one, worth every penny!

      --
      After 12 years and a few days, I finally gave in to the dark side and joined slashdot.
    2. Re:So this is where all the Titans ended up.... by Anonymous Coward · · Score: 0

      But not every dollar.

  11. Re:PostgreSQL used GPU 2 years ago by tmostak · · Score: 5, Informative

    The 70X is actually highly conservative - and this was benched against an optimized parallelized main-memory (i.e. not off of disk) CPU version, not say MySQL. On things like rendering heatmaps, graph query operations, or clustering you can get 300-500X speedups. The database caches what it can in GPU memory (could be 128GB on one node if you have 16 GPUs) and only sends back a bitmap of the results to be joined with data sitting in CPU memory. But yeah, if the data's not cached, then it won't be this fast. That's true, a lot of work has been done on GPU database processing - this is a bit different I think b/c it runs on multiple GPUs and b/c it tries to cache what it can on the GPU. Todd (MapD creator)

  12. Am I the only one? by Anonymous Coward · · Score: 0

    That thought that this would be a searchable database of all GPUs that exist? Because that sounded kinda useful.

  13. obvious question by crutchy · · Score: 1

    does it blend?

  14. Re:PostgreSQL used GPU 2 years ago by asicsolutions · · Score: 2

    Altera and Xilinx both have high level synthesis tools out that can target FPGA's using generic C. The Altera one allows you to target GPU's, CPU's or FPGA's. In the case of highly parallel tasks, an FPGA can run many times faster than even a GPU. There are fairly large gate count devices with ARM cores available now so you move the tasks around for better performance. I'd love to see some of these tasks targeting these devices.

  15. Could have... by Ghjnut · · Score: 1

    Maybe we should make it a habit of giving the owner some warning before slashdotting them. I know that if I ever get any concept development project up and running, I'm pretty excited to show my friends and tend to make it accessible before it's optimized enough to handle that king of onslaught.

    --
    MouseClass extends ScrollClass, which extends TabClass, which extends SidebarClass, which extends PowerClass, w
    1. Re:Could have... by Ghjnut · · Score: 1

      kind*, I'm not sure whether or not slashdot holds the title of 'king of onslaught'.

      --
      MouseClass extends ScrollClass, which extends TabClass, which extends SidebarClass, which extends PowerClass, w
    2. Re:Could have... by neonmonk · · Score: 1

      Where's Onslaught?

    3. Re:Could have... by BitZtream · · Score: 1

      The owner is the submitter. He knew what he was getting into, or should have.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    4. Re:Could have... by tmostak · · Score: 2

      Umm... no I didn't submit this. Perhaps the author of the article did. But I may have just done a super-hacky bandaid fix (also disallowed click requests - which may be a bit buggy) - we'll see if it holds up. Todd

    5. Re:Could have... by aiht · · Score: 2

      Where's Onslaught?

      It's in Norweight.

    6. Re:Could have... by Anonymous Coward · · Score: 0
  16. Ingres-Actian and Vectorwise by not_quite_a_user · · Score: 1

    Ingres was renamed to Actian and have released an analytic/reporting database called "Vectorwise" which makes use of SIMD and many other innovations in data throughtput techniques(everything in the Intel optimisation manual plus a lot more) and it gets more than 70 times performance. Check out TPC-H results "This is not an advertisement"

  17. Large datasets are mostly IO limited by zbobet2012 · · Score: 5, Interesting

    While cool and all 125million tweets with geo tagging is at most: 1250000000*142bytes = 165 GB. That is not what "big data" considers a large data set. Indeed most "big data" queries are IO limited. For around 16k USD you can fit that entire working set in memory. You are not really in the "big data" realm into you have datasets in the 10's of TB's compressed (100's of TB's uncompressed).
    For these kinds of datasets, and where more compute is necessary there is MARs.

    1. Re:Large datasets are mostly IO limited by Anonymous Coward · · Score: 0

      I don't want to undo my moderation, so I have to post anonymously. Which is probably prudent anyway because I have to ask...

      Does MARs need women?

    2. Re:Large datasets are mostly IO limited by rtaylor · · Score: 1

      Agreed. That easily fits into memory (3 times actually) on our main OLTP DB.

      If it can fit into ram for less than $50K, it's not big data.

      --
      Rod Taylor
    3. Re:Large datasets are mostly IO limited by greg1104 · · Score: 1

      This project's innovation is noting that that GPUs have enough RAM now that you can push medium sized data sets into them if you have enough available. With lots of cores and high memory bandwidth, in-memory data sets in a GPU can do multi-core operations faster than in-memory data sets in a standard CPU/memory combination.

      That's great for simple data operations that are easy to run in parallel and when the data set is small enough to fit in your available hardware. Break any of those assumptions, and you've got a whole different set of problems to solve than what this is good for. I suspect none of those three requirements hold in the usual case for what people want out of "big data".

    4. Re:Large datasets are mostly IO limited by tmostak · · Score: 4, Informative

      Hi - MapD creator here. Agreed, GPUs aren't going to me of much use if you have petabytes of data and are I/O bound, but what I think unfortunately gets missed in the rush to indiscriminately throw everything into the "big data bucket" is that a lot of people do have medium-sized (say 5GB-500GB) datasets that they would like to query, visualize and analyze in an iterative, real-time fashion, something that existing solutions won't allow you to do (even big clusters often incur enough latency to make real-time analysis difficult).

      And then you have super-linear algorithms like graph processing, spatial joins, neural nets, clustering, rendering blurred heatmaps which do really well on the GPU, which the formerly memory bound speedup of 70X turns into 400-500X. Particularly since databases are expected to do more and more viz and machine learning, I don't think these are edge cases

      Finally, although GPU memory will always be more expensive (but faster) than CPU memory, MapD already can run on a 16-card 128GB GPU ram server, and I'm working on a multi-node distributed implementation where you could string many of these together. So having a terabyte of GPU RAM is not out of the question, which, given the column-store architecture of the db can be used more efficiently by caching only the necessary columns in memory. Of course it will cost more, but for some applications the performance benefits may be worth it.

      I just think people need to realize that different problems need different solutions, and just b/c a system is not built to handle a petabyte of data doesn't mean its not worthwhile.

    5. Re:Large datasets are mostly IO limited by stenvar · · Score: 2

      that a lot of people do have medium-sized (say 5GB-500GB) datasets that they would like to query, visualize and analyze in an iterative, real-time fashion, something that existing solutions won't allow you to do

      Yeah, they actually do. For in-memory queries, analysis, and visualization, people use statistical and numerical languages like R, Matlab, Python, and others (as well as tools with nice graphics frontends). And they have full GPU support available these days. In many cases, the GPU support parallelizes large array operations, in addition to implementing many additional special-purpose operations as well.

    6. Re:Large datasets are mostly IO limited by Anonymous Coward · · Score: 0

      Mkay. And now would you get the right number of zeros in a million?

      This is a *toy* database.

    7. Re:Large datasets are mostly IO limited by Buchenskjoll · · Score: 1

      Mars needs guitars

      --
      -- Make America hate again!
    8. Re:Large datasets are mostly IO limited by foniksonik · · Score: 1

      Check out Twitter's own Storm system built on top of ZeroMQ.

      http://storm-project.net/

      http://www.zeromq.org/

      You may find something you like.

      --
      A fool throws a stone into a well and a thousand sages can not remove it.
    9. Re:Large datasets are mostly IO limited by Muad'Dave · · Score: 1

      Mars needs guitars

      "'Cause the man from Mars stopped eatin' cars and eatin' bars and now he only eats guitars"

      See this video at 3:47.

      --
      Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
    10. Re:Large datasets are mostly IO limited by KingMotley · · Score: 1

      125 million tweets != 1250000000

      125,000,000 * 142 = 16.5GB and easily fits into my desktop's memory.

    11. Re:Large datasets are mostly IO limited by Anonymous Coward · · Score: 1

      You are being quite presumptious when you make sweeping statements like "they do" and people use "statistical and numerical languages" and "...they have full GPU support...". You listed Python (really? Python is a 'statistical and numerical language'? Why not include C, C++, perl, ruby, php, etc.?) -- what GPU support is baked into Python?

      As it so happens, I work in a field (computer security as an umbrella) where rapid query and visualization in real time as an iterative process is very important. And these magical tools you speak of simply don't exist. There is constant improvement, but nothing like what you imply. Whether it is log analysis, intrusion detection work, forensics, or something else -- there is an ongoing need for better tools and speed of query and visualization is a significant component of that.

      Oh, and the size of data he mentions (5GB-500GB) plays well into that. Forensic timelines are definitely in that range, IDS varies with where you are at and how it is configured, but generally is in that domain as well. Logs... well, that varies enormously and it is less generally applicable, but there is definitely a possibility that it could be leveraged. And in any case, iterative queries are par for the course.

      I don't know that this work can be generalized to benefit all of this, but it is interesting work nevertheless.

      Thoromyr

      (really don't want to give up mods, hence the AC post)

  18. Re:PostgreSQL used GPU 2 years ago by anarcobra · · Score: 2

    Actually, depending on the specific problem GPU can still be significantly faster than FPGAs mostly because of the large number of processing units.
    The FPGAs are far more power efficient though.

  19. GPU is good - but you need the IOPS to leverage it by Dave500 · · Score: 1

    For data processing workloads, a frequent problem with GPU acceleration is that the working dataset size is too large to fit into the available GPU memory and the whole thing slows to a crawl on data ingest (physical disk seeks, random much of the time) or disk writes for persisting the results.

    For folks serious about getting good ROI on their GPU hardware in real world scenarios, I strongly recommend you take a look at the fusion IO PCIe flash cards, which now support writing to and reading from them directly from CUDA via DMA, with little to no CPU handling required. (See: http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0619-GTC2012-Flash-Memory-Throttle.pdf).

    I can't talk about what we do with it, but lets just say the following hardware combination has lead to interesting results;
    i) 16x PCIe slot chassis: http://www.onestopsystems.com/expansion_platforms_3U.php
    ii) 8x Nvidia Kepler K20x's
    iii) 8x Fusion IO 2.4TB IoDrive 2 Duo's

    We have been able sustain over 4 million data operations a second, each one processing ~16 K of data in a recoverable, transactionally consistent manner, totaling up to around 50 Gigabytes of data processed per second. All in a 5U deployment drawing less than 4 kilowatts.

  20. Talk to IBM PureData - beat to the punch by Gothmolly · · Score: 2

    Granted its not free or cheap, but IBM will ship you a prebuilt rack of 'stuff' that will load 5TB/hour and scan 128GB/sec. PGStrom came out in the last year. Custom hardware/ASIC/FPGA for this sort of thing is not new.

    --
    I want to delete my account but Slashdot doesn't allow it.
    1. Re:Talk to IBM PureData - beat to the punch by Anonymous Coward · · Score: 0

      Note that he's not using "custom hardware/ASIC/FPGA" -- he's using GPUs. GPUs aren't as expensive, and don't perform as well -- basically they fall between CPU and custom setups.

  21. Same reason you can buy a $99 supercomputer by Anonymous Coward · · Score: 1

    They're massively more parallel, running many more smaller simpler cores.

    It's the same reason these guys can make a 16 core parallel computer for $99.... the cores are focused on their job so they can be smaller and cheaper and can put more on a die.
    http://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone/

    So these guys can run 8 daughter boards, with 64 cores per board, 512 cores, and it looks like they plan on scaling to 4096 cores because they use the top 12 bits of the address as the core routing id.
    The tradeoff with all those cores is they're dirt simple cores, moves, adds, branches, and some floating point ops (misses divide even, its done in software, but then for signal processing and multiply-add is the one that needs to be fast and its coded as a single instruction).

    If you read up on your high end graphics card it might have 900+ CUDA cores, really just ALU cores, the actual thread running cores are far fewer than that. But the ALU's can be run in parallel.

    So a vector multiply is done as a parallel operation on these ALU blocks, and many other operations break down to be parallel in the same way.

  22. Good to see things like this. by idbeholda · · Score: 2

    As a data analyst/software engineer, it makes me glad to see these kind of actual strides are being made to ensure that both data and software will eventually start being designed properly from their inception. To have a single cluster database with anything more than a few thousand entries is nothing short of incompetence, and I believe anyone who does this should be publicly shamed and flogged. When dealing with excessively large amounts of data, it quickly becomes a necessity to have a paralleled database design to ensure that searches aren't hampered by long query times. It genuinely makes me thrilled to see someone else use this kind of design other than me, so when I put out numbers on my end, maybe my results won't seem as fantastical or unbelievable. Even though I don't know you personally, keep up the good work, Todd.

    1. Re:Good to see things like this. by tmostak · · Score: 1

      Thanks for the kind words! Hopefully this is just the start of a fun project... Todd

    2. Re:Good to see things like this. by b4upoo · · Score: 1

      Most normal mortals won't have any knowledge of why a large database is useful. Frankly the first thing that leaps to my humble mind is
      trying to harvest money from the stock markets. Obviously there are numerous companies applying all kinds of computing power to the stock market. I do wonder if more computing power helps at this point or whether there is some toggle point at which massive data crunching would yield much better results.
                                    For example I don't know if people buy or sell more stocks if they are happy or if they are in a bit of suspense or under pressure. So how could a system that tries to compute the mood of the qualified buyers help me at all? It is rather like the car salesman's observation that the nonsense about gimmicks and hyping up the vehicle didn't sell many cars. The man told me that he signed deals when the buyers were worn out from going to dealership after dealership trying to decide what to do. They simply get so tired that they want relief and at that point can be sold most any old thing. How does one apply data to such complex beings as humans?

    3. Re:Good to see things like this. by idbeholda · · Score: 1

      Map the human genome with a parallel database. The only "downtime" would be sequencing, but query times to test for different factors in a particular splice would be relatively short. The downside to this would be the amount of space required to group, and tie together relevant data. Something like this would probably be a start, which I still haven't gotten around to releasing in its entirety yet, given that I don't have much free time nowadays.

  23. Or just skip right to the punchline... by Anonymous Coward · · Score: 0

    ...and do big data on an FPGA cluster.

  24. Q: Whats better than a GPU database? by WaffleMonster · · Score: 1

    A: Indexes that don't suck.

    Using GPUs and massivelly parallel blah blah blah is cool and all but most databases are not processor limited so why should we care?

    1. Re:Q: Whats better than a GPU database? by tmostak · · Score: 3, Insightful

      Try to heatmap or do hierarchical clustering on a billion rows in a few milliseconds with just the aid of indexes - not all applications need lots of cores and high memory bandwidth - but some do.

    2. Re:Q: Whats better than a GPU database? by WaffleMonster · · Score: 1

      Try to heatmap or do hierarchical clustering on a billion rows in a few milliseconds with just the aid of indexes - not all applications need lots of cores and high memory bandwidth - but some do.

      Even your examples are I/O rather than processor limited. Sending billions of rows to a GPU over the southbridge aint free.

  25. MediumData by biodata · · Score: 0

    40 million rows is what we used to manage in Oracle tables in the late 80s. Jeez, did this guy have no clue how to build a database?

    --
    Korma: Good
  26. Patented technology by Anonymous Coward · · Score: 0

    AFAIR using a database with a GPU has been patented by IBM some years ago

  27. To state the obvious by Anonymous Coward · · Score: 0

    It's great the GPU is faster than the CPU for massively parallel non-conditional operations. Why not use the CPU in addition to the GPU? Does the computer memory speed or bus bandwidth prevent it?

  28. Code optimization by KPU · · Score: 1

    Student writes inefficient code, learns how to optimize it using known techniques, it becomes faster. Film at 11.

  29. Oblig. In Soviet Russia ... by shikaisi · · Score: 1

    In Soviet Russia, GPU database creates you. Oh wait, wrong GPU

    --
    No left turn unstoned.
  30. Re:PostgreSQL used GPU 2 years ago by Anonymous Coward · · Score: 0

    I to would hope that a cluster of 16 GPUs would be 300x-500x faster than a non-clustered CPU.

  31. There you go. by Anonymous Coward · · Score: 0

    A truck can deliver 1000 times more goods at once than a compact, but if you need to deliver babies, your truck can only deliver one, in the passenger seat.

  32. CPU/GPU crunching not suited to map-reduce cluster by Anonymous Coward · · Score: 0

    A map-reduce cluster, such as Hadoop, is useful when you have a lot of data to sift through. It brings the data to the CPUs, rather than the other way around. It allows you to do a bunch of I/O in parallel, so you're not I/O bound. Contrast this to crunching numbers with GPUs or CPUs, where the bottleneck is processing throughput instead of I/O. These two architectures are optimized for solving different problems.

    So the comparison between this very-cool GPU-centric solution to a 1000-node map-reduce cluster is not useful. It's like saying that printers are better than FAX machines, because they can print more pages per minute.

  33. PDF link for GPU "on-board" flash memory by girlinatrainingbra · · Score: 1

    Thanks for the link to the GPU "on-board" flash memory presentation. Interesting to see that original Apple ][ hardware guru Wozniak is the chief scientist on this for Fusion I O hardware. I hadn't seen that about him on any other sites. Merci!