Slashdot Mirror


Harvard/MIT Student Creates GPU Database, Hacker-Style

First time accepted submitter IamIanB writes "Harvard Middle Eastern Studies student Todd Mostak's first tangle with big data didn't go well; trying to process and map 40 million geolocated tweets from the Arab Spring uprising took days. So while taking a database course across town at MIT, he developed a massively parallel database that uses GeForce Titan GPUs to do the data processing. The system sees 70x performance increases over CPU-based systems, and can out crunch a 1000 node MapReduce cluster, in some cases. All for around $5,000 worth of hardware. Mostak plans to release the system under an open source license; you can play with a data set of 125 million tweets hosted at Harvard's WorldMap and see the millisecond response time." I seem to recall a dedicated database query processor that worked by having a few hundred really small processors that was integrated with INGRES in the '80s.

4 of 135 comments (clear)

  1. Two thoughts based on this story by Anonymous Coward · · Score: 5, Interesting

    1. Facebook would like to have a discussion with him.
    2. The FBI would like to have a discussion with him.

  2. Re:sounds like... by nebosuke · · Score: 3, Interesting

    Just out of curiosity, did you use PGStrom or roll your own pgsql/GPU solution? If the latter, did you also hook into pgsql via the FDW interface or some other way?

  3. Large datasets are mostly IO limited by zbobet2012 · · Score: 5, Interesting

    While cool and all 125million tweets with geo tagging is at most: 1250000000*142bytes = 165 GB. That is not what "big data" considers a large data set. Indeed most "big data" queries are IO limited. For around 16k USD you can fit that entire working set in memory. You are not really in the "big data" realm into you have datasets in the 10's of TB's compressed (100's of TB's uncompressed).
    For these kinds of datasets, and where more compute is necessary there is MARs.

  4. Re:That Didn't Take Long: Database Down For Maint. by static0verdrive · · Score: 1, Interesting

    An open source license will help get those bugs squashed in no time! ;)

    --
    ========
    77 77 77 2e 6d 65 6c 76 69 6e 73 2e 63 6f 6d