Slashdot Mirror


Harvard/MIT Student Creates GPU Database, Hacker-Style

First time accepted submitter IamIanB writes "Harvard Middle Eastern Studies student Todd Mostak's first tangle with big data didn't go well; trying to process and map 40 million geolocated tweets from the Arab Spring uprising took days. So while taking a database course across town at MIT, he developed a massively parallel database that uses GeForce Titan GPUs to do the data processing. The system sees 70x performance increases over CPU-based systems, and can out crunch a 1000 node MapReduce cluster, in some cases. All for around $5,000 worth of hardware. Mostak plans to release the system under an open source license; you can play with a data set of 125 million tweets hosted at Harvard's WorldMap and see the millisecond response time." I seem to recall a dedicated database query processor that worked by having a few hundred really small processors that was integrated with INGRES in the '80s.

3 of 135 comments (clear)

  1. and the most amazing thing by roman_mir · · Score: 4, Funny

    as the TFS states he uses GPUs to do the data processing, but you are never going to believe what he uses to store the actual data, you won't believe it, that's why it's not mentioned in TFS. Sure sure, it's PostgreSQL, but the way the data was stored physically was in the computer monitor itself. Yes, he punched holes in computer monitors with a chisel and used punch card readers to read those holes from the screens.

    1. Re:and the most amazing thing by eyenot · · Score: 4, Funny

      Mod parent up!

      Also: I heard he's using the printer port for commuication. By spooling tractor feed paper between two printers in a loop, and by stopping and starting simultaneous paper-feed jobs, he can create a cybernetic feedback between the two printers that results in a series of quickly occurring "error - paper jam" messages that (due to two taped-down "reset" buttons) are quickly translated from the wide bandwidth analog physical matrix into kajamabits of digital codes. The perceived bandwidth gain is much higher than just a single one or zero at a time.

      That way, he can access the mainframe any time, from any physical location, and it will translate directly into a virtual presence.

      --
      "Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee
  2. Re:I'm not a computer scientist, and... by Anonymous Coward · · Score: 3, Funny

    Now however if your data is _independent_ then you can distribute the work out to each core.

    Let me translate this into a woman-baby analogy: if one woman can have a baby in 9 months, then 9 women can have 9 babies in 9 months. At first the challenge is joggling with the timing of dates and dividing the calendar for conception events as near as possible to each other to keep up the efficiency and synchronization. Afterwards the challenge is the alimony, paying up college and particularly the Thanksgiving, when the fruits of the labor come together.