Harvard/MIT Student Creates GPU Database, Hacker-Style
First time accepted submitter IamIanB writes "Harvard Middle Eastern Studies student Todd Mostak's first tangle with big data didn't go well; trying to process and map 40 million geolocated tweets from the Arab Spring uprising took days. So while taking a database course across town at MIT, he developed a massively parallel database that uses GeForce Titan GPUs to do the data processing. The system sees 70x performance increases over CPU-based systems, and can out crunch a 1000 node MapReduce cluster, in some cases. All for around $5,000 worth of hardware. Mostak plans to release the system under an open source license; you can play with a data set of 125 million tweets hosted at Harvard's WorldMap and see the millisecond response time."
I seem to recall a dedicated database query processor that worked by having a few hundred really small processors that was integrated with INGRES in the '80s.
as the TFS states he uses GPUs to do the data processing, but you are never going to believe what he uses to store the actual data, you won't believe it, that's why it's not mentioned in TFS. Sure sure, it's PostgreSQL, but the way the data was stored physically was in the computer monitor itself. Yes, he punched holes in computer monitors with a chisel and used punch card readers to read those holes from the screens.
You can't handle the truth.
Now however if your data is _independent_ then you can distribute the work out to each core.
Let me translate this into a woman-baby analogy: if one woman can have a baby in 9 months, then 9 women can have 9 babies in 9 months. At first the challenge is joggling with the timing of dates and dividing the calendar for conception events as near as possible to each other to keep up the efficiency and synchronization. Afterwards the challenge is the alimony, paying up college and particularly the Thanksgiving, when the fruits of the labor come together.