U.C. Berkeley Offers Free "Big Data" Class This Week
pmdubs writes "The U.C. Berkeley AMPLab research group will be hosting a free 'Big Data Bootcamp' on-campus and online, August 21 and 22. The AMP Camp will feature hands-on tutorials on big data analysis using the AMPLab software stack, including Spark, Shark, and Mesos. These tools work hand-in-hand with technologies like Hadoop to provide high performance, low latency data analysis. AMP Camp will also include high level overviews of warehouse scale computing, presentations on several big data use-cases, and talks on related projects."
Now maybe some of the folks here will actually learn how Big Data methodologies work, rather than just spamming links to a strawman argument starring the word "web-scale"...
Aw, who am I kidding... this is Slashdot! A knee-jerk reaction with little forethought is not only the norm, but the mandate!
You do not have a moral or legal right to do absolutely anything you want.
FTA: "...and walk through’s..."
Can anyone shed some light on whether these technologies are niche/minor technologies, or whether they're actually popular / useful / used technologies?
"I've never heard of AMPLab" means just about nothing, given that I don't really spend a lot of time on Big Data. I recognize Hadoop (and MapReduce, Scala, etc,etc), but most of the technologies used in this class seems to be specific to Berkeley.
(I'm almost afraid to ask, given that there's a grand total of 13 comments and it's already 1/2 down the /. main page :( )
Some quick random googling turns this up:
ASUS P9X79
Support for up to 64GB of system memory with an 8-DIMM design
The sales literature says 32 GB/s ram speed. So in 2 seconds I can process and parse the crap out of 64 Gigabtyes of random unstructured text data. WOW!
How many businesses have a total data set even that big? (bean counting/ customer /text data, not pictures or video)
A 5 drive RAID and a quadcore motherboard with 64 Gigs running WIndows or Mac or LInux. That's a pretty simple standard readily available solution to what, 95% or 99% of 'Big Data' problems?
In 18 months, Moore's law will double that for the same price.
If I put my Marketing Hat on, I do not see Big Data as anything else but a small specialty market. And because it will never be mainstream, the big money will never be spent to make it easy to install, easy to program, easy to debug and easy on the enduser.
Oh ya. Uptime. These big clusters aren't very fault tolerant and run only a few days without something breaking. This ain't Novell.
iPhone and Android Users think 4 gig storage cards are big.
I have played with 'cluster' type computing. I tell you this. I will jump through a lot of hoops to make my application run on a single box under a mainstream OS before every again trying to keep a room full of boxes running.
Just moving all the right data around to the right nodes is a big pain that likes to break TCPIP stacks, routers, switches, OS's and the 4 gigabyte limit.
Getting the cluster 'booted' and 'booted reliably' each and 'every time' is a well earned excuse for much drinking.
Doing it 'in the cloud' just exchanges one set of problems for another.
Documentation is there, but it you got to be a true seeker, not deterred by anything.
I would take my estimate of project time and multiply it by 10 if the solution involves multiple boxes.
But if you need to query every byte in a terabyte in under a second, this is the only current solution.