Getting Students To Think At Internet Scale
Hugh Pickens writes "The NY Times reports that researchers and workers in fields as diverse as biotechnology, astronomy, and computer science will soon find themselves overwhelmed with information — so the next generation of computer scientists will have to learn think in terms of Internet scale of petabytes of data. For the most part, university students have used rather modest computing systems to support their studies, but these machines fail to churn through enough data to really challenge and train young minds to ponder the mega-scale problems of tomorrow. 'If they imprint on these small systems, that becomes their frame of reference and what they're always thinking about,' said Jim Spohrer, a director at IBM's Almaden Research Center. This year, the National Science Foundation funded 14 universities that want to teach their students how to grapple with big data questions. Students are beginning to work with data sets like the Large Synoptic Survey Telescope, the largest public data set in the world. The telescope takes detailed images of large chunks of the sky and produces about 30 terabytes of data each night. 'Science these days has basically turned into a data-management problem,' says Jimmy Lin, an associate professor at the University of Maryland."
Students are beginning to work with data sets like the Large Synoptic Survey Telescope, the largest public data set in the world. The telescope takes detailed images of large chunks of the sky and produces about 30 terabytes of data each night.
Err no it doesn't, and no they aren't. The telescope hasn't been built yet? First light isn't scheduled until late in 2015.
Al.
The Daily ACK - Eclectic posts by yet another hacker
thats absolutely not true. the process is vastly different when it comes to working with 100 MB or 10 petabytes. lets take databases for instance. if you have 100MB of data, you can just store the entire database on one server. when it comes to 100 PB of data, its even difficult to find the hardware capable of storing that much data. you need to start looking at distributed systems and distributed systems is such a broad field in itself.
when i graduated in 2005, a lot of the techniques i was taught worked great for working with database systems that handled a few hundred thousand rows. then i got a job at an internet company that had tables with over 80 million rows. all that normalization stuff i learned in school had to be thrown out. times may have changed now, but when i was in school, not only did i not learn how to handle "internet scale" data sets, i was taught the wrong methods to handle large data sets.
undergrad college students should at least get a basic intro to large data sets, if not have a class completely dedicated to learning on how to work with those data sets. school is supposed to prepare you for the work force. at least give the students the option to take a class that covers those topics if they want to go into those industries. i sure wish i had that option