MIT Develops Algorithm To Accelerate Neural Networks By 200x (extremetech.com)
An anonymous reader quotes a report from ExtremeTech: MIT researchers have reportedly developed an algorithm that can accelerate [neural networks] by up to 200x. The NAS (Neural Architecture Search, in this context) algorithm they developed "can directly learn specialized convolutional neural networks (CNNs) for target hardware platforms -- when run on a massive image dataset -- in only 200 GPU hours," MIT News reports. This is a massive improvement over the 48,000 hours Google reported taking to develop a state-of-the-art NAS algorithm for image classification. The goal of the researchers is to democratize AI by allowing researchers to experiment with various aspects of CNN design without needing enormous GPU arrays to do the front-end work. If finding state of the art approaches requires 48,000 GPU arrays, precious few people, even at large institutions, will ever have the opportunity to try.
Algorithms produced by the new NAS were, on average, 1.8x faster than the CNNs tested on a mobile device with similar accuracy. The new algorithm leveraged techniques like path level binarization, which stores just one path at a time to reduce memory consumption by an order of magnitude. MIT doesn't actually link out to specific research reports, but from a bit of Google sleuthing, the referenced articles appear to be here and here -- two different research reports from an overlapping group of researchers. The teams focused on pruning entire potential paths for CNNs to use, evaluating each in turn. Lower probability paths are successively pruned away, leaving the final, best-case path. The new model incorporated other improvements as well. Architectures were checked against hardware platforms for latency when evaluated. In some cases, their model predicted superior performance for platforms that had been dismissed as inefficient. For example, 7x7 filters for image classification are typically not used, because they're quite computationally expensive -- but the research team found that these actually worked well for GPUs.
Algorithms produced by the new NAS were, on average, 1.8x faster than the CNNs tested on a mobile device with similar accuracy. The new algorithm leveraged techniques like path level binarization, which stores just one path at a time to reduce memory consumption by an order of magnitude. MIT doesn't actually link out to specific research reports, but from a bit of Google sleuthing, the referenced articles appear to be here and here -- two different research reports from an overlapping group of researchers. The teams focused on pruning entire potential paths for CNNs to use, evaluating each in turn. Lower probability paths are successively pruned away, leaving the final, best-case path. The new model incorporated other improvements as well. Architectures were checked against hardware platforms for latency when evaluated. In some cases, their model predicted superior performance for platforms that had been dismissed as inefficient. For example, 7x7 filters for image classification are typically not used, because they're quite computationally expensive -- but the research team found that these actually worked well for GPUs.
Because someone is trying to read the article
Sorry Trump traitors!
Even the summary says that the 200x improvement is the learning cycle. The actual execution speed is less than 2x faster.
Apparently no one has heard of Wolpert's No-Free-Lunch-Theorem for search. It says then when averaged over all use cases no search algorithm out performs another (provided resources are not an issue). So one can have more resource efficient searches and one can have search algorithms that do better on some problems than others. It's great when you find a class of problems your search method is optimal for. But in general, no. Can't be done.
TO get a 200x speed up on the test set they must have a 200x slow down on average elsewhere.
That said this could be really useful for a large class of practical problems. So it's the hyperbole that is the bullshit not the research.
Some drink at the fountain of knowledge. Others just gargle.
It's not hard to accelerate a process, if you are allowed to leave away details without caring for their importance. The hard part is to not lose quality in the process.
TFA doesn't seem to say anything about that.
that would do it?
And sheep.
CNN is fake news
I though it was saying the algorithm would work before 2010.
That's the gist of it.
The teams focused on pruning entire potential paths for CNNs to use, evaluating each in turn. Lower probability paths are successively pruned away, leaving the final, best-case path.
Now, i don't know about AI that much, so in this scenario it may be totally different.
With chess engines, early pruning is a recipe for disaster. You might fool beginner players, you won't fool GM's. Pruning in general is already questionable, as you are deleting possibilities based on assumptions that are in turn based on a limited subset of your data. Early pruning leads to big performance gains but may also easily overlook possibilities because certain paths are assumed wrong. Just because sometimes something that appears to be wrong is actually good.
As said, it may or may not apply to AI, but it looks it does as they use similar words to describe a similar process. If it sounds too good to be true. Performance gains by a factor of 200 smells like red flags.
A glitch a day keeps the bugs away.
The two links are to the summary page and downloading the same PDF, it doesn't actually link to the 2nd pdf or review page :(
Sorting is not search. Search is minimization of an objective function. Take a CS course? If not don't pretend you understand something you don't know.
When I was a senior in high school in Texas, students started asking each other where they were going to go to college. There was a fellow named Allen in my electronics class that I asked where he was going. He said MIT. I asked him what and where it was and he was surprised to see that I had never heard of it. I had an even greater surprise on my face when he told me that it was in Boston. No respectable Texan would ever consider going to school anywhere in Yankeedom, as far as I was concerned. Of course, having heard about MIT for many years since, I now consider it to possibly be the ultimate of all the world's universities.