The Double Life of Memory Exposed With Automata Processor
An anonymous reader writes "As Nicole Hemsoth over at HPCwire reports 'In a nutshell, the Automata processor is a programmable silicon device that lends itself to handing high speed search and analysis across massive, complex, unstructured data. As an alternate processing engine for targeted areas, it taps into the inner parallelism inherent to memory to provide a robust and absolutely remarkable, if early benchmarks are to be believed, option for certain types of processing.'"
Basically, the chip is designed solely to process Nondeterministic Finite Automata and can explore all valid paths of an NFA in parallel, hiding the whole O(n^2) complexity thing. Micron has a stash of technical documents including a paper covering the design and development of the chip. Imagine how fast you can process regexes now.
that's fast
Well, strictly speaking, you can already explore all paths of an NFA simultaneously. You simply convert it to a DFA, which each state representing a potential subset of states the NFA could be in. It requires exponentially more states, but the running time is equivalent.
It's so full of scripts that keep phoning home the annoyance couldn't possibly justify the maximum possible value gainable from reading the content.
I suppose that was meant to say O(2^n)? Emulating an NFA on a DFA requires exponentially more either memory or time.
See subject.
Every double-life must end in a double-death.
motive always equals results http://www.youtube.com/watch?v=MLO3NmGJuHg
Reminds me of Computational RAM.
There is some existing work on the same basic idea of massively parallelizing regex matching by doing all the NFA branches in parallel, but using a GPU. Now a GPU is not necessarily perfectly suited to this problem, but it does have the advantage of being a mass-market consumer product, which produces economies of mass-market scale that let the average GPU have a ton more processing units and RAM than this Automata processor does. Would be interesting to see if an NFA-specialized processor gets enough of a speedup to overcome the manufacturing advantage of just throwing a beefy GPU at the problem.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Yes, but how many Hashes/s can it crack?!?
The final obstacle to complete surveillance falls.
I skimmed over the article, but I couldn’t tell: Is this another example of someone choosing an O(n^2) algorithm over an equivalent O(n log n) algorithm and then patting themselves on the back because they speed up the O(n^2) algorithm through parallelism, even though it’s still slower than the O(n log n) on a single processor?
I can’t tell because they go on about NFA’s, which would be exponential in time compared to DFA’s. That’s a totally different thing.
From a brief scan, this is an interesting Space-time tradeoff. Definitely has close physical limits to scaling, and we're not that strong algorithmically on what to do with this type of chip. Should be good Ph.D. material for a few years at least.
O(n^2) is the complexity of the problem, not that one of the implementation. That means it is impossible to hide it. This thing just gives some level of parallelism, hence the performance looks a bit better. It is still O(n^2) and when the NFA expands the search tree beyond that parallelism, the thing is back to plain quadratic behavior.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Well you can run any NFA in linear time (special hardware or no)... you don't have to determinize. Simply start in the initial state and carry along the set of all states reachable: if your word is "abbbab" then you check which states can be reached by 'a', then which states can be reached from those with a 'b'.. etc. None of those sets is larger than the number of states (constant) and the input length sets a hard limit on the length of the computation. In effect you produce a run of the powerset automaton without constructing that automaton explicitly.
You can even check if the language is non-empty (simple depth-first search).
What you cannot do with an NFA (efficiently) is to check whether there's an input that's rejected. THAT requires exponential time (it's PSPACE complete).
My guess is that the hardware uses some sort of symbolic representation of the NFA to speed up use of that kind of thing. Or possibly to try speed up some of those PSPACE-hard stuff I mentioned... like non-universality (there exists an input that is not accepted), intersection and so forth.
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- Zawinski