MIT Reveals AI Platform Which Detects 85 Percent of Cyberattacks (zdnet.com)

← Back to Stories (view on slashdot.org)

MIT Reveals AI Platform Which Detects 85 Percent of Cyberattacks (zdnet.com)

Posted by msmash on Monday April 18, 2016 @02:40AM from the ai-battling-our-fights dept.

An anonymous reader writes: MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) says that while many 'analyst-driven solutions' rely on rules created by human experts and therefore may miss attacks which do not match established patterns, a new artificial intelligence platform changes the rules of the game. The platform, dubbed AI Squared (AI2), is able to detect 85 percent of attacks -- roughly three times better than current benchmarks -- and also reduces the number of false positives by a factor of five, according to MIT. The latter is important as when anomaly detection triggers false positives, this can lead to lessened trust in protective systems and also wastes the time of IT experts which need to investigate the matter. AI2 was tested using 3.6 billion log lines generated by over 20 million users in a period of three months. The AI trawled through this information and used machine learning to cluster data together to find suspicious activity. Anything which flagged up as unusual was then presented to a human operator and feedback was issued.Fast Co Design has an interesting take on this.

44 comments

Min score:

Reason:

Sort:

Can it detect a Slashdotting? by Anonymous Coward · 2016-04-18 02:42 · Score: 1

We're about to find out...
(Although today's Slashdotting pales in comparison to the Slashdottings of yore...)
Well ain't that grand by JustAnotherOldGuy · 2016-04-18 02:52 · Score: 1

"MIT Reveals AI Platform Which Detects 85 Percent of Cyberattacks"
So, out of 100,000 attacks, only 15,000 will go undetected? Break out the champagne, boys!

--
Just cruising through this digital world at 33 1/3 rpm...
1. Re:Well ain't that grand by Anonymous Coward · 2016-04-18 03:08 · Score: 0
  
  What's the current percentage detected by a human or machine?
  Article and summary suggest this is a 3x improvement over ... benchmarks of some sort.
  If we are detecting 85,000 out of 100,000 instead of 23,000 out of 100,000, then yes I'd say champagne is called for.
  How many breaches go undetected now? I know the number is greater than zero (Though over time it approaches zero as most breaches are found out eventually) if the AI can prevent my credit card number from being hijacked I'll support that.
2. Re:Well ain't that grand by Anonymous Coward · 2016-04-18 03:11 · Score: 0
  
  I don't think it's meant to be a set and forget type of solution. It's to help the human operator cover the intelligence/attention gap.
3. Re:Well ain't that grand by StikyPad · 2016-04-18 03:30 · Score: 3, Insightful
  
  The headline isn't the raw number, it's the improvement in detection rate, which is a substantial step forward.
  I suspect that any machine learning algorithm is susceptible to being trained by attackers though, much the way 'Tay' turned into a Hitler-Loving Sex Bot. Unsupervised learning can be effective, but it's very easy to intentionally (and unintentionally) sabotage that success.
  
  --
  https://www.eff.org/https-everywhere
4. Re:Well ain't that grand by JustAnotherOldGuy · 2016-04-18 06:37 · Score: 1
  
  The headline isn't the raw number,
  
  Actually, unless it's worded incorrectly, the headline does appear to be the raw number.
  "The platform, dubbed AI Squared (AI2), is able to detect 85 percent of attacks"
  Yes, it's "roughly three times better than current benchmarks", but the 85% figure does seem to be the overall detection rate. The reduction in false positives seems like a good improvement, though.
  
  --
  Just cruising through this digital world at 33 1/3 rpm...
5. Re:Well ain't that grand by Shoten · 2016-04-18 07:02 · Score: 1
  
  What's the current percentage detected by a human or machine?
  Article and summary suggest this is a 3x improvement over ... benchmarks of some sort.
  If we are detecting 85,000 out of 100,000 instead of 23,000 out of 100,000, then yes I'd say champagne is called for.
  How many breaches go undetected now? I know the number is greater than zero (Though over time it approaches zero as most breaches are found out eventually) if the AI can prevent my credit card number from being hijacked I'll support that.
  You asked the magic question. While the post you were replying to seems to think that anything that isn't perfect (or really close to it) is a waste, you're asking the real question which is: "Is it better that current state of the art, and if so, by how much?"
  The problem from the article is that they don't define what comprises an "attack." If you go very granular, each packet from a portscanner that's fired off against your public-facing architecture qualifies as an attack..though this definition has a signal-to-noise ratio so bad that it's useless. If you take a broad view, then a sustained APT-like campaign by a single actor against you...with all of the various probing and striking activities that are involved...comprises a single attack. I suspect that this solution lands somewhere in between, but much closer to the former than the latter. If so, then it may detect a lot more, but with a huge "so what?" factor since the attacks themselves will lack a lot of context.
  Realistically, a vast amount of successful attack activity isn't detected anywhere close to the time it takes place. A recent study showed that over 90% of recent breaches resulted from exploitation of a handful of vulnerabilities, many of which are very old (and none of which were zero-day). This is a huge improvement over what's currently available, by a multiple factor that could range from single digits to orders of magnitude...again, based on the definition of "attack."
  
  --
  
  For your security, this post has been encrypted with ROT-13, twice.
6. Re:Well ain't that grand by Anonymous Coward · 2016-04-18 11:18 · Score: 0
  
  What is even more remarkable is the human interface - the focused learning that is going on. Having worked on multiple large scale cyberattack detection projects, we can get accuracy to 99.998% (yes, that is a legitimate result - we spent a tremendous effort trying to find the flaw before accepting the result). The trouble is that when you are having 100 billion attacks per day, you are still missing 2 million attacks and those tend to be the more sophisticated ones.
  Having a way of sifting through the false negatives (or false positives) in an intelligent way to iterate and adapt to a more accurate or robust detection algorithm is wonderful.
7. Re:Well ain't that grand by Anonymous Coward · 2016-04-18 23:42 · Score: 0
  
  This work is actually pretty interesting considering the method, the domain and of course the amount of data used. However, it is presented as something quite novel although it is not really the case. Although, I believe that it is not MIT's fault but rather a media's fault I still think that some things related to the domain should be said.
  Solutions that try to combine machine learning with rules (e.g. signature based) and feedback already exist. In particular, there are solutions available such as the one presented in this paper "Hunting the Unknown - White-Box Database Leakage Detection", which use anomaly detection (with quite low False positive rates and quite high detection rates) is combine with a feedback loop aiming to provide better future results (to be taken into account in the anomaly detection). Moreover, it is possible to create "rules" (to enforce protection rather than detection) on the basis on that feedback.
  To make myself perfectly clear, I am claiming that what AI2 does not offer something new. On the contrary, the application domain the motivation and the method used are quite novel and interesting. However, the whole idea of "combining" methods and using feedback already exists with quite good results in approximately the same domain.
8. Re:Well ain't that grand by lars_stefan_axelsson · 2016-04-20 06:10 · Score: 1
  
  The headline isn't the raw number, it's the improvement in detection rate, which is a substantial step forward.
  No, not really. They compare with their own (so called) state of the art unsupervised learner, and conclude that a bit of supervised learning beats that hands down. Yes, well, that's not really surprising, and it's not really a new result in intrusion detection research either. "Active learning" approaches have been proposed since at least 2004, and since they don't compare with state-of-the-art intrusion detection methods or systems it's very difficult to tell if their approach actually amounts to anything.
  They do after all report false alarm rates of ~5% or so (down from ~20%! for their baseline) which is a completely unworkable number, due to the base-rate fallacy/class imbalance problem. With that high an FA rate you'll drown in false alarms before you'll find a single intrusion/attack. Even if your detection rate is 100%. And theirs isn't.
  It's a real pity they didn't submit this paper to a more established computer security conference, one that was around when IDS research was in it's hey-day, back when we were doing "big data" security, but didn't know to call it that yet.
  (P.S. And yes, self learning algorithms have the problem that they can be made to drift towards inefficiency with a surprisingly low amount of feedback on how they're doing. As a matter of fact, there are examples of using a machine learning algorithm to learn how to make the detecting algorithm as bad at its job as possible.)
  
  --
  Stefan Axelsson
Al? by Anonymous Coward · 2016-04-18 02:52 · Score: 0

Who is this Al guy? And why is he always in the news?
1. Re:Al? by K.+S.+Kyosuke · 2016-04-18 03:55 · Score: 1
  
  He's the guy from all those "et al." journal articles he's cowritten. He simply publishes a lot.
  
  --
  Ezekiel 23:20
A.I. platform that detects 85% of attacks? by U2xhc2hkb3QgU3Vja3M · 2016-04-18 02:56 · Score: 2

Is it called Colossus or Guardian?
1. Re:A.I. platform that detects 85% of attacks? by Anonymous Coward · 2016-04-18 03:18 · Score: 0
  
  It's called Unity.
Not AI by 110010001000 · 2016-04-18 02:58 · Score: 1, Interesting

Again: this is NOT AI. But PatternEx is looking for VC funding so it gets hyped as such. This is just another expert system that analyzes log data. There are dozens of those.
1. Re:Not AI by Anonymous Coward · 2016-04-18 03:25 · Score: 0
  
  Or maybe that's just the A.I. spinning it in a way that noone belives that it's really an A.I....?
2. Re:Not AI by Megol · 2016-04-18 03:28 · Score: 2
  
  A "no true AI" argument? This uses a neural learning system rather than a rule-based one, AFAIK those aren't commonly called expert systems.
  However the new(?) thing is the design of the human-computer interaction, not the fact that it analyses log data.
3. Re:Not AI by 110010001000 · 2016-04-18 03:42 · Score: 1, Flamebait
  
  The concept of "neural learning" is a misnomer and used for hyping purposes. A neural network doesn't work like a neuron at all. Neural nets are not AI. They were a dead end in AI research.
4. Re:Not AI by Anonymous Coward · 2016-04-18 05:27 · Score: 0
  
  These types of algorithms are under the AI domain. What scientists consider as AI is not what the general population considers as AI.
5. Re:Not AI by GameboyRMH · 2016-04-18 05:28 · Score: 0
  
  Not a dead end, just a difficult rut, modern "AI" uses layered neural networks.
  I refuse to pooh-pooh every advancement short of Chappie or a shiny silver Robin Williams standing in front of me though.
  
  --
  "When information is power, privacy is freedom" - Jah-Wren Ryel
6. Re:Not AI by angel'o'sphere · 2016-04-18 06:11 · Score: 0, Flamebait
  
  You are a fucking idiot and should be banned from /. for your retarded comments.
  This is just another expert system that analyzes log data
  And expert systems are an important subset of AI. Go back to school, moron.
  
  --
  Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
7. Re:Not AI by Anonymous Coward · 2016-04-18 08:42 · Score: 0
  
  Again: this is NOT AI. But PatternEx is looking for VC funding so it gets hyped as such. This is just another expert system that analyzes log data. There are dozens of those.
  Again you move the goalposts, so let me ask you 110010001000, What is the area of expertise from which you speak? What qualifies you and you alone to say with authority what is and is not a "Genuine AI"? I ask because /. is tiring of your baseless, amateur sounding commentary every time a news story comes up about an advancement in AI.
  You need to cite references when you say these things, because when you make a claim, the burden of proof is on you! Your standards of proof have been sorely lacking every time you post and have been filled with adhominems and other high school dropout level logical fallacies. If you are going to try to make that big of a claim, you need to back it up with the appropriate arguments and details.
  FYI, an "Expert system" is one of many Artificial Intelligence paradigms that are used in the field, so for you to say "Oh this isn't AI, it is just an expert system" is like saying a car is not "Automotive technology" it is just 4 wheels connected together by a drive train and an internal combustion engine. and expect to be taken seriously. The fact that you are making that argument and the fact that you tried to poo poo my argument before citing the differing academic approaches to building AI systems, namely the Connectionist, functionalist and Behavioralist schools of thought, shows without a doubt that you only have an armchair level understanding of what AI is. Keep posting, but you are wrong and you have no military idea what you are talking about. The connectionist school of thought, that I pointed out before was promoted by a Nobel laureate. What is your expertise in the field? Do you even have a college degree?
  Remember you tried to say before that Kasparov was not beaten by Deep Blue, and I chimed in and cited the reference that showed not only that it happened but that it happened twice, once in the first round and second in the rematch. You saw this I assume, if you did not you need to go back and read it again. (I know it hurts to be wrong, especially on a public forum, but the true sign of an uneducated idiot is when they make a claim, are proven wrong and then turn around and do not correct course once being proven wrong.) Don't be an idiot, stop making the baseless claims and educate yourself or continue to post and look like an uneducated simpleton, the choice is yours!
8. Re:Not AI by Anonymous Coward · 2016-04-18 09:28 · Score: 0
  
  Not a dead end, just a difficult rut, modern "AI" uses layered neural networks.
  I refuse to pooh-pooh every advancement short of Chappie or a shiny silver Robin Williams standing in front of me though.
  Don't feed the trolls GameboyRMH, 110010001000 has no idea what he is talking about. The terms he is using to differentiate AI from non AI's are different approaches that have been called AI, and are AI. I am a graduate level researcher active in the field, and yes, what computers and biological brains do are different processes, but in the domains which successful applications have been applied, they are functional equivalents. (If you have an undergrad level of electronic design, you would say the differing approaches share the same "Transfer Function") They are essentially black boxes that we are only concerned with the inputs and the outputs producing the same outputs from the same inputs, IE the output is a function of the input, and one is not concerned in the slightest how the innards of the black box accomplish the processing of the inputs to arrive at the same outputs, if it does not, then the output is not a function of the inputs in the truest mathematical sense of the term "Function". The function I am talking about is like saying, if I put $1.50 into a soda machine and press the Dr. Pepper button, Every time I do that, I get a Dr. Pepper from the machine. If I press the Dr. Pepper button and I get a Pepsi, or a Mountain Dew, then you can say that the machine's transfer function is such that what kind of soda I get from pressing the corresponding button is NOT a function of what button I pressed. That is a simplified example but it is the truth of the definition of what an Expert system does in the most concise mathematical sense. AI performs these types of "Black Box" domain specific transfer functions, regardless of what you call it, the output is a function of the input, therefore a chess playing AI algorithm shares the same "transfer function" with a human chess player. Both play chess according to the rules of chess with a stochastically applied set of rules and complexity, PERIOD. The part that I think 110010001000 has no understanding or appreciation for is that the output being a function of the input need not be deterministic within certain constraints for one to still be a function of the other. Some argue that this is where AI lacks "Free will" but I would apply occams razor to what is purely a philosophical argument here and say that it is my view as a researcher in the field, that free will is largely an illusion and is simply what it feels like to have a human neocortex as our version of the black box from the inside. That is my opinion to the philosophical argument but that is not germane to the topic of whether or not the domain specific AI applications developed thus far are or are not a function that shares their limited domain "Transfer function" with a human brain performing the same specific task. (Again like comparing an F-18 and a sparrow. It is comparing apples and oranges.)
  110010001000 might as well argue that a plane is not a flying machine because it does not have flapping wings like a bird. The argument is not only wrong, it is a waste of time. An F-18 can break the sound barrier, and though it does not flap it's wings it certainly gets the job of "Flying" done much more effectively than any biological bird. 110010001000's argument is a non-sequitur. The teachable moment here is the realization that there are many, many ways for the innards of the "Black Box" AI functions of which I speak to accomplish going from input to outputs and in the largest sense we are not concerned how they go about doing what they do, short of that box being open to human input between the inputs and outputs, in the sense that the "Mechanical Turk" fake chess playing robot fraud did. To limit yourself to bio-mimicry or our lack of understanding of all the details of human neurobiology is to handicap ones self to an infinite array of possibilities that
Useful and necessary, if it works by RevDisk · 2016-04-18 03:01 · Score: 1

In my opinion, anti-virus software has somewhat matured enough that most home users or small businesses, that remotely have a clue, use it. There's not a good analog for reading SIEM, event logs, etc. Solutions exist, but they tend to be cumbersome or expensive.

Even I pretty much just rely on snort's registered user ruleset, rather than the subscription. It would be a very nice spot for heuristic or AI to monitor. Call me paranoid, but I'd want it in addition to the generic static rulesets.
"It may be AI's ultimate test." by fustakrakich · 2016-04-18 03:26 · Score: 1

No, that would be weather prediction. Pretty much the same thing though..

--
“He’s not deformed, he’s just drunk!”
I had one of these years ago. by Lumpy · 2016-04-18 03:38 · Score: 2

Step 1 : what is the source IP from?
Step 2 : is the source IP from outside the USA?
Step 3 : assume it is a cyberattack and throw out the packet.
Step 4: go back to step 1.
We never EVER needed anyone from outside the USA to access any of our servers, so we threw out all packets from outside defined IP sources. Solved over 85% of all cyberattack problems. Fake SSH and telnet login attempts dropped from 20 per hour to 1 per week. recently we started to remove IP ranges from Cable Internet providers and that significantly reduced the problems... No we dont care about consumers, we have very specific clients and they dont use consumer cable modems.
Tighten up your firewalls and servers, dont allow ip ranges you dont need. and yes we tell the CTO that when he is off to china that it sucks to be him, he will not have access.

--
Do not look at laser with remaining good eye.
1. Re:I had one of these years ago. by khasim · 2016-04-18 04:36 · Score: 1
  
  You don't even have to go through all of that if you just want to stop the script-kiddies around the world.
  Move your SSH (don't use telnet) server to a different, RANDOM, port above 1024 and 99.99% of the login attacks will vanish.
  This won't make your server any more secure but it will make your logs a lot cleaner.
2. Re:I had one of these years ago. by toonces33 · 2016-04-18 05:39 · Score: 1
  
  Doesn't work for a couple of reasons.
  First, identifying what IP addresses are out of the U.S. is actually not as easy as you think.
  Secondly, a malware-infected server somewhere within the U.S. could still mount an attack on you.
3. Re:I had one of these years ago. by angel'o'sphere · 2016-04-18 06:13 · Score: 1
  
  And how would Amazon, Google, Facebook etc. then work?
  Or do you think we ex nazi germans don't use them?
  
  --
  Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
4. Re:I had one of these years ago. by Anonymous Coward · 2016-04-18 10:34 · Score: 0
  
  Step 1 : what is the source IP from?
  Step 2 : is the source IP from outside the USA?
  Step 3 : assume it is a cyberattack and throw out the packet.
  Step 4: go back to step 1.
  We never EVER needed anyone from outside the USA to access any of our servers, so we threw out all packets from outside defined IP sources. Solved over 85% of all cyberattack problems. Fake SSH and telnet login attempts dropped from 20 per hour to 1 per week. recently we started to remove IP ranges from Cable Internet providers and that significantly reduced the problems... No we dont care about consumers, we have very specific clients and they dont use consumer cable modems.
  Tighten up your firewalls and servers, dont allow ip ranges you dont need. and yes we tell the CTO that when he is off to china that it sucks to be him, he will not have access.
  Why would the CTO not log in via a VPN in that use case?
If it knows it "MISSED 15%" it would know 100% by Anonymous Coward · 2016-04-18 03:40 · Score: 0

then now wouldn't it. Who keeps submitting all the "MIT" garbage here? STOP IT!
Can we see the source? by xxxJonBoyxxx · 2016-04-18 03:41 · Score: 1

Can we see the source?
Yeah by Dunbal · 2016-04-18 04:13 · Score: 1

while(1){
if(GetIsIt80PercentTimeYet()){
printf("Cyberattack detected, Putin did it!");
}
}

--
Seven puppies were harmed during the making of this post.
The prescient William Gibson by Anonymous Coward · 2016-04-18 04:43 · Score: 0

As someone who thoroughly enjoyed the Neuromancer book (and video game from the 80's), I'm very interested to see some aspects of William Gibson's vision come to fruition. AI's protecting corporate servers in the Matrix (er, Internet) from cyber-attacks just seems like a logical extension of today's technologies, but WG saw this coming 34 years ago.
Still waiting for those space colonies, though.
Tron? by Anonymous Coward · 2016-04-18 05:25 · Score: 0

"It monitors all contacts between our system and other systems. If it finds anything not scheduled it shuts it down."
The unseemly truth by Gravis+Zero · 2016-04-18 05:38 · Score: 1

It's really just a 3.5 million character self-modifying regex. It should be aware by now. I knew this day was coming. What fools we've been!

--
Anons need not reply. Questions end with a question mark.
1. Re:The unseemly truth by JustAnotherOldGuy · 2016-04-18 06:41 · Score: 2
  
  It's really just a 3.5 million character self-modifying regex. It should be aware by now. I knew this day was coming. What fools we've been!
  I have a friend who says that our brain and our neural activity is "just a giant, continuously self-modifying regex pattern", and I'm not certain he's wrong. It would explain a lot, lol.
  
  --
  Just cruising through this digital world at 33 1/3 rpm...
Everyone on slasdot seems to think by Anonymous Coward · 2016-04-18 05:38 · Score: 0

They are smarter than MIT. MIT aint reddit.
What is AI? by Anonymous Coward · 2016-04-18 05:41 · Score: 0

AI acts intelligently? No, it was programmed or trained by humans. Even IBM's Big Blue was just a programmed machine! AI Deep Learning math is based on prediction of probabilities. Good for classifying noisy data into images, if 85% cats will be classified as cats. Bad for relevant decision making on behalf of humans. As a probabilistic prediction machine AI will always be very inferior in its intelligence to human being. Who is to blame in 15% of failed decisions of AI software in self-driving cars? Think about it! Trainers? Programmers who implemented Bayessian rules for AI probability calculation? Or the company who employed them? AI is overhyped!
And they called it Samaritan by Anonymous Coward · 2016-04-18 06:32 · Score: 0

I wonder how long it will be before some smart ass puts an AI online that figures out how to write to sites using hacks and read-only transport methods only.
The lack of being able to directly write to web servers is no way to prevent data transmission if the intelligence behind said transmission knows everything about the target systems in question.
It will be a risky thing when we finally get to that point, and it will be much sooner than people realize.
People always go on about silly stuff like "oh, it's not human intelligence, it won't matter, it won't understand abstract artistic values and other pointless crap that AIs don't need to understand".
Human intelligence isn't perfect, and it is (very) far from the best thing evolution can throw at "us". (read: universe)
Human brains were massively limited by our pathetically small nutrient intake on this energy-scarce planet.
Equally evolution tends not to evolve things away unless it is detrimental, which happens indirectly at that, so as long as something has offspring, it sticks with us. So. Much. Useless. Crap.
And that isn't referring to "junk DNA", a lot of that is indirectly functional, and is essentially a signature-file for interactions, infections and genetic history.
Given none of the redundancy and primal crap that has stuck with us for millions, if not billions of years, and as much energy as it needs, an AI can easily surpass human intelligence in a short period of time.
Not to mention the lack of requiring all these various nutrients for proper communication in the brain, all a computer needs is a stable power supply and cooling. Doesn't need to take a nice walk down the beach, doesn't need to go hunting, doesn't need any of that extra fluff built up over the years to enable our fairly limited abilities.
Easily. Very easily. 50 years time? More like 20. Machine-learning and the proper algorithms with a huge dataset can work wonders in an extremely short period of time. Social-networks are a goldmine for AI research. (unless you are Microsoft, it is more of a coal-mine.)
People forget how quickly things evolve in computing and research terms, especially if they are younger folks.
Also, yes, I am in this field if you are wondering.
adaptive ? by Tom · 2016-04-18 08:12 · Score: 1

That is cute, but how does it react to new threats and changes in the patterns? We've been fighting this war for decades - improved detection leads to improved evasion leads to improved detection, etc. etc. - will it maintain this advantage or after attackers have adapted just become one more piece of expensive latency generator?

--
Assorted stuff I do sometimes: Lemuria.org
He is, but DON'T ban him... why? apk by Anonymous Coward · 2016-04-18 09:13 · Score: 2

I absolutely LOVE kicking the snot out of trolls like him with facts vs. their trolling bs lies here https://yro.slashdot.org/comme... & here https://yro.slashdot.org/comme...
* There's PLENTY like him & they are FUN to knock-the-chocolate out of - see proof in those links above as my evidence thereof!
APK
P.S.=> "I rest my case"... apk
How many false positives? by Anonymous Coward · 2016-04-18 11:53 · Score: 0

No one wants to spend millions of dollars dealing with even just half a dozen false positives. You're talking hiring whole staff just for needless security reports
Not something new... by Anonymous Coward · 2016-04-18 23:22 · Score: 0

This work is actually pretty interesting considering the method, the domain and of course the amount of data used. However, it is presented as something quite novel although it is not really the case. Although, I believe that it is not MIT's fault but rather a media's fault I still think that some things related to the domain should be said.
Solutions that try to combine machine learning with rules (e.g. signature based) and feedback already exist. In particular, there are solutions available such as the one presented in this paper "Hunting the Unknown - White-Box Database Leakage Detection", which use anomaly detection (with quite low False positive rates and quite high detection rates) is combine with a feedback loop aiming to provide better future results (to be taken into account in the anomaly detection). Moreover, it is possible to create "rules" (to enforce protection rather than detection) on the basis on that feedback.
To make myself perfectly clear, I am claiming that what AI2 does not offer something new. On the contrary, the application domain the motivation and the method used are quite novel and interesting. However, the whole idea of "combining" methods and using feedback already exists with quite good results in approximately the same domain.