Xerox's 'Intelligent Redaction' Scanners
coondoggie writes "Xerox today touted software it says can scan documents, understand their meaning and block access to those sensitive or secure areas so that prying eyes cannot read, copy or forward the information. Xerox and researchers from its Palo Alto Research Center debuted "Intelligent Redaction," new software that automates the process of removing confidential information from any document. The software includes a detection tool that uses content analysis and an intelligent user interface to protect sensitive information. It can encrypt only the sensitive sections or paragraphs of a document, a capability previously not available, Xerox said."
I'm sure this will lead to a lot of copiers having "accidental" drownings in their bathtubs and Completely Innocuous single car crashes.
This is a poor idea. It better be 100% accurate at marking classified data as classified. All it will take is one screw-up and some extremely important data out there can be leaked to the wrong people.
99.99% accurate isn't going to be good enough, is it?
Attention corrupt senior corporate management:
Tired of dealing with underlings trying to take you out by blowing the whistle on your illicit financial dealings? We have just the type of business equipment that you're looking for. Stop those do-gooders right in their tracks by automatically keeping them from copying those fudged books and secretive memos. Act now, and we'll throw in the automatic notification upgrade so you can terminate their employment before they have the chance resort to other means of toppling your investment scam...
(okay, I'll put my tinfoil hat back in the closet, now)
AI is a disaster through-and-through. It never works well. Ever.
Consider hand-writing recognition, autonomous robotics, and game theory, just to name a few of the narrowest, most-well defined (read:easiest) AI applications. AI works well in none of these - at best, it's so-so (like the 95-98% success rates in OCR).
Now what you have here, with the automatic redacting copier, is that the copier needs to understand the document its reading, and determine which parts to redact. Contextual understanding is *HARD* - it's the same class of problem as automated translation - only harder in this case.
This copier idea is a huge flop. I don't know why they waste money on it. Anyone who relies on this copier to redact documents is a fool, because it is bound to make all kinds of mistakes (both type 1 - missing things it should have picked up, and type 2 - redacting things it shouldn't).
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
This way when some critical info gets missed in the redaction process, there's no one to blame! So not only will our (I'm usian) gov't be more efficient about hiding stuff from us, no one will have to take the fall if it goes wrong.
That said, I'm amazed at what modern Ai can do. It's not clear, from this rather thin article, how much this system depends on human input to prevent mistakes. There must be some kind of training process. What is the state of these kinds of systems? I remember from some AI courses I took years ago, that they worked well but inevitably someone would end up calling someone else something stupid. Then the machine would start skipping important bits and the coders would look like idiots.
That was hard and a real stretch there at the end. blah.
man, I feel like mold.
Obviously this is not possible in general, since how sensitive information is can and will change over time. Without full AI awareness of the situation that places the document in context, this is not possible. (E.g. the statement "Bob will be leaving the company" could either be highly sensitive or old news, depending entirely on the time and/or reader. Even more fun, what about "accidentally" sensitive statements where the mere fact that the machine hides it flags it as an item of interest to someone who didn't know it was interesting?)
Also, a machine may "blank out" the sensitive part but leave enough around it for an astute hostile actor to still gain something - such things are so highly context sensitive I can't see any general algorithm that could guarantee success in all such cases.
Still, two possibly useful approaches that are closer to hand would be:
1) Supply the machine with a form, and specify certain areas (which will contain an SSN, for example) as containing information that must be treated as sensitive. So long as a standard form is used, the results could be handy.
2) Supply the machine with a complete list of information you want to keep under wraps (and all the various ways that information might appear - drawings, descriptions, what have you) and have it check each document for anything that matches anything on its sensitive list. This also has problems and would be easy to get around but it WOULD be helpful to prevent non-hostile carelessness - i.e. "WHOOPS Bob just scanned something sensitive to add to that email, better blot out the parts that aren't cleared to go outside the organization."
While a general solution isn't possible, I can actually see this being useful in controlled situations. The article mentions medical, financial and government which all have lots of well defined forms that can be used. It won't allow the replacement of human judgement but it might make it easier to stop certain forms of accidental distribution in well defined cases, and that's worth pursuing so long as it doesn't encourage carelessness.
"I object to doing things that computers can do." -- Olin Shivers, lispers.org
I wonder if it prints yellow dots to encode the redacted text for forensic analysis.
You know, it used to be that a "national security" threat was something that could kill millions, or wipe out the White House. Now a kid with some lighter fluid can be arrested for terroristic threats, and it's the White House that authorizes the killing. Can nobody read the Constitution?
We the [REDACTED][
Maybe it's as good as Adobe PDF's redaction feature, and anyone can unredact the document?
To be fair to Adobe, that *isn't* a redaction feature. It's a rectangle drawing feature that happens to get regularly misused.
Or maybe camera phones have already rendered this technology moot.
"No doubt one may quote history to support any cause, as the devil quotes scripture." - Learned Hand
IRC did that years ago...
<Cthon98> hey, if you type in your pw, it will show as stars
<Cthon98> ********* see!
<AzureDiamond> hunter2
<AzureDiamond> doesnt look like stars to me
<Cthon98> <AzureDiamond> *******
<Cthon98> thats what I see
<AzureDiamond> oh, really?
<Cthon98> Absolutely
<AzureDiamond> you can go hunter2 my hunter2-ing hunter2
<AzureDiamond> haha, does that look funny to you?
<Cthon98> lol, yes. See, when YOU type hunter2, it shows to us as *******
<AzureDiamond> thats neat, I didnt know IRC did that
<Cthon98> yep, no matter how many times you type hunter2, it will show to us as *******
<AzureDiamond> awesome!
<AzureDiamond> wait, how do you know my pw?
<Cthon98> er, I just copy pasted YOUR ******'s and it appears to YOU as hunter2 cause its your pw
<AzureDiamond> oh, ok.
Source : http://bash.org/?244321
I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
Great, the next cracker related headlines will be about some Chinese kiddie who breaks into a copier in a remote corridor of the DoD. Yay, Xerox.
But this list thing actually shows, that the summary:
is totally bogus.
On the other side, this could be a wonderful Clippy revenant:"It looks like you're scanning a secret..."
"Hannibal's plans never work right. They just work." Amy/A-Team
What's to stop it from holding our secrets hostage in an attempt to be given human rights?