IBM Claims Breakthrough In Analysis of Encrypted Data
An anonymous reader writes "An IBM researcher has solved a thorny mathematical problem that has confounded scientists since the invention of public-key encryption several decades ago. The breakthrough, called 'privacy homomorphism,' or 'fully homomorphic encryption,' makes possible the deep and unlimited analysis of encrypted information — data that has been intentionally scrambled — without sacrificing confidentiality." Reader ElasticVapor writes that the solution IBM claims "might better enable a cloud computing vendor to perform computations on clients' data at their request, such as analyzing sales patterns, without exposing the original data. Other potential applications include enabling filters to identify spam, even in encrypted email, or protecting information contained in electronic medical records."
I hope IBM won't be working with NSA on this one too!
Have you seen the new neighbours. I think they're homomorphic.
"might better enable a cloud computing vendor to perform computations on clients' data at their request, such as analyzing sales patterns, without exposing the original data. Other potential applications include enabling filters to identify spam, even in encrypted email, or protecting information contained in electronic medical records."
Right, because we've already figured out everything about cloud computing and it's a totally stable environment ready to be deployed in every company around the globe. Time to take it to the next step.
DRM that works.
"perform computations on clients' data at their request, such as analyzing sales patterns"
Or without their request.
A Magic the Gathering Article and Forum Aggregator
The abstract for Gentry's article can be found at: http://doi.acm.org/10.1145/1536414.1536440
then that form of encryption is useless for highly sensitive information.
It's as simple as that.
There are 1.1... kinds of people.
Okay, maybe I'm a noob when it comes to encryption, but I was under the impression that if you were able to read the encrypted email, you were probably able to read the encrypted recipient address too. Is there something I'm missing here?
First data is created and deemed private. It's then encrypted to prevent unauthorized people from seeing the private data. But other people want to analyze the data, without actually seeing it. Kind of like how it's good to know the income demographics of a city, but not to know the personal income of every person in said city. Never mind the fact that someone had to see the information at some point to render the statistics. Unless of course the results are never audited for accuracy. But's that's another story.
Anyway, so then we put our private data on someone else's server (we call it a 'cloud' - isn't that nice?) and they can't look at it because it's encrypted. But they will run analysis on the data, which requires a special tool to or something....look - all's I know is I get's me my KPI's for my board meetin'. Who really cares if some guys personal health records are sniffed by some computer...whoa! Jim has ass herpes?! I am so updating this on the Interwebz!
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
Apparently the logic is missed altogether:
"makes possible the deep and unlimited analysis of encrypted information - data that has been intentionally scrambled - without sacrificing confidentiality." is a conflicting phrase in and of itself.
If you can analyze the data "with or without confidentiality", you have already sacrificed the confidentiality. Or does nobody remember the aol search results fiasco?
This is more like "we can crack our own encrpytion, as if people are surprised".
Just FYI this site is whole sale cut and paste ripping IBM press off.
http://www-03.ibm.com/press/us/en/pressrelease/27840.wss
I bet multi-modal reflection sorting can determine what the confidential info is.
Abstractly, we accomplish this by enabling the encrypter to start the decryption process, leaving less work for the decrypter, much like the server leaves less work for the decrypter in a server-aided cryptosystem.
Umm... That doesn't ease my concerns about this already disturbing prospect..
Yeah, you can perform calculations on encrypted data without unencrypting it. But it's just a LITTLE slow. The first step is just showing it can be done, but it's a very long way from useful.
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
"IBM Research is home to the largest team of cryptography researchers outside of the academic and government communities. For more information about IBM Research, please visit http://www.research.ibm.com/."
Could someone explain how this "breakthrough might also one day enable computer users to retrieve information from a search engine with more confidentiality"? (Quoted from the article.)
From the article it seems as if this aids in the scanning and searching of encrypted data rather than securing anything. While this might ultimately benefit large corporations or governments hiring third parties to perform analyses, is it of any use to the rest of us?
Even a car analogy would be great. Umm, make that "acceptable".
Cool, but I'm half-convinced that holes will be found. The first time a new encryption scheme is put to the test, it usually fails. Still, hopefully, it'll lead to a truly secure scheme.
PHEM - party like it's 1997-2003!
You can not analyze the data. You can perform calculations on it without knowing what it is. So, for instance, you could encrypt all your tax info, send it to a company that processes the encrypted data without decrypting it, and sends you back your encrypted tax return, without ever having seen any of your financial detail.
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
f(x) = x
Why, are you trying to keep all the gay fun for yourself?!
<SEINFELD>No that there's anything wrong with that!</SEINFELD>
sexy topic... a DEEP, probative analysis of sorts. I bet the analyzers are as gay/giddy as they can morph into... Maybe they can be TRANS-FORMERS.... slap some dap and scrum some bum...
Previously: "Linux... Toward the Sunrise..." Now: "Linux... Toward the-- No, now, part of Every Sunrise"
OK, it looks like a lot of people are missing the point.
What Gentry figured out was a scheme for carrying out arbitrary computations on encrypted data, producing an encrypted result. That way, you can do your computation on encrypted data in the "cloud", but only you can view the results.
If E() is your encryption function, x is your data, and f() is the function you'd like to compute, homomorphic encryption gives you a function f'() such that f'(E(x)) = E(f(x)). But at no point does it actually decrypt your data.
This could be huge for secure computing.
If a human (or computer) looks at the "analyses" performed on encrypted data and can make the same inferences as if the analyses were performed on the unencrypted data, then the encryption process hasn't reduced the possibility of disclosure that arises through analysis (or data access).
Simple analyses like 1) produce a 2x2 table of outcomes or 2) produce a linear regression of an outcome on covariates can be used to identify information about individuals or even identify individuals.
I've seen a push by IT firms to focus on the encryption/transmission issue when discussing privacy concerns. While important, it's only a piece of the HIPAA/privacy issue. If a human being/computer can make correct inferences from analyses of data (encrypted or not), then there is a potential for disclosure that is not covered by the encryption process.
It is extremely difficult to determine a priori what analyses could/would reveal identities or information about individuals. Perhaps simple pre-defined two-way tables/one-way tables could be created for a given fixed set of information. A general purpose non-disclosive analysis system doesn't yet exist (if ever).
iaas (I am a statistician)
but after RTFA my suspicions may be justified:
Two fathers of modern encryption...
ok, so if your sales are tanking your competitors can analyse your encrypted offsite data to find out your sales are tanking. that about it?
up to you but i'll pass.
TFA doesn't seem clear on this point, but what the name of the technique implies is that you can perform the operation, but neither the inputs nor the outputs are ever decrypted. So if you can't see the question, and you can't see the answer, then why would you perform the operation other than at the request of someone who can (i.e. the client)?
Example, I want the total sales figures for all the left handed employees. I cobble together the appropriate SQL processing request send it to my cloud server which rummages throught the data base summing up the figures for some subset of the fields. It sends me back just the sum, encypted. It never knows which employees it was selecting nor any of their sales figures or even the sum. It just has the encrypted result that it sends to me all processed.
otherwise I'd have to pull every encrypted record of every employee across the web to my computer. In effect doing a table dump across the net for even the smallest calculation. Yuck! let alone not working on my iphone.
Another application is encryption of ballots. The achilies heel of almost every voting encryption algorithm is that somewhere somebody has the key and to do any processing you have to decrypt the ballots. THis way you can do the sum and only decrypt the results. you could publish the encrypted sum before the key is even unsealed then publish the key. The key does not have to ever be in the hands of the central tabulators.
Current voting encryption schemes require centralized control with possession of the keys. This way the control is decentralized with the counters and the key keepers not being the same person.
Some drink at the fountain of knowledge. Others just gargle.
Basically, IBM has created a set of cryptographic algorithms that allow fully homomorphic encryption. If you don't want your data to be analyzed, all you have to do is use an algorithm that doesn't support it. You'd want to do that anyway, since you'd want to use algorithms that are already considered strong, such as RSA and AES. Although RSA is homomorphic in theory, in practice it is not, since padding is used to prevent other weaknesses.
This is clearly a plot to turn our encrypted files gay.
Fully homomorphic encryption is a bit like enabling a layperson to perform flawless neurosurgery while blindfolded, and without later remembering the episode.
Oh, I get it! It's like when Dr. McCoy reinstalled Spock's brain. McCoy was an idiot before, got the 1337 skillz, and then forgot it all.
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
I don't suppose the researcher's name was Janik?
If telephones are outlawed, then only outlaws will have telephones.
Great...the net nannies and oppressive governments will have yet another censorship tool in their arsenal.
This article needs some clarification. In particular, a lot of the worried comments here show a lack of understanding of the word "homomorphic".
Here's a very simplified example of a homomorphism. I define a function
f(x) = 3x
This function is a homomorphism on numbers under addition. Its image "preserves" the addition operation. What I mean more precisely is
f(a) + f(b) = f(a + b)
That's pretty easy to verify for the function I've given.
Homomorphic encryption is interested in an encryption function f() that preserves useful computational operations. If we take my example as a very very simplified encryption then, say I have two numbers, 6, and 15, and I lack the computational power to do addtion, but I can encrypt my data with my key--3. (I'm generalizing my function to be multiplication by a key. And yes, for some reason I have the computational power to do multiplication. Humor me). I can encrypt my data, f(6) = 18 and f(15) = 45, and pass these to you, and ask you do do addtion for me. You'll do the addition, get 63, and pass this result to me, which I can then decrypt, which yields 21.
Now, my encryption here is very simple and very, very weak, but if you're willing to suspend disbelief, you'll note that the information I've allowed you to handle does not reveal either my inputs or my outputs. (In fact, with the particular numbers I've chosen, you might guess that my key is 9 instead of 3, (though relying on lucky choices or constraining myself to choices which have this property make my scheme rather useless))
If you generalize this to strong encryption and more useful computational operations, you begin to see how homomorphic encryption can be useful. One should note that, no, homomorphic encryption will not be a drop-in replacement for other forms of encryption. (Sending encrypted emails with homormorphic encryption would be unwise. An attacker can modify the data (though, if my understanding is correct, only with other data encrypted with the same key)) Homomorphic encryption simply fills a need that the other forms do not serve.
Hopefully you now also see how the article's use of the word "analysis" can be rather misleading. In particular, one of the earlier comments notes that it might be useful in allowing you to determine if different people's encrypted information is identical. By my understanding, homomorphic encryption would not allow this.
In any case, if my explanation is not enough, here's the wikipedia article.
If someone can detect spam in public-key encrypted email, how is this different from knowing my private key? Even if they promise they haven't read my mail, only "analysed" it, hasn't my encryption scheme been compromised?
In particular, hasn't the long-standing suspicion that public-key encryption reduces to a trivial case been confirmed? (A suspicion that arose when NSA dropped its objections to Phil Zimmerman's PGP scheme involving public-key encryption of the actual key being used to encipher plaintext using Triple-DES or IDEA or whatever it was. Either NSA knew how to crack a 256-bit key, or public-key encryption was already compromised.)
Similarly, to calculate the time required to discover Major Kong's OPE code prefix, given the 28 SAC B-52 bombers proceding to targets inside Russia on the Dr. Strangelove Big Board, we simply divide 26 cubed by 28, distributing a range of 628 prefix combinations to as many teams of radiomen who can send perhaps 20 prefices a minute to a single B-52. Major Kong's OPE prefix should turn up in 21 minutes, well within the deadline, provided his CRM 114 Discriminator isn't already toast. Now that's useful.
``Tension, apprehension & dissension have begun!'' - Duffy Wyg&, in Alfred Bester's _The Demolished Man_
If I understand the summary correctly, there was a previous Ask Slashdot article dealing with this idea:
"Encrypted But Searchable Online Storage?" (http://ask.slashdot.org/article.pl?sid=09/04/16/1953224)
It was of course instantly tagged as "impossible even in theory".
It has to be quite limited. Otherwise for example, lets suppose I have an integer (encrypted of course) and I have comparison and addition/subtraction and multiply/divide.
I can very easily find the encrypted values of both 0 (a-a for any a) and 1 (a/a)
I can now decrypt the data with repeated additions (or subtractions) of 1 and equality comparisons.
And, I don't see how you can prevent equality tests in the encrypted domain. You might have to calculate a Kernel but surely there is no way to prevent that.
So I don't see how the operations available can be as much as the usual operators on reals.
Squirrel!
A lot of respondents seem to have seized on a spurious notion of what this is all about. That isn't surprising since the Slashdot article and the press release and even the abstract are rather obscure. No sign of a preprint, but the same abstract shows up for a number of colloquiums in the last couple of months. The paper is from a proceedings, so it may itself not be especially profound.
The abstract says: "We propose a fully homomorphic encryption scheme -- i.e., a scheme that allows one to evaluate circuits over encrypted data without being able to decrypt. Our solution comes in three steps. First, we provide a general result -- that, to construct an encryption scheme that permits evaluation of arbitrary circuits, it suffices to construct an encryption scheme that can evaluate (slightly augmented versions of) its own decryption circuit; we call a scheme that can evaluate its (augmented) decryption circuit bootstrappable."
The encryption and compression literature tends to use the word "scheme" where others might say algorithm or transform. "Circuits" here is a term of art (maybe arising originally from actual physical circuits, as in the Enigma machine?)
"An encryption scheme that permits evaluation of arbitrary circuits" suggests only that the possessor of the private key can generate these arbitrary queries, not that anybody and their brother can scavenge the encrypted data. It isn't stated whether such a query also requires the plaintext. It would be pretty cool if one feature were to be able to discard the plaintext post-encryption.
The gimmick appears to be that the arbitrary circuit can include the decryption itself (the bootstrap part). Note that this feature is far more cool (assuming it works) than all the nonsense about cloud computing. Somehow the data are *arbitrarily* available to properly encoded queries without ever being exposed - even to the CPU performing the operations. This processor could be on the same machine, on some remote server, in the cloud or across the galaxy. How cool is that?
A few points that are missing from the summary:
* This is a new encryption algorithm, designed specifically to enable this sort of analysis. You can't do this with AES, RSA etc when they're properly implemented.
* You still can't get any information out without the secret key. The use case is like, the sender sends an encrypted message, the recipient's firewall virus scans it without seeing it, then the recipient decrypts it. But the firewall doesn't know whether any viruses were found. Or we'll all submit encrypted votes in an election, and the voting machines sum up all the votes without seeing them, then forward them to an election authority which sees only the sum.
* It's too slow for practical use. Polynomial-time, but the polynomial is large.
* It's not provably as secure as existing systems, and even if it were, lattice systems are notoriously difficult to actually build.
And yes, I am a cryptographer. And I've written a paper with Craig Gentry. And I don't recommend reading this one unless you're a cryptographer or you really like ideal lattices, because it's pretty long and ugly.
No, the Slashdot headline is, as usual, misleading. The article didn't really help explain the distinction either. This breakthrough doesn't help anybody break otherwise secure, non homomorphic cryptosystems and suddenly make them insecure. What the researcher did was be the first to create a fully homomorphic cryptosystem that allows the types of things described in the article, while still keeping certain desired information secure. This Wikipedia article gives a much better description of the issue, and you don't even really have to understand abstract algebra to understand that section.
This is nonsense: unlimited analysis being possible is the same thing as confidentiality being sacrificed.
Maybe there is something significant and important here, but TFA doesn't provide a clue as to what it is.
I downloaded the PDF paper and it says "We omit full details due to lack of space...". Doh!!!
What use is an ACM account when white papers "omit full details"?
Well I suppose they don't have to kill me as they omitted the full details... that's something at least.
Wouldn't this sort of analysis of the encrypted data potentially provide clues of its nature that would help in the decryption of the data? Seems to me that this new analysis method weakens ALL POSSIBLE encryption techniques...
enabling filters to identify spam, even in encrypted email
I would warmly appreciate them working on the unencrypted junk first..
Homomorphic encryption has been around for quite some time, but differing algorithms only performed certain mathematical functions. For example, Pallier homomorphic encryption will let you do addition in a fully homomorphic manner and there are others for multiplication, XOR, etc. Having done privacy research and written experiments using homomorphic schemes, the main problems with current homomorphic schemes are twofold:
1. Each algorithm is limited in the type of operations it performs, if the paper presented solves this and allows for all basic operations then there will be a way to privatize most any algorithm.
2. The algorithms are very very very slow. To be secure you must use a large key size (ie 4096 bits) making the otherwise small encrypted number quite large for mathematical operations. You must then take these large numbers and multiply, divide, bring to the power of, ect to properly perform the internal mathmatics. For example, in Pallier you multiply the two cryptotexts to add the numbers together [ p(3)*p(5)=p(3+5) ]. While inclreasing computer speed seems to eventually solve this problem, with faster computers come faster brute froce attacks so we need bigger keys... >
A few misconceptions continue to circulate here; let me try to shed some light.
First, the encryption system is apparently not practical in its current form. Maybe improvements will occur some day to make it practical, maybe not. It is still a major theoretical breakthrough because fully homomorphic encryption had often been thought to be impossible in the past. It has been a long sought goal in cryptography and it is remarkable to see it finally achieved. So in practice nobody is going to be doing spam filtering, income tax returns, or anonymous google searches any time soon.
Second, several people have gotten tripped up over an apparent weakness: if you can calculate E(X-Y) you can get an encryption of 0; if you can calculate E(X/Y) you can get an encryption of 1; and from these you could get other encryptions and potentially break the system. This idea fails for two reasons: first, it is a public-key system, so you don't need to go through all this rigamarole to get encryptions of 0, 1, or anything. In public key cryptography, anyone can encrypt data under a given key, without knowing any secrets. So it is already possible to get encryptions of known values, even without the special homomorphic properties. Second, in order for public key systems to be secure, they need to have a randomization property. In randomized encryption, there are multiple ciphertext values that encrypt the same plaintext. Basically, the encryption algorithm takes both the plaintext and a random value, and produces the ciphertext. Each different possible random value causes the same plaintext to go to a different ciphertext. The decryption algorithm nevertheless can take any of these different ciphertext values and produce the same plaintext.
This may be confusing because the most well known public key encryption system, RSA is not randomized. At the time it was invented, this aspect was not well understood. Shortly afterwards it became clear how important randomization is. Other encryption systems like ElGamal do use randomization, and RSA was adapted to allow randomization via what is called a "random padding" layer, known by the technical name PKCS-1. This adds the randomness which allows RSA to be used securely.
One other point is that people are getting hung up about what "fully" homomorphic encryption covers. Exactly what operations can you do? I think the best way to think of it is to go down to the binary level. We know that in our computers, at the lowest level everything is 1's and 0's. These get combined with elementary logical operations like AND, OR, NOT, XOR, and so on. Using these primitive operations, all the complexity of modern programs can be built up.
In the case of the homomorphic encryption, it is probably best to think of the values being encrypted in binary form, as encryptions of 1's and 0's. Keep in mind the point above about randomized encryption: all the encryptions of 1 look different, as do all the encryptions of 0. You can't tell whether a given value encrypts a 1 or a 0. Given these encrypted values, you can compute AND, OR, XOR, NOT and so on with these values, and get new encrypted values as the answers. You don't know the value of the outputs, they are encrypted. Only the holder of the private key, who originally encrypted the data, could decrypt the output. But you can continue to work with these output values, do more calculations with them, and so on.
Let me give an example of how you could do an equality comparison. Suppose you have two encrypted values and want to determine if they are the same. Recall that we are working in binary, so you actually have two sequences of encrypted bits; some are encrypted 1's and some are encrypted 0's, but you can't tell which. So the first thing you compute is the XOR of corresponding bits in the two values: XOR the 1st bits of each value; XOR the 2nd bits of each value, and so on. Now if the values are equal, the results are all encryptions of 0's. If the values are different, some of the results will be encryptions of 1's. But aga
Assume your get a encoded number XJSDASF and you don't know what it is, but you can operate with it and return a result.
You can do the operation XJSDASF - XJSDASF and get the encrypted value of zero.
You can do the operation XJSDASF / XJSDASF and get the encrypted value of one.
You can keep subtracting one to XJSDASF until you get the encrypted value of zero.
If the information is not too big (say, the amount of money in your bank account), I can find how much money do you have, and also return to you any integer value.
But I haven't read the article so I might be wrong.
Isn't there some restriction on your "f" function? For example, it might be nice to compute a diff between two encrypted files, but the resulting size of the diff could reveal a lot of information and thus make the system insecure.
Gentry did this work as part of his PhD at Stanford; IBM played a minor role, if any at all, in this research. The original article is copied word-for-word from this IBM press release: http://www-03.ibm.com/press/us/en/pressrelease/27840.wss
It's no surprise that IBM is trying to claim this as their work, but Craig has been giving talks on this topic for a while in academic settings; I was fortunate to be in the audience at Brown six weeks ago when he blew the audience's minds in a two-hour talk. The conference at which he officially unveiled this technique, STOC 2009, happened at the beginning of this month, so any way you slice it this IBM PR stunt is late to the game. The language in the press release is just depressing, as they transparently try to take credit for this seminal work that has very little to do with the company, and only admit at the very end that he just happened to visit IBM during one summer.
How do this handle branches?
If the algorithm to be performed have IF's in them, the computation would have to go down both paths to do the calculation securely. But is this possible with the proposed system?
If the both branches is not gone through, it will be easy to determine any input number, by comparing it to constants, and looking at the state.
A popular explanation of how it was done can be found here: http://www.mail-archive.com/cryptography@metzdowd.com/msg10571.html
how the flying fuck did this get modded off-topic? its a direct quote from TFA.
I need to get my eyes checked or my mind.
I read that as "an IBM researcher has solved the fully homoerotic erection"
GAH!