I'm not sure what you mean about the labeling thing. Your labels could all be encrypted and shuffled so he doesn't even know which file is file 50 (although it turns out this is not necessary). The process I am describing is a form of private information retrieval if you want more information. It does not strictly require FHE, but is enabled by it.
The serve would always learn the circuit you are using, so he would be able to tell which function you are evaluating over your data. There are methods to hide the function as well, but that feature is not provided by basic FHE.
Not sure where you got this idea, it is certainly necessary that P != NP for almost all of cryptography. If P = NP, then there are no problems in NP which are easy to come up with but hard to solve. Cryptography relies crucially on this asymmetry, it should be easy for a legitimate user but hard for someone without the key.
Sure there are classes harder than NP, but it is necessary that our problems be in NP because otherwise it would be intractable for legitimate users. Take public key cryptography for example. The problem it is based on needs to be in NP because, given a ciphertext encrypted with one half of the key, you need to be able to efficiently verify (decrypt) it with the other half. If your encryption was in NEXP, but not in NP, it would take exponential time to decrypt even for users with the key. This is just one example, but I promise you that all our useful problems need to be in NP.
If our problems need to be in NP, then they also must not be in P. If P = NP, then it takes as long to verify a problem as it does to actually solve it. It would be just as fast to break an encryption as it is to decrypt with the key. A good explanation is in Impagliazzo's worlds. Interestingly, it necessary but not actually sufficient for P != NP. For instance, hard problems could exist but not trapdoor functions, which would make public key encryption impossible.
It's only possible if P != NP. Hashing itself is impossible if P=NP, along with all the rest of cryptography save for information theoretically secure constructions. You might as well say encryption only exists if P != NP.
If you search and the search returns results, you are probably leaking because an observer can keep track of what sectors of the disk the results came from, and what bits passed through the registers of the CPU as it executed compare instructions, and use that to gain information about the data using statistical analysis.
A homomorphic search would not operate the way you are thinking. It would have to compute over the entire set of possible results (otherwise, as you say, the adversary learns at least that some results are not possible) and piece by piece obliviously combine them. As a simplification, if you know your search is going to only return one result then you can have the server run a circuit over each file which evaluates whether that file will be the result or not. If it is, out comes an encrypted one, if it is not then it will be an encrypted zero. The server then multiplies the file itself by this encrypted result and adds it to a running total. All the files which weren't chosen will be zeroed out and the single file you want will end up in this "register", although the server has no idea which one you wanted and which ones were zeroed out.
Nobody said these things weren't possible, just that homomorphic encryption out of the box does not do them. There are recent techniques for functional encryption, which use FHE as a component, that allow these exact scenarios. As you pointed out though, you have to be very careful if you don't want to ruin security. The way they work now, you can supply a server with a specific token which allows him to evaluate one very specific function on your encrypted data and get the plaintext result of that function. For instance, you could give your email server a token which allows him to run spam filtering over your incoming emails and output a plaintext bit which is '1' if it is spam and '0' if it is not. The security property of these schemes is that you cannot learn any information other than the output of this function run over encrypted data. It is veeery tricky at this point because you could leak some dangerous information unknowingly, but the techniques do exist.
Nah because it is also semantically secure. For every plaintext value, there are lots and lots of different ciphertext values (it is a one-to-many mapping). So many, in fact, that knowing the plaintext value of any particular ciphertext doesn't give you any good information.
You do have the problem that anyone can come up with any valid ciphertext and jam it into your computation wherever they want, but we have some other cool stuff for verifying that only authenticated ciphertexts were used and that the function the guy evaluated was the one he was supposed to do. Lots of very cool stuff:-)
So, let me first say that the main selling point for this technology is that it allows you to outsource your computation. You can use a low powered device like a cell phone and take advantage of more powerful computation in the cloud, while maintaining data privacy. You are correct that certain things will be leaked, like if I am storing encrypted email and I search for all emails sent by so-and-so then the server would learn how many emails I have from that guy. This is still a huge advantage over what we have now.
Now, I can outline a cool use that you probably have not thought of which is a little different. Imagine that a server is storing some really sensitive stuff for me. Obviously I don't trust the server so I am encrypting all my files. If he is really sneaky, however, he can learn something about the contents of those files by watching when, where and how often I access them. We call this the access pattern, and usually people just write this off as a cost of doing business. However, with homomorphic encryption we can hide even that!
Since I can evaluate any program homomorphically over my data, I write a program that says "return file number x" and give it an encrypted value, say 50, for x. The server now evaluates this program, with my encrypted 50, over the entire set of files. What he gives back to me is my file that I wanted, but from his point of view he can't actually tell which file he gave me! All he knows is he ran a circuit over all the files in the database, with my input that specifies which one I want, but he can't tell what my input is because it is encrypted.
If your scheme is over a finite field, then E(a) - E(a) will just be a literal zero. Technically this is an encryption of zero, but it has not security as it is an encryption of zero under every possible key. Knowing it doesn't really give anything to an adversary.
It is not that the algorithms have not been well tested. Actually the "algorithms" do not really need to be tested as we have security proofs that reduce to hardness assumptions. The problem is that the hardness assumptions simply have not been around long enough to be thoroughly vetted. RSA is based on the hardness of the RSA assumption (related to the hardness of factoring) which has had 40 years to be broken. These new homomorphic encryption schemes have been around for at most 4 years, which is NOTHING in the scheme of things.
You're missing the point, this is not being released so that people can actually use it. Keys in this system measure in the gigabytes, and single multiplications take seconds. It is so researchers can apply their new ideas on how to make it more efficient without having to code this whole thing from scratch. It is a great help for research.
Sure, that's pretty easy. We can even do it without homomorphic encryption (do a google scholar search for encrypted keyword search, there is lots of stuff with varying levels of security and utility). A quick explanation, which might convince you, is that you can actually run any program you can think of using fully homomorphic encryption.
First, suppose that any program can be written down as a circuit (it can). Now, we know from boolean theory that AND gates and XOR gates are functionally complete, that is any circuit you can write with other gates, you can rewrite with only AND and XOR gates. You might know it with AND and NOT gates, but it is easy to show that XOR can actually implement NOT so that also works. Now, with two bits, AND is really just multiplying the bits (1 AND 1 = 1, any other combo has at least one zero and is 0). XOR, on the other hand is just adding the bits, with the weird case that 1 + 1 = 0, which actually is bitwise addition ignoring the carry (mod 2). So, now we can implement any program as a circuit, any circuit with only AND and XOR gates, and we can do those AND and XOR gates with addition and multiplication. Therefore, we can do anything we want over homomorphically encrypted ciphertexts, with a suitable compiler that will translate our program into those operations!
Now that is a very general construction, so it might not be particularly interesting to your problem, but what it does show is that we can really do anything over the encrypted inputs that we could do with unencrypted inputs. Since Google can run their search algorithm over your unencrypted query, they would also be able to do it over your encrypted query.
It can be made public key. It is also semantically secure, so knowing the plaintext value of any one ciphertext gives you no information about the value of other ciphertexts.
I can't say for sure, I am not familiar with that guy's work. There are many other systems that provide verifiable end-to-end voting (Punchscan and Scantegrity for example, which I have personall worked on) which do not require anything more than a secure symmetric key encryption scheme.
There is a famous voting system by Rivest which does use homomorphic encryption though, and I can explain to you why it still remains efficient. This article is specifically about fully homomorphic encryption schemes, which are universally very very slow. For a long time now, however, we have had singularly homomorphic encryption schemes, under which you can calculate either sums or products of ciphertexts, but not both. For instance, RSA is actually homomorphic under multiplication (c1^e * c2^e = (c1*c2)^e). There is another encryption called Paillier which is very well understood and is homomorphic under addition. Using that, we can have secure elections which require only homomorphic addition.
To see why this suffices, you have to know how such a system would actually work. In it's most basic form, the voters could encrypt a one for the candidate they want to vote for and a zero for the candidates they don't want to vote for. The judge then adds up all the ciphertexts to obtain an encrypted sum of the votes for each candidate. There are some subtleties to deal with, like how do you prevent someone from encrypting a 10 and getting more votes for their guy, but we can deal with these too. The point is, for election purposes we can get by with just a single homomorphism and it can actually be pretty fast.
I think you should be covered in this case just by IND-CPA security of your encryption, but I'm not 100% on that. The randomness is an input to your function, and the inputs should be hidden no matter what. I'll ponder it when I get a chance and see if I can't come up with a proof. What paper did you get this technique from by the way?
I don't think this release is targeted at general developers or people who just want to toy around with it. FHE is not anywhere near that level of maturity. It is mostly for researchers who are trying to make this stuff faster and want to try out some new theoretical twist that they came up with, but not have to code this whole mess of a thing from scratch. I haven't looked too deeply at the code, but it seems you need a good understanding of the ciphertext representation at least (which is very different from traditional ciphers) in order to make use of it.
Gotcha that makes sense. A lot less efficient but it doesn't require any additional assumptions. It doesn't seem to me that it would require function privacy, the encryption function should be public anyway. Also you have to be sure that your encryption homomorphically supports its own Eval function, which I don't think is guaranteed. Otherwise, you would not be able to do any homomorphic operations on your double-encrypted ciphertexts.
Gotcha:D It is actually not that bad to do division though, depending on your definition of division. Usually these schemes operate over a finite field, where division is just multiplication by an inverse element. If you mean floating point division, then yeah that would be a mess I suppose.
Fortunately we are way far off from that kind of reality. It takes on the order of seconds, at least, to do a homomorphic multiplication. To emulate an entire instruction set would be, conservatively, trillions of times slower than running it unencrypted.
Nah, this wouldn't work because the encryption would be deterministic. If I wanted to test whether you had encrypted a certain bitstring I could recreate the process you used and compare the results. The actual way they do it in the Gentry paper is to publish an encryption of 1 and a whole bunch of encryptions of 0. Then your encryption process is to pick a random subset of those zeroes, add them together to get a new encryption of zero and then add the one if you are encrypting a 1. If the sparse subset sum assumption is true, then it will be hard to determine whether this ciphertext is the result of summing a bunch of zeroes and a one or just a bunch of zeroes.
This is interesting for me as I am doing my PhD dissertation on homomorphic encryption. The problem we have, as a community, is that the computational assumptions used in these schemes are very new and we cannot come to a consensus as to whether the encryption schemes that result from them are going to remain secure 5, 10 or 20 years from now. For instance, NSA recommends that you only use AES, El Gamal, elliptic curve Diffie Hellman, and SHA-256. The community has butt loads of awesome and interesting protocols and techniques which use things not on this list. NSA is obviously being conservative because they have very important secrets to hide, but it leads to a problem where nothing can new can be "certified" to be secure. How do you even judge that something is "encrypted" if you are using a scheme that some grad student made up like a month ago?
It is a probabilistic encryption. That is, there are many different encryptions of the same value, to the point that knowing the plaintext that corresponds to a particular ciphertext does not give you any information. All modern encryption schemes are like this in order to prevent the attack you describe (or a similar one).
Couldn't the television just decompress as well in that case? I am not saying this has no use in DRM, but I have been doing my PhD work on homomorphic encryption and I personally cannot see it if it exists.
I think he just didn't read the sentence correctly.
I'm not sure what you mean about the labeling thing. Your labels could all be encrypted and shuffled so he doesn't even know which file is file 50 (although it turns out this is not necessary). The process I am describing is a form of private information retrieval if you want more information. It does not strictly require FHE, but is enabled by it.
The serve would always learn the circuit you are using, so he would be able to tell which function you are evaluating over your data. There are methods to hide the function as well, but that feature is not provided by basic FHE.
Not sure where you got this idea, it is certainly necessary that P != NP for almost all of cryptography. If P = NP, then there are no problems in NP which are easy to come up with but hard to solve. Cryptography relies crucially on this asymmetry, it should be easy for a legitimate user but hard for someone without the key.
Sure there are classes harder than NP, but it is necessary that our problems be in NP because otherwise it would be intractable for legitimate users. Take public key cryptography for example. The problem it is based on needs to be in NP because, given a ciphertext encrypted with one half of the key, you need to be able to efficiently verify (decrypt) it with the other half. If your encryption was in NEXP, but not in NP, it would take exponential time to decrypt even for users with the key. This is just one example, but I promise you that all our useful problems need to be in NP.
If our problems need to be in NP, then they also must not be in P. If P = NP, then it takes as long to verify a problem as it does to actually solve it. It would be just as fast to break an encryption as it is to decrypt with the key. A good explanation is in Impagliazzo's worlds. Interestingly, it necessary but not actually sufficient for P != NP. For instance, hard problems could exist but not trapdoor functions, which would make public key encryption impossible.
It's only possible if P != NP. Hashing itself is impossible if P=NP, along with all the rest of cryptography save for information theoretically secure constructions. You might as well say encryption only exists if P != NP.
If you search and the search returns results, you are probably leaking because an observer can keep track of what sectors of the disk the results came from, and what bits passed through the registers of the CPU as it executed compare instructions, and use that to gain information about the data using statistical analysis.
A homomorphic search would not operate the way you are thinking. It would have to compute over the entire set of possible results (otherwise, as you say, the adversary learns at least that some results are not possible) and piece by piece obliviously combine them. As a simplification, if you know your search is going to only return one result then you can have the server run a circuit over each file which evaluates whether that file will be the result or not. If it is, out comes an encrypted one, if it is not then it will be an encrypted zero. The server then multiplies the file itself by this encrypted result and adds it to a running total. All the files which weren't chosen will be zeroed out and the single file you want will end up in this "register", although the server has no idea which one you wanted and which ones were zeroed out.
Nobody said these things weren't possible, just that homomorphic encryption out of the box does not do them. There are recent techniques for functional encryption, which use FHE as a component, that allow these exact scenarios. As you pointed out though, you have to be very careful if you don't want to ruin security. The way they work now, you can supply a server with a specific token which allows him to evaluate one very specific function on your encrypted data and get the plaintext result of that function. For instance, you could give your email server a token which allows him to run spam filtering over your incoming emails and output a plaintext bit which is '1' if it is spam and '0' if it is not. The security property of these schemes is that you cannot learn any information other than the output of this function run over encrypted data. It is veeery tricky at this point because you could leak some dangerous information unknowingly, but the techniques do exist.
Nah because it is also semantically secure. For every plaintext value, there are lots and lots of different ciphertext values (it is a one-to-many mapping). So many, in fact, that knowing the plaintext value of any particular ciphertext doesn't give you any good information.
:-)
You do have the problem that anyone can come up with any valid ciphertext and jam it into your computation wherever they want, but we have some other cool stuff for verifying that only authenticated ciphertexts were used and that the function the guy evaluated was the one he was supposed to do. Lots of very cool stuff
So, let me first say that the main selling point for this technology is that it allows you to outsource your computation. You can use a low powered device like a cell phone and take advantage of more powerful computation in the cloud, while maintaining data privacy. You are correct that certain things will be leaked, like if I am storing encrypted email and I search for all emails sent by so-and-so then the server would learn how many emails I have from that guy. This is still a huge advantage over what we have now.
Now, I can outline a cool use that you probably have not thought of which is a little different. Imagine that a server is storing some really sensitive stuff for me. Obviously I don't trust the server so I am encrypting all my files. If he is really sneaky, however, he can learn something about the contents of those files by watching when, where and how often I access them. We call this the access pattern, and usually people just write this off as a cost of doing business. However, with homomorphic encryption we can hide even that!
Since I can evaluate any program homomorphically over my data, I write a program that says "return file number x" and give it an encrypted value, say 50, for x. The server now evaluates this program, with my encrypted 50, over the entire set of files. What he gives back to me is my file that I wanted, but from his point of view he can't actually tell which file he gave me! All he knows is he ran a circuit over all the files in the database, with my input that specifies which one I want, but he can't tell what my input is because it is encrypted.
If your scheme is over a finite field, then E(a) - E(a) will just be a literal zero. Technically this is an encryption of zero, but it has not security as it is an encryption of zero under every possible key. Knowing it doesn't really give anything to an adversary.
It is not that the algorithms have not been well tested. Actually the "algorithms" do not really need to be tested as we have security proofs that reduce to hardness assumptions. The problem is that the hardness assumptions simply have not been around long enough to be thoroughly vetted. RSA is based on the hardness of the RSA assumption (related to the hardness of factoring) which has had 40 years to be broken. These new homomorphic encryption schemes have been around for at most 4 years, which is NOTHING in the scheme of things.
You're missing the point, this is not being released so that people can actually use it. Keys in this system measure in the gigabytes, and single multiplications take seconds. It is so researchers can apply their new ideas on how to make it more efficient without having to code this whole thing from scratch. It is a great help for research.
Sure, that's pretty easy. We can even do it without homomorphic encryption (do a google scholar search for encrypted keyword search, there is lots of stuff with varying levels of security and utility). A quick explanation, which might convince you, is that you can actually run any program you can think of using fully homomorphic encryption.
First, suppose that any program can be written down as a circuit (it can). Now, we know from boolean theory that AND gates and XOR gates are functionally complete, that is any circuit you can write with other gates, you can rewrite with only AND and XOR gates. You might know it with AND and NOT gates, but it is easy to show that XOR can actually implement NOT so that also works. Now, with two bits, AND is really just multiplying the bits (1 AND 1 = 1, any other combo has at least one zero and is 0). XOR, on the other hand is just adding the bits, with the weird case that 1 + 1 = 0, which actually is bitwise addition ignoring the carry (mod 2). So, now we can implement any program as a circuit, any circuit with only AND and XOR gates, and we can do those AND and XOR gates with addition and multiplication. Therefore, we can do anything we want over homomorphically encrypted ciphertexts, with a suitable compiler that will translate our program into those operations!
Now that is a very general construction, so it might not be particularly interesting to your problem, but what it does show is that we can really do anything over the encrypted inputs that we could do with unencrypted inputs. Since Google can run their search algorithm over your unencrypted query, they would also be able to do it over your encrypted query.
It can be made public key. It is also semantically secure, so knowing the plaintext value of any one ciphertext gives you no information about the value of other ciphertexts.
I can't say for sure, I am not familiar with that guy's work. There are many other systems that provide verifiable end-to-end voting (Punchscan and Scantegrity for example, which I have personall worked on) which do not require anything more than a secure symmetric key encryption scheme.
There is a famous voting system by Rivest which does use homomorphic encryption though, and I can explain to you why it still remains efficient. This article is specifically about fully homomorphic encryption schemes, which are universally very very slow. For a long time now, however, we have had singularly homomorphic encryption schemes, under which you can calculate either sums or products of ciphertexts, but not both. For instance, RSA is actually homomorphic under multiplication (c1^e * c2^e = (c1*c2)^e). There is another encryption called Paillier which is very well understood and is homomorphic under addition. Using that, we can have secure elections which require only homomorphic addition.
To see why this suffices, you have to know how such a system would actually work. In it's most basic form, the voters could encrypt a one for the candidate they want to vote for and a zero for the candidates they don't want to vote for. The judge then adds up all the ciphertexts to obtain an encrypted sum of the votes for each candidate. There are some subtleties to deal with, like how do you prevent someone from encrypting a 10 and getting more votes for their guy, but we can deal with these too. The point is, for election purposes we can get by with just a single homomorphism and it can actually be pretty fast.
I think you should be covered in this case just by IND-CPA security of your encryption, but I'm not 100% on that. The randomness is an input to your function, and the inputs should be hidden no matter what. I'll ponder it when I get a chance and see if I can't come up with a proof. What paper did you get this technique from by the way?
I don't think this release is targeted at general developers or people who just want to toy around with it. FHE is not anywhere near that level of maturity. It is mostly for researchers who are trying to make this stuff faster and want to try out some new theoretical twist that they came up with, but not have to code this whole mess of a thing from scratch. I haven't looked too deeply at the code, but it seems you need a good understanding of the ciphertext representation at least (which is very different from traditional ciphers) in order to make use of it.
Gotcha that makes sense. A lot less efficient but it doesn't require any additional assumptions. It doesn't seem to me that it would require function privacy, the encryption function should be public anyway. Also you have to be sure that your encryption homomorphically supports its own Eval function, which I don't think is guaranteed. Otherwise, you would not be able to do any homomorphic operations on your double-encrypted ciphertexts.
Gotcha :D It is actually not that bad to do division though, depending on your definition of division. Usually these schemes operate over a finite field, where division is just multiplication by an inverse element. If you mean floating point division, then yeah that would be a mess I suppose.
Fortunately we are way far off from that kind of reality. It takes on the order of seconds, at least, to do a homomorphic multiplication. To emulate an entire instruction set would be, conservatively, trillions of times slower than running it unencrypted.
Yes, unfortunately the size of the key is measured in gigabytes and it takes several seconds to compute one multiplication.
Nah, this wouldn't work because the encryption would be deterministic. If I wanted to test whether you had encrypted a certain bitstring I could recreate the process you used and compare the results. The actual way they do it in the Gentry paper is to publish an encryption of 1 and a whole bunch of encryptions of 0. Then your encryption process is to pick a random subset of those zeroes, add them together to get a new encryption of zero and then add the one if you are encrypting a 1. If the sparse subset sum assumption is true, then it will be hard to determine whether this ciphertext is the result of summing a bunch of zeroes and a one or just a bunch of zeroes.
Correct, but how does this relate to the above comment?
This is interesting for me as I am doing my PhD dissertation on homomorphic encryption. The problem we have, as a community, is that the computational assumptions used in these schemes are very new and we cannot come to a consensus as to whether the encryption schemes that result from them are going to remain secure 5, 10 or 20 years from now. For instance, NSA recommends that you only use AES, El Gamal, elliptic curve Diffie Hellman, and SHA-256. The community has butt loads of awesome and interesting protocols and techniques which use things not on this list. NSA is obviously being conservative because they have very important secrets to hide, but it leads to a problem where nothing can new can be "certified" to be secure. How do you even judge that something is "encrypted" if you are using a scheme that some grad student made up like a month ago?
It is a probabilistic encryption. That is, there are many different encryptions of the same value, to the point that knowing the plaintext that corresponds to a particular ciphertext does not give you any information. All modern encryption schemes are like this in order to prevent the attack you describe (or a similar one).
Couldn't the television just decompress as well in that case? I am not saying this has no use in DRM, but I have been doing my PhD work on homomorphic encryption and I personally cannot see it if it exists.