MIT Software Allows Queries On Encrypted Databases
Sparrowvsrevolution writes "CryptDB, a piece of database software that MIT researchers presented at the Symposium on Operating System Principles in October, allows users to send queries to an encrypted SQL database and get results without decrypting the stored information. CryptDB works by nesting data in several layers of cryptography (PDF), each of which has a different key and allows a different kind of simple operation on encrypted data. It doesn't work with every kind of calculation, and it's not the first system to offer this sort of computation on encrypted data. But it may be the only practical one. A previous crypto scheme that allowed operations on encrypted data multiplied computing time by a factor of a trillion. This one adds only 15-26%."
Why not just encrypt the database files on HDD and memory directly? That way database can still act really fast and you can use any existing database software.
Mine too... Perhaps AC isn't the way to go.
This is not really the first practical such system, nor have all previous systems been a trillion times slower. As seems to be a pattern with MIT press releases, the press release makes exaggerated claims, but the paper itself is actually quite good and gives proper credit where it's due, discussing a number of previous systems that implement related functionality, and some existing algorithms from the literature that they borrow and implement directly in CryptDB.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Order Preserving Encryption, how is it implemented? The paper page 4, simply lists that it exists and has a pointer to an article somewhere that I have no access.
I'm not understanding how this hides "known plaintext" attacks. Perhaps its not intended to. Like I said, I have no access to the footnoted OPE article. So, lets say you got a medical database of private health care info, where the diagnosis is a column. If you can sort it, all the folks with "aids" sort at the top, right above the "alcoholism" diagnosis, with the "worms, intestinal" and I suppose the "zoophilia" people at the bottom.
I suppose, the solution, is unless there is a business need to sort by diagnosis, you don't use OPE for that column, you use DET or if no need for "group by", then RND.
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
"MIT is overrated because I can't get into MIT."
Yeah. Keep telling yourself that.
Performing searches -- and other operations -- on encrypted data has some big potential to help protect our privacy in an age where people are losing control of their personal information. Since the data is *never* decrypted by the server, no information will be leaked even if the communication channel between the client and the server is compromised, or the server itself is hacked. If the query is also encrypted before sending, which is the case for most schemes, the server does not learn anything about the operation or the contents of the data. It makes central storage of data a LOT less risky (take note Sony).
In addition to searching, researchers have had some success in doing other operations on encrypted data, such as multiplication and addition. This means that if you encrypt the value 20, and multiply it by the encryption of 2, the decryption of the result will be 40! Pretty amazing, if you ask me. While processing power is still a major problem for most of these schemes (they are far from ready for production-level data volumes), the next 10 years in this field will surely be very exciting.
Mod parent up
I'm a dreamer, the world is my playpen. But hey, I'm a serious person, I can't dream all the time.
Can you even formulate a query if you don't have the key?
The Tao of math: The numbers you can count are not the real numbers.
"A previous crypto scheme that allowed operations on encrypted data multiplied computing time by a factor of a trillion. This one adds only 15-26%."
So, basically you're saying that both schemes slow things down, right?
--PHB
Everything except the Riemann hypothesis. Fuck complex analysis.
No. Encrypted queries operating directly on an encrypted database. Sounds really rad! A snooping third party will only see random gibberish.
Can you even formulate a query if you don't have the key?
As DBA:
show databases;
...list of databases returned. ... list of tables returned.
... encrypted data returned. Damn...
use patientData; show tables;
"Database changed"
select * from patients;
The DBMS is unmodified. It's the data that's encrypted in a way that SQL can deal with it. Mostly. Redefined SUM and other math ops.
Have to try it out first though, but this is one of the hardest security problems to solve: How do you trust others to maintain the servers your database is on. Even if there are attacks against this, it seriously raises the bar for attackers. Especially if your attacker only has access to your database for a brief period of time. I can see attacks against this if your adversary can analyze you queries over a long period of time.
In that case, if I were using such a service I'd make sure that my database name and tables are generated from the real ones with a salted hash (where the salt never leaves my system). Therefore the DBA would not see a database "patientData" but a database "A3FE5653A554ADEC" or similar. Of course if the customer is a hospital, they could guess that it contains patient data. But then, it also could contain staff information, financial data, data about contracts with suppliers of medical equipment, ...
The Tao of math: The numbers you can count are not the real numbers.