Slashdot Mirror


MIT Software Allows Queries On Encrypted Databases

Sparrowvsrevolution writes "CryptDB, a piece of database software that MIT researchers presented at the Symposium on Operating System Principles in October, allows users to send queries to an encrypted SQL database and get results without decrypting the stored information. CryptDB works by nesting data in several layers of cryptography (PDF), each of which has a different key and allows a different kind of simple operation on encrypted data. It doesn't work with every kind of calculation, and it's not the first system to offer this sort of computation on encrypted data. But it may be the only practical one. A previous crypto scheme that allowed operations on encrypted data multiplied computing time by a factor of a trillion. This one adds only 15-26%."

16 of 68 comments (clear)

  1. Why? by InsightIn140Bytes · · Score: 2

    Why not just encrypt the database files on HDD and memory directly? That way database can still act really fast and you can use any existing database software.

    1. Re:Why? by Niobe · · Score: 4, Informative

      Reasons I can surmise:
      1 no decryption operation required on server
      2 the data can stay encrypted in transit
      1+2 = more security than on-disk encryption

    2. Re:Why? by Anonymous Coward · · Score: 5, Informative

      Because you want to run your database in the Cloud(tm) for reliability purposes, and you don't want the provider to peek at your data.

    3. Re:Why? by Rary · · Score: 5, Informative

      Why not just encrypt the database files on HDD and memory directly? That way database can still act really fast and you can use any existing database software.

      A few key phrases from TFA: "...a trick that keeps the info safe from hackers, accidental loss and even snooping administrators ... a useful trick if you need to perform operations on health care or financial data in a situation like cloud computing, where the computer (or the IT administrator) doing the calculations can’t always be trusted to access the private numbers being crunched".

      --

      "You cannot simultaneously prevent and prepare for war." -- Albert Einstein

    4. Re:Why? by Kaz+Kylheku · · Score: 3, Informative

      Because the database is on a remote server, and that is where the queries are executing!

      The model you're describing is that of the database running on the local machine. Data is encrypted between the database server and disk, but not encrypted in the database and not between the database and client. So the database is just a stock program running SQL queries or whatever in the usual way.

      But what if the database must be a remote server? That's how most people use databases, for the purpose of sharing data among many people, scalability, and availability.

      If the data in a database is naively encrypted, then the server cannot perform complex queries. The client must download entire tables, decrypt them, and perform the joins locally. Or so you would think.

      This is the part that these researchers seem to have attacked, from my understanding: somehow get the server to do useful queries on encrypted data without decrypting it without the monstrous overhead of the naive solutions.

    5. Re:Why? by Kaz+Kylheku · · Score: 4, Insightful

      Sorry, I don't see how that helps. The idea is that no program on the database server has the key to actually decrypt the data.

      The problem isn't only that you don't trust the network in between, but that you don't trust the database server admins.

    6. Re:Why? by Obfuscant · · Score: 2

      This is the part that these researchers seem to have attacked, from my understanding: somehow get the server to do useful queries on encrypted data without decrypting it without the monstrous overhead of the naive solutions.

      I looked through the first few pages of the article. It is very much like how Unix passwords work. You don't decrypt the password in /etc/passwd to see if the user can log in, you encrypt his entered password with the same salt and see if there is a match. The trick is that here the DBMS is not doing the encrypting, there is a proxy that takes the performance hit, allowing the DBMS to run at full speeed.

      The text comparison (LIKE) is done by encrypting each token in the DB text and allowing a token equality comparison. You can't, therefore, do a "LIKE 'Boston%'" to find things like "Bostonian".

      For comparison operations ( select * where salary > 60000) the encryption used maintains order. The encrypted value of 59,999 is less than the encrypted value of 60,000, e.g.. The paper seems to imply that the equality encryption ( cleartext always encrypts to the same ciphertext so an equality of ciphertexts means equality of cleartext) is optional. In reality, order always means equality. I.e., if I search for $val>$x-eps and $val<$x+eps (where eps is the epsilon, or smallest interval in $x) the only answer can be where $val == $x.

      Hmmm. Just saying that, I realize that, unless the encryption of data in the DBMS is highly dependent on the actual data in the DB, eps must be the smallest step in the encrypted data, and since order is preserved, the only "encryption" is thus an offset (add or subtract a constant). Thus the DB encryption of data must be dependent on the range of data. I wonder if there is any useful information that can be extracted from that fact?

      For corporations, this system would be great. If the DBA didn't pre-define the salary column to be comparable, then nobody could do a "where salary > 100000" to find all the highly paid employees (or "bonus > 1000000", either).

    7. Re:Why? by lgw · · Score: 2

      comparison operations ( select * where salary > 60000) the encryption used maintains order. The encrypted value of 59,999 is less than the encrypted value of 60,000,

      I've never understood this bit. If, without the encryption key, I can compare two pieces of data to see which plaintext is less than then other, that seems like a huge hole. For normalized data in the DB, if some of the plaintext is known or guessable, I can probably guess all the values (since normalized values are generally represented by small integers). Heck, if I have "less than", can't I find the plaintext result of subtracting one plaintext value from another, without the key? That's effectively the same as decrypting English text.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    8. Re:Why? by Kjella · · Score: 3, Insightful

      Well strictly speaking, they don't need to know. The DBA - as in the person that makes sure the database is running, upgrades are done, backups are made and so on is often not really supposed to be privileged to all the information in the database. Probably the same kind of place you won't let your developers see production data, the development server has a different encryption key and the production key is set once during install, backed up in a safe and the production application server logged to hell and back including remote logging and audits. The only access anyone is supposed to have to the system is through the application that's enforcing permissions, logging and all that. I've only worked in relatively low-security environments but I'm perfectly aware that "SELECT * FROM [table]" circumvents anything and everything the application does to protect the data. In many environments that's fine and an accepted risk, if you're managing the database you should be sufficiently trusted to not go poking about. But I can easily see situations where that's not the case, without everybody jumping up and down about outsourcing. It's nothing personal in that they don't trust IT, but just like you in accounting don't want one person who can put in an invoice, approve it and take delivery you don't want one person from IT with all the keys to the castle. That this is the practical reality many places is because there hasn't been any other convenient enough way, it's not by design.

      --
      Live today, because you never know what tomorrow brings
  2. Re:MIT is overrated by Anonymous Coward · · Score: 4, Funny

    Mine too... Perhaps AC isn't the way to go.

  3. a little bit strong claim by Trepidity · · Score: 4, Informative

    This is not really the first practical such system, nor have all previous systems been a trillion times slower. As seems to be a pattern with MIT press releases, the press release makes exaggerated claims, but the paper itself is actually quite good and gives proper credit where it's due, discussing a number of previous systems that implement related functionality, and some existing algorithms from the literature that they borrow and implement directly in CryptDB.

    1. Re:a little bit strong claim by Anonymous Coward · · Score: 5, Insightful

      It's a fundamental tension between the scientists and the PR departments. I see this where I work (at a DoE national lab). Basically, we scientists publish cool results, and submit them to the PR department as candidates for press releases. The PR department of course tries to jazz it up as much as they can. So we go back-and-forth with them for a bit, trying to compromise on something is isn't factually wrong while still being accessible to the general public, and giving people a good feel for why our work is important.

      Then the press release is interpreted by media outlets, which dumb it down even more and stretch the claims even further. After even just 2 or three levels of this, honest sensible papers turn into grandiose hyperbole. A nice theoretical result on metamaterials becomes "scientists invent invisibility cloak"; work on new semiconductors becomes "world's fastest transistor"; and a paper on tentative correlations between X and Y becomes "X causes Y!" Believe me when I say that most scientists are embarrassed when they see their results exaggerated and misinterpreted like this.

      This is not meant to excuse such behavior. Some PR departments are better than others. At some institutes there is too much pressure from on-high to be seen in the media as being innovative, revolutionary, and all that other buzzwords. But at the end of the day, scientists have to have the courage (and the authority) to prevent press releases from going out that are so stretched as to be factually incorrect.

    2. Re:a little bit strong claim by reve_etrange · · Score: 2

      It seems typical of most universities' press releases. They have PR divisions which troll the research faculty for new developments they can turn into whiz-bang popularized "articles."

      I think that it's sort of the paradigm for how things are done at most large institutions: the researchers can't be bothered or don't have time to write popular accounts, do extraneous paperwork or file patents, so others are made to do it for them. The result is extraordinary claims in the press releases at best, and serious clerical mistakes or invalid patents at worst.

      --
      .: Semper Absurda :.
    3. Re:a little bit strong claim by icebraining · · Score: 2

      the need to do a shitload of work just to unencrypt each value before using it

      I think the point of the system is that you don't need to unencrypt the values at all to perform the calculations. It's homomorphic encryption.

  4. Order preserving encryption by vlm · · Score: 3, Interesting

    Order Preserving Encryption, how is it implemented? The paper page 4, simply lists that it exists and has a pointer to an article somewhere that I have no access.

    I'm not understanding how this hides "known plaintext" attacks. Perhaps its not intended to. Like I said, I have no access to the footnoted OPE article. So, lets say you got a medical database of private health care info, where the diagnosis is a column. If you can sort it, all the folks with "aids" sort at the top, right above the "alcoholism" diagnosis, with the "worms, intestinal" and I suppose the "zoophilia" people at the bottom.

    I suppose, the solution, is unless there is a business need to sort by diagnosis, you don't use OPE for that column, you use DET or if no need for "group by", then RND.

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
  5. Re:MIT is overrated by Unoriginal_Nickname · · Score: 2

    "MIT is overrated because I can't get into MIT."

    Yeah. Keep telling yourself that.