Slashdot Mirror


Encrypted But Searchable Online Storage?

An anonymous reader asks "Is there a solution for online storage of encrypted data providing encrypted search and similar functions over the encrypted data? Is there an API/software/solution or even some online storage company providing this? I don't like Google understanding all my unencrypted data, but I like that Google can search them when they are unencrypted. So I would like to have both: the online storage provider does not understand my data, but he can still help me with searching in them, and doing other useful stuff. I mean: I send to the remote server encrypted data and later an encrypted query (the server cannot decipher them), and the server sends me back a chunk of my encrypted data stored there — the result of my encrypted query. Or I ask for the directory structure of my encrypted data (somehow stored in my data too — like in a tar archive), and the server sends it back, without knowing that this encrypted chunk is the directory structure. I googled for this and found some papers, however no software and no online service providing this yet." Can anyone point to an available implementation?

13 of 266 comments (clear)

  1. It's not possible even in theory by nahdude812 · · Score: 4, Informative

    It's not possible to do this even in theory, unless you're relying on very weak encryption. The point of encryption is that you can't infer anything about the contents. If Google was able to infer enough to give you meaningful search results (if for example each word was encrypted by itself, and you searched for the encrypted version of the word), they would therefore necessarily be able to know enough to perform a frequency analysis attack on your data and compromise it in no time flat unless it was a very small amount of data (thus meaning search isn't really of value anyway).

    You'll find a similar problem plagues any attempt at searching. Searching requires a certain knowledge or meta knowledge of the material being searched; and that knowledge necessarily dramatically weakens your encryption.

    1. Re:It's not possible even in theory by TheRaven64 · · Score: 5, Interesting

      It is possible. When you upload the data, you also upload an index. When you connect again, you download the index (which is much smaller than the data) and search that on the local machine. Neither the index, nor the data, is ever unencrypted on the server.

      As for frequency analysis, I don't think any encryption algorithms published in the last 40 years have been vulnerable to this sort of attack...

      --
      I am TheRaven on Soylent News
    2. Re:It's not possible even in theory by TheRaven64 · · Score: 5, Informative

      Replying to myself: the scheme in the linked paper is not feasible. It performs O(n) searches, but this means that the amount of data you need to upload for the query is equal to the total amount stored. Since most consumer Internet links are asymmetric, it would be cheaper and easier to simply download the entire data search locally. The paper proposes having a server-side cache. This means that, for a typical block cypher, you would have a cache of every search term encrypted for each block. The server could then compare this to each block, but would not know what the plaintext is. This is not useful in any real-world scenario. The cache would be orders of magnitude bigger than the stored data and the search would sill be O(n), which is painfully slow. As I suggested above, uploading an encrypted index with the data makes more sense. Look at Apache Lucene or Apple's SearchKit for how to do this.

      --
      I am TheRaven on Soylent News
    3. Re:It's not possible even in theory by felipekk · · Score: 5, Funny

      Gee guys, isn't this a little bit too much work just to hide your porn?

      Just mark the directory as hidden, your mom will not find it.

  2. You want to... by mhkohne · · Score: 4, Insightful

    Use an encrypted query to match against the encrypted text. The problem is, if the text is REALLY encrypted, then there shouldn't be enough information to do this - the encrypting of the original text should make it impossible to even match against it.

    If it didn't, then an attacker who got hold of the encrypted text and some of your encrypted queries might well be able to mount an attack based on commonalities between the two.

    --
    A thousand pounds of wood moving at 300 feet per minute. Don't get in the way.
  3. Re:Am I missing something? by qbzzt · · Score: 4, Insightful

    You're missing something. SSL is for data that is in transit. The poster wants the data to be encrypted on the server. That's easy - any encryption program can do it. But then s/he also wants to search it. That is harder.

    --
    -- Support a free market in the field of government
  4. Re:Am I missing something? by 3p1ph4ny · · Score: 4, Insightful

    No, this is not what SSL is for at all. SSL you have a party you wish to communicate with, but an insecure channel.

    Here, you don't want to communicate anything useful to anyone. This is more a privacy preserving data mining problem. It goes something like this:

    I have a long list of secret numbers 1...n. I do something to these numbers, so that Google doesn't know what they are, and then I send them to Google. Next, I want to know how many numbers are larger than, say k. So, I ask Google, but in a clever way, so that Google doesn't know what I'm asking.

    Google then tells me how many of my original numbers were larger than k. However, Google doesn't know my original numbers, and they don't know what question I asked. There needs to be some theoretical mapping that preserves this privacy, but still allows the data mining to occur.

  5. A guy walks into a bar... by skathe · · Score: 5, Insightful

    ...and when the bartender asks him what he would like to drink, the guy says "I want what I always get, but I don't want you to actually pour the drink, just help me search behind the bar for the liquor I want, and the hand it to me without seeing what it actaully is, and charge me correctly without any knowledge of what it is you just helped me find."

  6. Re:huh? by HTH+NE1 · · Score: 5, Funny

    if the server cannot decipher the query it cannot execute it on a binary blob of encrypted data. FAIL.

    Gung jbhyq qrcraq ba ubj gevivny lbhe rapelcgvba zrgubq vf.

    --
    Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
  7. Re:huh? by oldspewey · · Score: 4, Insightful

    Well that depends whether the OP wants to perform something like a fulltext search (i.e. the ability to look for keywords within the content of each document) or a metadata search.

    There's nothing to prevent you setting up a CMS where each piece of content is encrypted, but the metadata describing that content is out in the clear and searchable. Security in such a scenario would be less than optimal (e.g. people could guess certain things about your content based on the statistical pattern of length for each of the millions of encrypted content items), and of course you'd have to be very careful about the metadata fields and how you are populating them.

    --
    If libertarians are so opposed to effective government, why don't they all move to Somalia?
  8. Re:huh? by needs2bfree · · Score: 4, Informative

    For the n00bs, the above post is in ROT13. Here is a link for a converter.

  9. Re:Am I missing something? by jcwayne · · Score: 4, Funny

    It can't, that's why I use Live Search. It doesn't understand the query, the data, or the result. Unfortunately, for the OP, it doesn't support encryption.

    --
    Failure to follow this advice may result in non-deterministic behavior.
  10. Re:Am I missing something? by vidarh · · Score: 4, Informative
    No, it's not impossible. It's not even particularly hard. You do have some limitations though:

    Search works by tokenizing a document, and creating an inverse index from tokens to documents. The tokens does not need to mean anything to the search engine. If you generate the tokens on the client, and don't transmit the dictionary that maps from word to token id, you can have "encrypted search".

    The problem with doing that directly is that if you want to do proximity based search you need information on the token order, and they could do frequency analysis to come up with plain text guesses if they guess the language right. You can counteract that by mapping the same word to multiple tokens to even out the frequency of each token id, but it means you would need to search for multiple tokens to find all occurrences of a word.

    If you don't are about word proximity it's much safer, as the index would only contain each token once per document at most.