Encrypted But Searchable Online Storage?

← Back to Stories (view on slashdot.org)

Encrypted But Searchable Online Storage?

Posted by timothy on Thursday April 16, 2009 @08:20AM from the give-some-to-that-lawyer dept.

An anonymous reader asks "Is there a solution for online storage of encrypted data providing encrypted search and similar functions over the encrypted data? Is there an API/software/solution or even some online storage company providing this? I don't like Google understanding all my unencrypted data, but I like that Google can search them when they are unencrypted. So I would like to have both: the online storage provider does not understand my data, but he can still help me with searching in them, and doing other useful stuff. I mean: I send to the remote server encrypted data and later an encrypted query (the server cannot decipher them), and the server sends me back a chunk of my encrypted data stored there — the result of my encrypted query. Or I ask for the directory structure of my encrypted data (somehow stored in my data too — like in a tar archive), and the server sends it back, without knowing that this encrypted chunk is the directory structure. I googled for this and found some papers, however no software and no online service providing this yet." Can anyone point to an available implementation?

46 of 266 comments (clear)

Min score:

Reason:

Sort:

It's not possible even in theory by nahdude812 · 2009-04-16 08:25 · Score: 4, Informative

It's not possible to do this even in theory, unless you're relying on very weak encryption. The point of encryption is that you can't infer anything about the contents. If Google was able to infer enough to give you meaningful search results (if for example each word was encrypted by itself, and you searched for the encrypted version of the word), they would therefore necessarily be able to know enough to perform a frequency analysis attack on your data and compromise it in no time flat unless it was a very small amount of data (thus meaning search isn't really of value anyway).
You'll find a similar problem plagues any attempt at searching. Searching requires a certain knowledge or meta knowledge of the material being searched; and that knowledge necessarily dramatically weakens your encryption.

--
Slay a dragon... over lunch!
1. Re:It's not possible even in theory by TheRaven64 · 2009-04-16 08:28 · Score: 5, Interesting
  
  It is possible. When you upload the data, you also upload an index. When you connect again, you download the index (which is much smaller than the data) and search that on the local machine. Neither the index, nor the data, is ever unencrypted on the server.
  As for frequency analysis, I don't think any encryption algorithms published in the last 40 years have been vulnerable to this sort of attack...
  
  --
  I am TheRaven on Soylent News
2. Re:It's not possible even in theory by TheRaven64 · 2009-04-16 08:34 · Score: 5, Informative
  
  Replying to myself: the scheme in the linked paper is not feasible. It performs O(n) searches, but this means that the amount of data you need to upload for the query is equal to the total amount stored. Since most consumer Internet links are asymmetric, it would be cheaper and easier to simply download the entire data search locally. The paper proposes having a server-side cache. This means that, for a typical block cypher, you would have a cache of every search term encrypted for each block. The server could then compare this to each block, but would not know what the plaintext is. This is not useful in any real-world scenario. The cache would be orders of magnitude bigger than the stored data and the search would sill be O(n), which is painfully slow. As I suggested above, uploading an encrypted index with the data makes more sense. Look at Apache Lucene or Apple's SearchKit for how to do this.
  
  --
  I am TheRaven on Soylent News
3. Re:It's not possible even in theory by FredFredrickson · 2009-04-16 08:34 · Score: 3, Interesting
  
  Mozy does this for personal/business backups. You can use a completely private key, but search your own data.
  
  --
  Belief? Hope? Preference?The Existential Vortex
4. Re:It's not possible even in theory by smallfries · 2009-04-16 08:40 · Score: 3, Insightful
  
  I'm curious - why would you post a comment claiming that this can't even be done in theory, when the submitter included links in the summary to a paper that shows that it can?
  
  --
  Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
5. Re:It's not possible even in theory by cakeninja · 2009-04-16 08:57 · Score: 3, Informative
  
  Mozy does not encrypt your file names. Someone without your private key could still view your file names if they had your Mozy login information.
6. Re:It's not possible even in theory by flaming+error · 2009-04-16 09:00 · Score: 2, Insightful
  
  > the amount of data you need to upload for the query is equal to the total amount stored
  That's not how I read it. But the approach still sounds useless:
  
  If Alice wants to search for the word W, she can tell Bob (the server) the word W and the ki corresponding to each location I in which W may occur
  What's the use of encrypting the data if you're going to send keywords in cleartext to a party you're trying to hide the data from?
7. Re:It's not possible even in theory by felipekk · 2009-04-16 09:04 · Score: 5, Funny
  
  Gee guys, isn't this a little bit too much work just to hide your porn?
  Just mark the directory as hidden, your mom will not find it.
8. Re:It's not possible even in theory by Anonymous Coward · 2009-04-16 09:18 · Score: 2, Informative
  
  "why would you post a comment claiming that this can't even be done in theory, when the submitter included links in the summary to a paper that shows that it can?"
  Because it can't. The one paper proposes (unless I'm missing something!) giving the server the word to search for AND the keys! The security is by frequently rotating the key, and if you KNOW you only wanted to search, say, chapter 1 of a longer document, only give the key for chapter 1. Not very secure!
  If the encrypted data has ANY types of patterns that can be used to infer the contents, the encryption system is weak. The only way to do this is to generate some kind of metadata (search indexes basically) locally, BEFORE you send up the encrypted files, send the metadata up *unecrypted*, and hope the metadata doesn't have sensitive data.
9. Re:It's not possible even in theory by BitZtream · 2009-04-16 09:30 · Score: 2, Informative
  
  And that would practically defeat the purpose of the encryption.
  For the index to be useful it has to provide too much information about the encrypted data. The point of encryption is to ensure that nothing can be inferred about the contents of the encrypted data. If you give them a nice big bunch of information about whats encrypted, why bother encrypting it in the first place?
  Given enough information in the index they could actually derive your encryption key as well with some simply brute forcing.
  
  --
  Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
10. Re:It's not possible even in theory by goodmanj · 2009-04-16 09:34 · Score: 3, Insightful
  
  Can I have an anti-theft system for my car, so that nobody can steal it but anybody who wants to can take it for an anonymous test-drive?
11. Re:It's not possible even in theory by ecevans · 2009-04-16 10:23 · Score: 2, Insightful
  
  This is true that what you're describing would work, but you're talking about a translation, not an encryption. Using a good encryption scheme, all encrypted instances of a given string would not be the same.
12. Re:It's not possible even in theory by raju1kabir · 2009-04-16 18:43 · Score: 2, Interesting
  
  I'm curious - why would you post a comment claiming that this can't even be done in theory, when the submitter included links in the summary to a paper that shows that it can?
  Because the paper doesn't propose any solution that is practical, or which even leads to a practical solution.
  In theory I can cure all forms of cancer - all I have to do is go through each cell in the victim's body and pluck out the cancerous ones.
  
  --
  "Patriotism is your conviction that this country is superior to all other countries because you were born in it." -- GBS
You want to... by mhkohne · 2009-04-16 08:25 · Score: 4, Insightful

Use an encrypted query to match against the encrypted text. The problem is, if the text is REALLY encrypted, then there shouldn't be enough information to do this - the encrypting of the original text should make it impossible to even match against it.
If it didn't, then an attacker who got hold of the encrypted text and some of your encrypted queries might well be able to mount an attack based on commonalities between the two.

--
A thousand pounds of wood moving at 300 feet per minute. Don't get in the way.
1. Re:You want to... by noidentity · 2009-04-16 09:27 · Score: 3, Funny
  
  The problem is, if the text is REALLY encrypted, then there shouldn't be enough information to do this - the encrypting of the original text should make it impossible to even match against it.
  NOT TRUE! I use a combination of XOR and rot-13 encryption and I'm able to do text searches just fine. The trick is to encrypt the search string, then it'll work perfectly. This is because the encryption doesn't depend on the position within the text, but that shouldn't hurt security too much.
2. Re:You want to... by Thad+Zurich · 2009-04-16 10:46 · Score: 2, Informative
  
  ROT13 is encoding, not encryption. You transform the information, but you don't conceal any of it. Ziad El Bizri (OP cit.) apparently observes that if you encrypt the keywords individually, then you can submit encrypted keyword queries, and the server can search for them for you. This is great, but why would you want to? The object of a search server is for other people to be able to search the data (otherwise why index it on the server?) With the suggested scheme, only the data owner (or shared key holders) will be able to search the data. It would seem to be just as easy to construct a trustworthy server and then encipher the query traffic, as has already been observed.
Re:Am I missing something? by qbzzt · 2009-04-16 08:26 · Score: 4, Insightful

You're missing something. SSL is for data that is in transit. The poster wants the data to be encrypted on the server. That's easy - any encryption program can do it. But then s/he also wants to search it. That is harder.

--
-- Support a free market in the field of government
Re:Am I missing something? by 3p1ph4ny · 2009-04-16 08:30 · Score: 4, Insightful

No, this is not what SSL is for at all. SSL you have a party you wish to communicate with, but an insecure channel.
Here, you don't want to communicate anything useful to anyone. This is more a privacy preserving data mining problem. It goes something like this:
I have a long list of secret numbers 1...n. I do something to these numbers, so that Google doesn't know what they are, and then I send them to Google. Next, I want to know how many numbers are larger than, say k. So, I ask Google, but in a clever way, so that Google doesn't know what I'm asking.
Google then tells me how many of my original numbers were larger than k. However, Google doesn't know my original numbers, and they don't know what question I asked. There needs to be some theoretical mapping that preserves this privacy, but still allows the data mining to occur.
It depends on the encryption by davidwr · 2009-04-16 08:32 · Score: 3, Insightful

If the data is encrypted in independent "chunks" from which search terms can be built then this is trivial: You pre-encrypt your search terms and search for them. Searching a word ROT13-encoded document works this way, as each character is encrypted individually and an encrypted search term is made up of encrypted characters.
Once you get past this, it's no longer easy. You basically have to either make the term you are searching for look like all possible values of the encrypted text and return all matches, or decrypt the document somewhere.
If the encryption is good and any particular chunk, extract, or other slicing-and-dicing of the encrypted data without the key looks random, you are pretty much stuck with decrypting it somewhere.
The alternative is to store an index, or at least a list of keywords, in clear text. For example, a document describing how to build a nuclear bomb could have a list of 10 or 20 non-classified keywords attached to it to aid searching. But that's not what you are asking for.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Re:Am I missing something? by deroby · 2009-04-16 08:32 · Score: 2, Insightful

Yes you are =)
SSL only encrypts the transport.
It seems that the poster wants to have his data _stored_ in an encrypted way that is only decipherable by him, not by any of the machines/users at the storage facility. Yet, when he wants to do some search, he somehow expects the server to be able to do so... AFAIK that's not feasible.
(you could store whatever encrypted stuff remotely, but querying will require fetching, reading and decrypting the (relevant portions of) data locally...)

--
If there is one thing to be learned on slashdot, it has to be sarcasm.
A guy walks into a bar... by skathe · 2009-04-16 08:37 · Score: 5, Insightful

...and when the bartender asks him what he would like to drink, the guy says "I want what I always get, but I don't want you to actually pour the drink, just help me search behind the bar for the liquor I want, and the hand it to me without seeing what it actaully is, and charge me correctly without any knowledge of what it is you just helped me find."
1. Re:A guy walks into a bar... by richie2000 · 2009-04-16 08:52 · Score: 2, Interesting
  
  But... That's not a valid car analogy since you're not allowed to drink and drive.
  
  --
  Money for nothing, pix for free
2. Re:A guy walks into a bar... by HTH+NE1 · 2009-04-16 08:59 · Score: 3, Funny
  
  Not good enough. The bartender could audit his liquor to see how much of each bottle was dispensed.
  This is why when they do this sort of thing, the gentleman just serves the bartender a National Security Letter and takes more than what he wants without paying a dime.
  
  --
  Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
3. Re:A guy walks into a bar... by dimeglio · 2009-04-16 12:19 · Score: 2, Interesting
  
  ...if I tell you a story in French and you don't understand it, you will have no idea what I told you and will not be able to answer questions about my story. However, if you are able to memorize all I told you phonetically I can ask if I said a word or not just by the sound. Yet you don't know exactly what I asked for, nor the meaning of the answer but you are able to answer that question since it doesn't imply meaning.
  So a possibility for the OP would be to store the information in a language unknown to anyone but to the poster. This language would need be compatible with the search algorithms used by Google. Not very practical but maybe someone can build on this.
  
  --
  Views expressed do not necessarily reflect those of the author.
not impossible; not easy by Lord+Ender · 2009-04-16 08:37 · Score: 2, Interesting

Keep the files on the remote server, encrypted. Keep the search index in a database, encrypted in chunks. Rsync your search database between your local machine and the server. Actual searches of the databases would be done locally.
Result: terrible performance whenever you access your data from a new machine (must sync entire search database). Good performance the rest of the time. Remote server never sees anything but cyphertext.

--
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
Re:huh? by HTH+NE1 · 2009-04-16 08:37 · Score: 5, Funny

if the server cannot decipher the query it cannot execute it on a binary blob of encrypted data. FAIL.
Gung jbhyq qrcraq ba ubj gevivny lbhe rapelcgvba zrgubq vf.

--
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
Re:huh? by oldspewey · 2009-04-16 08:38 · Score: 4, Insightful

Well that depends whether the OP wants to perform something like a fulltext search (i.e. the ability to look for keywords within the content of each document) or a metadata search.
There's nothing to prevent you setting up a CMS where each piece of content is encrypted, but the metadata describing that content is out in the clear and searchable. Security in such a scenario would be less than optimal (e.g. people could guess certain things about your content based on the statistical pattern of length for each of the millions of encrypted content items), and of course you'd have to be very careful about the metadata fields and how you are populating them.

--
If libertarians are so opposed to effective government, why don't they all move to Somalia?
I don't understand... by dschuetz · 2009-04-16 08:41 · Score: 2, Funny

...isn't this easy?
Plaintext: "Attack at dawn"
Ciphertext: "lkaoiuast98u;aw"
Search query: "oiua"
Result: "lkaoiuast98u;aw"
What could be simpler?
(no, I'm not an idiot, this is a joke.)
Re:Maybe, maybe not by The+Moof · 2009-04-16 08:42 · Score: 3, Interesting

Maybe something like this -
Create an index of hashes using the unencrypted data on the client.
Encrypt the data on the client so we now have an index of hashes that apply to an encrypted file.
Upload the hash index and the encrypted data file to the server.
To search, hash the search criteria on the client.
Server search the indexes for the hash value, returning a list of encrypted files with an index matching the criteria hash.
querying encrypted data howto by burnin1965 · 2009-04-16 08:45 · Score: 2, Interesting

As long as your query looks something like this...
SELECT * FROM mydata WHERE stuff LIKE '%YToyOntzOjc6InBhY2thZ2UiO3M6MjM5OiKyKHPh9ZawDX6KyA62cMd6p+mjBybGwJyCaNfFb7S.........
Seriously though, if I understand your objective I think it would be feasible to develop something like that, but I don't think its something you could integrate into Google's search services unless they added something on their end.
You could pass a decryption key along with your query and the server would then decrypt records as it performed the search. It would be very resource intensive.
As an close example, I have a web based password storage application in which I did not want to keep the encryption keys on the same server as the password database. So I generate a key with which to encrypt the records and the user keeps their key and must supply it every time they want to decrypt a record. I don't go so far as to enable searching of the encrypted data, I have a description field specifically for that purpose. The web application is called Passbox and is written in PHP.
What an oxymoron! by hesaigo999ca · 2009-04-16 08:47 · Score: 2, Interesting

Yeah, Id like my cake and eat it too!
The only way this could work is if you has tags in the meta header of the encrypted file
telling you that yes I am encrypted, but I have an image in me or my encrypted data is of the type accounting.
This might work for indexing searches where you want to be able to return all the files on the pc (encrypted or not) that are images or etc...
Re:huh? by needs2bfree · 2009-04-16 08:47 · Score: 4, Informative

For the n00bs, the above post is in ROT13. Here is a link for a converter.
ROT26 by davidwr · 2009-04-16 08:55 · Score: 2, Funny

I prefer ROT26. It's got built-in steganography to boot.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
It is possible to a certain extent by goombah99 · 2009-04-16 08:56 · Score: 2, Interesting

There are encryption algorithms that allow addition. That is, the sum of two encrypted messages is an encryption of the sum. I've forgotten how these work exactly, I think they are some many to one mapping, and the addition operation is not simply adding the encrytped numerical representations.
I came across these when looking at voting systems that allow N distributed people to vote in a way that sums the result before it is decrypted rather than decrypting to do the sum.
Anyhow what this means is that is possible to do certain operations on a remote database, like sum a column, without the database knowing the result and without transmistting anything additional information inbound or outbound.
You could presumably have your data stored in many forms on the database, each form suited for one type of query. Then you just query the approriate form to perform the operation of interest.
I'm reasonably sure there is no way to perform very high order operations that one might typically do in a relational database however.

--
Some drink at the fountain of knowledge. Others just gargle.
That's because they don't encrypt the filenames. by alanfairless · 2009-04-16 09:00 · Score: 2, Insightful

And they can't search inside your documents.
There is a way, kind of: PIR by Naerbnic · 2009-04-16 09:06 · Score: 2, Informative

There is a cryptography technique called Public Information Retrieval which allows you to do just that: Send an encrypted query to a server, let it perform some operations on your behalf, and send you an encrypted query result. The server neither knows the contents of the encrypted data, nor the content of the query, but you have your result nonetheless.
The intuition is that there exists a sort of "black-box" operation which some cryptographic techniques can use. For example, if I have two encrypted bits a and b (where I can't tell what a and b actually are), I can still perform the operation a xor b. The result is encrypted, and I don't know the actual operands or the result, but I know that what came out is indeed the encryption of the xor of the encrypted bits. Such cryptosystems are forms of "Homomorphic Encryption".
Using this, we can then give the server a search term thus encrypted and, using the black-box opertaion, have it do some set of operations which will reveal the result. The server will execute the exact same set of operations independent of the search term, so it knows nothing (and needs to know nothing) of the search term contents. Of course, this implies that the server has to operate on every element of the encrypted data to do its job, but that's the fundamental tradeoff. If you're willing to accept that, and the additional computational overhead, you can design such a system.

--

So there I was, juggling apples and small animals, when I accidentally bit into the wrong one...
Re:Am I missing something? by thtrgremlin · 2009-04-16 09:22 · Score: 2, Interesting

What it sounded like was that he wanted to keep a database with Google that was encrypted and wanted to search it remotely and securely, but without Google being able to look at the data. Even if that were possible, why are you trusting Google with that in the first place? Why not store it somewhere else? I would think keep encrypted data on a server and make a secure connection to it. You send your normal query across the encrypted channel to the secure server, it does its regular search and sends the result back across the secured channel. Add to that some secure authentication, and I thought that would have met the objective (even if the implementation is slightly different than described). If you wanted encrypted data stored in an untrusted location (why are you doing this again?) then you would think it would be necessary to hash specific queries as keys to encrypted data lacking the necessary information to decrypt the information remotely. Either way, guess I will see what other people are saying if this question seems more obvious (differently) to someone else.

--
Want Big Business out of government? Take away the incentive and start by getting government out of big business!
I put something similiar to this together myself.. by airjrdn · 2009-04-16 09:30 · Score: 3, Interesting

But it may not be everything you're looking for. My requirements were:
1 - Mask the filename
2 - Encrypt the contents
3 - Add recovery data in case the file got damaged
4 - Ability to view unmasked filename from web

I put together a batch file I could drag/drop multiple files onto that used WinRAR to compress the files (individually), with encrypted filenames, a password (of course), and included archive recovery data. It then used ReNamer to encrypt the .rar filenames. After that, I simply FTP'd the files to the server.

I had a webpage that would accept a password, and unencrypt the filenames so they were viewable in readable form on the page. Each one was a hyperlink. There was an extra step required if you wanted the downloaded filename to be unencrypted as well.

After uploading 115G or so, my host alerted me to the fact that they didn't allow me to keep offsite backups there. :) So in the end, I'm not even using it at the moment.

My solution didn't allow me to search within the files, but it did allow me to store files on the server that they had no way of viewing the contents of, or guessing the contents of based on filename.

--

My Tech Posts on Twitter
not hard by zogger · 2009-04-16 09:51 · Score: 2, Interesting

Just use a book (or multiple books) code cipher for your index. You don't need to remember a thing beyond which books and what your key starting number is, the pattern. And if someone is in your house throwing all your books at cracking the remote server, you are already screwed and have much bigger problems, such as they probably already installed a keylogger on you. If you are that much of a target for someone to take that much interest....time for plan B or C then, involving plastic surgery, new ID and some nation where there is no extradition treaty 0_o
Re:huh? by DaveM753 · 2009-04-16 09:51 · Score: 2, Funny

I tried the vi command, but I get this weird error:
'vi' is not recognized as an internal or external command,
operable program or batch file.
Does this mean ROT13 is not compatible with Windows?
Re:Am I missing something? by jcwayne · 2009-04-16 09:59 · Score: 4, Funny

It can't, that's why I use Live Search. It doesn't understand the query, the data, or the result. Unfortunately, for the OP, it doesn't support encryption.

--
Failure to follow this advice may result in non-deterministic behavior.
Re:CONFIRMED: You are missing something. by mcrbids · 2009-04-16 10:45 · Score: 2, Interesting

Sure the *NAME* is "Secure Sockets Layer", and perhaps that was what it was originally developed for, but it's just wrong to say that it can't be used otherwise, and/or that it only encrypts data "in transit", not on a server. Take a look at this:
http://us2.php.net/manual/en/function.openssl-public-encrypt.php
Here's the use of SSL functionality without (ahem) a socket. Right from the docs:
This function can be used e.g. to encrypt message which can be then read only by owner of the private key. It can be also used to store secure data in database.
I routinely use SSL to sign files in order to prove whodunnit. This information is stored alongside the signed document. Whether it's transported subsequently is inconsequential.

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Re:huh? by grassy_knoll · 2009-04-16 13:27 · Score: 2, Funny

I think so... perhaps you should try another operating system?
Be aware though, it's got a weird text editor...

--
A Human Right
Re:Am I missing something? by vidarh · 2009-04-16 17:57 · Score: 4, Informative

No, it's not impossible. It's not even particularly hard. You do have some limitations though:
Search works by tokenizing a document, and creating an inverse index from tokens to documents. The tokens does not need to mean anything to the search engine. If you generate the tokens on the client, and don't transmit the dictionary that maps from word to token id, you can have "encrypted search".
The problem with doing that directly is that if you want to do proximity based search you need information on the token order, and they could do frequency analysis to come up with plain text guesses if they guess the language right. You can counteract that by mapping the same word to multiple tokens to even out the frequency of each token id, but it means you would need to search for multiple tokens to find all occurrences of a word.
If you don't are about word proximity it's much safer, as the index would only contain each token once per document at most.
Re:Am I missing something? by vidarh · 2009-04-16 18:01 · Score: 2, Insightful

Here's a search index: [1,55] [2,103] [2,178] [3,1] [3,2]. Give me all documents with a document id matching the second entry in each pair where the first entry is 2.
Do you know which word "2" represents, or what is in documents 103 and 178?
That's how you do it. You need to ensure there's no way of doing statistical analysis on the token list to recover plaintext info, and you need to not give them the dictionary mapping from plaintext to tokens.
Re:Am I missing something? by seifried · 2009-04-16 19:19 · Score: 2, Insightful

And these tokens are generated how? Oh yeah. by Google's search engine. Whoops. If you want someone to extract information from data they will be definition be able to extract some amount of information from the data, even you have everything encrypted/etc. they could do frequency counts of the tokens and convert them to words, traffic analysis (can't encrypt the from/to). etc.