Slashdot Mirror


Encrypted But Searchable Online Storage?

An anonymous reader asks "Is there a solution for online storage of encrypted data providing encrypted search and similar functions over the encrypted data? Is there an API/software/solution or even some online storage company providing this? I don't like Google understanding all my unencrypted data, but I like that Google can search them when they are unencrypted. So I would like to have both: the online storage provider does not understand my data, but he can still help me with searching in them, and doing other useful stuff. I mean: I send to the remote server encrypted data and later an encrypted query (the server cannot decipher them), and the server sends me back a chunk of my encrypted data stored there — the result of my encrypted query. Or I ask for the directory structure of my encrypted data (somehow stored in my data too — like in a tar archive), and the server sends it back, without knowing that this encrypted chunk is the directory structure. I googled for this and found some papers, however no software and no online service providing this yet." Can anyone point to an available implementation?

266 comments

  1. Am I missing something? by thtrgremlin · · Score: 0

    I thought that was what ssl was for.

    --
    Want Big Business out of government? Take away the incentive and start by getting government out of big business!
    1. Re:Am I missing something? by qbzzt · · Score: 4, Insightful

      You're missing something. SSL is for data that is in transit. The poster wants the data to be encrypted on the server. That's easy - any encryption program can do it. But then s/he also wants to search it. That is harder.

      --
      -- Support a free market in the field of government
    2. Re:Am I missing something? by 3p1ph4ny · · Score: 4, Insightful

      No, this is not what SSL is for at all. SSL you have a party you wish to communicate with, but an insecure channel.

      Here, you don't want to communicate anything useful to anyone. This is more a privacy preserving data mining problem. It goes something like this:

      I have a long list of secret numbers 1...n. I do something to these numbers, so that Google doesn't know what they are, and then I send them to Google. Next, I want to know how many numbers are larger than, say k. So, I ask Google, but in a clever way, so that Google doesn't know what I'm asking.

      Google then tells me how many of my original numbers were larger than k. However, Google doesn't know my original numbers, and they don't know what question I asked. There needs to be some theoretical mapping that preserves this privacy, but still allows the data mining to occur.

    3. Re:Am I missing something? by deroby · · Score: 2, Insightful

      Yes you are =)

      SSL only encrypts the transport.

      It seems that the poster wants to have his data _stored_ in an encrypted way that is only decipherable by him, not by any of the machines/users at the storage facility. Yet, when he wants to do some search, he somehow expects the server to be able to do so... AFAIK that's not feasible.

      (you could store whatever encrypted stuff remotely, but querying will require fetching, reading and decrypting the (relevant portions of) data locally...)

      --
      If there is one thing to be learned on slashdot, it has to be sarcasm.
    4. Re:Am I missing something? by thtrgremlin · · Score: 2, Interesting

      What it sounded like was that he wanted to keep a database with Google that was encrypted and wanted to search it remotely and securely, but without Google being able to look at the data. Even if that were possible, why are you trusting Google with that in the first place? Why not store it somewhere else? I would think keep encrypted data on a server and make a secure connection to it. You send your normal query across the encrypted channel to the secure server, it does its regular search and sends the result back across the secured channel. Add to that some secure authentication, and I thought that would have met the objective (even if the implementation is slightly different than described). If you wanted encrypted data stored in an untrusted location (why are you doing this again?) then you would think it would be necessary to hash specific queries as keys to encrypted data lacking the necessary information to decrypt the information remotely. Either way, guess I will see what other people are saying if this question seems more obvious (differently) to someone else.

      --
      Want Big Business out of government? Take away the incentive and start by getting government out of big business!
    5. Re:Am I missing something? by xOneca · · Score: 1

      How the hell is Google going to answer a question that doesn't understand, and give an answer that can't undesrtand too?

    6. Re:Am I missing something? by __aaclcg7560 · · Score: 1

      That's why they hire the best and the brightest out there to solve problems like that. Discovering the future before the future is known.

    7. Re:Am I missing something? by jcwayne · · Score: 4, Funny

      It can't, that's why I use Live Search. It doesn't understand the query, the data, or the result. Unfortunately, for the OP, it doesn't support encryption.

      --
      Failure to follow this advice may result in non-deterministic behavior.
    8. Re:Am I missing something? by Lodragandraoidh · · Score: 1

      But then s/he also wants to search it. That is harder.

      That is not harder - that is impossible.

      The reason is that in order to search the data, you first have to decrypt the data. If you decrypt the data on the server side, you just compromised your security. End of game.

      --

      Lodragan Draoidh
      The more you explain it, the more I don't understand it. - Mark Twain
    9. Re:Am I missing something? by MaxVT · · Score: 1

      42.

    10. Re:Am I missing something? by vidarh · · Score: 4, Informative
      No, it's not impossible. It's not even particularly hard. You do have some limitations though:

      Search works by tokenizing a document, and creating an inverse index from tokens to documents. The tokens does not need to mean anything to the search engine. If you generate the tokens on the client, and don't transmit the dictionary that maps from word to token id, you can have "encrypted search".

      The problem with doing that directly is that if you want to do proximity based search you need information on the token order, and they could do frequency analysis to come up with plain text guesses if they guess the language right. You can counteract that by mapping the same word to multiple tokens to even out the frequency of each token id, but it means you would need to search for multiple tokens to find all occurrences of a word.

      If you don't are about word proximity it's much safer, as the index would only contain each token once per document at most.

    11. Re:Am I missing something? by vidarh · · Score: 2, Insightful
      Here's a search index: [1,55] [2,103] [2,178] [3,1] [3,2]. Give me all documents with a document id matching the second entry in each pair where the first entry is 2.

      Do you know which word "2" represents, or what is in documents 103 and 178?

      That's how you do it. You need to ensure there's no way of doing statistical analysis on the token list to recover plaintext info, and you need to not give them the dictionary mapping from plaintext to tokens.

    12. Re:Am I missing something? by seifried · · Score: 2, Insightful

      And these tokens are generated how? Oh yeah. by Google's search engine. Whoops. If you want someone to extract information from data they will be definition be able to extract some amount of information from the data, even you have everything encrypted/etc. they could do frequency counts of the tokens and convert them to words, traffic analysis (can't encrypt the from/to). etc.

    13. Re:Am I missing something? by Anonymous Coward · · Score: 0

      Do you know which word "2" represents, or what is in documents 103 and 178?

      neither does he, unless he works that out on the plaintext data.

    14. Re:Am I missing something? by CarpetShark · · Score: 1

      wanted to search it remotely and securely, but without Google being able to look at the data. Even if that were possible, why are you trusting Google with that in the first place? Why not store it somewhere else?

      Probably because, if it's encrypted and searchable remotely, then you can potentially store it anywhere. And if you can store it anywhere, then backups are going to be cheap.

      Personally, I'd be looking at ways to keep the search index local.

    15. Re:Am I missing something? by mark0978 · · Score: 1

      This kind of Tokenization allows you to token frequency analysis of all the documents, you essentially destroy the value of the encryption by using the same key for all documents.

      Standard PGP uses a different key for each document so the tokens never appear in the same format. When you give this up, you give up a large portion of the security of your documents.

      The only way I see to do this is build the index on your machine and point to the offline storage.

    16. Re:Am I missing something? by psydeshow · · Score: 1

      And these tokens are generated how?

      Keywords and meta-data entered in the client while everything is decrypted. Or automatic indexing by whatever does the encryption.

      You're not wrong to point out that frequency analysis and other techniques would render this less secure than a perfect black box. But it's also significantly better than trusting Google or your ISP with plain text.

    17. Re:Am I missing something? by psydeshow · · Score: 1

      What makes you trust your colo provider or ISP more than you trust Google?

      I have this same feeling myself, but I've never been able to articulate exactly how Google are any different from any other commercial provider. They employ engineers who are smart enough to avoid the kinds of dumb mistakes that result in data leaks. They are too big to care about you personally.

      If I can manage to trust Verizon, why shouldn't I trust Google?

  2. huh? by Anonymous Coward · · Score: 0

    if the server cannot decipher the query it cannot execute it on a binary blob of encrypted data. FAIL.

    1. Re:huh? by HTH+NE1 · · Score: 5, Funny

      if the server cannot decipher the query it cannot execute it on a binary blob of encrypted data. FAIL.

      Gung jbhyq qrcraq ba ubj gevivny lbhe rapelcgvba zrgubq vf.

      --
      Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
    2. Re:huh? by oldspewey · · Score: 4, Insightful

      Well that depends whether the OP wants to perform something like a fulltext search (i.e. the ability to look for keywords within the content of each document) or a metadata search.

      There's nothing to prevent you setting up a CMS where each piece of content is encrypted, but the metadata describing that content is out in the clear and searchable. Security in such a scenario would be less than optimal (e.g. people could guess certain things about your content based on the statistical pattern of length for each of the millions of encrypted content items), and of course you'd have to be very careful about the metadata fields and how you are populating them.

      --
      If libertarians are so opposed to effective government, why don't they all move to Somalia?
    3. Re:huh? by needs2bfree · · Score: 4, Informative

      For the n00bs, the above post is in ROT13. Here is a link for a converter.

    4. Re:huh? by Anonymous Coward · · Score: 0

      [rot13]That would depend on how trivial your encryption method is.[/rot13]

      Not quite. It would depend on how trivial your definition of encryption is.

    5. Re:huh? by Hognoxious · · Score: 1

      There's nothing to prevent you setting up a CMS where each piece of content is encrypted, but the metadata describing that content is out in the clear and searchable.

      I'd assume that's exactly not what the OP means, on the grounds that it's so trivially obvious that nobody would need to ask it.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    6. Re:huh? by grassy_knoll · · Score: 1

      Could also just use VI with the g?? command...

      ohg gura gung zvtug or gbb zhpu jbex sbe fbzr.

    7. Re:huh? by Anonymous Coward · · Score: 0

      Technically it would depend on your definition of security.

    8. Re:huh? by Anonymous Coward · · Score: 0

      True, but I'd still feel sorry for the goat.

    9. Re:huh? by DaveM753 · · Score: 2, Funny

      I tried the vi command, but I get this weird error:

      'vi' is not recognized as an internal or external command,
      operable program or batch file.

      Does this mean ROT13 is not compatible with Windows?

    10. Re:huh? by Anonymous Coward · · Score: 0

      No. It just means you are not qualified to post on Slashdot.

    11. Re:huh? by jcwayne · · Score: 1

      What's an OP?

      --
      Failure to follow this advice may result in non-deterministic behavior.
    12. Re:huh? by cybernanga · · Score: 1

      Original Poster

      --
      www.Buy-Proxy.com - A "buyer-driven" global marketplace.
    13. Re:huh? by grassy_knoll · · Score: 2, Funny

      I think so... perhaps you should try another operating system?

      Be aware though, it's got a weird text editor...

    14. Re:huh? by Anonymous Coward · · Score: 1, Funny

      For the n00bs, the above post is in double-ROT13.

    15. Re:huh? by PensivePeter · · Score: 1

      That's exactly the principle behind some companies' implementations of the ISO 13250 Topic Maps standard: easily accessible and searchable "maps" of your information and relevant relationshps between them, but without access to the information itself. By analogy, I can get a good idea and picture of where people live and organizations are housed but that does not mean that I can access their properties.

  3. It's not possible even in theory by nahdude812 · · Score: 4, Informative

    It's not possible to do this even in theory, unless you're relying on very weak encryption. The point of encryption is that you can't infer anything about the contents. If Google was able to infer enough to give you meaningful search results (if for example each word was encrypted by itself, and you searched for the encrypted version of the word), they would therefore necessarily be able to know enough to perform a frequency analysis attack on your data and compromise it in no time flat unless it was a very small amount of data (thus meaning search isn't really of value anyway).

    You'll find a similar problem plagues any attempt at searching. Searching requires a certain knowledge or meta knowledge of the material being searched; and that knowledge necessarily dramatically weakens your encryption.

    1. Re:It's not possible even in theory by TheRaven64 · · Score: 5, Interesting

      It is possible. When you upload the data, you also upload an index. When you connect again, you download the index (which is much smaller than the data) and search that on the local machine. Neither the index, nor the data, is ever unencrypted on the server.

      As for frequency analysis, I don't think any encryption algorithms published in the last 40 years have been vulnerable to this sort of attack...

      --
      I am TheRaven on Soylent News
    2. Re:It's not possible even in theory by blueg3 · · Score: 1

      Not possible in theory? You should tell the authors of the linked paper that describe how to do it in theory.

    3. Re:It's not possible even in theory by nine-times · · Score: 1

      Yeah, I'm not sure I understand how meaningful searches can be done without decryption-- but then I don't pretend to be any kind of a genius about these things. It seems much more likely to me that there could be some kind of a system where unencrypted search indexes are kept locally while the files are encrypted and sent to an online storage service. Then you could search locally for the file you're looking for, fetch the encrypted information from the online storage, and then decrypt it locally.

      That sort of thing seems entirely possible to me, but I'm not aware of any service that specifically offers that. I bet someone here could whip something up in 10 minutes to do it.

    4. Re:It's not possible even in theory by jgtg32a · · Score: 1

      That's because all encryption produced in the last 40 years has been based off of Division not Addition

    5. Re:It's not possible even in theory by TheRaven64 · · Score: 5, Informative

      Replying to myself: the scheme in the linked paper is not feasible. It performs O(n) searches, but this means that the amount of data you need to upload for the query is equal to the total amount stored. Since most consumer Internet links are asymmetric, it would be cheaper and easier to simply download the entire data search locally. The paper proposes having a server-side cache. This means that, for a typical block cypher, you would have a cache of every search term encrypted for each block. The server could then compare this to each block, but would not know what the plaintext is. This is not useful in any real-world scenario. The cache would be orders of magnitude bigger than the stored data and the search would sill be O(n), which is painfully slow. As I suggested above, uploading an encrypted index with the data makes more sense. Look at Apache Lucene or Apple's SearchKit for how to do this.

      --
      I am TheRaven on Soylent News
    6. Re:It's not possible even in theory by FredFredrickson · · Score: 3, Interesting

      Mozy does this for personal/business backups. You can use a completely private key, but search your own data.

      --
      Belief? Hope? Preference?The Existential Vortex
    7. Re:It's not possible even in theory by TheRaven64 · · Score: 1

      The algorithm in the linked paper requires you to upload at least as much data as is stored remotely for every search query. This is technically possible, but it would be cheaper and easier to download and decrypt all of the data locally then run all of your searches, which seems to defeat the point. The only occasion when their algorithm makes sense is when you are repeatedly searching for the same terms, but if you're doing that then you should just save your search results.

      --
      I am TheRaven on Soylent News
    8. Re:It's not possible even in theory by smallfries · · Score: 3, Insightful

      I'm curious - why would you post a comment claiming that this can't even be done in theory, when the submitter included links in the summary to a paper that shows that it can?

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    9. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      Whether it is possible really depends on what type of operations you are going to allow. For example, it is possible to perform basic arithmetic on encrypted integers, without ever needing to decrypt them.

      It may also be possible to achieve some form of security/searchability trade-off, in which the data which may be leaked is strictly limited while still providing the necessary usability.

      Regardless of which ends up being the case, it certainly is not nearly as black and white as you imply.

    10. Re:It's not possible even in theory by hesaigo999ca · · Score: 1

      You could assign tags to the meta headers of the encrypted file, that can be grouped into sub categories, hence some file that says I am encrypted but I can vouch that I am an image, could prove useful

    11. Re:It's not possible even in theory by Homburg · · Score: 1

      That's true for their Scheme I, but I don't think it's true for Scheme II, or any of the subsequent schemes, is it? Scheme II and all subsequent schemes make the key for any word a function of that word, so, to search for a word, you just need to upload the word and its related key. I don't see why that would be anything like as much data as is stored remotely.

      Now, the idea of making the key used to encrypt a given word a function of that word kind of sounds insecure to me, but I don't have the cryptography chops to back that feeling up.

    12. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      Exactly. This is a stupid proposal.
      I'm sorry to flame, but even considering this as a subject indicates to me that slashdot has gone downhill.
      Meanwhile there are some real scientific curiosities that need to get discussed. WTC7, Nanothermate, and Jane Stanley come to mind.

    13. Re:It's not possible even in theory by cakeninja · · Score: 3, Informative

      Mozy does not encrypt your file names. Someone without your private key could still view your file names if they had your Mozy login information.

    14. Re:It's not possible even in theory by flaming+error · · Score: 2, Insightful
      > the amount of data you need to upload for the query is equal to the total amount stored
      That's not how I read it. But the approach still sounds useless:

      If Alice wants to search for the word W, she can tell Bob (the server) the word W and the ki corresponding to each location I in which W may occur

      What's the use of encrypting the data if you're going to send keywords in cleartext to a party you're trying to hide the data from?

    15. Re:It's not possible even in theory by felipekk · · Score: 5, Funny

      Gee guys, isn't this a little bit too much work just to hide your porn?

      Just mark the directory as hidden, your mom will not find it.

    16. Re:It's not possible even in theory by TheRaven64 · · Score: 1

      Their subsequent schemes appear to rely on using asymmetric encryption (e.g. RSA) where you can provide the server with the public key and have it be able to encrypt, but not decrypt, data. Asymmetric encryption is massively more computationally expensive than symmetric, which is why it is never used for this kind of application.

      --
      I am TheRaven on Soylent News
    17. Re:It's not possible even in theory by MobyDisk · · Score: 1

      So does Spare Backup. Just like Mozy, they don't encrypt the file names though. So you can search the file names, but not the contents.

    18. Re:It's not possible even in theory by Anonymous Coward · · Score: 2, Informative

      "why would you post a comment claiming that this can't even be done in theory, when the submitter included links in the summary to a paper that shows that it can?"

                Because it can't. The one paper proposes (unless I'm missing something!) giving the server the word to search for AND the keys! The security is by frequently rotating the key, and if you KNOW you only wanted to search, say, chapter 1 of a longer document, only give the key for chapter 1. Not very secure!

                If the encrypted data has ANY types of patterns that can be used to infer the contents, the encryption system is weak. The only way to do this is to generate some kind of metadata (search indexes basically) locally, BEFORE you send up the encrypted files, send the metadata up *unecrypted*, and hope the metadata doesn't have sensitive data.

    19. Re:It's not possible even in theory by guywcole · · Score: 1

      I disagree. Consider the case of journaling a file structure. It is possible to encrypt the data AND the journal. Then you need only retrieve, decrypt, and analyze the journal.

      This retrieval can be done client side. The trick would be creating the journal. The only practical ways I see to do it are:
      1. Have the server do it, which requires them (temporarily) seeing the decrypted data.
      2. Do it before uploading, which requires having the entire data set client-side, which defeats the online storage.
      3. Do it client side, which requires passing all the data over the connection but entirely storing it. Consumes network resources, but doesn't violate privacy or require substantial client-side storage.

    20. Re:It's not possible even in theory by BitZtream · · Score: 2, Informative

      And that would practically defeat the purpose of the encryption.

      For the index to be useful it has to provide too much information about the encrypted data. The point of encryption is to ensure that nothing can be inferred about the contents of the encrypted data. If you give them a nice big bunch of information about whats encrypted, why bother encrypting it in the first place?

      Given enough information in the index they could actually derive your encryption key as well with some simply brute forcing.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    21. Re:It's not possible even in theory by goodmanj · · Score: 3, Insightful

      Can I have an anti-theft system for my car, so that nobody can steal it but anybody who wants to can take it for an anonymous test-drive?

    22. Re:It's not possible even in theory by goodmanj · · Score: 1

      Booyah, car analogy, thread over.

    23. Re:It's not possible even in theory by Clandestine_Blaze · · Score: 1

      In theory, I can think of one way where this would actually work. I could be wrong though.

      I don't speak, read, or understand Russian, at all. But if you gave me a sheet that had Cyrillic text, and gave me a request to retrieve a phrase or a portion of the text, I could probably do it given enough time by matching the characters exactly, but won't have any idea what I'm reading. I'm not translating anything, just retrieving a portion of the text based on a character map.

      Of course, I'm not sure if this analogy is really doable, but it's how I interpret what they want.

    24. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      It depends upon the kind of search, and the kind of encryption.

      Let's say you have a simple database with a few fields -- name, address, phone, etc. You can search those fields quickly if you index them.

      Now let's say that all of the data in those fields in encrypted using a public key. So you have unreadable stuff in each field. You can index an encrypted field just like you can encrypt a plaintext field.

      And when you do a search, say for people named "Smith", you can just encrypt Smith, and then look up the result in your index, and then decrypt whatever comes back.

      The larger point, though, is that if you're that paranoid, using the cloud is a bad idea.

    25. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      Sorry but Mozy does not do this. You are referring to the Pi offering which works ONLY if you use the Mozy provided encryption key. Using the Mozy key, your data is completely decrypted, stored, indexed and searched. If you choose to use your own private encryption key, Mozy cannot access any of your data. Also Mozy doesn't encrypt file names regardless of what encryption key you choose.

    26. Re:It's not possible even in theory by Vu1turEMaN · · Score: 1

      The only thing that I know is that it's gonna somehow involve Locate32 http://www.locate32.net/

    27. Re:It's not possible even in theory by Fwipp · · Score: 1

      If by 'Division' you mean multiplication in modular sets or Galois fields, or addition in more complex sets like elliptic curves or Braid groups, then yes.

    28. Re:It's not possible even in theory by fwr · · Score: 1

      That is not what was originally requested. What was originally requested was to have the provider do the search. Uploading keywords that the provider can search, or any other sort of index, is also not what was requested. What was requested was to have the provider search the actual data, not some cleartext index or keywords attached to the encrypted data. Your solution also makes little to no sense to upload the encrypted index if you are just going to download it again to search it locally. You may as well just keep the index locally, and not upload it at all. If you do that, well then the provider isn't doing the search at all now, is it? You are doing the search, and the provider is just storing the encrypted files. It just servers up file 000001 when requested. So, doing exactly what was requested is not theoretically possible. Doing something else that was not requested is of course theoretically possible, because you get to define what those requirements are.

    29. Re:It's not possible even in theory by fastbiker · · Score: 1

      Sorry but Mozy does not search or index you backup data. They're starting to offer a service like this but it works ONLY if you use the Mozy supplied encryption key. With the Mozy supplied key your data can then be unencrypted, stored and indexed. Yes it's then stored unencrypted on the servers. If you choose to use your own private encryption key, then Mozy cannot do anything with your data. In any case, Mozy does not encrypt file names. File names and directory paths are stored on the servers unencrypted.

    30. Re:It's not possible even in theory by Maelwryth · · Score: 1

      "Just mark the directory as hidden, your mom will not find it."

      Just make sure you aren't sharing a parent folder over a network as hidden folders may not be hidden to other operating systems.

      --
      I reserve the write to mangle english.
    31. Re:It's not possible even in theory by ecevans · · Score: 2, Insightful

      This is true that what you're describing would work, but you're talking about a translation, not an encryption. Using a good encryption scheme, all encrypted instances of a given string would not be the same.

    32. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      Wuala does this as well. However, the search is limited to file metadata (name, tags, etc.). If that satisfies your needs and you don't have more than a few thousand files, this might be a good option.

    33. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      As for frequency analysis, I don't think any encryption algorithms published in the last 40 years have been vulnerable to this sort of attack...

      The "frequency analysis" is not the frequency analysis done to crack the cipher, but rather an analysis of the "search" or "directory info" requests such that the attacker could then begin to infer what the searches were. Related, but not exactly quite the same.

    34. Re:It's not possible even in theory by Clandestine_Blaze · · Score: 1

      I figured what I described was too good to be true. :) In that context, I can see how this would differ with encryption. Thanks for the clarification.

    35. Re:It's not possible even in theory by DigitalCrackPipe · · Score: 1

      What you describe sounds like a viable alternative. However, the OP question involved the service doing the search, which should send off all kinds of warning bells - that would require unencrypted access for the server to at least part of the data.

    36. Re:It's not possible even in theory by Estanislao+Mart�nez · · Score: 1

      Let's say you have a simple database with a few fields -- name, address, phone, etc. You can search those fields quickly if you index them. Now let's say that all of the data in those fields in encrypted using a public key. So you have unreadable stuff in each field. You can index an encrypted field just like you can encrypt a plaintext field. And when you do a search, say for people named "Smith", you can just encrypt Smith, and then look up the result in your index, and then decrypt whatever comes back.

      But now suppose that one of your fields stores a boolean value. Your scheme requires that the cyphertext for "true" in that field be the same in the index and in all of the records; likewise for "false." This means that I can learn a lot about your database by looking at the cyphertext values--for example, I can trivially learn which fields are boolean.

      This generalizes beyond just boolean fields. For example, "Smith" is a fairly common last name. If I can discover which field in your database stores the last name of a record, and the last names in your database are a representative sample of the USA, I can use data on relative frequency of last names in the USA to formulate hypotheses about what last name cyphertexts correspond to which plaintexts. Each of these hypotheses corresponds to some hypothesis about your encryption key; this means I can try each of the implied keys on other fields and records of the database, and if the resulting plaintext is not garbage, then that provides independent confirmation that I guessed your key right.

      Basically, the scheme you're proposing is a substitution cypher over words.

    37. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      What are the odds his mom uses Linux

    38. Re:It's not possible even in theory by ewanm89 · · Score: 1

      My shares are limited to certain other hosts on the network anyway, plus my mum can't even manage to connect to her own shares. Finally, most of the time I use sshfs and so do not have samba/nfs running.

    39. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      I'm not the poster of said comment, but I think the difference stems from the level of security that is being provided. There seems to be an implicit assumption that an encrypted search query of encrypted data doesn't give the server any information about the data. This is not what the paper is claiming, and I, along with other readers, believe this level of security is theoretically impossible. This is due to the fact that searching for a word reveals to the server where those words occur in the encrypted data. A statistical analysis can then be used to try to figure out what the words are.

      In the paper, they mention that the first two schemes use a plain text word, W, for the search, thus it reveals the locations where W appears. Do enough searches for different words, and you've essentially allowed the server to decrypt your data a little at a time. The later schemes basically encrypt twice so that the search is for an encrypted word, X. The authors themselves mention statistical attacks, which refers to the fact that since the value of X depends only on the plain text word, W, statistical data on where particular values of X occur can be used to find the corresponding values of W. The paper even mentions a "reset button" of sorts where everything is encrypted with a new key and the order of the words is randomized to prevent the server from building up too much information.

      The paper also mentions an additional knob - the rate of false positives. Generating more false positives means the statistics collected by the server are less accurate. However, I believe this only makes it more secure in that it increases the time between resets.

    40. Re:It's not possible even in theory by aztektum · · Score: 1

      An encrypted file filled with financial information, locked up and merely tagged "financial information" would only tell you what sort of data is in there. If you can't view the data, no harm no foul?

      --
      :: aztek ::
      No sig for you!!
    41. Re:It's not possible even in theory by gbh1935 · · Score: 1

      You are forgetting, the index itself reveals data.

    42. Re:It's not possible even in theory by blackest_k · · Score: 1

      what if you could group your keys and rather than returning matches for smith you returned matches for keys starting with S (for over simplicity) then when the encrypted data comes down you only need locally decrypt part of the db. or maybe have one encryption for the key and a separate encryption for the data associated with that key. the portions returned might never hit disk storage and anything decrypted lost as soon as the power was pulled.

      actually you could perhaps arrange storage of files in a system of directories and put your info perhaps as what looks like random text in doc files or spreadsheets. Then you need have a scheme for where to store the data and retrieve it from. you could fill in the gaps with white noise.

    43. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      Actually I hide my encrypted plan to acquire world domination in the meta-channels and color manipulation layer of my porn videos. Lots of space to store information in there. I always try to act embarrassed when my mom finds the porn, but she will never know the truth before it is to late. *evil laugh*. Oh, you were trying to be funny?

    44. Re:It's not possible even in theory by Seth+Kriticos · · Score: 1

      GP meant reasonable theory. Take a good book about cryptography and read it, than you will maybe understand. There are a lot of smart people thinking about this kind of problems. It is simply a collision of the basic principles of cryptography. Like saying a normative car is faster than a normative bicycle at maximum speed. Or 1 + 1 != 42

      Ps. and yes, I checked TFA and it's wishful thinking at best, arrogant lie at worst.

    45. Re:It's not possible even in theory by martin-boundary · · Score: 1
      I'm sorry but your claim of impossibility is not very rigorous at best, and clearly wrong at worst.

      An encryption is simply a transformation mapping one string of symbols into another string of symbols. The aim is of course that this mapping is hard (but not impossible) to invert.

      The search problem requires that this encryption mapping has good enough properties, which allow the search string to be found using some algorithm with a given complexity.

      This property is trivial with some encryption methods, for example with a letter substitution cypher: Simply encrypt the search string itself, and do a normal string search on the encrypted target document. In this case, the encryption mapping preserves substring inclusion.

      Once you have the above example, you know that the search problem is solvable, and you can certainly go looking for other solutions.

      You'll note that this example not only disproves your impossibility claim, but can be trivially adapted to disprove other kinds of claims, such as that being able to search makes the encryption necessarily weak.

      Indeed, it is easy to design a user interface which decouples the underyling encryption method from the string search box and results display, in such a way that the results do not give any usable information about which encryption method was used or what the document really looks like. In particular, you could be given a full copy of the encrypted document, and the search facility itself would be of no help whatsoever in decrypting the full document.

    46. Re:It's not possible even in theory by turbidostato · · Score: 1

      Whether it is possible really depends on what type of operations you are going to allow. For example, it is "possible to perform basic arithmetic on encrypted integers, without ever needing to decrypt them."

      Yeah, but not without an observer knowing that you are doing arithmetics on encrypted integers if watching time enough. Link this to any other knowledge (like that you trade on the stock market) so the observer can hint on the gross numbers involved and you are done.

    47. Re:It's not possible even in theory by Tacvek · · Score: 1

      Sure you can. You just have it phone home once an hour on the hour, with its current location, and add a sign to the car that says so. Then have the systems check each hour for a secure indicator on a website you own. If the system finds the indicator it turns on continuous tracking, but also turns on a warning light that indicates to any potential test-driver that a test drive would not be anonymous.

      People can test drive anonymously, but cannot steal the car, since once stolen you can activate the tracking, and find them.

      Add in a system where tampering with the security system causes the car to explode, and add the words "Security system's anti-tamper defenses are lethal", And you are good to go.

      --
      Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
    48. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      Indeed, it is easy
      to design a user interface which decouples the underyling encryption method from the string search box and results display, in such a way that the results do not give any usable information about which encryption method was used or what the document really looks like. In particular, you could be given a full copy of the encrypted document, and the search facility itself would be of no help whatsoever in decrypting the full document.

      You brute force the list of word in the document by searching for every word in the dictionary. And seeing which ones return the document. There you go. Now you have every word in the document. That really helps in determining the contensts of the document, especially for short documents.

    49. Re:It's not possible even in theory by martin-boundary · · Score: 1
      Why even stop there? If you pick the search strings as two consecutive words, (eg "in the", "the beginning"), then you can reconstruct the full document with high probability, without ever attempting to decrypt the document itself.

      This is really using search as an oracle though. I suspect that a reasonable UI would limit the number of searches allowed in some time interval, not unlike typical password entry screens.

    50. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      I think that's called LoJack.

    51. Re:It's not possible even in theory by Lord+Bitman · · Score: 1

      Can I have an anti-theft system for my music, so that nobody can copy it but anybody who wants to can buy it and listen to it?

      --
      -- 'The' Lord and Master Bitman On High, Master Of All
    52. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      Agreed. If the "encrypted junk" was searchable, it would be readable. Sorry, but this doesn't even sound like a serious query, more like an april fools joke, or a quick way to see if someone has the logical ability and understanding of what encryption does. Your, best bet is a virtual dedicated server with its own search engine software installed. The whole server is password protected, but once you get through the gate, everything is searchable.

    53. Re:It's not possible even in theory by raju1kabir · · Score: 2, Interesting

      I'm curious - why would you post a comment claiming that this can't even be done in theory, when the submitter included links in the summary to a paper that shows that it can?

      Because the paper doesn't propose any solution that is practical, or which even leads to a practical solution.

      In theory I can cure all forms of cancer - all I have to do is go through each cell in the victim's body and pluck out the cancerous ones.

      --
      "Patriotism is your conviction that this country is superior to all other countries because you were born in it." -- GBS
    54. Re:It's not possible even in theory by raju1kabir · · Score: 1

      Once you have the above example, you know that the search problem is solvable, and you can certainly go looking for other solutions.

      It is intrinsic to the nature of substitution ciphers that the same encrypted text will map to the same plaintext. This aspect is what allows your proposal (submitting encrypted search keywords) to function. Take away that aspect of the cipher and you're back at square one.

      --
      "Patriotism is your conviction that this country is superior to all other countries because you were born in it." -- GBS
    55. Re:It's not possible even in theory by philosopher3000 · · Score: 1

      You're still not searching the data, so unless the index has been carefully set up with a set of key words for each article or page, then you will miss search terms, and it will never be as good as a Google AI search.

    56. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      No, the index is encrypted too. The posters idea is that you download the (smallish) index and do the search client-side.

    57. Re:It's not possible even in theory by philosopher3000 · · Score: 1

      Agreed. But like any good fiction, if this magic were miraculously plausible, the possibilities would be amazing. It would guarantee free speech forever, make money laundering a meaningless exercise, and create un-copy-able data. True freedom and real secure intellectual property are not dreams easily dissuaded.

    58. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      A guy with a gun that sits in the car and ensures the return of the car.

    59. Re:It's not possible even in theory by Anonymous Coward · · Score: 0

      Unfortunately I taught the wife too much about computers. Anonymous as she knows I post on slashdot.

    60. Re:It's not possible even in theory by Hognoxious · · Score: 1

      What about tags like "KatyaRussia13yrsdonkeyXXX" or "Area51secretpicz"?

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    61. Re:It's not possible even in theory by scruffy · · Score: 1

      This doesn't seem so hard.

      For each encrypted data chunk, you supply a corresponding bag of encrypted tokens. Each search is expressed with encrypted tokens. The information that an attacker can infer depends on the size of the chunk and the viability of a frequency analysis on the tokens, as well as the possibility of the attacker having access to some of the plaintext.

      Now if you want to search for phrases, you will have to supply a bigger bag of tokens, and that will probably have more weaknesses to attack.

    62. Re:It's not possible even in theory by nahdude812 · · Score: 1

      1) This is not your storage provider offering search, this is you storing a file in your provider's space which helps you to do a search yourself. There is nothing which stops you from doing this already, and no special support is required from your storage provider.

      2) Each time you modify your data and upload the modified data with its revised index, this exposes information about the underlying data, as you have essentially two messages with the same information stored using a different scheme - and this allows for a conflict analysis where you observe that certain patterns in the relationship between the two can emerge and can be exploited to reveal more about the encrypted information than is ideal.

    63. Re:It's not possible even in theory by nahdude812 · · Score: 1

      You would only be able to search by complete tokens, and either the token size is small so you can glean a (small) amount of benefit out of this, but you're exposed to a frequency analysis attack, or it's large, and there are few meaningful searches you can do, and you're exposed to a frequency analysis attack but which is harder to exploit.

      And still it's not very useful. Imagine that your data is:
      The rain in Spain fallss mostly on the plain.
      Assume your storage block size is 4 bytes (I know you'd use something bigger, but this is just to demonstrate how it wouldn't offer you much value).

      Here is what your storage looks like:

      The rain in Spain fallss mostly on the plain
      aaaabbbbccccddddeeeeffffgggghhhhiiiijjjjkkkk

      I want to search for the word "rain" so I search for the token identified by "bbbb" above. I get my result, and it works fine. However if I want to search for "the" I have several problems.

      First, I have "The" with a capital T, which is different from "the" with a lower case T. Search is typically expected to be case insensitive, so there's no way to know that the first word is my expected term. Maybe you can lowercase the search text before generating your tokens, but you're opening yourself up even more to a frequency analysis attack since you're now even limiting the characters permitted to fit within tokens.

      Second, even if I search for "The", it will not agree with my token as my token is actually for "The " (with a space).

      Third, the second "the" in the sentence is actually divided up among two tokens. To successfully find that word, I need to search for the tokens iiii and jjjj, so I need to search for "on the p" to find the word "the".

      Fourth, I later realize that I made a typo on the word "fallss" and correct it. Now instead of having to find that second "the" with the term "on the p" I now have to find it with "n the pl".

      So maybe you have a token per word, and that token is always based on the lower case version of that word (ignoring the fact that transforming case at this level is a layering violation, the same as having a "case insensitive" file system), a given token for a given word would then always have to be consistent regardless of position within a document. You've essentially got a cryptogram like I used to solve out of the newspaper comic pages over a bowl of cereal, except that it's a word per token instead of a letter per token.

    64. Re:It's not possible even in theory by nahdude812 · · Score: 1

      Giving an untrusted party enough information to do more with your encrypted data than simply transfer it around or attempt to brute force crack it automatically weakens your encryption.

      You're right that you could devise a system that you could label "encryption," which could be searchable, but any such system exposes so much information about the encrypted data that reversing the encryption is trivial (eg with letter substitution cyphers, you perform letter and word frequency analysis, and the brute force space is simply the size of your substitution dictionary).

      So while you certainly could come up with a system that was searchable, any system that exposed that much information about my data, I wouldn't accept the term 'encryption' as applied to it any more than I would consider a marble wrapped in a roll of paper a telescope, even if some might call it that. The point of modern encryption is that there is nothing which can be inferred about the underlying data, apart perhaps from a relative size of the data (assuming they have not added padding). It's not possible to distinguish data encrypted with a strong encryption algorithm from random noise.

      Any meaningful search you could perform against the data exposes too much about that data to the untrusted party (in this case the search/storage provider). As others have said in this thread, it makes more sense to create a search index and encrypt and upload that along with your data (which is not really your provider doing search, it's you storing search data on your provider, and it can lead to some form of exposure anyway if you're not very careful).

    65. Re:It's not possible even in theory by smallfries · · Score: 1

      Really? Which basic principles of cryptography do you think collide? The OP describes a non-problem. There is already work that solves the basic search problem without leaking frequency information. The basic technique returns constant-length results to any query.

      When you flick through your "good book about cryptography" you may want to look up Randomised Encryption Schemes, Homomorphic Encryption and Yao Circuits. Then you will maybe understand.

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
  4. You want to... by mhkohne · · Score: 4, Insightful

    Use an encrypted query to match against the encrypted text. The problem is, if the text is REALLY encrypted, then there shouldn't be enough information to do this - the encrypting of the original text should make it impossible to even match against it.

    If it didn't, then an attacker who got hold of the encrypted text and some of your encrypted queries might well be able to mount an attack based on commonalities between the two.

    --
    A thousand pounds of wood moving at 300 feet per minute. Don't get in the way.
    1. Re:You want to... by noidentity · · Score: 3, Funny

      The problem is, if the text is REALLY encrypted, then there shouldn't be enough information to do this - the encrypting of the original text should make it impossible to even match against it.

      NOT TRUE! I use a combination of XOR and rot-13 encryption and I'm able to do text searches just fine. The trick is to encrypt the search string, then it'll work perfectly. This is because the encryption doesn't depend on the position within the text, but that shouldn't hurt security too much.

    2. Re:You want to... by Thad+Zurich · · Score: 2, Informative

      ROT13 is encoding, not encryption. You transform the information, but you don't conceal any of it. Ziad El Bizri (OP cit.) apparently observes that if you encrypt the keywords individually, then you can submit encrypted keyword queries, and the server can search for them for you. This is great, but why would you want to? The object of a search server is for other people to be able to search the data (otherwise why index it on the server?) With the suggested scheme, only the data owner (or shared key holders) will be able to search the data. It would seem to be just as easy to construct a trustworthy server and then encipher the query traffic, as has already been observed.

    3. Re:You want to... by Anonymous Coward · · Score: 1

      This is modded insightful on slashdot? Even for those not versed in crypto, I'd have hoped that "rot13 twice" was enough of a meme to make people think twice any time it's mentioned . . .

      Sad day.

    4. Re:You want to... by abhi_beckert · · Score: 1

      Your solution only works with "kids in tree-house" strength encryption. With any of the modern and well-regarded encryption algorithm, the position *does* effect the output.

      For example, sha1 of two similar strings give completely different output (sha1 is a hashing algorithm, but modern encryption systems have the same behaviour):

      sha1 'abc'
      a9993e364706816aba3e25717850c26c9cd0d89d

      sha1 ' abc'
      3a2d0af63d31343a13054b9758c00398c772c5fd

    5. Re:You want to... by Anonymous Coward · · Score: 0

      Why? They don't know what the encrypted queries are either, so it shouldn't help them.

      It's not easy, but in theory it is possible (although of course traditionally this has not been a use case that's accounted for, so you have to be really careful not to add exploits when you add the searching).

    6. Re:You want to... by Anonymous Coward · · Score: 0

      I think it's hilarious you were modded insightful when you were clearly (I hope) joking. Yet strangely disturbing, considering that Slashdot is supposed to contain very technical people.

    7. Re:You want to... by noidentity · · Score: 1

      This is modded insightful on slashdot? Even for those not versed in crypto, I'd have hoped that "rot13 twice" was enough of a meme to make people think twice any time it's mentioned . . .

      Nono, it doesn't apply rot-13 TWICE, just once. It's the COMBINATION of that and XOR that makes it so powerful. I XOR with the byte value 0x00, but you can choose any of the 256 available values. Just don't tell anyone which one you use. It'd take them forever to try all 256! (and I too sincerely hope that everyone replying to this in a serious tone is just playing along with the joke)

    8. Re:You want to... by Anonymous Coward · · Score: 0

      Who the fuck modded that insightful!?

      Have you not been following the "I just rot-13 it twice" meme 'round these parts? I think funny was what you were looking for!

    9. Re:You want to... by ewanm89 · · Score: 1

      this is also untrue with truly random keystream generation, or with any of the block cipher modes (you could encrypt the search string a number of times with on of the modes to be possible).

    10. Re:You want to... by noidentity · · Score: 1

      Your solution only works with "kids in tree-house" strength encryption. With any of the modern and well-regarded encryption algorithm, the position *does* effect the output.

      Yes, obviously. If position couldn't affect the encoding, then all you can use is a substitution cipher, because the anything else depends on state, of which there is none. That, along with rot-13, were my hints that I wasn't being serious.

    11. Re:You want to... by philosopher3000 · · Score: 1

      Wouldn't this depend upon the bit length of the encrypted string. Because, as I understand it the common public key encryption algorithms use repeating periodic "chunks" and use the value of these "chunks" of data, to help randomize the code. Essentially each "chunk" of data is randomized differently, forcing you to decrypt the whole database in order to search it.

    12. Re:You want to... by Anonymous Coward · · Score: 0

      Umm, rot13 is not considered strong encryption in the modern sense. Also, what chaining algorithm are you using? The chaining algorithm is as important as the encryption used if you want to avoid certain types of attacks.

    13. Re:You want to... by Anonymous Coward · · Score: 0

      Lol, some people actually use XOR encryption

  5. Easy by AchiIIe · · Score: 1

    This sounds pretty easy,
    a) obtain database, indexing tools, search tool
    b) install on the machine and encrypt the entire hard drive with any of the many available whole-disk encryption tools
    c) ssh in and run queries.

    --
    Nature journal lied in Britannica vs Wikipedia Ask to retrac
    1. Re:Easy by Anonymous Coward · · Score: 1, Insightful

      RTFQ read the question again, please. With disk encryption the data would still be unencrypted in the server's RAM. The OP wants something much more sophisticated... data always encrypted in the server (HDD, RAM, CPU) but with the ability to search it. Not that easy to me.

    2. Re:Easy by icebraining · · Score: 1

      Is that so difficult?!

      Install GmailFS. Mount it somewhere. Install encfs. Use the gmailfs mounted folder as the encrypted folder for encfs and mount it on another folder. Install Trackerd and configure it to scan that folder and save the index data there.

      Presto. Was that difficult?

  6. Maybe, maybe not by MrEricSir · · Score: 1

    Unless you do the indexing client-side, and upload an index that's somehow encrypted...

    I'm not saying I know how to do this, but it seems possible.

    --
    There's no -1 for "I don't get it."
    1. Re:Maybe, maybe not by Anonymous Coward · · Score: 0

      Couldn't you also use meta-data? Attach a description of the file as meta-data and encrypt the file contents. That way you can search the meta data for key words, but your actual data remains safe. I suppose someone determined enough could use your meta-data as a crib and crack your encryption, but that's the risk you take.

    2. Re:Maybe, maybe not by The+Moof · · Score: 3, Interesting

      Maybe something like this -

      Create an index of hashes using the unencrypted data on the client.
      Encrypt the data on the client so we now have an index of hashes that apply to an encrypted file.
      Upload the hash index and the encrypted data file to the server.
      To search, hash the search criteria on the client.
      Server search the indexes for the hash value, returning a list of encrypted files with an index matching the criteria hash.

    3. Re:Maybe, maybe not by Anonymous Coward · · Score: 0

      Cribs are useless on properly done modern encryption.

    4. Re:Maybe, maybe not by MrEricSir · · Score: 1

      Sounds good to me. You wouldn't be able to get a "ransom note" but I guess that's an acceptable limitation.

      --
      There's no -1 for "I don't get it."
    5. Re:Maybe, maybe not by Estanislao+Mart�nez · · Score: 1

      Unless you do the indexing client-side, and upload an index that's somehow encrypted... I'm not saying I know how to do this, but it seems possible.

      How is the server going to consult the encrypted index, and correlate its entries with entries in the encrypted main file? Not that I'm a crypto expert or anything, but I bet you that any scheme you propose there will at the very least make it astronomically easier to decrypt the main file. Basically, the server can trivially discover facts like, e.g., 'asdgaerg' in the index corresponds to 'bslhuerl' at index 1234 in the main file, 'fuyiljfggre' at 4578, and 'rept8hljasl' at 8703. Put a lot of facts like this together and you can cut down the search space dramatically.

    6. Re:Maybe, maybe not by ewanm89 · · Score: 1

      depends on the block cipher mode, some still uses modes where dictionary based attacks can work...

    7. Re:Maybe, maybe not by MBHkewl · · Score: 1

      So in the first time my query is: fish blue short fins

      And then I hash it, but the next time I search, it could be: blue fish short fins, which would result in a different hash...
      This is if you want to search the contents of the files.

      If you're just searching for an exact file, then what you suggest is correct, but that's not how Google does it, nor how the submitter wants.

      --
      Mod points are a dangerous tool. Abuse them wisely.
  7. Good luck with that.... by Statecraftsman · · Score: 1

    Just to clarify the OP's idea. They want to store only encrypted data on the server, send only encrypted queries to the server(that the server can't even decrypt), yet they expect that the server will be able to send them back results. I don't think it can happen but surprise me.

    The best I think you can do is store and transfer the data in encrypted form and put the indexes and any search logic on the client. Maybe the index could be stored on the server as well and synced to the client, but creating the index will require access to the plaintext.

    1. Re:Good luck with that.... by Anonymous Coward · · Score: 0

      It *IS* doable, but not the way you think ... and it only works for exact searches, not less than, greater than, etc...

      1. Encrypt the data locally
      2. then send the encrypted data to the server for storage.
      3. Encrypt the search term you're looking for (example: "John Smith") with the same encryption algorythm
      4. Search the db for records with the result of the previous step.
      5. decrypt your results.

      -- tomhudson (not logged in :-(

    2. Re:Good luck with that.... by billcopc · · Score: 1

      That only works for very trivial encryption algorithms, where you can map the unencrypted string character-by-character to its enciphered value.

      The reason for this is you don't know where the text resides within the document, so if your cipher is not position-independent, you're screwed. Hint: anything more robust than XOR or ROT13 will be position-dependent.

      Example for the truly dense: Let's say you have two strings. One is "Harry" and the other is "Barry". They only differ by one character, so if your cipher results in encrypted data that still differs by a single character, it becomes quite obvious that the two strings are very similar. To an attacker, this says your data is very easy to decipher, as it is effectively a 1-to-1 code list. You can perform a frequency attack or a few other stastical techniques to very easily find a decryption table, and in the case of binary data, you can look for telltale patterns like JPEG headers or other predictable, repetitive structures.

      --
      -Billco, Fnarg.com
    3. Re:Good luck with that.... by deroby · · Score: 1

      That won't help either...

      Assuming you encrypt this piece of text : "three wrongs don't make a right, but three lefts do !"
      You'll get some binary blob A

      Encrypting the word "three" will give you another binary blob B

      The chances that you'll find B inside A is practically zero due to the way modern encryption works.

      (It would work in the mighty ROT13 system though !)

      --
      If there is one thing to be learned on slashdot, it has to be sarcasm.
    4. Re:Good luck with that.... by KDR_11k · · Score: 1

      That only works if you have a block cypher with no chaining or other alteration of the blocks (which means an attacker could e.g. modify your data by copy-pasting blocks) and a search query that fits neatly into full blocks and only has to match when the data also contains the query spread out exactly like that (if you specifically store your data like that by e.g. padding with whitespaces after every word an attacker can perform a frequency analysis on the queryable words). That's extremely weak. An important part of good cryptography is that you can't see when a part of the plaintext is repeated which conflicts entirely with the requirement that the server can recognize the cyphertext of your query.

      --
      Justice is the sheep getting arrested while an impartial judge declares the vote void.
    5. Re:Good luck with that.... by sy5t3m · · Score: 1

      Even an XOR would throw it out. It would only work with a simple substitution cypher.
      Say we have the text "this text is about flowers" and the key "some very long key string goes here", we then want to search for "flower".
      In the original text, "flower" will be XOR'd against "string". In the search text, "flower" will be XOR'd against "some v".
      The URL encoded version of the first is %15%18%1D%1E%0B%15 and the second is %15%03%02%12E%04

      You could generate multiple search strings with the position on the key moved by one each time, but this could also generate false hits in the search.

    6. Re:Good luck with that.... by RiotingPacifist · · Score: 1

      But then your encryption is trivial. If "john smith" always goes to "wbua fzvgu" then your data can be scrutinized using frequency analysis, if "john smith" doesn't always go to the same thing then you need to upload what "john smith" would be at any given point in the data, at which point it makes it more efficient to download the data and then do a local search on unencrypted data.

      --
      IranAir Flight 655 never forget!
    7. Re:Good luck with that.... by Anonymous Coward · · Score: 0

      Won't work. Using same algorithm, same key, "John Smith" will not encrypt to the same result.. it will depend on the surround plaintext, position within the plaintext document, etc. Unless you use a weak algorithm.

    8. Re:Good luck with that.... by Anonymous Coward · · Score: 0

      To all the naysayers, it will work *if* you leave the DB structure unencrypted (only encrypt the field contents) and you don't use a nonce. This reduces the security, but it would work.

    9. Re:Good luck with that.... by tomhudson · · Score: 1
      It doesn't work for plain-text, but it DOES work for text or other values that are inde3xed.

      If you have an index with the filed "user" for "John Smith" (say it was "109fcu91833c098890"), there's no reason you can't search for user "109fcu91833c098890" and find the record associated with it.

      Remember, you're not searching the text, just the index, which contains encrypted values.

      I do this all the time - store a crc64 of a much longer datum - and I can now use the crc54 (8 bytes) instead of the datum itself, as an index value. Much better when your datum might be 100 or more characters, like a url.

      Use your imagination a bit ... any method of generating an encrypted value will work with this, provided you use the same algorithm, the same salt, and the same data. Think about session data as one example - you don't search it by the user name, but by a generated datum.

    10. Re:Good luck with that.... by tomhudson · · Score: 1
      You don't search the data - you search the index. The index is pre-encrypted, with the values you want.

      If you want to search a table of users, by user name, you only need an index holding the encrypted values of their user names.

      If "John Smith" is "de9ld933dd9ddd93d9da8080", you search for "de9ld933dd9ddd93d9da8080", and you get the encrypted data for that row in the table.

      This is NOT the same as the case you're thinking of, where you want to do a plain-text search. In that case, you would have to drop all the stop words, then create indexes for the encrypted value of each significant phrase chunk. For "The cat is black", you would drop the words "The" and "is", then create indexes to that "document" on the encrypted values for "CAT", "BLACK", "CAT BLACK", and "BLACK CAT". Not that much different from existing full-text search (though this IS an over-simplification) - but it WILL work.

    11. Re:Good luck with that.... by sy5t3m · · Score: 1

      As you cannot generate the same output for any set of input data with any decent encryption algorithm, the only way to know that "John Smith" is "de9ld933dd9ddd93d9da8080" in the encrypted data is to store an index of username to encrypted value. If the same applies to any other fields you may want to search on, then you are keeping an unencrypted version of the data lying around, and may as well simply search that.

      The OP wants the data encrypted at all times, so this wont work.

    12. Re:Good luck with that.... by Anonymous Coward · · Score: 0

      You don't search the data - you search the index.

      In which case you might as well keep the index locally, which is not what was asked.

    13. Re:Good luck with that.... by tomhudson · · Score: 1

      You don't search the data - you search the index.

      In which case you might as well keep the index locally, which is not what was asked.

      Don't be ridiculous - there is NO need to do that. You can recreate the value of the index key locally, and you send ONLY that tot the server, which returns one or more values that match.

      If you've ever written a program that searches by the crc64 value of a text string, rather than the text string itself, you'd know that there is no need to store anything locally. I suspect the people who are whining about storing the index locally don't know what they're talking about, or there's been a serious failure of vision on their part. As long as you're looking for exact matches on the index, you don't need anything locally except the algorithm to generate the index key.

  8. Short answer: No by Anonymous Coward · · Score: 0

    Long answer: Nope

    1. Re:Short answer: No by Anonymous Coward · · Score: 0

      tl;dr

    2. Re:Short answer: No by psergiu · · Score: 1

      Please tag story as: no

      --
      1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.
  9. It depends on the encryption by davidwr · · Score: 3, Insightful

    If the data is encrypted in independent "chunks" from which search terms can be built then this is trivial: You pre-encrypt your search terms and search for them. Searching a word ROT13-encoded document works this way, as each character is encrypted individually and an encrypted search term is made up of encrypted characters.

    Once you get past this, it's no longer easy. You basically have to either make the term you are searching for look like all possible values of the encrypted text and return all matches, or decrypt the document somewhere.

    If the encryption is good and any particular chunk, extract, or other slicing-and-dicing of the encrypted data without the key looks random, you are pretty much stuck with decrypting it somewhere.

    The alternative is to store an index, or at least a list of keywords, in clear text. For example, a document describing how to build a nuclear bomb could have a list of 10 or 20 non-classified keywords attached to it to aid searching. But that's not what you are asking for.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  10. Roll your own solution by Anonymous+Showered · · Score: 1

    If you don't trust your data in others' hands, don't give it to them in the first place.

    The (costly) solution:
    1) Get a 1U server from ACME with appropriate hardware
    2) Install favourite Unix-based OS, e.g. FreeBSD
    3) Configure server with appropriate software, e.g. Truecrypt, SSH, etc.
    4) Find open source search engine software to index your data, see sourceforge.net (or look for recommendations on /. ... a future Ask Slashdot, anyone?)
    5) Place server in a secure co-location facility
    6) ???
    7) Profit.

    1. Re:Roll your own solution by HTH+NE1 · · Score: 1

      Isn't that just another way of giving the co-location facility the method to decrypt your data and search terms? They have physical access to your hardware, even if you do encase it in Gloopstik®.

      --
      Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
    2. Re:Roll your own solution by JBdH · · Score: 1

      If you store all harddisk data on your co-located server encrypted and use the Intrusion Detection feature present on most servers to instantly erase any unencrypted data from memory, you're pretty safe I guess.

  11. searching encrypted data by Bert690 · · Score: 1
    There are techniques to do this but none have made it out of academia. Most are quite inefficient and support very restricted querying models. Here's one paper that claims their methods are "practical" (but always keep in mind that academic claims of practicality should always be taken with a grain of salt):

    http://www.cs.berkeley.edu/~dawnsong/papers/se.pdf

    1. Re:searching encrypted data by Bert690 · · Score: 1
      http://www.cs.berkeley.edu/~dawnsong/papers/se.pdf

      Doh! Just noticed you already are aware of that particular work. Anyway, congrats, you're already aware of the state of the art!

  12. Hand over the keys by Anonymous Coward · · Score: 0

    If you want the server to do a meaningful search, you have to hand over the encryption keys. Otherwise how is the server knowing what it should look for? It is the same situation as having a safe in a bank with a secret code, and then asking the bank to look in the safe for you. You have to provide them with the code, otherwise they can't open it. Since you mention at the same time you don't trust the server (bank), and want it to peek in your data (safe), how can you simultaneously ask them to do exactly that?

  13. Slight correction by davidwr · · Score: 1

    As pointed out by others, the index can be stored encrypted, then downloaded locally. However, this means the index is what is being searched, and it - the item being searched - is in fact not being searched on the server. In practice this has value, but it's not what this thread asks.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  14. Easy solution by junglebeast · · Score: 1

    " I send to the remote server encrypted data and later an encrypted query (the server cannot decipher them), and the server sends me back a chunk of my encrypted data stored there â" the result of my encrypted query. " There are only 2 ways in the universe for accomplishing this, but they are both simple: Method 1: Send entire database to user for any search query. The results are bound to be in there somewhere! Method 2: If the server is powerful enough, brute force crack the encryption scheme, find the results, then re-encrypt it and send back to user. Anything else would violate the definition of full encryption. Of course, you could have "partial" encryption with unencrypted meta-data that the search is performed on.

  15. GNUnet by diablovision · · Score: 1

    It's been done. GNUnet.

    --
    120 characters isn't enough to explain it.
  16. A guy walks into a bar... by skathe · · Score: 5, Insightful

    ...and when the bartender asks him what he would like to drink, the guy says "I want what I always get, but I don't want you to actually pour the drink, just help me search behind the bar for the liquor I want, and the hand it to me without seeing what it actaully is, and charge me correctly without any knowledge of what it is you just helped me find."

    1. Re:A guy walks into a bar... by richie2000 · · Score: 2, Interesting

      But... That's not a valid car analogy since you're not allowed to drink and drive.

      --
      Money for nothing, pix for free
    2. Re:A guy walks into a bar... by HTH+NE1 · · Score: 3, Funny

      Not good enough. The bartender could audit his liquor to see how much of each bottle was dispensed.

      This is why when they do this sort of thing, the gentleman just serves the bartender a National Security Letter and takes more than what he wants without paying a dime.

      --
      Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
    3. Re:A guy walks into a bar... by Anonymous Coward · · Score: 1

      Fine, here's your car analogy:

      An english braille reading deaf and blind taxi driver gets into his car at the airport and puts the in service light on. A foreign passenger gets in and says "daba dooba dooba da" and expects to arrive at his requested destination; regardless of the fact that the taxi driver is blind, deaf, and doesn't can't comprehend dabadoo.

    4. Re:A guy walks into a bar... by maxume · · Score: 1

      Sure you are. That you open yourself up to various legal consequences is quite different from not being allowed to do it.

      --
      Nerd rage is the funniest rage.
    5. Re:A guy walks into a bar... by dimeglio · · Score: 2, Interesting

      ...if I tell you a story in French and you don't understand it, you will have no idea what I told you and will not be able to answer questions about my story. However, if you are able to memorize all I told you phonetically I can ask if I said a word or not just by the sound. Yet you don't know exactly what I asked for, nor the meaning of the answer but you are able to answer that question since it doesn't imply meaning.

      So a possibility for the OP would be to store the information in a language unknown to anyone but to the poster. This language would need be compatible with the search algorithms used by Google. Not very practical but maybe someone can build on this.

      --
      Views expressed do not necessarily reflect those of the author.
  17. not impossible; not easy by Lord+Ender · · Score: 2, Interesting

    Keep the files on the remote server, encrypted. Keep the search index in a database, encrypted in chunks. Rsync your search database between your local machine and the server. Actual searches of the databases would be done locally.

    Result: terrible performance whenever you access your data from a new machine (must sync entire search database). Good performance the rest of the time. Remote server never sees anything but cyphertext.

    --
    A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    1. Re:not impossible; not easy by maxume · · Score: 1

      This is a decent way to make searching the encrypted content easier, but it doesn't offload any processing to the server (any indexing work would need to be done on a client with access to the data, prior to encryption (or after decryption)).

      --
      Nerd rage is the funniest rage.
    2. Re:not impossible; not easy by Lord+Ender · · Score: 1

      Yep. But the only agent which could possibly index your data is one that has access to your data in cleartext: you. That job can't be offloaded to the server while still satisfying encryption requirements.

      --
      A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    3. Re:not impossible; not easy by maxume · · Score: 1

      I can't see how to do it, but I'm not ready to say that it can't be done (in any case, my saying so wouldn't mean very much).

      --
      Nerd rage is the funniest rage.
  18. Seriously, how did this post get green lit? by Anonymous Coward · · Score: 0

    Is there no moderation?

    1. Re:Seriously, how did this post get green lit? by KDR_11k · · Score: 0, Offtopic

      You mean how did this post get filed under Ask Slashdot instead of Humor?

      --
      Justice is the sheep getting arrested while an impartial judge declares the vote void.
  19. Now there's an oxymoron if there was one by DaleGlass · · Score: 1

    There's plenty meaning that can be derived from just filenames.

    Does it really matter that Google or whoever can't see the exact text or images, but has enough information from filenames, tags and descriptions to accurately find out what kind of furry porn you like?

    People who encrypt their data often don't want to disclose even what kind of content they have. Knowledge of what sort of porn is there, or that you're having an affair, or private internal company data are things that can be disclosed from just knowing document titles without having to even look at the exact file.

    The solution to this is to take Google out of the equation. Encrypt your computer's hard disk, encrypt all your mail, build your own search database that will be stored on the encrypted disk, and search that.

    1. Re:Now there's an oxymoron if there was one by Anonymous Coward · · Score: 0

      File names aren't really part of the problem here. It's trivial to set up a hash table to hide them.

    2. Re:Now there's an oxymoron if there was one by DaleGlass · · Score: 1

      File names aren't really part of the problem here. It's trivial to set up a hash table to hide them.

      Well, and what are you going to search by, then? The way I see it, the way to do this is to encrypt the content, but provide unencrypted, searchable metadata.

      What I'm saying is that the metadata often contains plenty private data. People want to hide the fact they have porn at all. Making it so that a porn collection is searchable but not viewable doesn't really help much. Mails with titles like "My wife will come home late tomorrow, let's meet" contain enough suspicious data right there that hiding the text doesn't do much good.

  20. I'm confused... by Manip · · Score: 1

    So you either want to:
    - Decrypted
    - Search

    If so, then just mount an encrypted drive and put the Search Index on the drive its self... Basically any encryption filter driver will do the mounting for you (Windows and Linux ship with these) and any old Search Software will work for the searching, just move the index.

    Or you want to:
    - Search Encrypted Content
    - For other encrypted content (or decrypted content)

    In either case this isn't possible. At least assuming you're using a Crypto algorithm written in the last thirty or so years then it won't work. Even in World War 2 they had encryption that would make this harder than just decrypted it.

  21. I don't understand... by dschuetz · · Score: 2, Funny

    ...isn't this easy?

    Plaintext: "Attack at dawn"
    Ciphertext: "lkaoiuast98u;aw"
    Search query: "oiua"
    Result: "lkaoiuast98u;aw"

    What could be simpler?

    (no, I'm not an idiot, this is a joke.)

  22. My way of tackling the problem... by migarg · · Score: 1

    would be to first encrypt each document word-by-word (this can lead to really big documents because of paddings), then the client would transmit the document together with the encrypted words as plain text. In this way, the search engine indexes meaningless words which points to the encrypted documents (you can use two different algorithms and/or keys for word-by-word encryption and for documents). For searching your client encrypts the keywords (asking for the encryption key) and once you have a link you have to decrypt the document.

    There should be some weak link in this chain, but I don't find any: be the first to claim my two cents.

    1. Re:My way of tackling the problem... by Anonymous Coward · · Score: 0

      Yep. Weak link is that identical plaintext will always produce the same cyphertext. Very bad for secure encryption.

      One very common way to avoid that little problem is called Cypher Block Chaining. In a nutshell, each block of plain text is exclusive ored with the previous generated block of cyphertext. And after the XOR operation, it is then encrypted. This works perfectly for the 2nd to the last chunk of data. But for the 1st piece, there isn't a prior piece of cyphertext to XOR with. The solution is to create what's called an Initial Vector which acts as the 1st piece of cypher text. This initial Vector is usually randomly generated. Using a randomly generated initial vector and cypher block chaining, the exact same message may be encrypted with the exact same key and result in a HUGE number of different appearing cyphertexts. For a cypher with a block size of 8 bytes, there can be 2^64 different cyphertexts that all represent the exact same plaintext message. And none of the cyphertexts will look anything like each other except for their all being the same length.

      And you want to index this cyphertext how? And obtain a meaningful search?

      Nope. Not gonna happen.

    2. Re:My way of tackling the problem... by ewanm89 · · Score: 1

      Cryptographically hash word by word over encrypting them anyway, then one would have to find the same collision. Problem with this is the possibility of false positives (hey at least I'm not decrypting and searching all the plain text...) All one would have to do is search the hash index by using the same algorithm on the search terms. Might be noted that with such an approach, generating the index will take a long time and must be done somewhere with the plaintext content. And even if we do discard repeated word hashes, for a couple of documents the index likely to be bigger than the documents themselves. At a few hundred documents it could work...

  23. Anon. networks are something else by davidwr · · Score: 1

    What this thread is about is "I have a file that is secret. I want to encrypt it into an opaque, un-encryptable-without-the-key blob. I want to upload it to a search engine. I want to do searches against it."

    The answer is "By definition, it can't be done, not in the way you want. If it's transparent enough to search, it's no longer encrypted enough to be called encrypted. Other solutions, such as using indexes, may provide some of the practical benefits you want, but they are not without risk."

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  24. querying encrypted data howto by burnin1965 · · Score: 2, Interesting

    As long as your query looks something like this...

    SELECT * FROM mydata WHERE stuff LIKE '%YToyOntzOjc6InBhY2thZ2UiO3M6MjM5OiKyKHPh9ZawDX6KyA62cMd6p+mjBybGwJyCaNfFb7S.........

    Seriously though, if I understand your objective I think it would be feasible to develop something like that, but I don't think its something you could integrate into Google's search services unless they added something on their end.

    You could pass a decryption key along with your query and the server would then decrypt records as it performed the search. It would be very resource intensive.

    As an close example, I have a web based password storage application in which I did not want to keep the encryption keys on the same server as the password database. So I generate a key with which to encrypt the records and the user keeps their key and must supply it every time they want to decrypt a record. I don't go so far as to enable searching of the encrypted data, I have a description field specifically for that purpose. The web application is called Passbox and is written in PHP.

  25. What an oxymoron! by hesaigo999ca · · Score: 2, Interesting

    Yeah, Id like my cake and eat it too!

    The only way this could work is if you has tags in the meta header of the encrypted file
    telling you that yes I am encrypted, but I have an image in me or my encrypted data is of the type accounting.

    This might work for indexing searches where you want to be able to return all the files on the pc (encrypted or not) that are images or etc...

  26. The "editors" are idiots. by Anonymous Coward · · Score: 0

    I'm surprised this wasn't kdawson's doing.

  27. The Hack you're looking for.... by Anonymous Coward · · Score: 0

    With any strong encryption, the server's copy of the data will be unsearchable.

    A solution provider like Google could:
    Write an AJAX app. that index the data before you send it, and then send the data and its encrypted search terms to the server to store. This will let you encrypt your search terms (like a "very well distributed hash") and have the server return all the documents that match your keys. I have as yet to see this done well.

    Or, you can https to a server you have placed in a colocate, hand the web page your credentials and have it mount an encrypted growable volume of your data that you can act on with Perl and flat indexes. A serious Linux Hacker could put this together for you as a couple weeks work. I've done this with one of my servers online, but ultimately it proved easier just to ssh to the box mount the encrypted volume with a single command and grep for the files of interest. Command line affection is not a disease.

    Good luck.

  28. Encrypted blob in cloud, unencrypted index locally by koick · · Score: 1

    Just throwing out an idea for an implementation:

    The uploaded blob to the cloud is encrypted. But there resides a local index for searching it.

    I haven't had a need for this (as I inherently don't trust the cloud) but if someone knows of this type of implementation perhaps it's enough for the poster.

  29. Easy by Smallpond · · Score: 1

    Randomly say that you found or did not find the search pattern. Since you're not decrypting it, nobody can tell if you're lying.

  30. This seems obviously impossible but it isn't by Anonymous Coward · · Score: 1, Insightful

    This seems obvious impossible, but it isn't. The problem, of course, is in how the server can perform a search when it isn't even able to decrypt the message telling it to do a search.

    However, there is nothing inherently impossible in defining an encrypted datastructure and an algorithm where you can perform computations on the *encrypted* data, without having any idea about what it is you are computing. There is no reason that you need to decrypt data before you can do computations with it. It just needs to be the case that when you perform an operation on the encrypted data, some predictable other operation happens on the data inside the encryption. The result of this encrypted computation will then be something still encrypted, which can be sent to the client who can then decrypt it and find inside the result of his query.

    So it isn't obviously impossible. In fact the theory of multiparty computation makes it clearly possible, though the overhead of doing it that way would probably be too high.

  31. Ask the NSA. by fahrbot-bot · · Score: 1

    I'm sure they copied and decrypted the data when you uploaded it.
    (This is why I wrap all my data in tin foil.)

    --
    It must have been something you assimilated. . . .
  32. ROT26 by davidwr · · Score: 2, Funny

    I prefer ROT26. It's got built-in steganography to boot.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    1. Re:ROT26 by Andy+Dodd · · Score: 1

      ROT26 is too weak. You need to be using ROT104 or higher.

      --
      retrorocket.o not found, launch anyway?
  33. It is possible to a certain extent by goombah99 · · Score: 2, Interesting

    There are encryption algorithms that allow addition. That is, the sum of two encrypted messages is an encryption of the sum. I've forgotten how these work exactly, I think they are some many to one mapping, and the addition operation is not simply adding the encrytped numerical representations.

    I came across these when looking at voting systems that allow N distributed people to vote in a way that sums the result before it is decrypted rather than decrypting to do the sum.

    Anyhow what this means is that is possible to do certain operations on a remote database, like sum a column, without the database knowing the result and without transmistting anything additional information inbound or outbound.

    You could presumably have your data stored in many forms on the database, each form suited for one type of query. Then you just query the approriate form to perform the operation of interest.

    I'm reasonably sure there is no way to perform very high order operations that one might typically do in a relational database however.

    --
    Some drink at the fountain of knowledge. Others just gargle.
    1. Re:It is possible to a certain extent by tomtomtom · · Score: 1

      Encryption systems with this ability are called homomorphic. Systems where this can be achieved efficiently include ElGamal (in which it is possible to compute an encryption of the product of two plaintexts given only two ciphertexts) and the Paillier system (in which it is possible to compute an encryption of the sum of the two plaintexts given only the ciphertexts).

      As you say, this property is pretty useful for applications such as secret ballots (when combined with distributed secure computation protocols which are mostly derived from this).

      The difficulty with using this approach to database operations is that it is likely to involve transmitting an awfully large set of data which sort-of negates the point again as others have said about the papers cited in the original question.

  34. That's because they don't encrypt the filenames. by alanfairless · · Score: 2, Insightful

    And they can't search inside your documents.

  35. Re:Encrypted blob in cloud, unencrypted index loca by dalhamir · · Score: 1

    damn, beat me to it. only efficient way to do it. Basically, you'd be doing your own searching, and not relying on google's search algorithms.

  36. There is a way, kind of: PIR by Naerbnic · · Score: 2, Informative

    There is a cryptography technique called Public Information Retrieval which allows you to do just that: Send an encrypted query to a server, let it perform some operations on your behalf, and send you an encrypted query result. The server neither knows the contents of the encrypted data, nor the content of the query, but you have your result nonetheless.

    The intuition is that there exists a sort of "black-box" operation which some cryptographic techniques can use. For example, if I have two encrypted bits a and b (where I can't tell what a and b actually are), I can still perform the operation a xor b. The result is encrypted, and I don't know the actual operands or the result, but I know that what came out is indeed the encryption of the xor of the encrypted bits. Such cryptosystems are forms of "Homomorphic Encryption".

    Using this, we can then give the server a search term thus encrypted and, using the black-box opertaion, have it do some set of operations which will reveal the result. The server will execute the exact same set of operations independent of the search term, so it knows nothing (and needs to know nothing) of the search term contents. Of course, this implies that the server has to operate on every element of the encrypted data to do its job, but that's the fundamental tradeoff. If you're willing to accept that, and the additional computational overhead, you can design such a system.

    --


    So there I was, juggling apples and small animals, when I accidentally bit into the wrong one...
    1. Re:There is a way, kind of: PIR by Qubit · · Score: 1

      Of course, this implies that the server has to operate on every element of the encrypted data to do its job...If you're willing to accept...the additional computational overhead, you can design such a system.

      Where's Bruce? He has the right combination of math and cs theory to spout off some usefulness on this thread :-)

      Anyhow, let's think about this plan:

      Let's say we store n chunks of encrypted data on the server d_1, d_2, ..., d_n. If they were PDFs, we could just store a corresponding text file for full text search on our local machine t_1, ..., t_n, which could be much smaller. If space were at a premium, we could even store each t_i (encrypted) on the server as well, but in order to get to the text, we'd have to download them.

      Now it sounds like someone is suggesting that if (and this is a big if) we know our search domain before we upload the data, we can encrypt a special search table T along with our data and upload that as a separate d_i on the remote machine.

      For example, if we have 30 documents and know that the word "tenacious" appears in docs 2, 15, 18, along with the rest of the d_i we can create T:
      tenacious (2 15 18)

      Then we encode each field separately to get:
      xxxxx yyyyy

      Allowing us to use the same encoding on our local machine to search for "tenacious", which is encoded to "xxxxx", which we pass to the remote machine to look up in our table and return to us.

      Of course, once we decode "yyyyy" and get (2 15 18), we're going to reveal information to an attacker that "xxxxx" and "yyyyy" relate to d_2, d_15, and d_18. We could randomly select a few more documents to pull down to disguise our search, but that only protects us if (1) we choose enough random data at the same time and (2) we never perform the same search twice. We could keep track (on the local machine) of what docs we request for a given search, but then we're storing data on the local machine, which we wanted to avoid.

      Even putting these hurdles aside, this only works if you set up the search tables in advance. This might work well in practice if you basically store full text/directory structure locally. Then you can just search the local database and the only information you reveal is what d_i blocks you accessed and when.

      But if you want to do other types of searching on the data, you'll have to download the entire dataset first because you don't trust the remote machine.

      --

      coding is life /* the rest is */
    2. Re:There is a way, kind of: PIR by sjhs · · Score: 1

      http://en.wikipedia.org/wiki/Homomorphic_encryption:

      Homomorphic encryption schemes are malleable by design and are thus unsuited for secure data transmission.

      http://en.wikipedia.org/wiki/Malleability_(cryptography):

      security against adaptive chosen ciphertext attacks (CCA2) is equivalent to non-malleability

    3. Re:There is a way, kind of: PIR by dimeglio · · Score: 1

      Brainstorming here...

      There's always steganography. You hide your AES encrypted data inside another container which is itself searchable. You'll need to develop clever meta data for the images which would help you search for the contents without revealing the real intention of the search. For example, you hide a letter to your boss in a jpg of man reading a letter. The meta data could say: boss, letter, new project, reading.
      Well, that may sound like a lot of work, using Google itself for find the source images and a bit of scripting and tagging might make it doable.

      --
      Views expressed do not necessarily reflect those of the author.
    4. Re:There is a way, kind of: PIR by philosopher3000 · · Score: 1

      Wouldn't that set up require the searcher to share a private key with the server in advance, and send that same key (encrypted with a public key) every time? Essentially negating the ability for blind searches, because the server uses the private key to break the private key encryption locally, then uses the public key to return the results, which can be viewed by the searcher.

  37. Have you tried PGP NetShare? by Anonymous Coward · · Score: 0

    I played around with it and I believe with some more time and effort it could have worked. Wasn't that concerned about data security however.

  38. Impossible? Not true by ccleve · · Score: 1

    It's very possible to do this.

    The trick is that search engines deal with symbols, not necessarily words or characters. If you change the words and characters to different symbols then you're set. Imagine a dictionary of words that associated each word with a number. You keep the dictionary and don't give it to the vendor. You just give the numbers, and send your query in numbers. It works.

    This particular scheme wouldn't be very secure, but it easy to imagine better ones.

    Here's what you need: a search engine that allows you to modify documents as they go into the index, and also allows you to specify custom tokenizers, morphological analyzers, and whatnot.

    The search engine I developed does this. http://dieselpoint.com/

    1. Re:Impossible? Not true by Hognoxious · · Score: 1

      Imagine a dictionary of words that associated each word with a number.

      You'd have a 1:1 substitution cypher. Those are rather easy to break.

      The better ones you imagine do exist, but they don't obey the 1:1 mapping, so you couldn't use your dictionary.

      Also, you'd need to index every word of every document in advance. Not very efficient.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    2. Re:Impossible? Not true by Anonymous Coward · · Score: 0

      Imagine a dictionary of words that associated each word with a number.

      You'd have a 1:1 substitution cypher. Those are rather easy to break.

      The better ones you imagine do exist, but they don't obey the 1:1 mapping, so you couldn't use your dictionary.

      Sure you can. So long as the tokenizer used at index and at query time can get a stream of numbers, you can use any encryption scheme you want.

      Also, you'd need to index every word of every document in advance. Not very efficient.

      Umm... that's how search engines work. Almost all of them. Even Google. The words go into an inverted index.

  39. Privacy enhanced databases by peterwayner · · Score: 1

    This is a great challenge and an active area of research for some time. Many researchers would like to build databases that protect the users without creating some huge pile of aggregated personal information.

    Encrypting the data at the client is a good solution. I've posted several good case studies from my book, Translucent Databases .

    Here's what I wrote for a library and here's a case study of helping an online store.

    Let me know if you have questions or suggestions.

  40. Windows Search 4.0 by __aamisb9940 · · Score: 1

    ...has an option to index encrypted files.

  41. create your own search index. by Anonymous Coward · · Score: 0

    you need an search index on that server. Attach to that server via ssl, query it using encrypted text. that text will be decrypted and processed via the index on that server. Results are encrypted and sent back. You then unencrypt your results.

  42. Unfortunately SharePoint provides the equivalent by Anonymous Coward · · Score: 0

    SharePoint can prevent even server-admins form accessing the uploaded/stored data, while still allowing users/groups with authorization to the data to search it. I know this is a missing feature in Google's Mini/Appliance, and one of the reasons corporations have a problem with Googleâ(TM)s solution.
    Some other search providers have similar authorization-based solutions, which indirectly fulfills your need. Be wize.

  43. Anonymous Coward by Anonymous Coward · · Score: 0

    Freenet uses a search feature that searches encrypted data.

  44. You need a private VM or a hosted machine by wiresquire · · Score: 1

    As pointed out above, if the data is encrypted, the service can't search on it.

    So:
    - you get a VM or a hosted machine that you have complete control over.
    - You set up all your encryption as necessary, eg encrypting the file system. SSL to the machine, etc
    - You set up a search system, eg lucene, or maybe database as SQL queries are needed or whatever.
    - Profit(?)

    Of course, you could do all the same in-house as well, without the need for encryption etc.

    ws

    --

    So does Anonymous Coward have good karma?

  45. I put something similiar to this together myself.. by airjrdn · · Score: 3, Interesting

    But it may not be everything you're looking for. My requirements were:
    1 - Mask the filename
    2 - Encrypt the contents
    3 - Add recovery data in case the file got damaged
    4 - Ability to view unmasked filename from web

    I put together a batch file I could drag/drop multiple files onto that used WinRAR to compress the files (individually), with encrypted filenames, a password (of course), and included archive recovery data. It then used ReNamer to encrypt the .rar filenames. After that, I simply FTP'd the files to the server.

    I had a webpage that would accept a password, and unencrypt the filenames so they were viewable in readable form on the page. Each one was a hyperlink. There was an extra step required if you wanted the downloaded filename to be unencrypted as well.

    After uploading 115G or so, my host alerted me to the fact that they didn't allow me to keep offsite backups there. :) So in the end, I'm not even using it at the moment.

    My solution didn't allow me to search within the files, but it did allow me to store files on the server that they had no way of viewing the contents of, or guessing the contents of based on filename.

  46. Not really feasible by gweihir · · Score: 1

    There are some solutions for this. I think the first appraches were called "Iraiksan". However there is a massive performance penalty so you are unlikely to find this offered anywhere. Better keep metadata on your local machine and search that.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  47. CONFIRMED: You are missing something. by mcrbids · · Score: 1, Insightful

    You say 'SSL only encrypts the transport' as if that means something. What is a file if it's not a way to transport information from the file writer to the file reader?

    I use SSL daily to encrypt files with keys to be stored for later retrieval by the intended recipient. I think you are confusing SSL (the ability to assymetrically encrypt data) with HTTPS (a use of SSL to encrypt HTTP data transfers)?

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
    1. Re:CONFIRMED: You are missing something. by Warped-Reality · · Score: 0, Offtopic

      SSL = secure sockets layer

      the ability to asymmetrically encrypt data = public key cryptography (See RSA, El Gamal, NTRU, etc)

      --
      This is not the greatest sig in the world, no. This is just a tribute.
    2. Re:CONFIRMED: You are missing something. by mcrbids · · Score: 2, Interesting

      Sure the *NAME* is "Secure Sockets Layer", and perhaps that was what it was originally developed for, but it's just wrong to say that it can't be used otherwise, and/or that it only encrypts data "in transit", not on a server. Take a look at this:

      http://us2.php.net/manual/en/function.openssl-public-encrypt.php

      Here's the use of SSL functionality without (ahem) a socket. Right from the docs:

      This function can be used e.g. to encrypt message which can be then read only by owner of the private key. It can be also used to store secure data in database.

      I routinely use SSL to sign files in order to prove whodunnit. This information is stored alongside the signed document. Whether it's transported subsequently is inconsequential.

      --
      I have no problem with your religion until you decide it's reason to deprive others of the truth.
    3. Re:CONFIRMED: You are missing something. by Warped-Reality · · Score: 1

      ah, i was commenting more about equating SSL with PKC than the overall meaning of your comment.

      --
      This is not the greatest sig in the world, no. This is just a tribute.
    4. Re:CONFIRMED: You are missing something. by Nikker · · Score: 1
      I think the point of "the Transport" parent refers to is that you do indeed need a key to access the data but once the data is on your local drive anyone with access to that data can read it with out the keys. So if you were to use scp to do a backup of non encrypted data the result on the server would be unencrypted. Unless of course you are capturing the raw encrypted session on the client and you write software for recreating the data structure from a raw ssh session.

      I would have to say to the OP though to encrypt the directory structure separately with different keys and store it locally with version information, size change (delta), owner, etc... This will benefit in a few ways.
      • First the digest data of the structure is never publicly exposed and as well keys to either the digest of the raw data or the data structure is never transmitted over a network since you can view the contents decrypting the data locally determine the file you require using that data and download the encrypted digest over ssh/scp and again decrypt it locally. This also helps to prevent issues of having your data taken / hacked by some one via the storage center since you are strictly storing raw data that machine will never touch the keys.
      • Since you have a local copy you can access it and create it much easier and securely. You could even write a simple https page that could query the data and never have your queries exposed to an external network
      • You can chose the cipher with out any setup on the storage server, this limits the exposure to any information about version / type of cipher used to create the digest

      In short this method only really gives up to brute force attacks and selecting a good cipher will help.

      --
      A loop, by its nature, continues. If that didn't make sense, start reading this sentence again.
  48. Because he, like most Slashbots, is a cretin. by Anonymous Coward · · Score: 0

    God damned arrogant know-it-alls.

  49. wuala anyone? by Anonymous Coward · · Score: 0

    how about wuala (http://www.wuala.com)?

  50. not hard by zogger · · Score: 2, Interesting

    Just use a book (or multiple books) code cipher for your index. You don't need to remember a thing beyond which books and what your key starting number is, the pattern. And if someone is in your house throwing all your books at cracking the remote server, you are already screwed and have much bigger problems, such as they probably already installed a keylogger on you. If you are that much of a target for someone to take that much interest....time for plan B or C then, involving plastic surgery, new ID and some nation where there is no extradition treaty 0_o

  51. Do this maybe? by pentalive · · Score: 1, Insightful

    1 Encrypt the file (or record for databases)
    1.5 (for a database) Encrypt the key fields each separately
    2 Encrypt the file name separately
    3 store on server

    To search for a file:
    1 Encrypt the search criteria (file name or key value)
    2 search for encrypted thing on server
    3 Retrieve matches.

    1. Re:Do this maybe? by SpazmodeusG · · Score: 1

      JLAN is a java application that does just this.
      It creates a filesystem that can be stored on a database and can be encrypted. That database can be remote or local. So all you need to do is pay a monthly fee for access to an online database and run JLAN.

    2. Re:Do this maybe? by timothyf · · Score: 1

      Except real encryption doesn't work this way. Almost all encryption contains a feedback loop in it where the results of the previous block of encryption is fed into the the next block of encryption--specifically to thwart statistical analysis of its contents. This is how you can encrypt something with a lot of repeating bits (like, say, a bitmap with large blocks of color) and still get something that looks like random noise out.

      Personally, I'd say that, almost by definition, any completely opaque encrypted blob is unsearchable by definition, though I won't discount the idea that someone smarter than I am could make it work. The scenarios for using that sort of tech don't seem all that compelling to me, though.

      http://www.cs.bham.ac.uk/~mdr/teaching/modules/security/lectures/symmetric-key.html - see the block cipher modes section for info on that.

    3. Re:Do this maybe? by SpazmodeusG · · Score: 1

      The following seems to work to me (assume you are using a customisable DB filesystem like JLAN i mentioned above)...

      When you do a search you send a SELECT * FROM table_entries WHERE MD5_hashed_filename = encrypted_filename.
      This returns an index to the encrypted database binary blob that contains that cluster that the file starts in.
      When you want a listing of a directory you do a SELECT * FROM table_entries WHERE MD5_hashed_directory_name = encrypted_directory_name and get a result similar to above.

      In the actual binary blob of the cluster there's a copy of the filename+directory string, file date etc. preceeding the data of the files and it is encrypted using a standard encryption algorithm rather than a hash.

      You are storing 2 copies of each filename string, once in the file table and again in the binary blob but that isn't much considering the data of the filename string is peanuts compared to the data of the file itself.

      This makes it rather quick to list directory contents (look up the entries for the directory in the MD5 hashed filetable then only decrypt the clusters that have data from that directory). It also makes it quick to search for files as well.

      The clusters are each individually encrpyted but their size is well above the limit where someone can guess the block and thus extract the key.

    4. Re:Do this maybe? by turbidostato · · Score: 1

      "When you do a search you send a SELECT * FROM table_entries WHERE MD5_hashed_filename = encrypted_filename."

      You are adding redundancy without benefit. The database is just an way to index information... but directories and filenames are just ways to index information too! So you have an indexing engine (the database) for another indexing engine (the filesystem layout) but you haven't started to get to the real stuff: all that indexing is for easy retrival of actual information, which you still haven't addressed. I.e.: it is not "give me all the files that start with 'a'" but "give me all files that contain information about 'x'". Since these files are still encrypted you still have no means to make such a search.

      If all you want is protecting your directory layout (directory and file names), simply cypher the whole filesystem and remotely mount it. You get free of the database step and are no worse about searching on the real contents which you still will have to do it client-side.

    5. Re:Do this maybe? by JWSmythe · · Score: 1

      In response to your sig.... 4 8 15 16 23 42

          1) Coordinates in the Central African Republic according to Google Maps

          2) Somewhere in the pacific according to Mapquest

          3) The mystery numbers on "Lost"

          4) The lottery numbers I lost with last week

          5) An arbitrary set of numbers to make people wonder what you're on about.

          6) 42 = the answer to the life, the universe, and everything. The others are numbers leading to that answer. To understand the answer, you must understand the question. For that, you'll need a much bigger computer. We can build it. We'll call it "Earth".

          7) As a single integer, the number of years old the earth will be, when it simply falls apart. :) (start singing REM "It's the end of the world as we know it")

          8) A phone number in a non existant area code.

          9) A local number in any of a number of countries, depending on the dialed country code.

          Did I get it right?

      --
      Serious? Seriousness is well above my pay grade.
    6. Re:Do this maybe? by Hognoxious · · Score: 1

      Almost all encryption contains a feedback loop in it where the results of the previous block of encryption is fed into the the next block of encryption--specifically to thwart statistical analysis of its contents.

      That's what I was just about to say, but you explained it better. Now you could do it on structured data, say a database, by encrypting each field in isolation. But I fail to see the use; why not just keep the index at home? And it wouldn't work for large amounts of freeform text for the reasons you mention. It might help if we knew what sort of data he's trying to store, but since the question is typically half-assed we don't.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    7. Re:Do this maybe? by Anonymous Coward · · Score: 0

      You can maintain that state client-side, merely using a remote file system to store encrypted files. The English language has about 20,000 words. Only about 4,000 are commonly used. This is not a particularly big data structure. Searching through binaries is pointless, so they can be ignored. Indeed, version control systems already deal with most of these issues, and one could be modified to support using a remote, encrypted back end using AES and an arbitrary transfer protocol.

    8. Re:Do this maybe? by pentalive · · Score: 1

      The truth is "out there".

    9. Re:Do this maybe? by pentalive · · Score: 1

      Oh, and about item 4, I would not take any airplane rides any time soon, if I were you.

    10. Re:Do this maybe? by JWSmythe · · Score: 1

      Ok, canceling my plane ticket now. :)

          Why does my radio keep picking up those numbers. Oh wait, I don't have a radio. Why do I keep hearing them?

      --
      Serious? Seriousness is well above my pay grade.
  52. Anyone ever used Wuala before by Anonymous Coward · · Score: 0

    This this is quite neat. Cross platform (Java Web Start), secure and free . You can donate local storage to gain more remote storage.

  53. Re:this is just DRM, correct? by Dare+nMc · · Score: 1

    exactly what DRM is, IE if I want to give out encryted data, to a computer/user that I don't trust, yet I want them/that to do useful stuff with my encrypted data, but never give full read access to my encrypted data.

    So a VM could be deployed on the remote server that would only allow your signed app to perform only the acts you allow on the data, and only allow your client to connect securely. This would provide the desired functionality. Same as DRM, the security is through obscurity at some point, since you must give the key to your data out in some fashion, but it can be hidden deep in a big program. So a true open source solution likely can't exist...

  54. Beta API to search across your data by Veni+Vidi+Dormi · · Score: 1

    This company has a Beta API to do just that I think.

    I know of some people that are using it for searching across anonymized medical data.

    BeliefNetworks Web Services API
    http://beliefnetworks.net/bnws/

    They have code samples too and a downloadable Java Library.
    http://beliefnetworks.net/bnws/examples.html
    http://beliefnetworks.net/bnws/security.html

    1. Re:Beta API to search across your data by sy5t3m · · Score: 1

      On that security page, there is only information about authenticating the calling user before doing a delete.

      Their actual service does not seem suited at all to encrypted data, as they are pulling keywords and using them to find related documents. If you could even find keywords in encrypted data, matches in other documents might not even decrypt to the same word.

    2. Re:Beta API to search across your data by Anonymous Coward · · Score: 0

      They have a white-paper out somewhere that makes it seem like it is an entirely secure system to input, eat and output.

      Again, it looks like a similar service if not exactly what the posted wanted.

  55. Fuse - EncFS by Anonymous Coward · · Score: 0

    EncFS (Fuse)
    or, if just trying to conceal data, make a disk image on your hosted environment and mount that.

    I use sshfs (again, fuse) to mount a remote directory, and then EncFS to mount an encrypted drive within the sshfs mount.

    Convoluted, but possible - unix only of course... :)

  56. zero-knowledge web applications by Anonymous Coward · · Score: 0

    Have a look at http://www.clipperz.com/about. The application knows nothing about you actual data but you can do stuff on your data, like searching (which is atm not possible, but would be).

  57. Kinda doable? by Anonymous Coward · · Score: 0

    Would this not be do-able right now today, in a sorta round-about way? You'd have to do the 'searching' locally, so if you were hoping to offload CPU use to this provider that's out.. but..

    If the provider gives you an NFS mount, and you create a file inside using the crypto filesystem, and mount it over the NFS mount -- I'm pretty sure the provider has no ability to know jack crap about your data, and you now have unencrypted data on your end you could hit with whatever search program/indexing program/whatever you want. Assuming it maintained its indexes in that mounted filesystem, you could mount it up anywhere, potentially even multiple places at once, to do your searches.. and unmount it when not in use.

    Maybe not the most elegant, and certainly loses the ability for the provider-end to do your search for you.. but maybe fits the bill?

  58. Encrypt the Index by hax0r_this · · Score: 1

    I assume FredFredrickson meant that the index would be encrypted.

  59. I don't think you can have it both ways by BattyMan · · Score: 1

    Either you send your storage provider clear data, in which case he can understand and work with it (including search through it), or you can send him (and ask him to store) encrypted data.

    One of the principal characteristics of (well-)encrypted data is that it is essentially random gibberish. Encrypting your search query won't somehow help him understand your encrypted data. The purpose of encrypting it is to keep (all) others out of it.

    Sorry.

    --
    Exceeding the recommended torque is not recommended.
  60. Or... just don't upload it. by Estanislao+Mart�nez · · Score: 1

    For FredFredrickson's scheme to work, whether the index is encrypted or not is actually irrelevant. The scheme relies on nothing more than the server not having any effective use of the index file. That can be achieved simply by not uploading the index to the server. The client would use the index locally to figure out which chunk of the encrypted file to request from the server, and request that.

    I can see two problems with that, though:

    1. It is not server-side search at all. All of the searching happens on the client. To put it this way: the only functionality that the server is giving you is random access to a blob of bytes.
    2. An attacker who has control of the server can log requests for chunks of the cyphertext. If the attacker can correlate this with other information that they've obtained independently, this could be useful for breaking the code, or at least for inferring some secrets.

      For example, the attacker can discover the frequency at which various chunks of the cyphertext are accessed; if this is a client information database, and the access frequencies can be correlated with independent knowledge of, say, how frequently you deal with your various clients, the attacker can formulate hypotheses about which cyphertext chunks have information about which clients.

  61. Why does the server need to perform the search? by DamnStupidElf · · Score: 1

    The server just stores a bunch of indexes into your data and searches them when you supply the keywords. It sounds like what you really need is an efficient index (it requires few reads to determine whether what you are searching for is there, or that it isn't anywhere). Then you can build and encrypt the index and store it online in chunks, and download the pieces of it that you need to search for your keywords, and then retrieve the encrypted data that the index entries points to.

    For instance, if you want to do keyword searches you build a word index from all the keywords in your documents and then put links to the documents into buckets for each keyword. You could make this relatively efficient by creating the following structure for each bucket: "document1,document2,document3,..." and then storing it by encrypting the structure and naming it with the encrypted value of the keyword. E.g. for "slashdot" create a bucket named "fynfuqbg" and in it store links to any of your documents containing the string "slashdot".

    To perform a keyword search, encrypt all the keywords separately and ask the online storage for all the files named with the set of encrypted keywords, and then once you have the index entries do a simple intersection on them to find links to documents containing all the keywords. If you want to support searching for variations in the spelling of a keyword, just generate and encrypt all the possible variations that you want to search for and see if there are index buckets for any of those variations.

    Obviously the online storage facility will know that you are performing searches, and can figure out what your most popular searches are in terms of the buckets you access, and they could statistically determine what likely plaintext keyword belongs to a bucket based on the common word frequencies in documents, and the general frequency of searches for particular keywords. One way to obfuscate your searches is to always include several requests for other blocks, using a statistical method to try to make all your searches obey a uniform distribution. Storing keyword buckets in a uniform size is also imperative to prevent statistical analysis as you build the index (otherwise watching index buckets grow would allow the online storage facility to associate the indexes that grew with recently added documents).

  62. Links to Articles by navhali · · Score: 1

    The problem is not that it is impossible, just that most current implementations are extremely slow. Song implemented ranged query over encrypted data on gMail and even with encryption accelerators the performance was low. Some more papers: http://www.springerlink.com/index/370086k273w1587t.pdf http://www.cs.berkeley.edu/~dawnsong/papers/rangequery-full.pdf http://www.springerlink.com/index/u2007h5706482j51.pdf There have also been some different multi-server database schemes that do the same thing, although, once again, due to performance and the cost of maintenance I do not know of any that have actually gone to market Hope those help. Hit me up if you want more info. -Nav

  63. FTP anyone? by Anonymous Coward · · Score: 0

    Think FTP... just with the contents encrypted. You still get to browse the files and know their names (e.g., it is searchable), but each file is encrypted for security. You even get the index of the current directory each time you connect, and the entire index if you want. Done.

  64. security is not easy by Bork · · Score: 1

    Been various ways brought out so i will add my comment also.

    If you are storing the files encrypted, that means you do not want others to be able to know the content; if its for some other reason you better rethink what you are doing.

    In being able to search the information means that the method used has the keys to be able to open up the file and pull out information (decryption). If only you have the keys, then only you can open up the files which means a third party will not be able to (Google).

    Some have put out the idea of a index that is stored along with the secured file - BAD idea! If you have any information about the contents of a encrypted file, you have just given a third party information on how to possibly get through the encryption, you have weakened the security.

    I have taken enough security courses to understand that unless you get the proper education in security, you will absolutely do the wrong thing when it comes to security. In fact in trying to secure it, may result in it becoming more vulnerable.

  65. Shameless Plug by lomiel · · Score: 1

    I run this company, so it is a shameless plug, but its the best solution imho.
    It is a disaster recovery data storage company that takes daily snapshots of business data, transmits them via ssh, stores them encrypted on our servers, and allows https access for customers to find what they need if they lose their data.
    Naturally, we dont give out a lot of tech specs, but we have a large client base in Australia, and have been operating on the same premise for the last 6 years.

    www.sns-storage.com

    Have a nice day :)

    Julian Field

  66. Look at JLAN by SpazmodeusG · · Score: 1

    I use JLAN for this. I have a virtual private online server that i don't have root access to. So i can't install FUSE.
    Instead i installed JLAN which is a user mode java application that stores your data either in a file, a set of files or in a database. I store the data in a database (my provider gives unlimited database access with the virtual private server subscription).
    JLAN outputs the data as either an FTP, NFS or SMB share/filesytem. So it doesn't create a filesystem like FUSE does but it is still trivial to get to the data either directly (//myhost/myshare) or via a permanent share->drive mapping. This is why JLAN doesn't need to have admin permissions to insall a filesystem driver on the remote server.

    It all works perfectly and it is GPLed these days.

    One great use for this is that for ~$10 a month you can get a virtual private server with shell access and database access. The shell access is all you need to run JLAN (it is a user mode application). Setup JLAN to store files in a database encrypted and share out those files as an SMB share. Let all your trusted freinds know the address and you all have a filesystem you can easily access remotely that is encrypted for a very low cost. Another thing you can do with the shell access is run a torrent program and set the download path as the path of the JLAN shared drive. That way all your torrents are stored in that filesystem.

    It isn't slow. In fact even with the database overhead of the filesystem i'm using it is still a faster than my 20mbps net connection. It is also blazingly fast to browse thanks to the database structure JLAN uses for the filesystem (the file table entries are in one area and typically get cached and the blobs of the actual data of the files are in another).

    1. Re:Look at JLAN by SpazmodeusG · · Score: 1

      Oh one thing i meant to add. You have to use SSH tunneling to get to the remote share. Not just because of security but also because Windows doesn't like using SMB on anything but the default port. It's only a minor inconvenience though.
      Tutorial with help on SMB -> SSH tunneling
      http://kign.blogspot.com/2008/07/accessing-smb-shares-under-firewall.html

  67. Yes! by NemosomeN · · Score: 1

    Encrypt all of your documents, word by word, with your private key. Each word would have to be in a separate file, named sequentially. If you need to search for a word, you sign your query, then search for that query. In short, this is a retarded idea, and even the best-case is garbage. And why wasn't this posted to Idle?

    --
    I hate grammar Nazi's.
  68. Not qualitatively different. by Estanislao+Mart�nez · · Score: 1

    The scheme you're proposing here requires the server have full understanding of an index that maps properties of interest (encoded as hashes) to the data items in the database (which represent files). This index says quite plainly that certain data items share certain properties with other data items (i.e., are both listed under the same hash). This reveals some information about the encrypted data that is subject to statistical analysis, and to correlation with other, independently obtained information.

    For example, if by social engineering I discover that your index is indexing last names of Americans, I can formulate hypotheses about which hash represents which name. By observing how the hashes cluster across the set of documents, I can further test that against information about, say, the last names of your contact persons at your various clients.

    Your proposal really isn't qualitatively different from a full-text index. The only difference is that the granularity of the index you're describing is coarser; instead of pinpointing the location of every individual word in the database that satisfy the search, the index might pinpoint largeish "files" that mention a certain last name. Makes it much harder to crack, sure, but the point is that the difference is quantitative, not qualitative.

  69. Along these lines, consider Bloom filters by Anonymous Coward · · Score: 0

    If you can boil down your query terms to presence of one or more search terms, and if you can efficiently generate the full set of terms for a region of plaintext (e.g. there are a finite number of whole words in a text and you only support whole word matching), then you could produce a Bloom filter bitmap representing all of the "present matches" for a given region. By introducing a secret key into the hash functions of the Bloom filter, you could enable someone to do the bitmap searching for you, without revealing what you were searching for (ignoring traffic analysis on populations of searches).

    Bloom filters also allow you to easily compose bitmaps, so if you can segment your plaintext to the smallest atomic search region, you can compute the bitmaps for all regions in parallel (or sequentially) and then build up a hierarchy of combined bitmaps for the powerset of regions (or some heuristic subset of the powerset).

  70. Semantic comparison across encrypted volumes ? by Anonymous Coward · · Score: 0

    Wouldn't that be the best way?

    Well. I have to wonder. If you have two encrypted volumes and compare semantic information across them . . . isn't this the 'dark veil' approach to doing full text searches across encrypted volumes?

  71. Re:I put something similiar to this together mysel by CXI · · Score: 1
    I had a webpage that would accept a password, and unencrypt the filenames so they were viewable in readable form on the page. Each one was a hyperlink. There was an extra step required if you wanted the downloaded filename to be unencrypted as well.

    If you had a web page on the host that would decrypt the file names (or files), they could have just stored a copy after your code generated it. Not only that, but they could have trivially captured any password you put into it. That's not a secure system at all if you assume a malicious host like the OP assumes.

  72. Amazon S3 + JungleDisk does this. by Anonymous Coward · · Score: 0

    We do exactly this with Amazon S3 & JungleDisk. All our files are encrypted on the S3 servers (including file names). Since JungleDisk is the proxy that handles the encryption/decryption and provides a mount point which my desktop search indexer has access to, I can use my normal desktop search to quickly find the encrypted data I want on S3.

    Of course, the search index is local and not handled remotely... but... meh. Works fine for our needs.

  73. ROT13 by bobbuck · · Score: 0

    Sounds like somebody doesn't know ROT13!!

  74. Encrypt Word By Word by kudBwrong · · Score: 1

    It's not possible to search for a keyword within a larger encrypted text without decrypting the text. So there have been numerous proposals for indexing methods with various pros and cons. Suppose we encrypt each word separately? "Beethoven" becomes "mxP370e8" If I want to search for "Beethoven" without letting Google know (put aside, for the moment, the objection that Google _already_ knows everything) I search for "mxP370e8" instead of "Beethoven", and my search returns a link to a word that is surrounded by other encrypted words, perhaps a file. It may be secure enough to let Google know that "mxP370e8" is the third word of a file of 18132 words, and that I searched for it. Encrypting word-by-word is vulnerable to statistical and traffic analysis, but there are ways to mitigate this, such as by using lots of salt to make all words the same size, changing keys for different files or parts of files (now there is more than one encryption that maps to "Beethoven") and so on. I think my basic point is that if you want to do what we normally think of as a full-text search, then each searchable word has to be standalone encrypted all by itself, if the third party is going to do the searching.

  75. Blinding by chkn0 · · Score: 1

    What you're looking for is called blinding.

  76. Encrypted Block Storage? by nghate · · Score: 1

    Hmm... You could think of using the remote storage as a block storage for encrypted data... might not be the most efficient, but it could get you the functionality you seek...

  77. A different approach by Anonymous Coward · · Score: 0

    You'll probably end up having to pick between something that's either CPU-intensive, bandwidth-intensive or storage-intensive...

    I don't think anyone proposed a solution similar to this, but you could send and encrypt two packs of data.. First you store your actual data, encrypted as you so wish.. then, you store your search data (also encrypted). Think like maybe what Google Desktop uses to search your data on your own computer.. It's smaller than the actual data, but it's still a different content.. Then when you need to search for something, you can either use your local copy (which is kept encrypted, too), or if you don't have it, you can download only the search data from the server, and search locally, only retrieving the data you want with the pointers you got from your search.

  78. It can't by SmallFurryCreature · · Score: 1

    The idea is idiotic and one of those crypto-idiot fantasies that the totally clueless and paranoid seem to have.

    About the only way you could do this is to control the server itself since then only person who knows about the encryption keys is yourself or rather your server.

    But the poster wants a third party to have his encrypted data, send him the key to that data and then open that data, look at it but not be able to look at it.

    This is DRM. It don't work. If I want to encrypt something I have several things.

    The sender, this entity MUST have the original data, the encryption key and the encrypted result.

    The reader, this person must have the encrypted data, the decryption key and can with those two, obtain a copy of the original data.

    In between is the messenger or untrusted party. The messenger should NEVER have the encrypted data and decryption key at the same time or they will be able to do what only the reader should be able to do.

    Tradiotionaly this means the sender and reader meet, exchange keys and then part. The sender then uses a messenger to send the encrypted data to the reader. The messenger does not have they key and so is safe. If said messenger turns out to be unreliable, you only loose the encrypted data, they key is safe with you and the reader.

    DRM fails, because it trusts the messenger but not the reader. DRM wishes to give the reader everything so it can read the message but not be able to read the message. This cannot be done and is the reason DRM fails.

    I seen some people be confused by SSH. SSH seemingly allows you to securely connect to a remote system without a seperate exchange of keys. The problem is that SSH doesn't allow that at all. If you just ssh to a remote system you are NOT secure at all. How do you know you are connecting to the system you are connecting to and not something else? You are trusting the messenger, the internet, to be trustworthy. SSH warns you about this, when you first connect to a system asking you accept the remote machines key, if you have NOT in a seperate communication verified that this key belongs to the remote machine, then you are gambling the internet is trusthworthy.

    Back to the system proposed. DRM's wet dream is to control the readers hardware so they can only read the message in a way that doesn't allow them to reproduce it. The Trusted Computing dream. If the whole end machine is encrypted, only the anolog hole remains.

    You cannot send a reader all the data they need to read the data but not be able to read the data for their own purposes. If I want google to search my email, they must be able to read my email.

    The idea to search in encrypted data is just plain silly. The whole point of encryption is to not be able to read in it. If you encrypt a piece of text in such a way that individual words can be encrypted the same as the same words in the document then you are asking to be cracked in no time.

    Consider how human usuable encryptions are attacked. By looking for often repeating encodings that might relate to common words. If you know the text is in english, then in an encrypted text "4 231231 421 4 534534 4" it would be fairly easy to figure out that 4 = a. Find more common words by statiscal analysis and then you only need to figure out the encryption that results in that encoding of a very short string and voila, you can decode eveything. Good encryption does NOT allow the same data parts to be encrypted the same.

    So the idea of sending encrypted words to seach for is idiotic. Even if you have the original data on your PC so you can create the same encoded data as on the server (but why then search on it remotly?) you still wouldn't be able to snip the bit you wanted out of it because the encryption shouldn't allow you to do that.

    The entire idea is idiotic.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

  79. Search the Freenet project. by Anonymous Coward · · Score: 0

    data is stored in an encrypted format but is searchable. I'm not sure how they do it but they have a good website that explains it all.

    FreeNet project.

    1. Re:Search the Freenet project. by ewanm89 · · Score: 1

      They use plaintext indices.

  80. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  81. Re:I put something similiar to this together mysel by Anonymous Coward · · Score: 0

    I was just giving him some ideas, not providing a perfect solution. I was on a shared host, so my files were mixed in with probably tens of thousands of other files.

    I think it's safe to say, you're never fully protected, you just take the precautions that provide the most safety, with the hoops of having to deal with.

  82. tahoe by Anonymous Coward · · Score: 0

    You should take a look at allmydata.com

    Behind the scenes, it's tahoe-lafs which is being used, and you can ask the hoster to join the cloud with your own nodes. This way, you can get both worlds.

    http://allmydata.org/trac/tahoe

  83. Apple Mail by fulldecent · · Score: 1

    Just a note, from the Apple Mail help pages. This application saves all your messages in encrypted format (if they were received encrypted). When the message is opened, a index is created and saved plaintext. This allows you to use spotlight on encrypted messages.

    I think it is reasonable to save plaintext indexes in this scenario.

    --

    -- I was raised on the command line, bitch

  84. Spideroak by TheKeyboardSlayer · · Score: 1

    Spideroak is the only company I know of that DOESN'T HOLD THE KEYS to your encrypted data. Even if they wanted to 'see' your data, they couldn't. https://spideroak.com/

    --
    Insert_Ending_Here
  85. I only know of one way... by Half-pint+HAL · · Score: 1

    When I want to encrypt my data securely, I cambio de langue beaucoup des tursan en gach frase. In this way, ich puedo tres cinnteach sein that bheil only a handful of people ann a soussteheneas nere mensajea.

    HAL.

    --
    Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
  86. Insider Trading is fun by Anonymous Coward · · Score: 0

    Try Memopal. It's a backup software that lets you search through your data while keeping it encrypted

  87. Host-proof hosting by Anonymous Coward · · Score: 0

    Here is a description of similar functionality/API: http://en.wikipedia.org/wiki/Host-proof_hosting .