Encrypted But Searchable Online Storage?
An anonymous reader asks "Is there a solution for online storage of encrypted data providing encrypted search and similar functions over the encrypted data? Is there an API/software/solution or even some online storage company providing this? I don't like Google understanding all my unencrypted data, but I like that Google can search them when they are unencrypted. So I would like to have both: the online storage provider does not understand my data, but he can still help me with searching in them, and doing other useful stuff. I mean: I send to the remote server encrypted data and later an encrypted query (the server cannot decipher them), and the server sends me back a chunk of my encrypted data stored there — the result of my encrypted query. Or I ask for the directory structure of my encrypted data (somehow stored in my data too — like in a tar archive), and the server sends it back, without knowing that this encrypted chunk is the directory structure. I googled for this and found some papers, however no software and no online service providing this yet." Can anyone point to an available implementation?
I thought that was what ssl was for.
Want Big Business out of government? Take away the incentive and start by getting government out of big business!
if the server cannot decipher the query it cannot execute it on a binary blob of encrypted data. FAIL.
It's not possible to do this even in theory, unless you're relying on very weak encryption. The point of encryption is that you can't infer anything about the contents. If Google was able to infer enough to give you meaningful search results (if for example each word was encrypted by itself, and you searched for the encrypted version of the word), they would therefore necessarily be able to know enough to perform a frequency analysis attack on your data and compromise it in no time flat unless it was a very small amount of data (thus meaning search isn't really of value anyway).
You'll find a similar problem plagues any attempt at searching. Searching requires a certain knowledge or meta knowledge of the material being searched; and that knowledge necessarily dramatically weakens your encryption.
Slay a dragon... over lunch!
Use an encrypted query to match against the encrypted text. The problem is, if the text is REALLY encrypted, then there shouldn't be enough information to do this - the encrypting of the original text should make it impossible to even match against it.
If it didn't, then an attacker who got hold of the encrypted text and some of your encrypted queries might well be able to mount an attack based on commonalities between the two.
A thousand pounds of wood moving at 300 feet per minute. Don't get in the way.
This sounds pretty easy,
a) obtain database, indexing tools, search tool
b) install on the machine and encrypt the entire hard drive with any of the many available whole-disk encryption tools
c) ssh in and run queries.
Nature journal lied in Britannica vs Wikipedia Ask to retrac
Unless you do the indexing client-side, and upload an index that's somehow encrypted...
I'm not saying I know how to do this, but it seems possible.
There's no -1 for "I don't get it."
Just to clarify the OP's idea. They want to store only encrypted data on the server, send only encrypted queries to the server(that the server can't even decrypt), yet they expect that the server will be able to send them back results. I don't think it can happen but surprise me.
The best I think you can do is store and transfer the data in encrypted form and put the indexes and any search logic on the client. Maybe the index could be stored on the server as well and synced to the client, but creating the index will require access to the plaintext.
Long answer: Nope
If the data is encrypted in independent "chunks" from which search terms can be built then this is trivial: You pre-encrypt your search terms and search for them. Searching a word ROT13-encoded document works this way, as each character is encrypted individually and an encrypted search term is made up of encrypted characters.
Once you get past this, it's no longer easy. You basically have to either make the term you are searching for look like all possible values of the encrypted text and return all matches, or decrypt the document somewhere.
If the encryption is good and any particular chunk, extract, or other slicing-and-dicing of the encrypted data without the key looks random, you are pretty much stuck with decrypting it somewhere.
The alternative is to store an index, or at least a list of keywords, in clear text. For example, a document describing how to build a nuclear bomb could have a list of 10 or 20 non-classified keywords attached to it to aid searching. But that's not what you are asking for.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
If you don't trust your data in others' hands, don't give it to them in the first place.
The (costly) solution: /. ... a future Ask Slashdot, anyone?)
1) Get a 1U server from ACME with appropriate hardware
2) Install favourite Unix-based OS, e.g. FreeBSD
3) Configure server with appropriate software, e.g. Truecrypt, SSH, etc.
4) Find open source search engine software to index your data, see sourceforge.net (or look for recommendations on
5) Place server in a secure co-location facility
6) ???
7) Profit.
http://www.cs.berkeley.edu/~dawnsong/papers/se.pdf
If you want the server to do a meaningful search, you have to hand over the encryption keys. Otherwise how is the server knowing what it should look for? It is the same situation as having a safe in a bank with a secret code, and then asking the bank to look in the safe for you. You have to provide them with the code, otherwise they can't open it. Since you mention at the same time you don't trust the server (bank), and want it to peek in your data (safe), how can you simultaneously ask them to do exactly that?
As pointed out by others, the index can be stored encrypted, then downloaded locally. However, this means the index is what is being searched, and it - the item being searched - is in fact not being searched on the server. In practice this has value, but it's not what this thread asks.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
" I send to the remote server encrypted data and later an encrypted query (the server cannot decipher them), and the server sends me back a chunk of my encrypted data stored there â" the result of my encrypted query. " There are only 2 ways in the universe for accomplishing this, but they are both simple: Method 1: Send entire database to user for any search query. The results are bound to be in there somewhere! Method 2: If the server is powerful enough, brute force crack the encryption scheme, find the results, then re-encrypt it and send back to user. Anything else would violate the definition of full encryption. Of course, you could have "partial" encryption with unencrypted meta-data that the search is performed on.
It's been done. GNUnet.
120 characters isn't enough to explain it.
...and when the bartender asks him what he would like to drink, the guy says "I want what I always get, but I don't want you to actually pour the drink, just help me search behind the bar for the liquor I want, and the hand it to me without seeing what it actaully is, and charge me correctly without any knowledge of what it is you just helped me find."
Keep the files on the remote server, encrypted. Keep the search index in a database, encrypted in chunks. Rsync your search database between your local machine and the server. Actual searches of the databases would be done locally.
Result: terrible performance whenever you access your data from a new machine (must sync entire search database). Good performance the rest of the time. Remote server never sees anything but cyphertext.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
Is there no moderation?
There's plenty meaning that can be derived from just filenames.
Does it really matter that Google or whoever can't see the exact text or images, but has enough information from filenames, tags and descriptions to accurately find out what kind of furry porn you like?
People who encrypt their data often don't want to disclose even what kind of content they have. Knowledge of what sort of porn is there, or that you're having an affair, or private internal company data are things that can be disclosed from just knowing document titles without having to even look at the exact file.
The solution to this is to take Google out of the equation. Encrypt your computer's hard disk, encrypt all your mail, build your own search database that will be stored on the encrypted disk, and search that.
So you either want to:
- Decrypted
- Search
If so, then just mount an encrypted drive and put the Search Index on the drive its self... Basically any encryption filter driver will do the mounting for you (Windows and Linux ship with these) and any old Search Software will work for the searching, just move the index.
Or you want to:
- Search Encrypted Content
- For other encrypted content (or decrypted content)
In either case this isn't possible. At least assuming you're using a Crypto algorithm written in the last thirty or so years then it won't work. Even in World War 2 they had encryption that would make this harder than just decrypted it.
...isn't this easy?
Plaintext: "Attack at dawn"
Ciphertext: "lkaoiuast98u;aw"
Search query: "oiua"
Result: "lkaoiuast98u;aw"
What could be simpler?
(no, I'm not an idiot, this is a joke.)
would be to first encrypt each document word-by-word (this can lead to really big documents because of paddings), then the client would transmit the document together with the encrypted words as plain text. In this way, the search engine indexes meaningless words which points to the encrypted documents (you can use two different algorithms and/or keys for word-by-word encryption and for documents). For searching your client encrypts the keywords (asking for the encryption key) and once you have a link you have to decrypt the document.
There should be some weak link in this chain, but I don't find any: be the first to claim my two cents.
What this thread is about is "I have a file that is secret. I want to encrypt it into an opaque, un-encryptable-without-the-key blob. I want to upload it to a search engine. I want to do searches against it."
The answer is "By definition, it can't be done, not in the way you want. If it's transparent enough to search, it's no longer encrypted enough to be called encrypted. Other solutions, such as using indexes, may provide some of the practical benefits you want, but they are not without risk."
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
As long as your query looks something like this...
SELECT * FROM mydata WHERE stuff LIKE '%YToyOntzOjc6InBhY2thZ2UiO3M6MjM5OiKyKHPh9ZawDX6KyA62cMd6p+mjBybGwJyCaNfFb7S.........
Seriously though, if I understand your objective I think it would be feasible to develop something like that, but I don't think its something you could integrate into Google's search services unless they added something on their end.
You could pass a decryption key along with your query and the server would then decrypt records as it performed the search. It would be very resource intensive.
As an close example, I have a web based password storage application in which I did not want to keep the encryption keys on the same server as the password database. So I generate a key with which to encrypt the records and the user keeps their key and must supply it every time they want to decrypt a record. I don't go so far as to enable searching of the encrypted data, I have a description field specifically for that purpose. The web application is called Passbox and is written in PHP.
Yeah, Id like my cake and eat it too!
The only way this could work is if you has tags in the meta header of the encrypted file
telling you that yes I am encrypted, but I have an image in me or my encrypted data is of the type accounting.
This might work for indexing searches where you want to be able to return all the files on the pc (encrypted or not) that are images or etc...
I'm surprised this wasn't kdawson's doing.
With any strong encryption, the server's copy of the data will be unsearchable.
A solution provider like Google could:
Write an AJAX app. that index the data before you send it, and then send the data and its encrypted search terms to the server to store. This will let you encrypt your search terms (like a "very well distributed hash") and have the server return all the documents that match your keys. I have as yet to see this done well.
Or, you can https to a server you have placed in a colocate, hand the web page your credentials and have it mount an encrypted growable volume of your data that you can act on with Perl and flat indexes. A serious Linux Hacker could put this together for you as a couple weeks work. I've done this with one of my servers online, but ultimately it proved easier just to ssh to the box mount the encrypted volume with a single command and grep for the files of interest. Command line affection is not a disease.
Good luck.
Just throwing out an idea for an implementation:
The uploaded blob to the cloud is encrypted. But there resides a local index for searching it.
I haven't had a need for this (as I inherently don't trust the cloud) but if someone knows of this type of implementation perhaps it's enough for the poster.
Randomly say that you found or did not find the search pattern. Since you're not decrypting it, nobody can tell if you're lying.
This seems obvious impossible, but it isn't. The problem, of course, is in how the server can perform a search when it isn't even able to decrypt the message telling it to do a search.
However, there is nothing inherently impossible in defining an encrypted datastructure and an algorithm where you can perform computations on the *encrypted* data, without having any idea about what it is you are computing. There is no reason that you need to decrypt data before you can do computations with it. It just needs to be the case that when you perform an operation on the encrypted data, some predictable other operation happens on the data inside the encryption. The result of this encrypted computation will then be something still encrypted, which can be sent to the client who can then decrypt it and find inside the result of his query.
So it isn't obviously impossible. In fact the theory of multiparty computation makes it clearly possible, though the overhead of doing it that way would probably be too high.
I'm sure they copied and decrypted the data when you uploaded it.
(This is why I wrap all my data in tin foil.)
It must have been something you assimilated. . . .
I prefer ROT26. It's got built-in steganography to boot.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
There are encryption algorithms that allow addition. That is, the sum of two encrypted messages is an encryption of the sum. I've forgotten how these work exactly, I think they are some many to one mapping, and the addition operation is not simply adding the encrytped numerical representations.
I came across these when looking at voting systems that allow N distributed people to vote in a way that sums the result before it is decrypted rather than decrypting to do the sum.
Anyhow what this means is that is possible to do certain operations on a remote database, like sum a column, without the database knowing the result and without transmistting anything additional information inbound or outbound.
You could presumably have your data stored in many forms on the database, each form suited for one type of query. Then you just query the approriate form to perform the operation of interest.
I'm reasonably sure there is no way to perform very high order operations that one might typically do in a relational database however.
Some drink at the fountain of knowledge. Others just gargle.
And they can't search inside your documents.
damn, beat me to it. only efficient way to do it. Basically, you'd be doing your own searching, and not relying on google's search algorithms.
There is a cryptography technique called Public Information Retrieval which allows you to do just that: Send an encrypted query to a server, let it perform some operations on your behalf, and send you an encrypted query result. The server neither knows the contents of the encrypted data, nor the content of the query, but you have your result nonetheless.
The intuition is that there exists a sort of "black-box" operation which some cryptographic techniques can use. For example, if I have two encrypted bits a and b (where I can't tell what a and b actually are), I can still perform the operation a xor b. The result is encrypted, and I don't know the actual operands or the result, but I know that what came out is indeed the encryption of the xor of the encrypted bits. Such cryptosystems are forms of "Homomorphic Encryption".
Using this, we can then give the server a search term thus encrypted and, using the black-box opertaion, have it do some set of operations which will reveal the result. The server will execute the exact same set of operations independent of the search term, so it knows nothing (and needs to know nothing) of the search term contents. Of course, this implies that the server has to operate on every element of the encrypted data to do its job, but that's the fundamental tradeoff. If you're willing to accept that, and the additional computational overhead, you can design such a system.
So there I was, juggling apples and small animals, when I accidentally bit into the wrong one...
I played around with it and I believe with some more time and effort it could have worked. Wasn't that concerned about data security however.
It's very possible to do this.
The trick is that search engines deal with symbols, not necessarily words or characters. If you change the words and characters to different symbols then you're set. Imagine a dictionary of words that associated each word with a number. You keep the dictionary and don't give it to the vendor. You just give the numbers, and send your query in numbers. It works.
This particular scheme wouldn't be very secure, but it easy to imagine better ones.
Here's what you need: a search engine that allows you to modify documents as they go into the index, and also allows you to specify custom tokenizers, morphological analyzers, and whatnot.
The search engine I developed does this. http://dieselpoint.com/
This is a great challenge and an active area of research for some time. Many researchers would like to build databases that protect the users without creating some huge pile of aggregated personal information.
Encrypting the data at the client is a good solution. I've posted several good case studies from my book, Translucent Databases .
Here's what I wrote for a library and here's a case study of helping an online store.
Let me know if you have questions or suggestions.
...has an option to index encrypted files.
you need an search index on that server. Attach to that server via ssl, query it using encrypted text. that text will be decrypted and processed via the index on that server. Results are encrypted and sent back. You then unencrypt your results.
SharePoint can prevent even server-admins form accessing the uploaded/stored data, while still allowing users/groups with authorization to the data to search it. I know this is a missing feature in Google's Mini/Appliance, and one of the reasons corporations have a problem with Googleâ(TM)s solution.
Some other search providers have similar authorization-based solutions, which indirectly fulfills your need. Be wize.
Freenet uses a search feature that searches encrypted data.
As pointed out above, if the data is encrypted, the service can't search on it.
So:
- you get a VM or a hosted machine that you have complete control over.
- You set up all your encryption as necessary, eg encrypting the file system. SSL to the machine, etc
- You set up a search system, eg lucene, or maybe database as SQL queries are needed or whatever.
- Profit(?)
Of course, you could do all the same in-house as well, without the need for encryption etc.
ws
So does Anonymous Coward have good karma?
But it may not be everything you're looking for. My requirements were:
.rar filenames. After that, I simply FTP'd the files to the server.
:) So in the end, I'm not even using it at the moment.
1 - Mask the filename
2 - Encrypt the contents
3 - Add recovery data in case the file got damaged
4 - Ability to view unmasked filename from web
I put together a batch file I could drag/drop multiple files onto that used WinRAR to compress the files (individually), with encrypted filenames, a password (of course), and included archive recovery data. It then used ReNamer to encrypt the
I had a webpage that would accept a password, and unencrypt the filenames so they were viewable in readable form on the page. Each one was a hyperlink. There was an extra step required if you wanted the downloaded filename to be unencrypted as well.
After uploading 115G or so, my host alerted me to the fact that they didn't allow me to keep offsite backups there.
My solution didn't allow me to search within the files, but it did allow me to store files on the server that they had no way of viewing the contents of, or guessing the contents of based on filename.
My Tech Posts on Twitter
There are some solutions for this. I think the first appraches were called "Iraiksan". However there is a massive performance penalty so you are unlikely to find this offered anywhere. Better keep metadata on your local machine and search that.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
You say 'SSL only encrypts the transport' as if that means something. What is a file if it's not a way to transport information from the file writer to the file reader?
I use SSL daily to encrypt files with keys to be stored for later retrieval by the intended recipient. I think you are confusing SSL (the ability to assymetrically encrypt data) with HTTPS (a use of SSL to encrypt HTTP data transfers)?
I have no problem with your religion until you decide it's reason to deprive others of the truth.
God damned arrogant know-it-alls.
how about wuala (http://www.wuala.com)?
Just use a book (or multiple books) code cipher for your index. You don't need to remember a thing beyond which books and what your key starting number is, the pattern. And if someone is in your house throwing all your books at cracking the remote server, you are already screwed and have much bigger problems, such as they probably already installed a keylogger on you. If you are that much of a target for someone to take that much interest....time for plan B or C then, involving plastic surgery, new ID and some nation where there is no extradition treaty 0_o
1 Encrypt the file (or record for databases)
1.5 (for a database) Encrypt the key fields each separately
2 Encrypt the file name separately
3 store on server
To search for a file:
1 Encrypt the search criteria (file name or key value)
2 search for encrypted thing on server
3 Retrieve matches.
This this is quite neat. Cross platform (Java Web Start), secure and free . You can donate local storage to gain more remote storage.
exactly what DRM is, IE if I want to give out encryted data, to a computer/user that I don't trust, yet I want them/that to do useful stuff with my encrypted data, but never give full read access to my encrypted data.
So a VM could be deployed on the remote server that would only allow your signed app to perform only the acts you allow on the data, and only allow your client to connect securely. This would provide the desired functionality. Same as DRM, the security is through obscurity at some point, since you must give the key to your data out in some fashion, but it can be hidden deep in a big program. So a true open source solution likely can't exist...
This company has a Beta API to do just that I think.
I know of some people that are using it for searching across anonymized medical data.
BeliefNetworks Web Services API
http://beliefnetworks.net/bnws/
They have code samples too and a downloadable Java Library.
http://beliefnetworks.net/bnws/examples.html
http://beliefnetworks.net/bnws/security.html
EncFS (Fuse)
or, if just trying to conceal data, make a disk image on your hosted environment and mount that.
I use sshfs (again, fuse) to mount a remote directory, and then EncFS to mount an encrypted drive within the sshfs mount.
Convoluted, but possible - unix only of course... :)
Have a look at http://www.clipperz.com/about. The application knows nothing about you actual data but you can do stuff on your data, like searching (which is atm not possible, but would be).
Would this not be do-able right now today, in a sorta round-about way? You'd have to do the 'searching' locally, so if you were hoping to offload CPU use to this provider that's out.. but..
If the provider gives you an NFS mount, and you create a file inside using the crypto filesystem, and mount it over the NFS mount -- I'm pretty sure the provider has no ability to know jack crap about your data, and you now have unencrypted data on your end you could hit with whatever search program/indexing program/whatever you want. Assuming it maintained its indexes in that mounted filesystem, you could mount it up anywhere, potentially even multiple places at once, to do your searches.. and unmount it when not in use.
Maybe not the most elegant, and certainly loses the ability for the provider-end to do your search for you.. but maybe fits the bill?
I assume FredFredrickson meant that the index would be encrypted.
Either you send your storage provider clear data, in which case he can understand and work with it (including search through it), or you can send him (and ask him to store) encrypted data.
One of the principal characteristics of (well-)encrypted data is that it is essentially random gibberish. Encrypting your search query won't somehow help him understand your encrypted data. The purpose of encrypting it is to keep (all) others out of it.
Sorry.
Exceeding the recommended torque is not recommended.
For FredFredrickson's scheme to work, whether the index is encrypted or not is actually irrelevant. The scheme relies on nothing more than the server not having any effective use of the index file. That can be achieved simply by not uploading the index to the server. The client would use the index locally to figure out which chunk of the encrypted file to request from the server, and request that.
I can see two problems with that, though:
For example, the attacker can discover the frequency at which various chunks of the cyphertext are accessed; if this is a client information database, and the access frequencies can be correlated with independent knowledge of, say, how frequently you deal with your various clients, the attacker can formulate hypotheses about which cyphertext chunks have information about which clients.
Are you adequate?
The server just stores a bunch of indexes into your data and searches them when you supply the keywords. It sounds like what you really need is an efficient index (it requires few reads to determine whether what you are searching for is there, or that it isn't anywhere). Then you can build and encrypt the index and store it online in chunks, and download the pieces of it that you need to search for your keywords, and then retrieve the encrypted data that the index entries points to.
For instance, if you want to do keyword searches you build a word index from all the keywords in your documents and then put links to the documents into buckets for each keyword. You could make this relatively efficient by creating the following structure for each bucket: "document1,document2,document3,..." and then storing it by encrypting the structure and naming it with the encrypted value of the keyword. E.g. for "slashdot" create a bucket named "fynfuqbg" and in it store links to any of your documents containing the string "slashdot".
To perform a keyword search, encrypt all the keywords separately and ask the online storage for all the files named with the set of encrypted keywords, and then once you have the index entries do a simple intersection on them to find links to documents containing all the keywords. If you want to support searching for variations in the spelling of a keyword, just generate and encrypt all the possible variations that you want to search for and see if there are index buckets for any of those variations.
Obviously the online storage facility will know that you are performing searches, and can figure out what your most popular searches are in terms of the buckets you access, and they could statistically determine what likely plaintext keyword belongs to a bucket based on the common word frequencies in documents, and the general frequency of searches for particular keywords. One way to obfuscate your searches is to always include several requests for other blocks, using a statistical method to try to make all your searches obey a uniform distribution. Storing keyword buckets in a uniform size is also imperative to prevent statistical analysis as you build the index (otherwise watching index buckets grow would allow the online storage facility to associate the indexes that grew with recently added documents).
The problem is not that it is impossible, just that most current implementations are extremely slow. Song implemented ranged query over encrypted data on gMail and even with encryption accelerators the performance was low. Some more papers: http://www.springerlink.com/index/370086k273w1587t.pdf http://www.cs.berkeley.edu/~dawnsong/papers/rangequery-full.pdf http://www.springerlink.com/index/u2007h5706482j51.pdf There have also been some different multi-server database schemes that do the same thing, although, once again, due to performance and the cost of maintenance I do not know of any that have actually gone to market Hope those help. Hit me up if you want more info. -Nav
Think FTP... just with the contents encrypted. You still get to browse the files and know their names (e.g., it is searchable), but each file is encrypted for security. You even get the index of the current directory each time you connect, and the entire index if you want. Done.
Been various ways brought out so i will add my comment also.
If you are storing the files encrypted, that means you do not want others to be able to know the content; if its for some other reason you better rethink what you are doing.
In being able to search the information means that the method used has the keys to be able to open up the file and pull out information (decryption). If only you have the keys, then only you can open up the files which means a third party will not be able to (Google).
Some have put out the idea of a index that is stored along with the secured file - BAD idea! If you have any information about the contents of a encrypted file, you have just given a third party information on how to possibly get through the encryption, you have weakened the security.
I have taken enough security courses to understand that unless you get the proper education in security, you will absolutely do the wrong thing when it comes to security. In fact in trying to secure it, may result in it becoming more vulnerable.
I run this company, so it is a shameless plug, but its the best solution imho.
It is a disaster recovery data storage company that takes daily snapshots of business data, transmits them via ssh, stores them encrypted on our servers, and allows https access for customers to find what they need if they lose their data.
Naturally, we dont give out a lot of tech specs, but we have a large client base in Australia, and have been operating on the same premise for the last 6 years.
www.sns-storage.com
Have a nice day :)
Julian Field
I use JLAN for this. I have a virtual private online server that i don't have root access to. So i can't install FUSE.
Instead i installed JLAN which is a user mode java application that stores your data either in a file, a set of files or in a database. I store the data in a database (my provider gives unlimited database access with the virtual private server subscription).
JLAN outputs the data as either an FTP, NFS or SMB share/filesytem. So it doesn't create a filesystem like FUSE does but it is still trivial to get to the data either directly (//myhost/myshare) or via a permanent share->drive mapping. This is why JLAN doesn't need to have admin permissions to insall a filesystem driver on the remote server.
It all works perfectly and it is GPLed these days.
One great use for this is that for ~$10 a month you can get a virtual private server with shell access and database access. The shell access is all you need to run JLAN (it is a user mode application). Setup JLAN to store files in a database encrypted and share out those files as an SMB share. Let all your trusted freinds know the address and you all have a filesystem you can easily access remotely that is encrypted for a very low cost. Another thing you can do with the shell access is run a torrent program and set the download path as the path of the JLAN shared drive. That way all your torrents are stored in that filesystem.
It isn't slow. In fact even with the database overhead of the filesystem i'm using it is still a faster than my 20mbps net connection. It is also blazingly fast to browse thanks to the database structure JLAN uses for the filesystem (the file table entries are in one area and typically get cached and the blobs of the actual data of the files are in another).
Encrypt all of your documents, word by word, with your private key. Each word would have to be in a separate file, named sequentially. If you need to search for a word, you sign your query, then search for that query. In short, this is a retarded idea, and even the best-case is garbage. And why wasn't this posted to Idle?
I hate grammar Nazi's.
The scheme you're proposing here requires the server have full understanding of an index that maps properties of interest (encoded as hashes) to the data items in the database (which represent files). This index says quite plainly that certain data items share certain properties with other data items (i.e., are both listed under the same hash). This reveals some information about the encrypted data that is subject to statistical analysis, and to correlation with other, independently obtained information.
For example, if by social engineering I discover that your index is indexing last names of Americans, I can formulate hypotheses about which hash represents which name. By observing how the hashes cluster across the set of documents, I can further test that against information about, say, the last names of your contact persons at your various clients.
Your proposal really isn't qualitatively different from a full-text index. The only difference is that the granularity of the index you're describing is coarser; instead of pinpointing the location of every individual word in the database that satisfy the search, the index might pinpoint largeish "files" that mention a certain last name. Makes it much harder to crack, sure, but the point is that the difference is quantitative, not qualitative.
Are you adequate?
If you can boil down your query terms to presence of one or more search terms, and if you can efficiently generate the full set of terms for a region of plaintext (e.g. there are a finite number of whole words in a text and you only support whole word matching), then you could produce a Bloom filter bitmap representing all of the "present matches" for a given region. By introducing a secret key into the hash functions of the Bloom filter, you could enable someone to do the bitmap searching for you, without revealing what you were searching for (ignoring traffic analysis on populations of searches).
Bloom filters also allow you to easily compose bitmaps, so if you can segment your plaintext to the smallest atomic search region, you can compute the bitmaps for all regions in parallel (or sequentially) and then build up a hierarchy of combined bitmaps for the powerset of regions (or some heuristic subset of the powerset).
Wouldn't that be the best way?
Well. I have to wonder. If you have two encrypted volumes and compare semantic information across them . . . isn't this the 'dark veil' approach to doing full text searches across encrypted volumes?
If you had a web page on the host that would decrypt the file names (or files), they could have just stored a copy after your code generated it. Not only that, but they could have trivially captured any password you put into it. That's not a secure system at all if you assume a malicious host like the OP assumes.
We do exactly this with Amazon S3 & JungleDisk. All our files are encrypted on the S3 servers (including file names). Since JungleDisk is the proxy that handles the encryption/decryption and provides a mount point which my desktop search indexer has access to, I can use my normal desktop search to quickly find the encrypted data I want on S3.
Of course, the search index is local and not handled remotely... but... meh. Works fine for our needs.
Sounds like somebody doesn't know ROT13!!
It's not possible to search for a keyword within a larger encrypted text without decrypting the text. So there have been numerous proposals for indexing methods with various pros and cons. Suppose we encrypt each word separately? "Beethoven" becomes "mxP370e8" If I want to search for "Beethoven" without letting Google know (put aside, for the moment, the objection that Google _already_ knows everything) I search for "mxP370e8" instead of "Beethoven", and my search returns a link to a word that is surrounded by other encrypted words, perhaps a file. It may be secure enough to let Google know that "mxP370e8" is the third word of a file of 18132 words, and that I searched for it. Encrypting word-by-word is vulnerable to statistical and traffic analysis, but there are ways to mitigate this, such as by using lots of salt to make all words the same size, changing keys for different files or parts of files (now there is more than one encryption that maps to "Beethoven") and so on. I think my basic point is that if you want to do what we normally think of as a full-text search, then each searchable word has to be standalone encrypted all by itself, if the third party is going to do the searching.
What you're looking for is called blinding.
Hmm... You could think of using the remote storage as a block storage for encrypted data... might not be the most efficient, but it could get you the functionality you seek...
You'll probably end up having to pick between something that's either CPU-intensive, bandwidth-intensive or storage-intensive...
I don't think anyone proposed a solution similar to this, but you could send and encrypt two packs of data.. First you store your actual data, encrypted as you so wish.. then, you store your search data (also encrypted). Think like maybe what Google Desktop uses to search your data on your own computer.. It's smaller than the actual data, but it's still a different content.. Then when you need to search for something, you can either use your local copy (which is kept encrypted, too), or if you don't have it, you can download only the search data from the server, and search locally, only retrieving the data you want with the pointers you got from your search.
The idea is idiotic and one of those crypto-idiot fantasies that the totally clueless and paranoid seem to have.
About the only way you could do this is to control the server itself since then only person who knows about the encryption keys is yourself or rather your server.
But the poster wants a third party to have his encrypted data, send him the key to that data and then open that data, look at it but not be able to look at it.
This is DRM. It don't work. If I want to encrypt something I have several things.
The sender, this entity MUST have the original data, the encryption key and the encrypted result.
The reader, this person must have the encrypted data, the decryption key and can with those two, obtain a copy of the original data.
In between is the messenger or untrusted party. The messenger should NEVER have the encrypted data and decryption key at the same time or they will be able to do what only the reader should be able to do.
Tradiotionaly this means the sender and reader meet, exchange keys and then part. The sender then uses a messenger to send the encrypted data to the reader. The messenger does not have they key and so is safe. If said messenger turns out to be unreliable, you only loose the encrypted data, they key is safe with you and the reader.
DRM fails, because it trusts the messenger but not the reader. DRM wishes to give the reader everything so it can read the message but not be able to read the message. This cannot be done and is the reason DRM fails.
I seen some people be confused by SSH. SSH seemingly allows you to securely connect to a remote system without a seperate exchange of keys. The problem is that SSH doesn't allow that at all. If you just ssh to a remote system you are NOT secure at all. How do you know you are connecting to the system you are connecting to and not something else? You are trusting the messenger, the internet, to be trustworthy. SSH warns you about this, when you first connect to a system asking you accept the remote machines key, if you have NOT in a seperate communication verified that this key belongs to the remote machine, then you are gambling the internet is trusthworthy.
Back to the system proposed. DRM's wet dream is to control the readers hardware so they can only read the message in a way that doesn't allow them to reproduce it. The Trusted Computing dream. If the whole end machine is encrypted, only the anolog hole remains.
You cannot send a reader all the data they need to read the data but not be able to read the data for their own purposes. If I want google to search my email, they must be able to read my email.
The idea to search in encrypted data is just plain silly. The whole point of encryption is to not be able to read in it. If you encrypt a piece of text in such a way that individual words can be encrypted the same as the same words in the document then you are asking to be cracked in no time.
Consider how human usuable encryptions are attacked. By looking for often repeating encodings that might relate to common words. If you know the text is in english, then in an encrypted text "4 231231 421 4 534534 4" it would be fairly easy to figure out that 4 = a. Find more common words by statiscal analysis and then you only need to figure out the encryption that results in that encoding of a very short string and voila, you can decode eveything. Good encryption does NOT allow the same data parts to be encrypted the same.
So the idea of sending encrypted words to seach for is idiotic. Even if you have the original data on your PC so you can create the same encoded data as on the server (but why then search on it remotly?) you still wouldn't be able to snip the bit you wanted out of it because the encryption shouldn't allow you to do that.
The entire idea is idiotic.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
data is stored in an encrypted format but is searchable. I'm not sure how they do it but they have a good website that explains it all.
FreeNet project.
Comment removed based on user account deletion
I was just giving him some ideas, not providing a perfect solution. I was on a shared host, so my files were mixed in with probably tens of thousands of other files.
I think it's safe to say, you're never fully protected, you just take the precautions that provide the most safety, with the hoops of having to deal with.
You should take a look at allmydata.com
Behind the scenes, it's tahoe-lafs which is being used, and you can ask the hoster to join the cloud with your own nodes. This way, you can get both worlds.
http://allmydata.org/trac/tahoe
Just a note, from the Apple Mail help pages. This application saves all your messages in encrypted format (if they were received encrypted). When the message is opened, a index is created and saved plaintext. This allows you to use spotlight on encrypted messages.
I think it is reasonable to save plaintext indexes in this scenario.
-- I was raised on the command line, bitch
Spideroak is the only company I know of that DOESN'T HOLD THE KEYS to your encrypted data. Even if they wanted to 'see' your data, they couldn't. https://spideroak.com/
Insert_Ending_Here
When I want to encrypt my data securely, I cambio de langue beaucoup des tursan en gach frase. In this way, ich puedo tres cinnteach sein that bheil only a handful of people ann a soussteheneas nere mensajea.
HAL.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
Try Memopal. It's a backup software that lets you search through your data while keeping it encrypted
Here is a description of similar functionality/API: http://en.wikipedia.org/wiki/Host-proof_hosting .