Encrypted But Searchable Online Storage?
An anonymous reader asks "Is there a solution for online storage of encrypted data providing encrypted search and similar functions over the encrypted data? Is there an API/software/solution or even some online storage company providing this? I don't like Google understanding all my unencrypted data, but I like that Google can search them when they are unencrypted. So I would like to have both: the online storage provider does not understand my data, but he can still help me with searching in them, and doing other useful stuff. I mean: I send to the remote server encrypted data and later an encrypted query (the server cannot decipher them), and the server sends me back a chunk of my encrypted data stored there — the result of my encrypted query. Or I ask for the directory structure of my encrypted data (somehow stored in my data too — like in a tar archive), and the server sends it back, without knowing that this encrypted chunk is the directory structure. I googled for this and found some papers, however no software and no online service providing this yet." Can anyone point to an available implementation?
It's not possible to do this even in theory, unless you're relying on very weak encryption. The point of encryption is that you can't infer anything about the contents. If Google was able to infer enough to give you meaningful search results (if for example each word was encrypted by itself, and you searched for the encrypted version of the word), they would therefore necessarily be able to know enough to perform a frequency analysis attack on your data and compromise it in no time flat unless it was a very small amount of data (thus meaning search isn't really of value anyway).
You'll find a similar problem plagues any attempt at searching. Searching requires a certain knowledge or meta knowledge of the material being searched; and that knowledge necessarily dramatically weakens your encryption.
Slay a dragon... over lunch!
Use an encrypted query to match against the encrypted text. The problem is, if the text is REALLY encrypted, then there shouldn't be enough information to do this - the encrypting of the original text should make it impossible to even match against it.
If it didn't, then an attacker who got hold of the encrypted text and some of your encrypted queries might well be able to mount an attack based on commonalities between the two.
A thousand pounds of wood moving at 300 feet per minute. Don't get in the way.
You're missing something. SSL is for data that is in transit. The poster wants the data to be encrypted on the server. That's easy - any encryption program can do it. But then s/he also wants to search it. That is harder.
-- Support a free market in the field of government
This sounds pretty easy,
a) obtain database, indexing tools, search tool
b) install on the machine and encrypt the entire hard drive with any of the many available whole-disk encryption tools
c) ssh in and run queries.
Nature journal lied in Britannica vs Wikipedia Ask to retrac
Unless you do the indexing client-side, and upload an index that's somehow encrypted...
I'm not saying I know how to do this, but it seems possible.
There's no -1 for "I don't get it."
Just to clarify the OP's idea. They want to store only encrypted data on the server, send only encrypted queries to the server(that the server can't even decrypt), yet they expect that the server will be able to send them back results. I don't think it can happen but surprise me.
The best I think you can do is store and transfer the data in encrypted form and put the indexes and any search logic on the client. Maybe the index could be stored on the server as well and synced to the client, but creating the index will require access to the plaintext.
No, this is not what SSL is for at all. SSL you have a party you wish to communicate with, but an insecure channel.
Here, you don't want to communicate anything useful to anyone. This is more a privacy preserving data mining problem. It goes something like this:
I have a long list of secret numbers 1...n. I do something to these numbers, so that Google doesn't know what they are, and then I send them to Google. Next, I want to know how many numbers are larger than, say k. So, I ask Google, but in a clever way, so that Google doesn't know what I'm asking.
Google then tells me how many of my original numbers were larger than k. However, Google doesn't know my original numbers, and they don't know what question I asked. There needs to be some theoretical mapping that preserves this privacy, but still allows the data mining to occur.
If the data is encrypted in independent "chunks" from which search terms can be built then this is trivial: You pre-encrypt your search terms and search for them. Searching a word ROT13-encoded document works this way, as each character is encrypted individually and an encrypted search term is made up of encrypted characters.
Once you get past this, it's no longer easy. You basically have to either make the term you are searching for look like all possible values of the encrypted text and return all matches, or decrypt the document somewhere.
If the encryption is good and any particular chunk, extract, or other slicing-and-dicing of the encrypted data without the key looks random, you are pretty much stuck with decrypting it somewhere.
The alternative is to store an index, or at least a list of keywords, in clear text. For example, a document describing how to build a nuclear bomb could have a list of 10 or 20 non-classified keywords attached to it to aid searching. But that's not what you are asking for.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
If you don't trust your data in others' hands, don't give it to them in the first place.
The (costly) solution: /. ... a future Ask Slashdot, anyone?)
1) Get a 1U server from ACME with appropriate hardware
2) Install favourite Unix-based OS, e.g. FreeBSD
3) Configure server with appropriate software, e.g. Truecrypt, SSH, etc.
4) Find open source search engine software to index your data, see sourceforge.net (or look for recommendations on
5) Place server in a secure co-location facility
6) ???
7) Profit.
Yes you are =)
SSL only encrypts the transport.
It seems that the poster wants to have his data _stored_ in an encrypted way that is only decipherable by him, not by any of the machines/users at the storage facility. Yet, when he wants to do some search, he somehow expects the server to be able to do so... AFAIK that's not feasible.
(you could store whatever encrypted stuff remotely, but querying will require fetching, reading and decrypting the (relevant portions of) data locally...)
If there is one thing to be learned on slashdot, it has to be sarcasm.
http://www.cs.berkeley.edu/~dawnsong/papers/se.pdf
As pointed out by others, the index can be stored encrypted, then downloaded locally. However, this means the index is what is being searched, and it - the item being searched - is in fact not being searched on the server. In practice this has value, but it's not what this thread asks.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
" I send to the remote server encrypted data and later an encrypted query (the server cannot decipher them), and the server sends me back a chunk of my encrypted data stored there â" the result of my encrypted query. " There are only 2 ways in the universe for accomplishing this, but they are both simple: Method 1: Send entire database to user for any search query. The results are bound to be in there somewhere! Method 2: If the server is powerful enough, brute force crack the encryption scheme, find the results, then re-encrypt it and send back to user. Anything else would violate the definition of full encryption. Of course, you could have "partial" encryption with unencrypted meta-data that the search is performed on.
It's been done. GNUnet.
120 characters isn't enough to explain it.
...and when the bartender asks him what he would like to drink, the guy says "I want what I always get, but I don't want you to actually pour the drink, just help me search behind the bar for the liquor I want, and the hand it to me without seeing what it actaully is, and charge me correctly without any knowledge of what it is you just helped me find."
Keep the files on the remote server, encrypted. Keep the search index in a database, encrypted in chunks. Rsync your search database between your local machine and the server. Actual searches of the databases would be done locally.
Result: terrible performance whenever you access your data from a new machine (must sync entire search database). Good performance the rest of the time. Remote server never sees anything but cyphertext.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
if the server cannot decipher the query it cannot execute it on a binary blob of encrypted data. FAIL.
Gung jbhyq qrcraq ba ubj gevivny lbhe rapelcgvba zrgubq vf.
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
There's plenty meaning that can be derived from just filenames.
Does it really matter that Google or whoever can't see the exact text or images, but has enough information from filenames, tags and descriptions to accurately find out what kind of furry porn you like?
People who encrypt their data often don't want to disclose even what kind of content they have. Knowledge of what sort of porn is there, or that you're having an affair, or private internal company data are things that can be disclosed from just knowing document titles without having to even look at the exact file.
The solution to this is to take Google out of the equation. Encrypt your computer's hard disk, encrypt all your mail, build your own search database that will be stored on the encrypted disk, and search that.
So you either want to:
- Decrypted
- Search
If so, then just mount an encrypted drive and put the Search Index on the drive its self... Basically any encryption filter driver will do the mounting for you (Windows and Linux ship with these) and any old Search Software will work for the searching, just move the index.
Or you want to:
- Search Encrypted Content
- For other encrypted content (or decrypted content)
In either case this isn't possible. At least assuming you're using a Crypto algorithm written in the last thirty or so years then it won't work. Even in World War 2 they had encryption that would make this harder than just decrypted it.
Well that depends whether the OP wants to perform something like a fulltext search (i.e. the ability to look for keywords within the content of each document) or a metadata search.
There's nothing to prevent you setting up a CMS where each piece of content is encrypted, but the metadata describing that content is out in the clear and searchable. Security in such a scenario would be less than optimal (e.g. people could guess certain things about your content based on the statistical pattern of length for each of the millions of encrypted content items), and of course you'd have to be very careful about the metadata fields and how you are populating them.
If libertarians are so opposed to effective government, why don't they all move to Somalia?
...isn't this easy?
Plaintext: "Attack at dawn"
Ciphertext: "lkaoiuast98u;aw"
Search query: "oiua"
Result: "lkaoiuast98u;aw"
What could be simpler?
(no, I'm not an idiot, this is a joke.)
would be to first encrypt each document word-by-word (this can lead to really big documents because of paddings), then the client would transmit the document together with the encrypted words as plain text. In this way, the search engine indexes meaningless words which points to the encrypted documents (you can use two different algorithms and/or keys for word-by-word encryption and for documents). For searching your client encrypts the keywords (asking for the encryption key) and once you have a link you have to decrypt the document.
There should be some weak link in this chain, but I don't find any: be the first to claim my two cents.
What this thread is about is "I have a file that is secret. I want to encrypt it into an opaque, un-encryptable-without-the-key blob. I want to upload it to a search engine. I want to do searches against it."
The answer is "By definition, it can't be done, not in the way you want. If it's transparent enough to search, it's no longer encrypted enough to be called encrypted. Other solutions, such as using indexes, may provide some of the practical benefits you want, but they are not without risk."
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
As long as your query looks something like this...
SELECT * FROM mydata WHERE stuff LIKE '%YToyOntzOjc6InBhY2thZ2UiO3M6MjM5OiKyKHPh9ZawDX6KyA62cMd6p+mjBybGwJyCaNfFb7S.........
Seriously though, if I understand your objective I think it would be feasible to develop something like that, but I don't think its something you could integrate into Google's search services unless they added something on their end.
You could pass a decryption key along with your query and the server would then decrypt records as it performed the search. It would be very resource intensive.
As an close example, I have a web based password storage application in which I did not want to keep the encryption keys on the same server as the password database. So I generate a key with which to encrypt the records and the user keeps their key and must supply it every time they want to decrypt a record. I don't go so far as to enable searching of the encrypted data, I have a description field specifically for that purpose. The web application is called Passbox and is written in PHP.
Yeah, Id like my cake and eat it too!
The only way this could work is if you has tags in the meta header of the encrypted file
telling you that yes I am encrypted, but I have an image in me or my encrypted data is of the type accounting.
This might work for indexing searches where you want to be able to return all the files on the pc (encrypted or not) that are images or etc...
For the n00bs, the above post is in ROT13. Here is a link for a converter.
Just throwing out an idea for an implementation:
The uploaded blob to the cloud is encrypted. But there resides a local index for searching it.
I haven't had a need for this (as I inherently don't trust the cloud) but if someone knows of this type of implementation perhaps it's enough for the poster.
Randomly say that you found or did not find the search pattern. Since you're not decrypting it, nobody can tell if you're lying.
This seems obvious impossible, but it isn't. The problem, of course, is in how the server can perform a search when it isn't even able to decrypt the message telling it to do a search.
However, there is nothing inherently impossible in defining an encrypted datastructure and an algorithm where you can perform computations on the *encrypted* data, without having any idea about what it is you are computing. There is no reason that you need to decrypt data before you can do computations with it. It just needs to be the case that when you perform an operation on the encrypted data, some predictable other operation happens on the data inside the encryption. The result of this encrypted computation will then be something still encrypted, which can be sent to the client who can then decrypt it and find inside the result of his query.
So it isn't obviously impossible. In fact the theory of multiparty computation makes it clearly possible, though the overhead of doing it that way would probably be too high.
I'm sure they copied and decrypted the data when you uploaded it.
(This is why I wrap all my data in tin foil.)
It must have been something you assimilated. . . .
I prefer ROT26. It's got built-in steganography to boot.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
There are encryption algorithms that allow addition. That is, the sum of two encrypted messages is an encryption of the sum. I've forgotten how these work exactly, I think they are some many to one mapping, and the addition operation is not simply adding the encrytped numerical representations.
I came across these when looking at voting systems that allow N distributed people to vote in a way that sums the result before it is decrypted rather than decrypting to do the sum.
Anyhow what this means is that is possible to do certain operations on a remote database, like sum a column, without the database knowing the result and without transmistting anything additional information inbound or outbound.
You could presumably have your data stored in many forms on the database, each form suited for one type of query. Then you just query the approriate form to perform the operation of interest.
I'm reasonably sure there is no way to perform very high order operations that one might typically do in a relational database however.
Some drink at the fountain of knowledge. Others just gargle.
I'd assume that's exactly not what the OP means, on the grounds that it's so trivially obvious that nobody would need to ask it.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
And they can't search inside your documents.
damn, beat me to it. only efficient way to do it. Basically, you'd be doing your own searching, and not relying on google's search algorithms.
There is a cryptography technique called Public Information Retrieval which allows you to do just that: Send an encrypted query to a server, let it perform some operations on your behalf, and send you an encrypted query result. The server neither knows the contents of the encrypted data, nor the content of the query, but you have your result nonetheless.
The intuition is that there exists a sort of "black-box" operation which some cryptographic techniques can use. For example, if I have two encrypted bits a and b (where I can't tell what a and b actually are), I can still perform the operation a xor b. The result is encrypted, and I don't know the actual operands or the result, but I know that what came out is indeed the encryption of the xor of the encrypted bits. Such cryptosystems are forms of "Homomorphic Encryption".
Using this, we can then give the server a search term thus encrypted and, using the black-box opertaion, have it do some set of operations which will reveal the result. The server will execute the exact same set of operations independent of the search term, so it knows nothing (and needs to know nothing) of the search term contents. Of course, this implies that the server has to operate on every element of the encrypted data to do its job, but that's the fundamental tradeoff. If you're willing to accept that, and the additional computational overhead, you can design such a system.
So there I was, juggling apples and small animals, when I accidentally bit into the wrong one...
It's very possible to do this.
The trick is that search engines deal with symbols, not necessarily words or characters. If you change the words and characters to different symbols then you're set. Imagine a dictionary of words that associated each word with a number. You keep the dictionary and don't give it to the vendor. You just give the numbers, and send your query in numbers. It works.
This particular scheme wouldn't be very secure, but it easy to imagine better ones.
Here's what you need: a search engine that allows you to modify documents as they go into the index, and also allows you to specify custom tokenizers, morphological analyzers, and whatnot.
The search engine I developed does this. http://dieselpoint.com/
This is a great challenge and an active area of research for some time. Many researchers would like to build databases that protect the users without creating some huge pile of aggregated personal information.
Encrypting the data at the client is a good solution. I've posted several good case studies from my book, Translucent Databases .
Here's what I wrote for a library and here's a case study of helping an online store.
Let me know if you have questions or suggestions.
...has an option to index encrypted files.
Could also just use VI with the g?? command...
ohg gura gung zvtug or gbb zhpu jbex sbe fbzr.
A Human Right
What it sounded like was that he wanted to keep a database with Google that was encrypted and wanted to search it remotely and securely, but without Google being able to look at the data. Even if that were possible, why are you trusting Google with that in the first place? Why not store it somewhere else? I would think keep encrypted data on a server and make a secure connection to it. You send your normal query across the encrypted channel to the secure server, it does its regular search and sends the result back across the secured channel. Add to that some secure authentication, and I thought that would have met the objective (even if the implementation is slightly different than described). If you wanted encrypted data stored in an untrusted location (why are you doing this again?) then you would think it would be necessary to hash specific queries as keys to encrypted data lacking the necessary information to decrypt the information remotely. Either way, guess I will see what other people are saying if this question seems more obvious (differently) to someone else.
Want Big Business out of government? Take away the incentive and start by getting government out of big business!
As pointed out above, if the data is encrypted, the service can't search on it.
So:
- you get a VM or a hosted machine that you have complete control over.
- You set up all your encryption as necessary, eg encrypting the file system. SSL to the machine, etc
- You set up a search system, eg lucene, or maybe database as SQL queries are needed or whatever.
- Profit(?)
Of course, you could do all the same in-house as well, without the need for encryption etc.
ws
So does Anonymous Coward have good karma?
But it may not be everything you're looking for. My requirements were:
.rar filenames. After that, I simply FTP'd the files to the server.
:) So in the end, I'm not even using it at the moment.
1 - Mask the filename
2 - Encrypt the contents
3 - Add recovery data in case the file got damaged
4 - Ability to view unmasked filename from web
I put together a batch file I could drag/drop multiple files onto that used WinRAR to compress the files (individually), with encrypted filenames, a password (of course), and included archive recovery data. It then used ReNamer to encrypt the
I had a webpage that would accept a password, and unencrypt the filenames so they were viewable in readable form on the page. Each one was a hyperlink. There was an extra step required if you wanted the downloaded filename to be unencrypted as well.
After uploading 115G or so, my host alerted me to the fact that they didn't allow me to keep offsite backups there.
My solution didn't allow me to search within the files, but it did allow me to store files on the server that they had no way of viewing the contents of, or guessing the contents of based on filename.
My Tech Posts on Twitter
There are some solutions for this. I think the first appraches were called "Iraiksan". However there is a massive performance penalty so you are unlikely to find this offered anywhere. Better keep metadata on your local machine and search that.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Please tag story as: no
1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.
You say 'SSL only encrypts the transport' as if that means something. What is a file if it's not a way to transport information from the file writer to the file reader?
I use SSL daily to encrypt files with keys to be stored for later retrieval by the intended recipient. I think you are confusing SSL (the ability to assymetrically encrypt data) with HTTPS (a use of SSL to encrypt HTTP data transfers)?
I have no problem with your religion until you decide it's reason to deprive others of the truth.
How the hell is Google going to answer a question that doesn't understand, and give an answer that can't undesrtand too?
Just use a book (or multiple books) code cipher for your index. You don't need to remember a thing beyond which books and what your key starting number is, the pattern. And if someone is in your house throwing all your books at cracking the remote server, you are already screwed and have much bigger problems, such as they probably already installed a keylogger on you. If you are that much of a target for someone to take that much interest....time for plan B or C then, involving plastic surgery, new ID and some nation where there is no extradition treaty 0_o
I tried the vi command, but I get this weird error:
'vi' is not recognized as an internal or external command,
operable program or batch file.
Does this mean ROT13 is not compatible with Windows?
1 Encrypt the file (or record for databases)
1.5 (for a database) Encrypt the key fields each separately
2 Encrypt the file name separately
3 store on server
To search for a file:
1 Encrypt the search criteria (file name or key value)
2 search for encrypted thing on server
3 Retrieve matches.
That's why they hire the best and the brightest out there to solve problems like that. Discovering the future before the future is known.
It can't, that's why I use Live Search. It doesn't understand the query, the data, or the result. Unfortunately, for the OP, it doesn't support encryption.
Failure to follow this advice may result in non-deterministic behavior.
exactly what DRM is, IE if I want to give out encryted data, to a computer/user that I don't trust, yet I want them/that to do useful stuff with my encrypted data, but never give full read access to my encrypted data.
So a VM could be deployed on the remote server that would only allow your signed app to perform only the acts you allow on the data, and only allow your client to connect securely. This would provide the desired functionality. Same as DRM, the security is through obscurity at some point, since you must give the key to your data out in some fashion, but it can be hidden deep in a big program. So a true open source solution likely can't exist...
This company has a Beta API to do just that I think.
I know of some people that are using it for searching across anonymized medical data.
BeliefNetworks Web Services API
http://beliefnetworks.net/bnws/
They have code samples too and a downloadable Java Library.
http://beliefnetworks.net/bnws/examples.html
http://beliefnetworks.net/bnws/security.html
I assume FredFredrickson meant that the index would be encrypted.
But then s/he also wants to search it. That is harder.
That is not harder - that is impossible.
The reason is that in order to search the data, you first have to decrypt the data. If you decrypt the data on the server side, you just compromised your security. End of game.
Lodragan Draoidh
The more you explain it, the more I don't understand it. - Mark Twain
What's an OP?
Failure to follow this advice may result in non-deterministic behavior.
Either you send your storage provider clear data, in which case he can understand and work with it (including search through it), or you can send him (and ask him to store) encrypted data.
One of the principal characteristics of (well-)encrypted data is that it is essentially random gibberish. Encrypting your search query won't somehow help him understand your encrypted data. The purpose of encrypting it is to keep (all) others out of it.
Sorry.
Exceeding the recommended torque is not recommended.
For FredFredrickson's scheme to work, whether the index is encrypted or not is actually irrelevant. The scheme relies on nothing more than the server not having any effective use of the index file. That can be achieved simply by not uploading the index to the server. The client would use the index locally to figure out which chunk of the encrypted file to request from the server, and request that.
I can see two problems with that, though:
For example, the attacker can discover the frequency at which various chunks of the cyphertext are accessed; if this is a client information database, and the access frequencies can be correlated with independent knowledge of, say, how frequently you deal with your various clients, the attacker can formulate hypotheses about which cyphertext chunks have information about which clients.
Are you adequate?
The server just stores a bunch of indexes into your data and searches them when you supply the keywords. It sounds like what you really need is an efficient index (it requires few reads to determine whether what you are searching for is there, or that it isn't anywhere). Then you can build and encrypt the index and store it online in chunks, and download the pieces of it that you need to search for your keywords, and then retrieve the encrypted data that the index entries points to.
For instance, if you want to do keyword searches you build a word index from all the keywords in your documents and then put links to the documents into buckets for each keyword. You could make this relatively efficient by creating the following structure for each bucket: "document1,document2,document3,..." and then storing it by encrypting the structure and naming it with the encrypted value of the keyword. E.g. for "slashdot" create a bucket named "fynfuqbg" and in it store links to any of your documents containing the string "slashdot".
To perform a keyword search, encrypt all the keywords separately and ask the online storage for all the files named with the set of encrypted keywords, and then once you have the index entries do a simple intersection on them to find links to documents containing all the keywords. If you want to support searching for variations in the spelling of a keyword, just generate and encrypt all the possible variations that you want to search for and see if there are index buckets for any of those variations.
Obviously the online storage facility will know that you are performing searches, and can figure out what your most popular searches are in terms of the buckets you access, and they could statistically determine what likely plaintext keyword belongs to a bucket based on the common word frequencies in documents, and the general frequency of searches for particular keywords. One way to obfuscate your searches is to always include several requests for other blocks, using a statistical method to try to make all your searches obey a uniform distribution. Storing keyword buckets in a uniform size is also imperative to prevent statistical analysis as you build the index (otherwise watching index buckets grow would allow the online storage facility to associate the indexes that grew with recently added documents).
The problem is not that it is impossible, just that most current implementations are extremely slow. Song implemented ranged query over encrypted data on gMail and even with encryption accelerators the performance was low. Some more papers: http://www.springerlink.com/index/370086k273w1587t.pdf http://www.cs.berkeley.edu/~dawnsong/papers/rangequery-full.pdf http://www.springerlink.com/index/u2007h5706482j51.pdf There have also been some different multi-server database schemes that do the same thing, although, once again, due to performance and the cost of maintenance I do not know of any that have actually gone to market Hope those help. Hit me up if you want more info. -Nav
Been various ways brought out so i will add my comment also.
If you are storing the files encrypted, that means you do not want others to be able to know the content; if its for some other reason you better rethink what you are doing.
In being able to search the information means that the method used has the keys to be able to open up the file and pull out information (decryption). If only you have the keys, then only you can open up the files which means a third party will not be able to (Google).
Some have put out the idea of a index that is stored along with the secured file - BAD idea! If you have any information about the contents of a encrypted file, you have just given a third party information on how to possibly get through the encryption, you have weakened the security.
I have taken enough security courses to understand that unless you get the proper education in security, you will absolutely do the wrong thing when it comes to security. In fact in trying to secure it, may result in it becoming more vulnerable.
Original Poster
www.Buy-Proxy.com - A "buyer-driven" global marketplace.
I run this company, so it is a shameless plug, but its the best solution imho.
It is a disaster recovery data storage company that takes daily snapshots of business data, transmits them via ssh, stores them encrypted on our servers, and allows https access for customers to find what they need if they lose their data.
Naturally, we dont give out a lot of tech specs, but we have a large client base in Australia, and have been operating on the same premise for the last 6 years.
www.sns-storage.com
Have a nice day :)
Julian Field
I use JLAN for this. I have a virtual private online server that i don't have root access to. So i can't install FUSE.
Instead i installed JLAN which is a user mode java application that stores your data either in a file, a set of files or in a database. I store the data in a database (my provider gives unlimited database access with the virtual private server subscription).
JLAN outputs the data as either an FTP, NFS or SMB share/filesytem. So it doesn't create a filesystem like FUSE does but it is still trivial to get to the data either directly (//myhost/myshare) or via a permanent share->drive mapping. This is why JLAN doesn't need to have admin permissions to insall a filesystem driver on the remote server.
It all works perfectly and it is GPLed these days.
One great use for this is that for ~$10 a month you can get a virtual private server with shell access and database access. The shell access is all you need to run JLAN (it is a user mode application). Setup JLAN to store files in a database encrypted and share out those files as an SMB share. Let all your trusted freinds know the address and you all have a filesystem you can easily access remotely that is encrypted for a very low cost. Another thing you can do with the shell access is run a torrent program and set the download path as the path of the JLAN shared drive. That way all your torrents are stored in that filesystem.
It isn't slow. In fact even with the database overhead of the filesystem i'm using it is still a faster than my 20mbps net connection. It is also blazingly fast to browse thanks to the database structure JLAN uses for the filesystem (the file table entries are in one area and typically get cached and the blobs of the actual data of the files are in another).
Encrypt all of your documents, word by word, with your private key. Each word would have to be in a separate file, named sequentially. If you need to search for a word, you sign your query, then search for that query. In short, this is a retarded idea, and even the best-case is garbage. And why wasn't this posted to Idle?
I hate grammar Nazi's.
The scheme you're proposing here requires the server have full understanding of an index that maps properties of interest (encoded as hashes) to the data items in the database (which represent files). This index says quite plainly that certain data items share certain properties with other data items (i.e., are both listed under the same hash). This reveals some information about the encrypted data that is subject to statistical analysis, and to correlation with other, independently obtained information.
For example, if by social engineering I discover that your index is indexing last names of Americans, I can formulate hypotheses about which hash represents which name. By observing how the hashes cluster across the set of documents, I can further test that against information about, say, the last names of your contact persons at your various clients.
Your proposal really isn't qualitatively different from a full-text index. The only difference is that the granularity of the index you're describing is coarser; instead of pinpointing the location of every individual word in the database that satisfy the search, the index might pinpoint largeish "files" that mention a certain last name. Makes it much harder to crack, sure, but the point is that the difference is quantitative, not qualitative.
Are you adequate?
I think so... perhaps you should try another operating system?
Be aware though, it's got a weird text editor...
A Human Right
If you had a web page on the host that would decrypt the file names (or files), they could have just stored a copy after your code generated it. Not only that, but they could have trivially captured any password you put into it. That's not a secure system at all if you assume a malicious host like the OP assumes.
It's not possible to search for a keyword within a larger encrypted text without decrypting the text. So there have been numerous proposals for indexing methods with various pros and cons. Suppose we encrypt each word separately? "Beethoven" becomes "mxP370e8" If I want to search for "Beethoven" without letting Google know (put aside, for the moment, the objection that Google _already_ knows everything) I search for "mxP370e8" instead of "Beethoven", and my search returns a link to a word that is surrounded by other encrypted words, perhaps a file. It may be secure enough to let Google know that "mxP370e8" is the third word of a file of 18132 words, and that I searched for it. Encrypting word-by-word is vulnerable to statistical and traffic analysis, but there are ways to mitigate this, such as by using lots of salt to make all words the same size, changing keys for different files or parts of files (now there is more than one encryption that maps to "Beethoven") and so on. I think my basic point is that if you want to do what we normally think of as a full-text search, then each searchable word has to be standalone encrypted all by itself, if the third party is going to do the searching.
What you're looking for is called blinding.
Hmm... You could think of using the remote storage as a block storage for encrypted data... might not be the most efficient, but it could get you the functionality you seek...
42.
Search works by tokenizing a document, and creating an inverse index from tokens to documents. The tokens does not need to mean anything to the search engine. If you generate the tokens on the client, and don't transmit the dictionary that maps from word to token id, you can have "encrypted search".
The problem with doing that directly is that if you want to do proximity based search you need information on the token order, and they could do frequency analysis to come up with plain text guesses if they guess the language right. You can counteract that by mapping the same word to multiple tokens to even out the frequency of each token id, but it means you would need to search for multiple tokens to find all occurrences of a word.
If you don't are about word proximity it's much safer, as the index would only contain each token once per document at most.
Do you know which word "2" represents, or what is in documents 103 and 178?
That's how you do it. You need to ensure there's no way of doing statistical analysis on the token list to recover plaintext info, and you need to not give them the dictionary mapping from plaintext to tokens.
And these tokens are generated how? Oh yeah. by Google's search engine. Whoops. If you want someone to extract information from data they will be definition be able to extract some amount of information from the data, even you have everything encrypted/etc. they could do frequency counts of the tokens and convert them to words, traffic analysis (can't encrypt the from/to). etc.
The idea is idiotic and one of those crypto-idiot fantasies that the totally clueless and paranoid seem to have.
About the only way you could do this is to control the server itself since then only person who knows about the encryption keys is yourself or rather your server.
But the poster wants a third party to have his encrypted data, send him the key to that data and then open that data, look at it but not be able to look at it.
This is DRM. It don't work. If I want to encrypt something I have several things.
The sender, this entity MUST have the original data, the encryption key and the encrypted result.
The reader, this person must have the encrypted data, the decryption key and can with those two, obtain a copy of the original data.
In between is the messenger or untrusted party. The messenger should NEVER have the encrypted data and decryption key at the same time or they will be able to do what only the reader should be able to do.
Tradiotionaly this means the sender and reader meet, exchange keys and then part. The sender then uses a messenger to send the encrypted data to the reader. The messenger does not have they key and so is safe. If said messenger turns out to be unreliable, you only loose the encrypted data, they key is safe with you and the reader.
DRM fails, because it trusts the messenger but not the reader. DRM wishes to give the reader everything so it can read the message but not be able to read the message. This cannot be done and is the reason DRM fails.
I seen some people be confused by SSH. SSH seemingly allows you to securely connect to a remote system without a seperate exchange of keys. The problem is that SSH doesn't allow that at all. If you just ssh to a remote system you are NOT secure at all. How do you know you are connecting to the system you are connecting to and not something else? You are trusting the messenger, the internet, to be trustworthy. SSH warns you about this, when you first connect to a system asking you accept the remote machines key, if you have NOT in a seperate communication verified that this key belongs to the remote machine, then you are gambling the internet is trusthworthy.
Back to the system proposed. DRM's wet dream is to control the readers hardware so they can only read the message in a way that doesn't allow them to reproduce it. The Trusted Computing dream. If the whole end machine is encrypted, only the anolog hole remains.
You cannot send a reader all the data they need to read the data but not be able to read the data for their own purposes. If I want google to search my email, they must be able to read my email.
The idea to search in encrypted data is just plain silly. The whole point of encryption is to not be able to read in it. If you encrypt a piece of text in such a way that individual words can be encrypted the same as the same words in the document then you are asking to be cracked in no time.
Consider how human usuable encryptions are attacked. By looking for often repeating encodings that might relate to common words. If you know the text is in english, then in an encrypted text "4 231231 421 4 534534 4" it would be fairly easy to figure out that 4 = a. Find more common words by statiscal analysis and then you only need to figure out the encryption that results in that encoding of a very short string and voila, you can decode eveything. Good encryption does NOT allow the same data parts to be encrypted the same.
So the idea of sending encrypted words to seach for is idiotic. Even if you have the original data on your PC so you can create the same encoded data as on the server (but why then search on it remotly?) you still wouldn't be able to snip the bit you wanted out of it because the encryption shouldn't allow you to do that.
The entire idea is idiotic.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
They use plaintext indices.
Comment removed based on user account deletion
Probably because, if it's encrypted and searchable remotely, then you can potentially store it anywhere. And if you can store it anywhere, then backups are going to be cheap.
Personally, I'd be looking at ways to keep the search index local.
For the n00bs, the above post is in double-ROT13.
Just a note, from the Apple Mail help pages. This application saves all your messages in encrypted format (if they were received encrypted). When the message is opened, a index is created and saved plaintext. This allows you to use spotlight on encrypted messages.
I think it is reasonable to save plaintext indexes in this scenario.
-- I was raised on the command line, bitch
This kind of Tokenization allows you to token frequency analysis of all the documents, you essentially destroy the value of the encryption by using the same key for all documents.
Standard PGP uses a different key for each document so the tokens never appear in the same format. When you give this up, you give up a large portion of the security of your documents.
The only way I see to do this is build the index on your machine and point to the offline storage.
Spideroak is the only company I know of that DOESN'T HOLD THE KEYS to your encrypted data. Even if they wanted to 'see' your data, they couldn't. https://spideroak.com/
Insert_Ending_Here
When I want to encrypt my data securely, I cambio de langue beaucoup des tursan en gach frase. In this way, ich puedo tres cinnteach sein that bheil only a handful of people ann a soussteheneas nere mensajea.
HAL.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
That's exactly the principle behind some companies' implementations of the ISO 13250 Topic Maps standard: easily accessible and searchable "maps" of your information and relevant relationshps between them, but without access to the information itself. By analogy, I can get a good idea and picture of where people live and organizations are housed but that does not mean that I can access their properties.
And these tokens are generated how?
Keywords and meta-data entered in the client while everything is decrypted. Or automatic indexing by whatever does the encryption.
You're not wrong to point out that frequency analysis and other techniques would render this less secure than a perfect black box. But it's also significantly better than trusting Google or your ISP with plain text.
What makes you trust your colo provider or ISP more than you trust Google?
I have this same feeling myself, but I've never been able to articulate exactly how Google are any different from any other commercial provider. They employ engineers who are smart enough to avoid the kinds of dumb mistakes that result in data leaks. They are too big to care about you personally.
If I can manage to trust Verizon, why shouldn't I trust Google?