LOAF - Distributed Social Networking Over Email
FamousLongAgo writes "LOAF (List Of All Friends) is an extension to email that lets you send out address book data without compromising your privacy. LOAF appends a hash-like data structure to each outgoing email, and collects similar attachments from the people who write to you. These files can be queried to see if they contain a given email address, but they can't be reverse-engineered to reveal the list of addresses used to construct them. LOAF lets you check whether someone emailing you for the first time is a complete stranger, or appears in the address books of some of your trusted correspondents. And as a decentralized application, LOAF offers an interesting alternative to current social networking sites like Orkut or Friendster."
They've included a nice analysis of the types of attacks including the Ex-Girlfriend attack, Marc Canter attack, and Dictionary Attacks in the writeup
The configurable false positive rate can make Bloom filters resistant to dictionary attack, but it also renders them less useful. Given a false positive rate of c, and a dictionary with k elements, a dictionary attack will result in ck false hits. This rate goes down if you can collect multiple filters from the same user that are either 1) of different length, or 2) use different hash functions (salts, in our implementation). False positives in either case will be different, so for n filters the false positive rate will drop to c^n.
This implies that the truly paranoid should use a presized filter large enough to contain as many correspondents as they ever expect to have on record, and an invariant set of salts. Under those conditions, collecting multiple filters will not change the false positive rate. A mostly empty large filter might have an unacceptably low false positive rate, so you would want to pad the list of real emails out with random data, to maintain a constant ratio of on/off bits as well.
The tradeoff with a high false positive rate is that the filter will be less useful to legitimate recipients. An intriguing possibility is that of sending out very inaccurate filters that are updated on a regular basis (for example weekly) so that a user has to accumulate a certain number of the filters in order to run queries with a good degree of certitude. This spreads private information over several filters and ensures that an eavesdropper who intercepts only one file will find it of very limited value.
And most importantly they say: Of course, the truly paranoid would be crazy to use LOAF.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
An (bad) example would be that the "encoding" function is the ascii values for the first and third character before the @ and the first character after the @ - those bits of a 128 bit Bloom filter are "lit up" for your address, so that means:
all map to the same bits being lit up in the bloom filter, there is no real way to "reverse engineer" it and since it does not assume no collisions (unlike MD5 and SHA*) it is not expected to have unique mappings - that's a feature, as they say.
There is no central list. The concept is that you append a list of YOUR friends to the end of each email you send. No one can read the list alone, but they can check if specific addresses are in it. So when someone new emails you, you check their address against all the known-good LOAF hashes youve recieved, this will tell you if they are a friend of a friend of yours.
Reverse-engineer is the wrong word. Nothing about this process is hidden, so there is nothing to reverse engineer. This simply hashes or "encrypts" the data using a one way function called a bloom filter. One way functions are easy to compute in one direction, but are extremely computationally intensive to compute in the other direction, for example multiplying c=a*b is easy, factoring c into a and b is hard. They are a cornerstone of cryptography, and all of the important, widely used types of one-way functions have been studied extensively by the some of the most brilliant mathmaticians in the world so we have a pretty good handle on how long it will take to break anything encrypted with them. However, they are not loth, and occasionally do suprise us with new findings.
However, this is quite different than DeCSS, which was fundamentally insecure, as they distributed the key with every single DVD player in existence, and relied on people just not looking. I don't know much about Bloom filters in specific, so I can't comment on this implementation, but methods like it are employed everyday to keep password secure, when sending across the internet, or storing it in the server.
Argh. RTFA. There is a configurable false-positive value that would make this kind of attack less than useful.
Trust not a man who's rich in flax / His morals may be sadly lax
Indeed, Bloom Filters are the shit.
;)
These days, in my spare time, I'm writing a p2p program -- think of it as a swarm-download system, like BitTorrent, on an overlay network topology, like eMule (only eMule uses Kademlia, and I'm using Pastry). It has been shown, here and here, that Bloom Filters can drastically reduce the traffic generated when searching peer to peer networks. I recently coded a Java implementation of a Bloom Filter for my p2p program, and it works great in testing. (But the p2p program isn't anywhere near done, so don't ask about it
Furthermore, Bloom Filters can be compressed -- see Michael Mitzenmacher's work here. The idea that you can compress a Bloom Filter is a little counter-intuitive, because the size of the bit vector and the number of hash functions are derived using calculus to maximize the compactness of the set, for a given false positive rate -- thus, in this state, it is non-compressable (it is "already compressed" by simply being an optimal Bloom Filter). To compress a bloom filter, you must choose a large bit vector, and a non-optimal number of hash functions, then apply the compression algorithm (typically arithmetic coding). Because the bit vector is so large, it is sparsely populated -- and so compression works.
Often you can save 10% and 20% on the size of your bloom filter, while having a lower false positive rate. Score!
A very nice, very interesting survey of all the applications of Bloom Filters can be found here.
- sm
"Address book" is a misnomer - what this is based on is email addresses you have sent email to (or specifically imported into LOAF). The app monitors your outbound mail (through a sendmail wrapper, for example) and adds all new addresses to its list of seen recipients.
Right now there is a reference implementation for Pine/procmail, we are hoping for help with implementations for Outlook, Mail.app, and other clients.
A customer service representative will be with me shortly.
Bloom filters have been around since 1970 (link to acm digital library - you probably need a subscription to get in), and can be based on any crytographic hash function, such as sha-1.
Bloom filters tell you if something is (probably) a member of a set. If you know an email address, you can ask "is this email address in this address book?", but you can't ask "what are all the email addresses in this address book?" without guessing every address. Essentially, if a spammer already has you email addrees, he can verify that it's actually in use, but if he doesn't already have it, guessing it is likely to be fairly hard (unless it's something like bob@hotmail.com, or if loaf uses a weak cryptographic hash function).
In other words, loaf is as difficult to break as reversing a hash of your email address. The longer your email address is, the safer you are.
-jim
"If you want to talk about elitism- thinking your way is better because its your way, now THATS elitism."
Elitism:
1. The belief that certain persons or members of certain classes or groups deserve favored treatment by virtue of their perceived superiority, as in intellect, social status, or financial resources. 2a. The sense of entitlement enjoyed by such a group or class. b. Control, rule, or domination by such a group or class.
(The American Heritage® Dictionary of the English Language: Fourth Edition. 2000.)
You're free to disagree with my opinion, but words have established definitions. By the way - I don't think "my way" is "better," I just prefer it (I'm also morally opposed to "your way").
G