The Ultimate Weapon Against Censorship?
Every byte in the source file is XOR'd with exactly one byte in the random file. The result file, by itself, is totally indistinguishable from white noise, provided that the pad used is truly random. Madore now suggests that users store pads on different servers and use several of them in combination to encrypt data.
A FTP or WWW site that stores one of the pads could argue that they are only storing random noise, and another might do the same. It would be mathematically impossible to prove them guilty of storing illegal information (unless there is a way to prove that one pad was created after the other). Only by the combination of the two (or more) files I am able to retrieve the original controversial information. The critical parts are the links to the pads I need to obtain the information, but those might be traded on a distributed system like Gnutella or FreeNet. Plus links take very little space and can be relocated easily to freespace ISPs.
The concept is a little more complicated than my summary here, so please read the paper (and mirror it, it's GPL'd!). There are already scripts and programs to create pads and restore the original files (including a GUI program for Win32). I might add that the idea of pad encryption is fairly old, already used in WWII -- its advantage is that it is mathematically safe if the pads are truly random and only used once, thus its name "One Time Pad"."
*Sigh*
/. reaction, but it's still Bad Crypto.
Everybody loves the One Time Pad.
Can't imagine why. It's like, couple words out of Shannon saying a system can be provably uncrackable, as long as it's far too annoying to actually use, and people convert that to:
Lets just make it not annoying to use.
Problem is, the security comes from that annoyance, and degrades ungracefully: Very, very ungracefully. As in, the moment one pad gets compromised, or even reused, boom. Game over. You're done.
Compound that by having key material retrieved by the encryptor over a network(as this system depends on), and you're even more done. Lets analyze what's going on here a bit.
All cryptosystems are essentially engines for extracting the secrecy from a set of data. Secrecy is something even more intangible than the raw data that itself is secret; a very large quantity of information can be stored and transfered, but a secret can only be transfered if that data can be understood. Cryptography essentially works by allowing the comprehensibility of data--though not the data itself--to be extracted and simplified down to some other piece of data.
Now, often that data can be much, much smaller. Broadbridge Media, for instance, takes direct advantage of this for reasonably secure mass data distribution of music videos on CDs--some large ciphertext gets mass distributed on CDs or DVDs, while a small, personalized transaction over the Internet allows an individual to retrieve the key which decrypts the ciphertext into plaintext. The mass data is moved, but remains incomprehensible until a relatively tiny amount of key material is transmitted to the destination host.
Madore's system is somewhat similar; he still has a chunk of extracted secrecy composed of a "recipe of pads" which, when XOR'ed together, reveal the plaintext. This recipe can be as small as literally two pads; an innocent "complete works of Shakespeare" page and some extension thereof.
First problem? Madore gets his pad indexes from the first couple of bytes of whatever pad he's come across. PGP has survived reasonably well with a 2^^32 complexity attack against its public keyspace indexes(it's called the DEADBEEF attack); Madore's system however is likely to find collisions in everyday use.
It never ceases to amaze cryptographers that, for all the functionality of the fixed-output, one way hash(password storage, small indexes to arbitrarily sized inputs), people don't use them. There really aren't that many flat out solved problems in all of crypto, this is one of them. IF YOU'RE NOT STORING YOUR PASSWORDS AS EITHER MD5 OR SHA-1 HASHES, YOU'RE WAITING TO GET HACKED. *sigh*
Anyway, beyond that small chunk of data which gives the recipe of which block to use, there's also the censorworthy-but-XOR-obfuscated block which will supposedly diffuse itself throughout the network. Whereas Broadbridge got its incomprehensible data out the door on CDs, Madore's system invokes the distributed nature of many, many XORable keyblocks to hide which block on the network is the actual censor-worthy block.
But how many blocks do I need to use for a recipe? Suppose I have 200 random blocks to choose from, and I download one block of random key material. Wait. Lets say I'm really paranoid, and I generate my own random block to XOR against, and upload it to a server. OK. So I've gotten my single block to XOR against, I do so, and I upload my data-containing block to the padservers.
I've already lost.
Whether I downloaded my keyblock from the network, or uploaded it to the network, anybody sniffing my network traffic will see the exact block I used to encrypt against. They'll either watch it leaving the keyserver or going back in.
Worse, lets assume there was no sniffer--just 201 random blocks, any two of which can be XORed together to reach plaintext. The complexity isn't one of fifty billion, it's 201*201, or a good 40,401 operations. Use of two pads isn't particularly specified...but then, use of this as a viable encryption system isn't particularly specified either. You can tell, by this line:
"Your first task is to locate an announcement stating that the data you want are recoverable by XORing such a set of pads."
Oh, that's all.
"Go find your key."
Obviously, with no special complexity applied to locating your key, there's nothing that separates You As Reader from You As Censor. And, since whoever determines a key used *once* for secret information determines it for all time...boom.
But, lets be fair. Madore's goal mainly seems to be able to give websites the capability to host information they can't recognize. Freenet did this; Madore doesn't actually even come close. Among other things, the system isn't particularly fault tolerant. Good secret sharing systems allow m-of-n functionality, i.e. retrieval of any m number of shares from n total(like 3-of-5) reveals the data. This system? Any block is missing--and there doesn't need to be more than two--and your data is gone. Loss of a single pad archive is likely to cause some data to disappear forever. Ouch.
Honestly, I'm putting too much energy into this. Madore writes the following:
The pads, of course, are just named by their 16-hex-digit names (thus, strictly speaking, the announcement makes it possible to recover the first eight characters of the data; but that should not be a problem).
Any cryptosystem which leakes information about the plaintext in the key material never should have left the drawing boards. I congratulate Madore on noticing this, of many flaws in his design, but this really is Bad Crypto. It's timely, and it's useful, and it'll hopefully prevent people from falling for other Pad scams by sheer nature of the
*Sigh* At least he wasn't trying to sell us anything.
Yours Truly,
Dan Kaminsky
DoxPara Research
http://www.doxpara.com
Hi. I'm the author of the page in question, and victim unaware of the Slashdot effect (well, not truly unaware: Erik Moeller, who posted the story, was kind to notify me in time). I received many emails about it, which I've all read, as well as a good many posts in the current discussion. I can't possibly reply to them all, but I'll try to answer some of the most frequent or important comments here.
First note that the page was written in february (2000/02/19 to 2000/02/23 to be precise), so it is not new. However, I do not claim any kind of originality, nor paternity of the idea: it is a small variation on the protocol described in section 6.3 ("Anonymous Message Broadcast") of Bruce Schneier's book on cryptography. In any case, I think it is pretty obvious in the first place. I am merely suggesting a few practical ideas to make it workable. There is nothing great or revolutionary about anything, and I never made that claim.
One thing should be made clear from the start: the whole idea is not about obscuring what the data is (i.e. it is not strictly speaking cryptography) but about who is sending the data. And, even more specifically, it is about making legal conviction impossible so long as the presumption of innocence is maintained (whether the presumption of innocence still means anything in these dark days is another question:-/ ); thus, it is normal that the story appeared on Slashdot's "Your Rights Online" section.
Please also note that I am not making a political statement. This is not a libertarian manifesto. I am not stating that you should use this system to send out assassination messages against the President / the Prime Minister / the King / the Pope / <insert your favorite assassination victim here>; I am merely stating that you can, and that this is none of my business.
Many have pointed out that my suggested way of naming pads is bad. That's true: using the MD5 (or SHA1 or any other kind of hash) signature would be a better idea. But it doesn't really matter all that much what the pads are named unless we want the system to be resistant to malicious tampering, which was not one of my avowed goals. Indeed, we can get this almost for free, so we might as well. Let's say we could have a symlink pointing from pad_md5_whatever.dat to the pad of the given md5 for each pad in each repository, and "combination recipes" could be given with these links so as to make them resistant to tampering.
Similarly for secret sharing: my idea was not to have a system which is hard to censor (there are other, far better, solutions for this), but to have one which is hard to track.
Another thing I should make quite clear is that the system in itself is not used to hide data: it is used to hide the origin of data. This is why all comments on the "OTP is secure as long as the pad is truly one-time" line, or all remarks to the effect that it is trivial to find all relevant data among the padset, are quite true but completely irrelevant. If you want to hide the data on top of hiding the origin, then you use a traditional cipher; for example, you encrypt your data using blowfish and you use that data (the ciphertext, which for all intents and purposes is random) as input to the pad system. So long as you don't release the key, nobody can tell that there's a blowfish-encrypted data hidden in the pad system. The two are completely orthogonal. (It is true that my remark about the difficulty of finding "recognizable data" in the pad system is very misleading and irrelevant. I should remove that: never mind that part.) As for my comment about the birthday effect, it is merely about accidental collisions, not at all about malicious action.
Somebody asks what is wrong with storing all pads in the same place since anyone can download them all. That is true, but that is beside the point. The point is that as long as a site does not have a complete set of pads yielding readable data, it is not, by iself, breaking any law, and all it is distributing is white noise; whereas if it stores one complete set of pads, then it is distributing the forbidden document in some form. Naturally, if someone wants to collect a complete set of pads, it is a good idea; but to distribute it is dangerous.
Finally, there is the central question of whether the legal argument (which is the crux of the matter) holds water. Presumably it doesn't, but that will at leas prove one thing: the argument shows that any kind of law restricting free speech contradicts the presumption of innocence. Some have pointed out that one could monitor the pad system, and the last pad published in a set of pads would always be the culprit: this is not true, because it might have been delayed, or it might be provably innocent (which implies the former, actually), and you can never quite be sure.
Imagine the following scenario: someone points out on some Usenet group that eight publically available pads, when XORed together, give something like DeCSS code. Judge summons the 'someone' in question, who claims that he just noticed that by randomly XORing pads together; not unconvincing, so judge lets the guy go. Then judge summons the pad owners. Starts with the most recently published pad: but the owner explains "look, my pad is just an encryption using the key 'foobar' of the first 128kb of (some standard transcription of) Shakespeare's Tempest; the idea had been floating around for some time, I just decided to publish it". Judge checks statement: it's true. So apparently the data was "published" earlier than was thought, it just took some time to come out; that makes things rather difficult to track. Second owner similarly points out that his pad is just a sequence of decimals of pi in binary. Third owner is in a country over which judge has no jurisdiction, so nothing to do there. Fourth and fifth owners seem to have created their pads at the very same time, and both state obstinately that they generated pure white noise (following, say, a story on Slashdot about pads being a great idea). Sixth owner says he generated his pad by XORing another dozen other pads with an innocent message (which he shows to judge). Seventh owner refuses to answer judge's question. Eighth owner posted his pad before DeCSS even appeared, so must be innocent (or really?). Now what does judge do? Convict some owners? All? None? Problem is, judge is impressed with first poster's proof, and can't run the risk of convicting someone who might afterward prove that his pad was innocent. Presumption of innocence. Even if judge merely issues an injunction that the pads be taken off the network, every owner appeals on the ground that the pads were reused in making some other messages (innocuous ones) and that removing them would be a serious breach of first amendment (or whatever you call this thing about free speech).
Anyhow, this is the summary: there's nothing new or revolutionary about the whole pad system; in fact, it's pretty trivial. But it does make one point: that information is fundamentally delocalized and that any attempt to pinpoint it or to find a culprit will fail. For the better or for the worse.