Slashdot Mirror


Fast Random Number Generation For Encrypted FS?

Signal 11 asks: "I've been reviewing different filesystem-level encryption schemes and so far, I have found only one solution - applying a kernel patch and using the loopback device. The problem is in generating large amounts of random data to seed the initial filesystem - it takes about 16 hours to create 20GB of pseudo-random data from /dev/urandom. Is there a faster (and equally secure) way to generate large amounts of random data?" Any clues? I'd figure a kernel patch to turbocharge /dev/urandom (while not losing too much security) might be another route to take.

11 of 24 comments (clear)

  1. How much randomness do you actually need? by nweaver · · Score: 3

    One question to ask is how much randomness do you really need? Depending on the application, you either really have no choice but to sit down and let the system's mix of pseudo random number generation and physcial system measurements generate enough bits. IIRC, linux systems use a mix of measurements (disk timing, keystrokes, mouse behavior) and a pseudo random number generator to create these bits. This makes them pretty GOOD random bits, but slow to gather.

    But often you can get away with a GOOD pseudo random number generator, that is well seeded, to generate the mass of bits you need. You only actually need, say, 256 truely random bits to get the job done.

    In which case, the easiest thing to do is grab a bunch of 8 sided dice, and roll them a bunch of times to seed your favorite cryptographically secure PRNG, and then use that to get all the pseudorandom bits you want. (8 sided dice are nice, because each roll generates 3 bits of entropy).

    For information on PRNGs, go look through Applied Cryptography or a similar text.


    Nicholas C Weaver
    nweaver@cs.berkeley.edu

    --
    Test your net with Netalyzr
  2. slightly OT question: leaky encrypted fs? by Daffy+Duck · · Score: 3
    My main question is: what is the reason for filling the file with random data before creating the filesystem?

    And the reason I ask is that I've just installed the international patch and started to mess around with the loopback filesystem, and I've noticed that even after the random initialization the encrypted data isn't completely random-looking.

    As a simple test, I made a 1MB encrypted filesystem using the serpent cipher, and inside that I created a 900K file full of a repeating 8-byte sequence. Then I unmounted the filesystem.

    The 1MB container file is compressible down to 320K by gzip. This doesn't seem right to me - shouldn't there be almost no redundancy in the encrypted file? Further investigation shows that about 930 of the 1024 1K blocks in the container occur in repeated groups of four blocks. The data inside each block are certainly scrambled beyond human recognition, but isn't this exposure of the cleartext's redundancy a sign of something wrong?

    Is it my fault? I followed the encrypted loopback HOWTO to the letter. Anyone know what's up?

    1. Re:slightly OT question: leaky encrypted fs? by DaveHowe · · Score: 2

      The 1MB container file is compressible down to 320K by gzip. This doesn't seem right to me - shouldn't there be almost no redundancy in the encrypted file? Further investigation shows that about 930 of the 1024 1K blocks in the container occur in repeated groups of four blocks. The data inside each block are certainly scrambled beyond human recognition, but isn't this exposure of the cleartext's redundancy a sign of something wrong?
      It varies on the scheme used. It appears to me that each of the "virtual sectors" of your encrypted filesystem is being encrypted separately with one of four keys - and of course, identical data encrypted with identical keys will produce identical cyphertext!
      What you actually want to happen is for each VSector to be "seeded" with a random value (probably as part of the key) to prevent this - but in practice, having large numbers of sectors with identical data in them isn't likely to happen, so the additional overhead in normal use isn't worth the special-case.
      It may be worth you bearing in mind though, if you ever need to hide *how much data* you have - seed unused VSectors with pseudorandom or straight counter values, and make sure any filestructures have sufficient internal structure(pointers and so forth) that identical blocks of plaintext will come out as different file-data
      --

      --
      -=DaveHowe=-
    2. Re:slightly OT question: leaky encrypted fs? by coyote-san · · Score: 2

      1) I'm not sure about the serpent cipher, but DES (and others) use an "initial vector" dependent upon the disk blocks relative (or absolute) position. Identical plaintext blocks will *not* give you identical ciphertext blocks, so something is wrong. (If serpent doesn't have an IV key, then it's an inappropriate cipher to use in this application.)

      A secondary factor, already mentioned, is that the cipher should be run in CBC mode within the disk block. That shouldn't be a problem since any block cipher can be run in CBC mode - it's coded at a higher abstraction level.

      2) the "randomness" in a well-encrypted file is not due to the encryption - it's due to the compression that occurs before the encryption. Non-randomness is a problem with byte-wide encryption (including variants like Viriege(sp?) and Playfair ciphers), but it's far harder to attribute any significance to non-randomness in a 8- or 16-byte cipher running in CBC mode.

      3) the reason for preloading the disk with random data is simple: you eliminate the implicit "known-text" problem that occurs with empty (and nulled) sectors. With random data the only known-text on the entire encrypted disk is the superblock, plus *maybe* a file or two at an unknown location.

      --
      For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  3. Strength of Hash function by drig · · Score: 3

    Linux's random device uses a SHA hash to mix the seed and produce decent random numbers. SHA may be a bit more than you need. If you were to change linux/drivers/char/random.c to use MD2 instead of SHA1, you'd probably get a noticeable speed up.

    --
    Citizens Against Plate Tectonics
  4. Re:fractal algorithm? by drig · · Score: 2

    One big issue with cryptography is the need to reduce patterns in the ciphertext. One prime qualty of a fractal algorithm is that it makes repeated patterns. As such, most fractals would not be directly applicable to cryptography. Of course, if there was some way of hiding the patterns (maybe by concentrating on the borderland areas), this wouldn't apply.

    --
    Citizens Against Plate Tectonics
  5. Hardware or CD-ROM? by extra88 · · Score: 2
    Maybe you should look at hardware designed to deliver random numbers. I just found this page which has some listed (use Find to jump down to the Random Numbers section).

    I think many of these are serial port attachments, obviously a lot slower than what /dev/urandom is producing so dumping straight to disk isn't an option, but I think what such hardware gets you is a reliable, high-quality source of random bits to seed a pseudo-random process. Looking at the man page for urandom on OpenBSD (I'm assuming the one in Linux is no different), it doesn't check the entropy pool for quality so without a high-quality source of randomness, in 20GB you're entropy pool's quality is pretty likely to run low, relying just on system activity.

    /dev/urandom isn't *that* slow, producing 364KB/sec (unless you meant 20Gb, then it's only about 45KB/sec). I don't know how cryptographically sound it would be but you could consider using a CD-ROM of random data (see the link above) as a starting point. A CD-ROM drive should be able to deliver a lot more KB/sec for /dev/urandom or something else to process to get your 20GB.

  6. Video In? by SEWilco · · Score: 2
    Well, how fast can you grab input from video boards? A 1/4 screen 320*240 stream of 8-bit bytes is 76KB. Feed it an often-changing channel or "snow" on an unused channel. Or several channels on several video cards, mixed together.

    This is of course fast and merely somewhat random. There are a lot of similarities between consecutive frames. An unused channel is also particularly susceptible to intential attack by a nearby transmitter.

    For more randomness, this can be run through further randomizing algorithms. The obvious example is the SGI LavaRand.

  7. Try YARROW. by rjh · · Score: 2

    I am an InfoSec professional, but this is not professional advice. (Gads, I hate lawsuit-happy cultures like ours...)

    Bruce Schneier's YARROW is the only PRNG I'm aware of which has actually gone through formal cryptanalysis. I'm not overly fond of YARROW--it's extremely slow with its 3DES/SHA-1 core--but the fact that it's been cryptanalyzed makes it much more trusted than almost any other PRNG.

    Substituting a fast cipher like Blowfish or an AES candidate, and replacing the hash algo with MD5 or somesuch, would result in a much faster YARROW. Unfortunately, this also invalidates the cryptanalysis, since the modified version wouldn't have undergone the same level of cryptanalysis: still, if I remember correctly, YARROW is intended to be both hash- and cipher-agnostic.

  8. Re:Why so much data? by coyote-san · · Score: 2

    You use that random data to eliminate most of the "known plaintext" on the disk. That's the same reason you need to use a real cryptographic PRNG, not a weak one which is easily cracked.

    On the one hand, this doesn't offer you *that* much more protection - I would rather use a strong algorithm and zero-initialized blocks than a weak algorithm and random blocks. On the other hand, if you feel it's appropriate to use an encrypted FS in the first place you shouldn't begrudge small improvements with definite benefits.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  9. Johnny Mnemonic? by technos · · Score: 2

    Using a cheap BT848 you can grab and write 30 fps at that resolution and depth, so you could potentially get 2.2 megs a second.. For a 20G volume, that's just over two and a half hours..

    By a simple increase of resolution to the break-even point (512x384 at 16bpp I believe is where the older Brooktree flatline) you could easily saturate any IDE device, and mow that sucker in under an hour..

    A queer side effect would be that any super-hyper-sharp NSA cracker might be able to make out a snowy episode of Dragonball-Z through your data, and know the key off the bat!!

    Oh, and does anyone else get Johnny Mnemonic deja vu?

    --
    .sig: Now legally binding!