Resisting the PGP Whole Disk Encryption Craze
alaederach writes "I run a lab in a non-profit academic life sciences research institute. Our IT recently decided it would be a good idea to use PGP whole disk encryption on all of our computers, laptops and servers and picked PGP's suite of software. The main reason is that a small subset of our researchers work with patient information which we obviously are mandated to keep confidential. My lab does a lot of high-performance computational work (on genes from Tetrahymena, no humans here) and I am concerned that the overhead of complying with our ITs new security policy will be quite detrimental to my research program. For example, dynamically reallocating a partition on a PGP encrypted disk is apparently not possible. Furthermore, there is some evidence that certain forms of compression are also incompatible with PGP whole disk encryption. Interestingly, it is hard to find any negative articles on PGP, probably because most of them are written by IT pros who are only focused on the security, and not usability. I therefore ask the Slashdot community, what are the disadvantages of PGP in terms of performance, Linux, and high-performance computational research?"
Truecrypt Whole Disk Encryption has less than 1% over head. I can't see the problem. Surely the patent and IP information security outweighs this minimal overhead.
Whole disk encryption is excellent for security, but it will bog you down in disk access times. Depends on a lot of things, but reading and writing files can slow down up to 50%, but usually the slow-down is much less. If you are doing something that involves a lot of disk access and it doesn't need to be encrypted, then create a special, non encrypted partition for that.
Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/
An IT policy is a general rule which has to be interpreted and adopted. It's not supposed to be followed by the letter. Ask your IT department what they want to accomplish with the policy, and how you can help them accomplish that without having your work ruined.
what are the disadvantages of PGP in terms of high-performance computational research?
O(1) ;)
Here's a brief experiment I ran: dd if=/dev/zero of=/home/jonas/zeroes bs=1048576 count=1024; that is, writing one gig of zeroes to a disk encrypted with ubuntu's disk encryption from the 8.04 alternative installer.
I saw a roughly constant ~30% CPU usage from kcryptd, going from 25% to 35%, on a 2.13GHz Pentium M (in a thinkpad t43p). So I have 1.5 GHz worth of cycles left.
Hard disk write speed was about 30 megs per second, but oscillating in big leaps. I did my observations with conky, sampling in one-second intervals, but conky is known to sometimes merge two samples. That's probably not the only factor, disk writes are most efficient when clumped together into one big (much preferably sequential) write, so I'd assume the kernel does this.
You haven't told us what your disk usage patterns are. But if you're doing one big read, one big computation, and then one big write, there's going to be zero impact (almost): there was lots of CPU capacity left.
Another low impact scenario is that you have a server that reads work units from disk, hand them to clients, gets results and writes the results back [I assume clients don't need any disk activity]. There you can read a bunch of work units in advance while the server is idle, then hand them out instantaneously when needed.
Aside: bugger, fault in my experiment: I didn't look at the CPU usage of kernel code that's not in the process table. Take what I say with a grain of salt.
But: do the measurement in your own world. My software, hardware and artificial measured usage pattern may differ from yours, subtly but enough that my conclusion doesn't transfer. Be scientific about it :)
You really want blanket encryption because you to worry about such things as swap space, scratch copies made and then deleted and people forgetting to encrypt files. /boot encrypted during install.
If the encryption is done at the block device level (such as dmcrypt on linux) the impact is minimal on how things work and overhead and you are fairly well protected (unless the machine is accessed while powered up by someone wants the data as opposed to just the machine).
Fedora can make all partitions except
Surely what is required is to isolate the sensitive information, so that it can be protected.
That's a great idea that in practice will leak your information. The reason is that _every_ application that touches your data needs to know that it should keep your data confidential.
Broswers know to not cache data transfered over https. It knows the data was encrypted, it knows to be smart with it [for "protective" value of smart].
When you have a program that reads a file through a transparent layer of encryption, it never sees the "please-be-careful-with-this" label, and so the desktop search engine will index all the strings, the editor will write backups to . or /tmp, and so forth. All the apps think they need to do is respect what you meant by your mode bits (if you're on *nix), so it'll chmod/umask the /tmp copy the right way. If someone grabs your disk and you didn't encrypt /tmp, you lose.
And no, encrypting /tmp won't fix it: you need to know that everything the user of the data can write to is encrypted if you want to be sure. I only know one way that I can somewhat confidently say solves the problem: encrypt everything. [and then there's the network, but we'll save that for another decade ;)]
Only encrypting the sensitive data is like carrying water in bucket used for target practice: stuff will leak.
Sorry, they "claim" that.
But on my core 2 2.4 Ghz machine, windows boottime more than doubled after encoding the system partition.
Yeah, i can get 100Mbyte/s linear reads and writes.
But for some reason, random or semi random access get hosed quite a bit.
Maybe it messes with the comand queueing, or the internal prefetch alorithmns, i dont know. Never had a problem on data partitions, but the performance impact on the system drive was enourmous (up to the point that even with 6Gbyte RAM, it wasnt fun anymore)
Ah, and i forgot one thing: the 100Mbyte/s is nearly 100% cpu load on both cores. I dont know where you get 1% overhead from... Even the in-memory benchmark only gets about 150Mbyte under full load on two cores.
S
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
Positive:
- added security
Negative:
- worse performance
- you may forget the password (it has happened before.)
- has to be mounted manually (or at least type in password each time you need access to the data.)
- it's painful to backup
- it's painful to do a proper file systems check
- if the discs are somehow taken by the authorities you might have to give up your password (or be sentenced for whatever they think you have on the discs.)
- discs are only secure if they are not mounted.
There are a few negative sides, but usually they make up for the positive, i.e. if you really need the security then of course this is the way to go. Also remember to secure the other aspects of the machine, like physical access (including fire/theft), software protection (anti malware and virus) and network protection (firewalls, etc.)
"Marketing is not a science even if its an Open Source project"
Run some tests on a drive. Run TrueCrypt, re-run the tests, look the difference in CPU load and performance and then try and work out where the 1% number comes from.
Personally I think its based on averaging time across when you aren't using the machine.
An Eye for an Eye will make the whole world blind - Gandhi
If you do encrypt why use PGP? It costs money and its proprietary. Use Truecrypt which is free and open source, does whole disk encryption which according to this can sometimes actually *boost* performance. I use Truecrypt daily and its awesome. http://en.wikipedia.org/wiki/Truecrypt#Performance http://www.truecrypt.org/
The numbers on my machine are about 20% slower read and 30% slower write. I'm using 256 bit LUKS with serpent-xts-essiv:sha256.
Might I also suggest hardware encryption? Seagate (and others I believe) make drives that do AES128 (good enouhg for this sort of thing I believe) in hardware. Zero performance hit. No software required. Set a drive password and go.
93rd rule of Slashdot: No matter how obvious my sarcasm is, my comment will be taken seriously by someone.
I second that.
If you're looking for an excuse not to protect the data, that's one thing. But TrueCrypt has lots of support and does a good job. PGP in general is well-known and has been refined frequently. That's the reason you don't find a lot of negative criticism-- there isn't any because it works fairly seemlessly. You'll find hard disk controllers don't help the process much, but if the machine does work in batches, and you backup frequently (presuming you're backing up an encrypted partition) and you use a UPS (or your controller supports battery-backed write cache), you can use various write cacheing driver options and techniques to boost performance dramatically. What write cacheing *can* do is to also cause transactional integrity problems if there's a machine hickup. Otherwise, writes are queued up and get batched onto disk. Performance can be 10x, so long as you understand the potential evils involved. It takes the sting out of the disk I/O degradation, but how much will vary with the duty cycles of your application's I/O profile.
---- Teach Peace. It's Cheaper Than War.
My concern with encrypting an entire disk would be fault tolerance. If a sector goes bad on a non-encrypted drive, you might lose a file. If it goes bad on an encrypted drive, do you risk losing more data or even the entire drive?
Of course, one could say that's why you make backups. But presumably the backups would also be using encryption. Therefore, they would be susceptible to the same effect. If there is a greater chance of total data loss on each device, the chance of multiple device failures leading to unrecoverable data also increases.
That is interesting - if the overhead was really 1%, then why even bother with optimizations for multi cores?
The other thing I cannot understand is why anyone would want to run whole-disk encryption on a compute server. Even the US DoD machines that are used for classified research do not do this!
I'm not sure that assuming that just because somethings done in hardware, that it happens in zero time (or even near zero time) is at all accurate. A review I read of a different encrypted drive, said it was 5-10% slower than it's non-encrypted equivalent. It wasn't the Seagate you're talking about, but I doubt that even hardware encryption can do it instantly, so I think your "zero" is an exaggeration.
If there really is a performance loss, and you can quantify it, then you can attack it from another angle, eg an impact statement to management along the lines of "This will introduce a %% performance loss to our workloads, at a cost of $$$. In order to maintain the same level of productivity we will require upgraded hardware at a cost of $$$".
Having a manager who is concerned about his departments budgets on your side can help your case too :)
Linux software RAID 5 uses 2% CPU under heavy load.
Given the fact that you can always recover your data with any Linux livecd gives it a definite edge over a hardware raid solution where you need a similar model to read the data.
My workplace recently mandated that all laptops/portable media be encrypted. The impact to the system cpu usage isn't that significant to be honest, except when attempting to access, say, USB drives.
What's more important is the reliability of the disk itself.
As everyone knows, drivers shipped with laptops tend to be the first casualties of boot-sector-loading programs, like disk encryption and certain virus scanners.
Guess what happens when your encrypted disk can't be booted? You can't boot under a windows/emergency restore disk, because your partition is not readable. You can't boot off anything other than the hard drive. Guess what happens if the corruption doesn't allow you to run the encryption app's boot loader? Only solution is to format the disk.
Some of us who have been hit by this already have gone through the trouble of ensuring that any data we want to keep is stored on a shared drive, and that all work is done in a VM, which is occasionally uploaded to the shared drive as well. Since any given windows or driver-affecting update could kill our machine at any minute and make it entirely unrestorable, that's what's required.
So in essence, we're switching back to storing the media on a non-encrypted device because the loss of the data is more important than the security of the data.
This reminds me of the policies surrounding passwords I've seen at many companies; limiting the set of choices by making password creation requirements, and forcing them to change so often that people end up writing them down and leaving them on their desk. Defeats much of the purpose of having them in the first place.
In the time you spent writing this post to Slashdot, you could have written a friendly letter to your IT department stating that you want some machines to not use this encryption, because these machines need maximum performance and anyway do not store any kind of personal information.
Every expression is true, for a given value of 'true'
The only protection that Full Disk Encryption gives is if someone physically gets their hands on the machine that they can not boot the machine and read its contents. This make perfect sense for laptops but makes little sense for any pertinently fixed location workstations. A laptop will physically leave the premises so it leaves itself open to theft, but a workstation (assuming you have some decent form of physical security) is much less likely to need this protection. Once a workstation is booted and the disk drive unlocked digitally then any hacker that gets a foothold on the system would then have access to it, so all that overhead of full disk encryption does no good unless the encryption is done per-user-session. When you need assess to the data you authenticate and start decrypting then, and keep it encrypted across the network. Yes, that data that you speak of should be encrypted, but you must encrypt it at the correct level to actually increase its security rather than just slowing down the machine. Anything short of that level of control and you are just fooling yourself into thinking you have protected the data. Fool-Disk-Encryption is not always the answer.
I work with the DoD on a classified program. You're right, we don't use encryption on any of our desktops, but the only reason is because you go through 2 security gates with guards, then finally enter a closed room with a giant digital lock with a badge swipe and keypad on the door, not to mention a giant separately digitally controlled deadbolt in addition to the digital lock.
You better bet your ass that we use whole-disk encryption on any machine that would leave the building, though (such as laptops). And those are unclassified!
The submitter is in a research institute. Some labs in that institute have patient data, and therefore require significant security like disk encryption.
His lab works with a protozoa, and has massive computational requirements. There will never be any patient data near his lab, because the people who work with patients are in a different lab (think different department in business). They do not need disk encryption.
You say Truecrypt has "1% overhead", PGP presumably has some other "% overhead." The submitter is asking what the details of that overhead for PGP, truecrypt etc are. Whats the CPU usage, memory usage? Are disk performance penalties constant, or are they dependent on average file size, number of files, format of those files, etc etc etc. "1% overhead" may hide whopping huge performance penalties for specialist users.
I have serious doubt we even need hardware RAID anymore with current CPU speeds.
At some point in time I believed the same thing. I did a test a few years ago to see if it's still worth it to bother with hardware RAID and configured an system with linux and software RAID.
This was for a fileserver in a high performance cluster so speed mattered. I don't have the exact figures here right now, but from what I remember two years ago the software RAID solution was between 7 and 15% slower. Once you start hitting the performance limit your processes hit I/O wait and your performance goes down. When I added LVM to that back then performance got shot to hell.
Now, it's not as bad as it seems, you still get decent performance (especially considering that your setup suddenly costs a lot less and can be done on commodity hardware), and with a fair bit of tinkering with blockdev and your read-ahead buffer (provided you have enough RAM, and your usage fits that particular pattern) you can still get some very nice performance.
The reason that we went with hardware RAID in the end was because hardware RAID isn't all that expensive, and the performance gains were noticeable especially on systems that have to run 24/7 at maximum throughput.
Again, for consumer systems and services where performance isn't a primary concern software RAID is an attractive option, especially if you're on a budget.
As for overhead with encryption: it would make a nice experiment but I think 1% overhead is very optimistic especially on a busy system. The only way to be sure is to compare your performance now to the performance when you encrypt the entire disk. The only time I tested truecrypt I got a throughput of 80MByte/s, while unencrypted I got 120MByte/s, and it's been a while since I tested this. Those truecrypt tests weren't finetuned either, it was basicly a test to see if it was easy to implement.
Anything I mention here has to be taken with a grain of salt since a lot of time has passed and a lot has changed since those tests.
If policy dictates that you have to setup X, the best way to become an exception to this policy is to prove that that policy is detrimental to your project and might end up costing a lot of money. Policy doesn't care about performance, but it cares greatly about money and lost time. Do your tests, do the math, add a pricetag and talk with your manager.
I don't understand people who think that if they encrypt something it automatically becomes secure. For that data to be of any use to someone it will need to be decrypted and relevant people given access, so that destroys the notion of defacto encryption for security right there.
Encryption assumes that bad people are going to get access to your data whatever happens, and if you are using whole disk encryption then you really need to be seriously asking yourself who has physical access to your disks and where your data is located. That needs to be sorted out first, and once it is with data held centrally, I doubt whether disk encryption will be needed. You will probably need some form of encryption between the data and the remote users though. Using full disk encryption gives you something else to go wrong, is a variable in performance impairment you probably can't account, is something else to support for and will almost certainly be unnecessary once you've taken other steps first.
If you're keeping confidential patient information where it would be a Bad Thing(tm) if it ever got mislaid (even if it is encrypted, you don't want a computer with stuff on it lost I assume), in the name of all that is holy, please centralise your data and vet access. Stop people from passing around Excel spreadsheets of data, regardless of when and how it is encrypted.
I really am aghast as to how stupid people are about how and where their data needs to be protected. PGP is the wrong solution here, if you can call it a solution.
actually there's not much disk hit. The CPU loss does exist but isn't awful. I don't do anything that computationally intensive on my laptop.
I ran quite a few tests on my solution; I don't really care if some other software costs you 50% overhead and makes it impossible to use compression software [impressive kernel hack?], for me I lose about 20% write speed 30% read speed, and that's only for sustained read/write.
Day to day use? Didn't slow down a bit. Just as responsive. Battery life? Lost about 10 mins. CPU? Still idles at 0.00.
The cost to me was $20 for the encrypting hdd (that's the differential) and a bit slower for copying massive amounts of data. The upshot? When my laptop with all my financial documents, years of personal email, credit cards, and login credentials for root on some servers I'm responsible for was stolen last year, I lost no data and no one else gained any. The Debian ssl bug hurt me more than that loss (the laptop was actually insured).
The benefit to my using encryption is marginal. So's the cost. The hdd was a toy to play with. The software was a checkbox during installation.
So no, I wouldn't do this to a work computer unless there were a good reason (like being a laptop). But for my personal machine it makes a lot of sense.
93rd rule of Slashdot: No matter how obvious my sarcasm is, my comment will be taken seriously by someone.
Read the FAQ; drives usually have larger block sizes than the block size used for encryption, so there is not much difference.
It depends a lot on what you're doing with the data. If you've got a single-threaded process that's consuming 50MB/s and you can read 100MB/s from the disk and run 100MB/s decodes on the other core, you won't notice the speed difference. If you're doing random access then you will have, say, a 9ms seek time to get the data and then a few more ms to decompress it. If your process is already I/O bound (many scientific computing tasks are) then a 9ms decode per block will halve the speed of your computation.
The correct solution for this lab seems to be to borrow a policy from most defence-related sites. Have a secure and an insecure network. The secure network is allowed to access confidential data, the insecure network isn't. Run encryption on the machines on the insecure network, don't bother with it on the insecure machines. If one of the insecure machines is compromised or stolen then nothing confidential is lost.
I am TheRaven on Soylent News
The submitter is in a research institute. Some labs in that institute have patient data, and therefore require significant security like disk encryption.
Repeat after me: "The first line of security is physical."
If the servers are locked in a room with limited access (like, oh, say, 95+% of servers in the corporate world), then the probably not.
Data security is about securing the data using reasonable compensating controls. If no one can get to the disks, and those who can comprise a limited list of, say, trusted sysadmins, then it doesn't matter whether they're encrypted or not.
Requirements, if properly written, never specify implementation details -- the means. They only specify what is needed. How that is achieved is irrelevant so long as it the requirement is achieved completely.
So other than for devices that are not in access-controlled environment (like laptops or, in some cases, workstations), the need for whole disk encryption at most places is nil.
My blog