Resisting the PGP Whole Disk Encryption Craze
alaederach writes "I run a lab in a non-profit academic life sciences research institute. Our IT recently decided it would be a good idea to use PGP whole disk encryption on all of our computers, laptops and servers and picked PGP's suite of software. The main reason is that a small subset of our researchers work with patient information which we obviously are mandated to keep confidential. My lab does a lot of high-performance computational work (on genes from Tetrahymena, no humans here) and I am concerned that the overhead of complying with our ITs new security policy will be quite detrimental to my research program. For example, dynamically reallocating a partition on a PGP encrypted disk is apparently not possible. Furthermore, there is some evidence that certain forms of compression are also incompatible with PGP whole disk encryption. Interestingly, it is hard to find any negative articles on PGP, probably because most of them are written by IT pros who are only focused on the security, and not usability. I therefore ask the Slashdot community, what are the disadvantages of PGP in terms of performance, Linux, and high-performance computational research?"
Truecrypt Whole Disk Encryption has less than 1% over head. I can't see the problem. Surely the patent and IP information security outweighs this minimal overhead.
what are the disadvantages of PGP in terms of high-performance computational research?
O(1) ;)
Here's a brief experiment I ran: dd if=/dev/zero of=/home/jonas/zeroes bs=1048576 count=1024; that is, writing one gig of zeroes to a disk encrypted with ubuntu's disk encryption from the 8.04 alternative installer.
I saw a roughly constant ~30% CPU usage from kcryptd, going from 25% to 35%, on a 2.13GHz Pentium M (in a thinkpad t43p). So I have 1.5 GHz worth of cycles left.
Hard disk write speed was about 30 megs per second, but oscillating in big leaps. I did my observations with conky, sampling in one-second intervals, but conky is known to sometimes merge two samples. That's probably not the only factor, disk writes are most efficient when clumped together into one big (much preferably sequential) write, so I'd assume the kernel does this.
You haven't told us what your disk usage patterns are. But if you're doing one big read, one big computation, and then one big write, there's going to be zero impact (almost): there was lots of CPU capacity left.
Another low impact scenario is that you have a server that reads work units from disk, hand them to clients, gets results and writes the results back [I assume clients don't need any disk activity]. There you can read a bunch of work units in advance while the server is idle, then hand them out instantaneously when needed.
Aside: bugger, fault in my experiment: I didn't look at the CPU usage of kernel code that's not in the process table. Take what I say with a grain of salt.
But: do the measurement in your own world. My software, hardware and artificial measured usage pattern may differ from yours, subtly but enough that my conclusion doesn't transfer. Be scientific about it :)
You really want blanket encryption because you to worry about such things as swap space, scratch copies made and then deleted and people forgetting to encrypt files. /boot encrypted during install.
If the encryption is done at the block device level (such as dmcrypt on linux) the impact is minimal on how things work and overhead and you are fairly well protected (unless the machine is accessed while powered up by someone wants the data as opposed to just the machine).
Fedora can make all partitions except
Surely what is required is to isolate the sensitive information, so that it can be protected.
That's a great idea that in practice will leak your information. The reason is that _every_ application that touches your data needs to know that it should keep your data confidential.
Broswers know to not cache data transfered over https. It knows the data was encrypted, it knows to be smart with it [for "protective" value of smart].
When you have a program that reads a file through a transparent layer of encryption, it never sees the "please-be-careful-with-this" label, and so the desktop search engine will index all the strings, the editor will write backups to . or /tmp, and so forth. All the apps think they need to do is respect what you meant by your mode bits (if you're on *nix), so it'll chmod/umask the /tmp copy the right way. If someone grabs your disk and you didn't encrypt /tmp, you lose.
And no, encrypting /tmp won't fix it: you need to know that everything the user of the data can write to is encrypted if you want to be sure. I only know one way that I can somewhat confidently say solves the problem: encrypt everything. [and then there's the network, but we'll save that for another decade ;)]
Only encrypting the sensitive data is like carrying water in bucket used for target practice: stuff will leak.
Positive:
- added security
Negative:
- worse performance
- you may forget the password (it has happened before.)
- has to be mounted manually (or at least type in password each time you need access to the data.)
- it's painful to backup
- it's painful to do a proper file systems check
- if the discs are somehow taken by the authorities you might have to give up your password (or be sentenced for whatever they think you have on the discs.)
- discs are only secure if they are not mounted.
There are a few negative sides, but usually they make up for the positive, i.e. if you really need the security then of course this is the way to go. Also remember to secure the other aspects of the machine, like physical access (including fire/theft), software protection (anti malware and virus) and network protection (firewalls, etc.)
My concern with encrypting an entire disk would be fault tolerance. If a sector goes bad on a non-encrypted drive, you might lose a file. If it goes bad on an encrypted drive, do you risk losing more data or even the entire drive?
Of course, one could say that's why you make backups. But presumably the backups would also be using encryption. Therefore, they would be susceptible to the same effect. If there is a greater chance of total data loss on each device, the chance of multiple device failures leading to unrecoverable data also increases.
My workplace recently mandated that all laptops/portable media be encrypted. The impact to the system cpu usage isn't that significant to be honest, except when attempting to access, say, USB drives.
What's more important is the reliability of the disk itself.
As everyone knows, drivers shipped with laptops tend to be the first casualties of boot-sector-loading programs, like disk encryption and certain virus scanners.
Guess what happens when your encrypted disk can't be booted? You can't boot under a windows/emergency restore disk, because your partition is not readable. You can't boot off anything other than the hard drive. Guess what happens if the corruption doesn't allow you to run the encryption app's boot loader? Only solution is to format the disk.
Some of us who have been hit by this already have gone through the trouble of ensuring that any data we want to keep is stored on a shared drive, and that all work is done in a VM, which is occasionally uploaded to the shared drive as well. Since any given windows or driver-affecting update could kill our machine at any minute and make it entirely unrestorable, that's what's required.
So in essence, we're switching back to storing the media on a non-encrypted device because the loss of the data is more important than the security of the data.
This reminds me of the policies surrounding passwords I've seen at many companies; limiting the set of choices by making password creation requirements, and forcing them to change so often that people end up writing them down and leaving them on their desk. Defeats much of the purpose of having them in the first place.
In the time you spent writing this post to Slashdot, you could have written a friendly letter to your IT department stating that you want some machines to not use this encryption, because these machines need maximum performance and anyway do not store any kind of personal information.
Every expression is true, for a given value of 'true'
The only protection that Full Disk Encryption gives is if someone physically gets their hands on the machine that they can not boot the machine and read its contents. This make perfect sense for laptops but makes little sense for any pertinently fixed location workstations. A laptop will physically leave the premises so it leaves itself open to theft, but a workstation (assuming you have some decent form of physical security) is much less likely to need this protection. Once a workstation is booted and the disk drive unlocked digitally then any hacker that gets a foothold on the system would then have access to it, so all that overhead of full disk encryption does no good unless the encryption is done per-user-session. When you need assess to the data you authenticate and start decrypting then, and keep it encrypted across the network. Yes, that data that you speak of should be encrypted, but you must encrypt it at the correct level to actually increase its security rather than just slowing down the machine. Anything short of that level of control and you are just fooling yourself into thinking you have protected the data. Fool-Disk-Encryption is not always the answer.
The submitter is in a research institute. Some labs in that institute have patient data, and therefore require significant security like disk encryption.
His lab works with a protozoa, and has massive computational requirements. There will never be any patient data near his lab, because the people who work with patients are in a different lab (think different department in business). They do not need disk encryption.
You say Truecrypt has "1% overhead", PGP presumably has some other "% overhead." The submitter is asking what the details of that overhead for PGP, truecrypt etc are. Whats the CPU usage, memory usage? Are disk performance penalties constant, or are they dependent on average file size, number of files, format of those files, etc etc etc. "1% overhead" may hide whopping huge performance penalties for specialist users.
I have serious doubt we even need hardware RAID anymore with current CPU speeds.
At some point in time I believed the same thing. I did a test a few years ago to see if it's still worth it to bother with hardware RAID and configured an system with linux and software RAID.
This was for a fileserver in a high performance cluster so speed mattered. I don't have the exact figures here right now, but from what I remember two years ago the software RAID solution was between 7 and 15% slower. Once you start hitting the performance limit your processes hit I/O wait and your performance goes down. When I added LVM to that back then performance got shot to hell.
Now, it's not as bad as it seems, you still get decent performance (especially considering that your setup suddenly costs a lot less and can be done on commodity hardware), and with a fair bit of tinkering with blockdev and your read-ahead buffer (provided you have enough RAM, and your usage fits that particular pattern) you can still get some very nice performance.
The reason that we went with hardware RAID in the end was because hardware RAID isn't all that expensive, and the performance gains were noticeable especially on systems that have to run 24/7 at maximum throughput.
Again, for consumer systems and services where performance isn't a primary concern software RAID is an attractive option, especially if you're on a budget.
As for overhead with encryption: it would make a nice experiment but I think 1% overhead is very optimistic especially on a busy system. The only way to be sure is to compare your performance now to the performance when you encrypt the entire disk. The only time I tested truecrypt I got a throughput of 80MByte/s, while unencrypted I got 120MByte/s, and it's been a while since I tested this. Those truecrypt tests weren't finetuned either, it was basicly a test to see if it was easy to implement.
Anything I mention here has to be taken with a grain of salt since a lot of time has passed and a lot has changed since those tests.
If policy dictates that you have to setup X, the best way to become an exception to this policy is to prove that that policy is detrimental to your project and might end up costing a lot of money. Policy doesn't care about performance, but it cares greatly about money and lost time. Do your tests, do the math, add a pricetag and talk with your manager.
I don't understand people who think that if they encrypt something it automatically becomes secure. For that data to be of any use to someone it will need to be decrypted and relevant people given access, so that destroys the notion of defacto encryption for security right there.
Encryption assumes that bad people are going to get access to your data whatever happens, and if you are using whole disk encryption then you really need to be seriously asking yourself who has physical access to your disks and where your data is located. That needs to be sorted out first, and once it is with data held centrally, I doubt whether disk encryption will be needed. You will probably need some form of encryption between the data and the remote users though. Using full disk encryption gives you something else to go wrong, is a variable in performance impairment you probably can't account, is something else to support for and will almost certainly be unnecessary once you've taken other steps first.
If you're keeping confidential patient information where it would be a Bad Thing(tm) if it ever got mislaid (even if it is encrypted, you don't want a computer with stuff on it lost I assume), in the name of all that is holy, please centralise your data and vet access. Stop people from passing around Excel spreadsheets of data, regardless of when and how it is encrypted.
I really am aghast as to how stupid people are about how and where their data needs to be protected. PGP is the wrong solution here, if you can call it a solution.
I'm with Smertrios on this one.. IT policy is just that.. a corporate policy. It's not subject to end-user interpretation, it's a definition of how IT resources are to be deployed and utilized. The written policy itself is what gives the company the "teeth" to discipline employees who choose to make their own interpretations and NOT comply.
Now back on topic: Whole disk encryption? For removable / transportable media, ABSOLUTELY! For enterprise data backups, ABSOLUTLEY! For live data on active servers, meh.. not as critical. If your data center employs appropriate physical, network and host security, your data is reasonably safe. If someone compromises your network -> system security, they've got your data.. encrypted or not. It's wonderful that your IT department has the desire to achieve the highest level of security possible, but there is always a balance that needs to be struck between the holy grail of ultimate security and the ability to do business. The OP needs to help everyone find that balance. A good place to start would be his local neighborhood HIPAA expert to make sure that no "business needs" prevent the company from maintaining regulatory compliance. Once the specific requirements for his continues compliance have been identified, then anything beyond that becomes somewhat negotiable.
chown -R us
"Marketing is not a science even if its an Open Source project"
The submitter is in a research institute. Some labs in that institute have patient data, and therefore require significant security like disk encryption.
Repeat after me: "The first line of security is physical."
If the servers are locked in a room with limited access (like, oh, say, 95+% of servers in the corporate world), then the probably not.
Data security is about securing the data using reasonable compensating controls. If no one can get to the disks, and those who can comprise a limited list of, say, trusted sysadmins, then it doesn't matter whether they're encrypted or not.
Requirements, if properly written, never specify implementation details -- the means. They only specify what is needed. How that is achieved is irrelevant so long as it the requirement is achieved completely.
So other than for devices that are not in access-controlled environment (like laptops or, in some cases, workstations), the need for whole disk encryption at most places is nil.
My blog
Parent is on the right track, imo. Submitter should work with the IT dept to assess the impact of this.
Setup two machines running the same processing task that is actual work that he does, one with encryption and one without. Compare the difference in processing. If the performance loss is acceptable, all done. If it's not acceptable, submitter needs to start agitating now that this will seriously hamper his/her ability to do the job, and push IT to come up with a different solution.
A previous employer rolled this out, and after my work productivity got killed, i found their assessment consisted of two guys opening MS Word, making some edits, saving, and exiting word.
RTFA FTW!!!
The Submitter him/herself doesn't work with sensitive info, just other dept's. IT is enforcing an overly broad solution on everyone, with considering the downside. I agree with you that sensitive data needs to be secured, but rolling out disk encryption to everyone in a company when a subset of everyone is dealing with sensitive info is maybe overkill, and the impacts to the primary activity of other depts needs to at least be quantified and considered.