How Cornell Plans To Purge Campus Computers of Personal Data
and so forth writes "Cornell lost a laptop last year with SSNs. Now, they've mandated scanning every computer at the University for the following items: social security numbers; credit card numbers; driver's license numbers; bank account numbers; and protected health information, as defined by HIPAA. The main tools are Identityfinder (commercial software for Windows and Mac), spider (Cornell software for Windows from 2008) and Find_SSN (python script from Virginia Tech). The effort raises both technical questions (false positives, anyone?) and practical issues (should I trust closed source software to do this?). Have other Universities succeeded at removing confidential data? Success, here, should probably be gauged in terms of diminished legal liability after the attempted clean up has been completed." Note: this program affects the computers of university employees and offices, rather than students' personal machines.
Get that data out of there! If it isn't your system and the data shouldn't be there, no problem scanning for the bad stuff.
After logging off, revert to the last backup. If there's no data on the computer, there's no personal data on the computer. Anything you need saved goes on removable storage.
Give me Classic Slashdot or give me death!
Like any hard disk, it soon fills to near capacity. Sort of like the only elevator in a 4-story housing project when the plumbing is out.
Does this include professors?
I know a lot of scientists who would be quite annoyed if the people from the IT department (who are clueless policy-obsessed wankers at my institution) came in and wanted to search through a bunch of simulation results and LaTeX files looking for SSN's.
We'll get on that, just as soon as our Y2K-bug vulnerability scan is done running.
I'm 100% for this. Personal computers account for very little in data losses. It's these "work" machines that account for the majority of the major information losses around the world.
As long as people are dumb / lazy enough to keep documents in the clear on their machines there will be losses.
I would also go as far as to make certain quantities of types information on a machine illegal as well. For example: 1,000 SSN's, stored on a portable data device un-encrypted is a fine of $10,000. 100,000 SSN's stored on a portable data device un-encrypted is jail time.
a) too fucking bad.
b) Sign this waver that says you are legally responsible if your repository of data were to contain information such as SSN/Credit Card etc.
I don't get the premise of the article. Scanning for credit card data and SSN is quite easy and simple. It's no more intrusive than a virus scan. Being opened, or closed source doesn't make any bloody difference either.
Intrusion detection systems should also be running and scanning for data that conforms with SSN or creditcard formats.
It sounds like they are looking to catch accidental leaks.
I would like to know if they have examined their policies to reduce over-collection of unnecessary data.
If they never collect it in the first place, then they never have to worry about losing control of it later on.
When information is power, privacy is freedom.
Does this include professors?
I know a lot of scientists who would be quite annoyed if the people from the IT department (who are clueless policy-obsessed wankers at my institution) came in and wanted to search through a bunch of simulation results and LaTeX files looking for SSN's.
As someone who has worked in an academic research group, I can attest to that. If such a program were instituted at my university, myself and others in our group would probably be less than forthcoming about the number and location of computers in our group. We certainly wouldn't relish the idea of giving folks from the IT department root access to all our Unix/Linux boxes which they would probably need to perform the kind of scan they're trying to perform.
I'm guessing, however, that this measure applies to administrative computers and not academic/research. That would only make sense.
If I can be modded down for being a troll, can I be modded up for being an orc, or a balrog?
SSN numbers are not strictly structured like credit card numbers. this results in false positives.. There are things you can do to narrow this down, such as no all 0 values for any of the three sets, and the first set goes up to something like 760 as its highest ... but that's it. They could have spaces, dashes, nothing.
http://www.langston.com/Fun_People/1994/1994AXP.html
Excerpt:
My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
Have they tried DBAN?
http://alternatives.rzero.com/
DD is open source though, and it's hard to beat erasure-by-thermite for entertainment value.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
encrypting the hard drive!
Are they employees? Do they conduct "university business"?
All employees must acknowledge their custodial responsibility for the university information on the computer(s) and associated storage they use in the conduct of university business, whether university property or personally owned. This includes:
I have, in the past, used my desktop system to order things from suppliers. That is clearly "doing university business". Sometimes I save mail messages on my file servers. I sometimes plug a USB stick into one of them. I have about two dozen USB sticks.
In an age of always-connected, treating computers as "smart terminals" with no long-term local storage save an encrypted self-destructing-on-wrong-password cache can be very useful.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
To Rice U.: http://www.media.rice.edu/media/NewsBot.asp?MODE=VIEW&ID=14734
Seriously. Has no one ever heard of encryption? Or just not allowing people to copy personal data onto computers/media not behind at least 1 locked door?
No trespassing. Violators will be shot. Survivors will be shot again.
Constantly scanning every machine for sensitive data is too difficult to be effective.
Simply encrypt active machines, and use secure erase/destruction policies for retired hard drives.
Cornell lost a laptop last year with SSNs. Now, they've mandated scanning every computer
Are they going to at least FUCKING ENCRYPT the laptops now?
Then again, the more SSNs leak, the more likely people are to get pissed off enough to convince the Govt/Banks/etc not to use the SSN as both a username AND password.
And a) is the reason my department does not trust IT cowboys with any of our data. This is data that cost actual money to generate, not some shit we downloaded off BitTorrent for fun. I hope you get fired.
What's bad is when a university exposes one's SSN when simply logging into one's e-mail. I mean, the way e-mail is accessed, you can see one's entire SSN on one of the webpages. So, if someone somehow gets your university's e-mail address password, they have your SSN too.
Ohio State relies on their institutional data policy and Disclosure or Exposure of Personal Information policy. Essentially, any protected information has to be kept on encrypted devices. That worked fairly well, except once they had all their computers encrypted they quit paying the license fees to PGP. They didn't know the software, which they thought was only pre-boot authentication, phoned home and had a DRM time-bomb in it to automatically drop everything Windows was doing, and spend a couple hours decrypting the whole drive after a certain date if the subscription wasn't renewed. I'd be pretty weary of trusting that kind of task to proprietary software, especially if it requires a subscription like ours did. Posted AC for obvious reasons. If it's closed source, you never know what kind of trick the vendor might be able to pull on you.
Whole disk encryption.
"Nuke them from orbit, it's the only way to be sure"
I work at a university, I generally agree with your assessment. The vast majority of academic types get uncomfortable with any kind of monitoring. They do seem to accept that IT has admin rights on most things. What's great is that they refuse to accept any kind of content filtering on the campus network connection. I've also heard of professors having their connections shutdown for excessive bandwidth use who raised hell because it interfered with their academic freedom. I remember one story about a professor who got shutdown while streaming a video to his class, apparently that is a very good way to piss the entire academic division of the college off.
We did this where I work recently, small-ish private university, lots of science, a hospital, etc. All the faculty and staff had to run IDF. The tech guys came in and installed it and showed everyone how to run it but weren't allowed to see it being run. The person was required to run it and sort through the results themselves. All of my department ran it fine, no problems, no complaints, other than spending time sorting results. It really wasn't that big a deal.
http://www.newsobserver.com/2010/10/14/739551/unc-cancer-scientist-appeals-her.html.
All I have to do now is infect the (probably windows-based) servers that host the scanning software and scan the memory for patterns resembling SSN#'s, ets. and make off with potentially an entire university's personal information? I say memory, cause I know no one would be dumb enough to search for that sort of sensitive information and then actually just log it into a centralized location for no reason. Right? Right?
The eternal struggle of good vs. evil begins within one's self.
And a) is the reason my department does not trust IT cowboys with any of our data. This is data that cost actual money to generate, not some shit we downloaded off BitTorrent for fun. I hope you get fired.
Well are you an arrogant and self-important little bugger. The fact is that improperly retaining and losing privacy act data costs money and reputation too (just ask the Veterans Administration). Potentially a lot more than some professors grading data where he stupidly tracks students by their full soc number. Or the sociology researcher keeping a huge database of personal info on their test subjects. The mandate for this action did not originate with the IT folks, but they were tasked to implement the policy. Stop being a little prick and try to understand the bigger picture.
Besides the article didn't say it was going to delete the data. It said "cleanup" which could be anything from a script that pops up when it detects questionable data, or even maybe it just moves it off of theft-prone laptops and desktops onto a central file server.
Many institutions are going the route of encryption. Hard drives are encrypted, and anything stored onto removeable media gets encrypted. A pain in the ass to be sure, but it does allow management to claim that no data was compromise if a laptop disappears.
As a consultant in the data warehousing & financial services industry, I have to deal with data security on a daily basis and scanning is not the solution. You'll never catch everything 100% of the time. Nor do scanners typically scan in real time and typically cant as the data needs to reside on the local disk till its processed. What should be happening is that certain data sets should be classified at defined security levels. Depending on the security level, users who require access should be provided with the the appropriate "Controls" to secure the data from unforseen events. "Controls" mostly meaning encryption which usually equates to some type of whole disk encryption. There's plenty of software packages out there that provide anything from single laptop installations to controlling enterprise wide deployments with very acceptable price points (including free). I've worked with several of these and they hardly add any more complexity to the end user's experience or performance loss. For those environments that are sensitive to performance and cant have a layer of software encryption, "Controls" are usually physically constructed which requires secured data centers and such and one should ask why it would need to be on a laptop in the first place.
Although it is good to make sure that any computer does not have any unnecessary personal/private data, and also good to have searching software that might help locate some or most of it. It is unrealistic to except to be able to insure that such data will be kept off all computers, especially when there might be some situations where there is a legitimate need to have access to such data offline.
The best solution is to use whole disk encryption with the free opensource TrueCrypt software.
Although it is a shame that TrueCrypt does not support whole disk encryption on the Mac yet. At least there are some less trust-worthy closed options like PGP Whole Disk Encryption, which would be better than nothing.
(should I trust closed source software to do this?)
Cornell is a large enough entity to request an audit of the software used, including a source audit. If they want to say as a condition of using the contract, that they get to do so, it would be an acceptable term.
Whether or not they are qualified to conduct such an audit, well, that's another story.
But if they want to do it, they can...or they'll take their money elsewhere, and well, nobody's going to want that.
I do think they should conduct such an audit though, just like they should if the software were open source.
And that "I know better" attitude is precisely why the university is going to be putting this program in place. To say nothing of the reputation damage, HIPAA violations ain't cheap. So your "this data cost money" argument falls completely flat when doing nothing can cost money as well.
"16MB (fuck off, MiB fascists)" - The Mighty Buzzard
Are you a PHD?
"Why the $#%$ is the 834,734,123,233rd digit of pi wrong in my code?"
Then 6 weeks of debugging by a grad student...
Does having a witty signature really indicate normality?
Yes it does include professors. As a Ph.D. doing medically related research at a university, I've got some PHI data I need to include in some studies. It's encrypted and stored on secured servers. That's the way it's supposed to be. All the scanning software does is make sure you have it encrypted and not just lying around. THAT'S A GOOD THING.
Other reasons professors who aren't working with medical research need to do it.. Some of our departments used to use student SSNs for a lot of things. Data tends to just accumulate over the years and most of them didn't think a thing of it. Then some desktops/laptops with that data on it got stolen. Suddenly the University found it had to send out letters to LOTS of alumni (you know, those folks they rely on for donations) telling them that their SSNs and other personal data was stolen, and they were now at risk for identity theft. Lots of those alum then said they would never donate to the university again till it was ensured that couldn't happen to them or anyone else so easily again. Hence the big push this year to secure data that should have been secured years ago.
If you worked at our university and decided to 'be less than forthcoming about the number and location of computers in your group', you would soon find yourself looking for another university to employ you.
...an alternate solution?
1) encrypt the drive
2) don't lose the damn machine
For the people who blast me on #2 ... that's what #1 is for.
The OP says that a practical issue is whether one should trust closed source software to do this? Because, of course, being closed source should implicitly invoke gloomy music, dark clouds and cause people to break out in a cold sweat? Seriously, enough with this bullc*** already... There's nothing inherently wrong with running closed source software, nor is a given piece of software magically better by virtue of being open-source, nor are open-source developers somehow better than those who develop closed-source software. There's legitimate arguments to be made that open-source has advantages. That open-source is, somehow, more trustworthy, isn't one such argument. And it's high time we stopped peddling it as one, or accepting it as one.
I have 20 years of IT experience, including bringing a companies into PCI compliance after a breach.
Scanning data, identifying what it contains and locking it down are not difficult tasks. 99% of the data scanned is unlikely to trip false positives and is a complete non-issue. The remaining data can be quickly categorized as likely, or unlikely to be relevant with an appropriate perusal. The remaining data, actual non-compliant data, will consume the most time in dealing with properly.
The first step is to identify who has data that actually meets the qualifications and which has to be dealt with. If you simply ask you -WILL- be lied to.
Graduated in four years, never studied once, and was drunk all the time!
Just thought I'd mention that...
b) Sign this waver that says you are legally responsible if your repository of data were to contain information such as SSN/Credit Card etc.
Unless he then shoves the waiver up the manager of the IT department's nose, that waiver won't do anything, the IT department will refer him to a secretary who will refer him to some policy and the comittee for something or other who will meet once a year and won't discuss it with him. Universities are usually more bureaucratic and inflexible than your local DMV.
Which is why Cornell will try to scan every computer on campus, not just those ones which are likely to have student or employee information on them. Got an apple IIe running a very old but still functional instrument? It may be more convinient to just lie to the IT department. Some are understanding, whereas others would insist you get a new computer. If you would have to spend $10k to replace the equipment, that's not really their department.
so what you are saying is that i need to be storing socials as integers rather than strings, so they don't look like socials?
Snowden and Manning are heroes.
why wouldn't they just wipe and rebuild everything?
Maybe I'm being dense.
If they are so worried about it then just wipe the drives. You can purchase for a reasonable amount of money a utility that will wipe a drive to DOD spec and it couldn't possibly any longer to do that than to run 3 or 4 different scans from various companies.
You cannot tell SSN from any number (pretty much; there are 2 or 3 prefixes (e.g. 666) that are not used) from 001000000 to about 778000000 since there is not so much as a parity bit in there. Even if you look for nnn-nn-nnnn, you get some false positives. I have seen scripts that attempt this, but they wind up flooding you with other strings (zip+4 strings being common cases). If you find labels like "SSN" near the patterns you do somewhat better but looking for that may miss lots of cases.
Credit/debit card numbers and bank routing numbers have construction rules that allow one to tell they follow the rules, but it works a LOT better if you limit yourself to actual
BIN numbers (1st 6 digits) that some bank uses. Many are unused and unassigned.
You can look for, say, the most common first and last names and state abbreviations near zip codes and get some likelihood that there is a name or address there, but these things
give candidates. Nobody should be punished for that. Besides, that kind of thing is present in much innocent correspondence. (Same goes for real cc numbers if in ones or twos. Correspondence
about problems with an account typically contains the account number, sometimes has to for legal reasons. (Affidavits.))
Finding bank account numbers will in practice have to look for routing numbers (1st 9 digits) rather than the last, since the last part varies widely and does not have fixed
format.
The biggest problem is that there are formats (.ppt for example) that contain lots of long numeric strings. If they are scanned it is very easy to find "interesting" patterns in plenty
in them, even though you can see looking at the actual data that this is just chance matching. Many "cc" strings will happen to get the Luhn check digit right.
If you are willing to contact the Social Security Admin to see if a 9 digit string is a real SSN, that gets slow and costly, and the number space STILL overlaps zip+4 and
many other pieces of strings. SSN is hopeless (and should never be an authenticator anyway; at best it might be treated as an identifier only). Card numbers, if recognized
only by being numeric and having check digit, will still give many false positives.
Also anything trying to parse MS Office formats may well miss data that is hidden in the file but that doesn't show up from the utility. Word and Excel are notorious about not
really expunging information that was "deleted", yet the data can still be there big as life when you examine the file as a byte string. Other Office formats have similar behavior
at times.
Thus while scans for this kind of thing can be used to ask questions, they should not be taken as definitive that something "contraband" is actually present.
Well, a closed source solution could intermittently save a convenient credit card, and send it over the network to the author.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
I just facepalmed at this, and then again when I realized it is probably happening. Hurray for obscurity.
I used one of these tools that once that found some false positives in its own source when I ran it. Not kidding.
Should you trust closed source software to do this scan?
Should you trust the bank managing your transactions?
Should you trust closed source software in medical equipment?
Should you trust SAP to manage your financial transactions?
Should you trust a Windows computer for anything more important than your gmail password?
Should you trust Google Chrome when logging into your netbanking?
You know what? I think on the grand scheme of things trusting a piece of closed source software specifically designed to search for information made by a company which would literally be sued into oblivion if they did what the article was hinting at, ranks pretty damn low on the list of things I worry about.
As someone who works rather intimately with the department-level IT guys at a university, this would be a disaster. They don't have time to install automatic encryption software on everyone's disk, and *we* don't have time to wait on the computer to run crypto on it. Yes, disk encryption software is pretty fast -- but it's slower than the RAID we're pulling the data from, and the CPU is busy doing other things.
If you can't trust your professors to follow *reasonable* instructions about the protection of personal data without the nonsense of installing software on everyone's boxes, then you shouldn't trust them to work for you.
772 at its highest for now. Starting in June of 2011 the SSA will issue "random" numbers up through 899 for the first three digits.
Luckily the physics department I work for has our own IT guy, who knows the students and faculty and can cut through the bullshit.
Then the correct policy is "Don't haphazardly store personal data on machines without considering what you are doing". There is no reason to barge into Dr. Smith's office, who's madly creating his slides for the conference next week while trying to babysit a supercomputer at Berkeley while fending off emails from his students, and insist in a very bureaucratic tone that you have to scan his workstation, the RAID, his other computer, his student's computer, and the two computers used to monitor various instruments (which the other students are taking data on) for SSN's.
too easy for someone to get the source and "tweak" it before compiling. Most problems happen on the "inside" - a rogue sys-admin could do anything with open source - no-one is going to be able to prove the binary doesn't match the source or who changed it
Something like a Google Search Appliance would have probably taken care of this with much more of a guarantee, though I'm sure the boys at Cornell looked into that.
Why bother? First of all, if a laptop is lost, how would you scan it for anything. Regardless, the best and just about only way to be certain that there isn't a problem is to remove the hard drive and sand it down.
so what you are saying is that i need to be storing socials as integers rather than strings, so they don't look like socials?
No it means you need to be storing the data in an encrypted file/folder. Believe it or not, doing it right is sometimes easier than trying to hide what is arguably illegal activity.
My employer (you'll see who in a minute, which is why this is anonymous...) has this exact same policy, informally in 2008 and formalized in 2009.
from http://rules-saps.tamu.edu/PDFs/29.01.99.M1.29.pdf :
"4.3 Where feasible, all data files are to be scanned on an annual basis to determine if those files contain SSNs. If SSNs are found or known to be present in a file, they are to be removed or appropriate risk mitigation measures applied (e.g., encryption) if their continued presence is required. The results of the file scanning and risk mitigation measures taken shall be reported during the annual ISAAC process. All SSNs that are to be retained and stored are to be reported to and approved by the Vice President and Associate Provost for Information Technology. The reporting and approval process will be in the manner indicated in the ISAAC process. Specialized information systems that cannot be scanned and are not capable of storing SSNs shall also be documented accordingly as part of the ISAAC process."
We use Identity Finder and Spider to scan. I can honestly say I'm impressed with the accuracy of Identity Finder, and it's really easy to roll out via Group Policy in an Active Directory environment. It's also pretty easy for users to scan their own personal drive space (both local profile and network shares), and for the admin to see everything in a unified console. Spider, on the other hand, for *nix and Mac systems is a pain in the rear, requiring customized regexes to prevent false positives.
As far as those people saying they'll just lie about it? If you have a computer on the network here, there is a record of it. Each device with an IP is assigned an owner and that owner, or their supervisor depending on the department, is responsible for complying with all university policies regarding IT services and data security. There is an annual IT security audit that is required for every system on the network (Texas Administrative Code section 202 plays a large part in that), and the person responsible for filling it out *and* their supervisor are required to sign it verifying that they comply with all the policies and procedures in effect. In other words, it's state law. There is no "academic freedoms" being violated in requiring a scan for confidential or privileged information which the user is not supposed to have stored on their computer in the first place.
There's a pretty easy solution for people who lie about systems or try to hide things. There is much less security risk if the computers in question no longer have network connectivity.
Then the correct policy is "Don't haphazardly store personal data on machines without considering what you are doing". There is no reason to barge into Dr. Smith's office, who's madly creating his slides for the conference next week while trying to babysit a supercomputer at Berkeley while fending off emails from his students, and insist in a very bureaucratic tone that you have to scan his workstation, the RAID, his other computer, his student's computer, and the two computers used to monitor various instruments (which the other students are taking data on) for SSN's.
Unfortunately, Dr Smith is taking his laptop to the conference. He's much too busy to go on travel without taking all of his data with him on the laptop, such as his students grading info (SSNs) or info on the other proprietary projects he's working on. He he's too important to worry about such trivialities such as data protection policies issued by those idiots on the Board of Directors. After all drive encryption slows things down too much he hears, but in truth he doesn't know how to set it up. Of course his laptop gets stolen and now the University has to report that data was compromised. Suddenly Dr Smith is no longer an asset to the university but rather a liability.
Sorry, but anyone who has worked in IT or even law enforcement knows damn well that users will ignore written policies unless there is some level of monitoring and enforcement. Just scroll up a bit and you'll see examples of those guys posting stuff like "just store the ssn as an integer so they scripts don't find it".
Don't be an idiot. They'll scan and report. You'd get an email listing files that contain numbers that look like SSNs, credit card info etc. As someone who recently ran such a scan (using Spider) on corporate file servers I can confirm that the effort of following up on the false positives would have killed me. So I discussed the process with the users first - and agreed that they'll investigate the scan results. That way they manage their own data.
main campus IT can burn in hell. Bunch of incompetent power hungry monkeys. Always instituting some stupid rule. Ugh I loathe any new restrictions. Now departmental IT. Those guys are cool.
The only difficulty with this attitude is that it's only going to work for the Russian and Dance Departments. If you try it in Physics or Chemistry or Engineering, where a generic professor can be responsible for $1 to $2 million a year in no strings attached research overhead that goes straight into the university's hungry coffers, you will be quickly educated in the different levels of deference applied to cost centers (like IT) and profit centers (like research departments).
I might add that it's possible a place as prestigious in these fields as Cornell might be able to get away with it, because they think, not unreasonably, that for any professor pissed enough to start looking at moving they can find 10 eager replacements, but few universities further down the academic pecking order will be able to do the same.
Is that while they can have sensitive data, they need to protect it. That does often mean working with the "meal 'ole IT department." I highly doubt Cornell will say "Nobody can have any private data, at all, ever." Not only would it is dumb to preclude them from getting some grants, it wouldn't work since their student records, payroll, etc will have said info. What they are probably doing is making sure that if you do have it, you secure it.
This is a real problem with some researchers. They just wanna do their own thing, they can't be bothered with things like patching their computers, having good passwords, and encryption. That all works fine... Until it doesn't, until something like this happens.
We have that problem all the time in our department. Not patient files getting exposed but researchers that don't want to play ball. They want all their grad students to have administrative access to all systems. They don't want to run a virus scanner because it is "slow." They don't want to patch their system because they can't have them "rebooting all the time." More or less they are just being lazy and arrogant and thus refusing to take any security precautions. This does, of course, lead to systems getting virused. Now given our line of work (engineering) this has thus far not been a major issue. However if there was personal data on a system, it would be.
So that is the kind of thing that schools are getting proactive about. They are saying "If you have sensitive data on your system you WILL meet some standards, you can't just do whatever you want."
This kind of shit has to happen. While an academic environment demands freedoms that a corporate environment doesn't have, that doesn't mean it can be the wild west, not anymore. The Internet is a dangerous place, so you have to take precautions with computers connected to it. Not a ton, nothing real arduous, you just have to be careful. When it comes to sensitive data, additional precautions need to be taken.
I really doubt they would bother the professors with this. I am a student at Cornell and I really doubt anyone in the physics group I work for would want to take the time to run a scan. The article didn't mention faculty here don't have access to much personal data, my major advisor can not even access my grades, so there isn't much to worry about from them. The thing I worry about is another staff members downloading the data to their personal computer on an unencrypted drive then loosing it. All of our data is stored on a server and managed through PeopleSoft (yuck) so there is no reason for any of them to have a local copy of anything. The only users that use the data that heavily have extremely fast connections to the servers so having to do everything remotely doesn't really add that much latency.
Our IT department is actually pretty good which has always made me wonder why it's taken so long to implement something like this. Scanning every university computer across campus is a large task but they have had the infrastructure for doing that for a long time. They do stuff like this regularly to several thousand university computers when deploying updates across several different platforms (win7, XP, OSX, and Ubuntu mainly).
re: PeopleSoft:
I'm at University of Arizona. We switched to PeopleSoft this summer, at the cost of $60 million or so (which we can't afford at all -- maybe you have read about our crazy governor?).
Semester starts, grad students don't get paid (sometimes) for a month, grad students get bills for tuition we're not supposed to and get charged late fees, secretaries can't do their jobs helping students ... TOTAL NIGHTMARE, and everyone blames it on this PeopleSoft thing.
Look, comming from Penn Sate University. They are already 5 years behind. Faculty do not care who is working as IT, I have had to force people like Dr Bader from PSU Hazleton to abide by computer policy for SSN. Did he care? NO!!!. Faculty ignore it until the DOD claims it at a customs inspection and the only thing people hear is "woops, faculty member lost laptop buy new one now". To many family and students put to much power into faculty since staff cannot control anyone with tenure. Tenure track still rules at a University so I wouldn't also be surprised if staff and faculty that are not tenured seem to be the biggest offenders and the dumbest about it. Why is it parents trust tenured faculty and do not question their ability to understand confidential information? O wait their "ANY" University does not tell them these exceptions that tenured faculty have.
Why does anyone ever need to store the SSN's of students, anyway?
Being opened, or closed source doesn't make any bloody difference either.
You're obviously new here. Closed source software written by trained professionals at trusted security corporations under government scrutiny is evil, and considered insecure.
Bloated open source software, written by 52 basement dwellers who flunked out of college, and 2 Iranian intelligence officers posing as basement dwellers who flunked out of college, is considered secure.
That's nothing. California spent $650 million switching the CSU system over to PeopleSoft.
a) Nowadays there is no reason that information (especially in something like a university network) is stored locally.
b) since windows, osx and linux have out-of-the box encryption there is no need to subscribe to any software beyond the normal distribution with support.
c) the most tasks which require ssn or any other information are so standardized that a thin client will do the job. A thin client can also be a laptop with 3g, no
infromation to be stored locally.
d) Use VPNs to avoid the rest of the trouble.
e) fire employees when you figure out they use USB sticks to take work home, instead of taking the approved laptop home and work via the network. And sue them for cleaning up the mess.
If they use some kind of domain administrator passwords for this software to run guess what, the domain administrator credentials could be cached in every single computer. If a local administrator was on the machine and wanted to compromise, he/she could run hash stealing software when this "process" runs and compromise the domain. The least possible privileges for a process that does this data searching will be difficult to determine. A trade-off will have to be done between accessibility of files and a lesser privileged account to be used for an exercise like this on Windows.
You want people to trust academic staff? The only people that would do so willingly are the ones that haven't seen what academics are like. Care for an example?
In our computer science department we have multi-function devices that allow digital scans onto a publicly accessible hard drive that resides inside the device. I've seen scans of credit card details, passports etc left for anybody to view for weeks after they were scanned. Now if you can't even trust academics to look after their own personal data (in a field that ought to know better) why on earth would you hope they'll do better with somebody else's? I hate to imagine what other departments hold on theirs.
Of course, if it's a encrypted partition with a freshly generated key each time you boot that's most of the way there - it requires a very competent attacker to extract the key from powered down RAM.
Potentially a lot more than some professors grading data where he stupidly tracks students by their full soc number.
How would a professor get the students' SSNs in the first place? The university should have no need for SSNs assigned to anyone except employees.
It absolutely includes profs.
It's the people who believe they are above these rules that usually end up spilling personal data.
I've taught at a university. I can tell you right now I would definitely audit profs machines.
And to be honest to bad if they are annoyed. Suck it up as they say.
Above got marked a troll?
It was not. It was an honest opinion.
Firstly, and obviously, but organisations seem to have trouble with it: Don't collect sensitive data unless you really have to. If data is collected for a purpose then it should be held for a purpose with some method.
Secondly, a scanning system, even if it is security theatre, will help enforce the dim-wits who think that 'firstly' doesn't apply to them. If the scanner actually works then it may find some transgressors.
It's actually something I quite liked as an idea - taking a look at Stelios' Easy Internet Cafes in the UK - when the user logs off, the PC becomes unavailable for about 6-7 mins, while it completely restores the harddrive, from a multicast broadcast on the local network.
The storage on the machine is strongly limited - but you're sure that unless someone were to infect the master (difficult) that noone could leave anything on the machine that you don't like.
Combine that with user network storage, and the PCs themselves stay safe, while your data stays 'safe' on the network drive.
so they plan on unzipping every zip zile, decoding every pdf and docx, sfw and every other crap-laden file format 'export to' option? Not to mention parsing email attachments in god-knows-what format (can't trust the mimetype header)? And where is the cpu time coming from? Do you expect users to sit there while their laptop turns into a george foreman grill? I just don't see how any of this is possible given most infrastructures.
boycott slashdot February 10th - 17th check out: altSlashdot.org
Because not so long ago, it was common practice to use a student's SSN as their student ID number. In ~2001 and ~2004, I attended schools which changed their policies on this matter in those years, respectively. For each school, I started with a student ID that was the same digits as my SSN, and when I was graduated, I had a new student ID that was an unrelated string of digits.
Using the SSN as an ID is very convenient. For every incoming person, you have a unique number that they probably already have memorized. From there, it should be no surprise when professors get lists of SSNs on class rosters at the beginning of a semester, and they might store it in one form or another over the course of grading, and similar activities.
I worked in IT for Cornell both as a student and for a short time after graduating (8 years ago). This honestly isn't news.
Hardly any employee computers have this kind of stuff on them to begin with. Most of it is stored on servers and not in a format you can say 'dump to Excel on my laptop.' I did some work with the admissions database as a student and I had to promise in writing in triplicate that I would be very careful with that data Or Else and even so, I never needed to download any of it. If I'd suddenly started sporting the admissions database on my laptop they'd have probably sent in the cops.
The number of people who actually need this stuff on their local system is probably paltry, but they now have to be seen to 'do something' and so subject everybody's computer to this stuff. Cornell IT used to be huge and decentralized and the skill levels were all over the map - Cornell is the only big employer in the area and so the pay sucks (it's better if you're a developer). We had some great people and some really terrible people, probably like most large institutions. I really wonder how many employees actually need this kind of thing in the first place, though - seems like HR and admissions would cover it, and in both places I'm not sure that you'd need to download bulk data to use offline.
Why not just using Full Disk Encryption software like a lot of other companies and organisations? In case of loss or theft of the notebook, you are save. There are various commercial and open source offerings to choose from.
Besides that, they should also consider removable media encryption. USB sticks can be a thread for PII. Remember the problems in the UK 2 years ago.
No one barged into anyone's office at a moment's notice. They notified us months before that this would be coming down, and gave a few more months to install the scanning software. Why is everyone on the other side of the conversation so out of touch with reality?
I also work at Cornell. What some folks may fail to realize is that a University is a large organization and the colleges are allowed quite a bit of freedom in how they operate. I see a lot of posts mentioning encryption, well our dept does encrypt the user storage areas of their hard drives, but not every college/dept does. Perhaps some day it will be mandated, but currently it's not. Still, the Identity Finder really targets the large set of users that pulled a spreadsheet (perhaps many years ago) onto their machine with sensitive info, and probably don't even realize it's there. Is it perfect? Of course not, but it definitely helps remove a lot of low hanging fruit and it also reminds everyone that they need to be responsible with the data on their pc's.
Frist of all, there is no reason that there would be any privacy information on a research lab computer. The only thing that is on my research lab computers is the software and data for the research. The grants that paid for the equipment prevent anything else from being stored on it anyways. Second of all, if professors know the social insurance number of their students then there is something even more wrong with the administration of the university. Students at our university are identified by a student number which is completely independent of any government issued identification number, and only people in administration are permitted to view the private data. I certainly don't have access to SINs of students (Canadian equivalent of SSN).
Atlas stands on the earth and carries the celestial sphere on his shoulders.
Because this is Slashdot?
Why exactly is it the job of IT cowboys (I'll reuse the term from earlier as it is apt) to make sure nobody anywhere on campus does something stupid with a computer?
It's not the job of the lab safety mavens to stop people from eating plutonium.
Seems like you need to stop using SSN's as ID numbers, then, or tell students that they have the option of choosing another student ID number (and that if they use their SSN and Bad Stuff happens then it's their issue).
It depends on the field of research. Medical researchers will often have 'sensitive' (HIPAA in the US) data on their test subjects. My university, like many others, until recently, indexed all students with SSNs, and if I downloaded a roster for use in Excel, I got the SSNs with no option to delete them. That's what really angered me; I didn't want or need the data, but they (the Uni) shoved it down my throat & then threw a fit years later and pushed the cost of fixing the problem down to the departments.
Further, it depends on your definition of student data. I am a professor at a Big10 uni in the US, and our institutional definition of student records (which are declared sensitive by federal law) includes things as inane as emails from a student stating that the grader mis-added their score & they should've gotten an 80 on a quiz instead of a 75. So pretty much every communication to/from a student must be protected.
We're going to be doing this where I work. I believe it does, if the computer is University owned/administered. IIRC the software is not going to autoclean anything, just generate a list, which can then be followed up on by humans. The whole software installation and execution is centrally automated for desktops. Servers will probably need special attention, and as such, if a server cannot be reasonably expected to interact with personally identifiable information, it likely won't be scanned. However large centrally administered servers will probably be scanned, e.g. the campus email server.
What they are looking for is accidental caching of this information by people who have legitimate access to it, it is not a hunt for outright theft. Mostly professors do not have access to this type of information, so unless the machine is fantastically easy to push this software out to, or there is a record of someone who did have such access logging into it, the rest of professors' machines will probably not get touched.
Someone had to do it.
"Scanning for credit card data and SSN is quite easy and simple"
Sure, if you don't mind a false positive rate of 99%, which is what a colleague of mine got when he ran an automated tool on his machine that contained hundreds of GBs of particle physics data. Shocking, but its not hard to find 9 digit numbers (SSNs) or 15-18 digit numbers (CC) when you look at large repositories of quantitative data.
So the problem is, if you are mandated to run such software, and you get 1 million possible #s, what do you do then? If you are a typical clueless IT lackey (which luckily, we have dept-level IT, so ours are quite good & clued-in) he tells you to delete all of the dangerous data, or _prove_ that its not dangerous. (This has happened to a number of fellow profs at my Uni, which is one of the largest in the US).
The reality is that (a) the Uni should've stopped giving access to such info to anybody w/o a strict need-to-know decades ago. But they didn't. Now its everybody else's problem.
Well, I can tell you that at Ohio State University, this is exactly what has happened. Effectively, every single machine that _may_ have ever had 'sensitive' (FERPA or HIPAA or Grant-defined) data on it must be encrypted. If it is lost & not encrypted, then it is the owner's burden of proof to prove that no sensitive data was on the machine; which is only possible if you have a complete & recent backup.
So, it can be done, but it is very expensive (though much cheaper now--BitLocker is really nice on Win7, FileVault on Mac) in terms of software and time. When we had to implement this ('08, I think) it cost our department of 25 faculty about $10,000-$15,000 to implement in software & time.
Actually, what I meant to say was that they did stop. Each school changed its policy while I was a student there; they do not use SSNs as IDs now. The point was that these policies were changed within the last ten years at both schools, which is a short enough time for many faculty members to have the older SSNs on file for one reason or another.
Potentially a lot more than some professors grading data where he stupidly tracks students by their full soc number.
How would a professor get the students' SSNs in the first place? The university should have no need for SSNs assigned to anyone except employees.
Obviously you are younger. It was very common practice for schools, universities, public libraries, health professionals, and even some small businesses to request and use your soc number as your ID. A good deal of this cleanup is to find that old data that probably isn't even being actively used anymore.
Also keep in mind that in some cases the University does need your soc number for doing tax forms and dealing with some govt grants. Obviously the profs don't need it and it shouldn't be your uni library account number, but the bean counters would need it.
You sound like an IT administrator who has never done research.
* When you run computations that take weeks to complete, reboots *need* to be rare events that are negotiated between IT staff and researchers. Otherwise, productivity goes to zero, and then what's the point of having computers or IT staff?
* When you have terabytes of data, scans can be a major time sink. And false positives? If it eats some of my data by accident, that's big trouble.
* When you've spent a few weeks compiling some odd software and getting 17 libraries to work in concert, the last thing you want is some heavy-handed sysadmin messing with your system. Everyone's big fear is that if you touch the system, you might break it in some subtle way. Subtle problems lead to weeks of debugging or wrong answers, and if you get the wrong answer, what's the point of having computers or IT staff?
The sensible approach is that you have to have a sysadmin who works with the researchers and actually believes that his job is to help do some research. And, sure, the researchers have to make some compromises, too. But research is hard enough to do, even with minimal annoyance from outside.
You need a sysadmin who is willing to say "OK, let's figure out how to minimize the risk" instead of some jobsworth who says "Tough sh**, them's the rules, and I don't much care that the rules were designed for the typical desktop system." In my experience, given good IT staff with the right attitude, most researchers are happy to cooperate. IT staff who cause trouble without providing help will get a different reception.
Maybe they left it there because they didn't know how you set up the system? Maybe you think they should take the time to understand your system, and perhaps you don't take the time to understand your users?
Maybe it's wasn't a good idea to set up a system that dumps scan data to a publicly accessible hard disk?
No not so much. I'm not an IT administrator, just an IT support guy, and I work for a very nice, very accommodating boss. I also work with a lot of idiotic researchers. Well ok not a lot, but a few. Most of the researchers are pretty reasonable people, but some aren't. They just want to do their own thing and not be bothered, and then wail and howl when something gets infected and we have the gall not to make it the #1 highest priority to fix (by departmental policy, research systems are 5th priority).
The problem is that they aren't willing to negotiate. We've said that it is perfectly ok to not reboot on patch Tuesday, however you need to apply the patches. We can change things so that you get to choose when. Know what happens? They just never apply them. They always put them off. This isn't just on computation systems, this is on desktops that are used for Word and web surfing. They just "can't be bothered." Also, as a practical matter, you need to write computations that save the state periodically. We lose power to the building sometimes. Sometimes it is for a long time (hours). That's life on the US power grid (you'll recall a story on Slashdot about this not long ago). I've never met the researcher that was willing to buy the massive UPS to back up bigass computation systems. To the extent they have UPSes (most don't) they are little things that'll last 10 minutes. Means if you don't want to lose data, you save your damn state.
There's other options too, of course. In some cases a disconnected computer can work fine, or things like a cluster where only the head node is on the public net, the others are a private network only taking to the head.
However the simple fact of the matter is the net is a nasty place. You have to protect your systems, you have to patch, and you really need a virus scanner if every grad student is going to be an administrator. If not, your shit WILL get infected. Things like sensitive information scans are to make sure that when that happens, it isn't a crisis. If ever a system got owned with that kind of data (hasn't happened yet in our department) it would be hell for the researcher. They'd be lucky to keep their job.
The problem is that researchers don't want to compromise. They want to be able to do anything they please, including given their grad students admin access and letting them download pirated software (and for some reason grad students don't seem to be aware of places that actually check their warez for viruses) and then expect any problems to be repaired in minutes. That is just not how it works, not how it can work. I don't make the rules, I didn't make the Internet dangerous, I'm just telling you the reality of the situation and what needs to happen.
You can bitch all you like about how important your research is, universities are deciding that not having sensitive data being compromised is MORE important. Reason is that your research might bring in a few million in grants, if you are really good. Compromised data could bring in a hundred times that in lawsuits.
there is a product to keep sensitive data from leaking accidentally. its called a security appliance.
Maybe the OP thought someone would read the article....
One implicication by OP is that this is legal posturing by the University.
Lets look at a few of the details (http://www.cit.cornell.edu/datacleanup/requirements/index.cfm)
1. Any employee (staff, faculty and students [meaning student employees]) "falls under the mandate".
2. Each employee is individually responsible:
"All employees must acknowledge their custodial responsibility for the university information on the computer(s)
and associated storage they use in the conduct of university business, whether university property or personally owned."
and
"IT or administrative staff may assist the employee in determining what information is present on his/her computer(s)
and in taking appropriate remedial actions for any data that should not reside there, but this does not replace the
individual’s custodial responsibility."
3. All employees are required to scan using one of the tools mentioned by OP. The scan applies to all parts of machine that
are writeable (internal drives, external drives, mobile devices, flash drives, email, network file
spaceS are all mentioned).
"If you can write to the space, you need to scan it."
4. The requirement to scan equipment EXEMPTS personally owned equipment, mobile devices and copies of email
stored on a server.
5. Encrypted volumes must be decrypted and scanned.
6. Virtual machines must be scanned.
It goes on and the description includes a number of other points about backups, public workstations, etc.
A policy is by its nature legalistic. On the other hand, this data cleanup seems to over reach
in some respects (i.e. why scan encrypted volumes?) and under reach in others (i.e. why NOT require
individuals to scan their personal machines if they might hold university data?).
My personal theory is that in the first case, the University bears the bulk of the responsibility whether or not
the data is encrypted. Should another laptop disappear maybe its important to show that they've tried
to find every scrap of sensitive data. By contrast, in the second case, the individual is the one in hot
water if there is a loss. Not only is it their personal machine but, as far as the University is concerned,
they are the custodian of the data (who made a poor decision to work outside the university environment).
I have no idea if that relieves the University of liability for data loss. But they're clearly worried about
it (http://www.cit.cornell.edu/security/data/consequences.cfm).
It absolutely includes profs.
It's the people who believe they are above these rules that usually end up spilling personal data.
I've taught at a university. I can tell you right now I would definitely audit profs machines.
And to be honest to bad if they are annoyed. Suck it up as they say.
Or another alternative is simply to lock the laptop in a desk drawer and when the IT guy asks if you have a computer in your office, say no.
Seriously though, why would you expect a professor to have credit card numbers or SSNs on a research computer? Would you mind if your home computer were searched to make sure that it isn't storing expunged criminal records from juveniles in Botswana?
If I can be modded down for being a troll, can I be modded up for being an orc, or a balrog?
You make the point for me.
"Seriously though, why would you expect a professor to have credit card numbers or SSNs on a research computer? "
This is exactly the reason for the scan.
My home computer is not a institutional computer. It does not carry a burden of trust. Thus is not under the dual responsibility of institutional governance and private. It's the institutional scan that is being discussed. They, the institution has a responsibility to scan it's equipment.
I of course also carry a moral responsibility to make sure my personal equipment does not contain information that it should not.
So yes I would mind if someone decided to scan my personal equipment. It's not under their authority.
My home computer is not a institutional computer. It does not carry a burden of trust.
If you mean by "institutional" that the computers are the property of the university, then I suppose they are but do they carry a "burden of trust"? I hardly think so.
I think it's relevant to ask what is the probability that these computers will be storing credit card numbers and personal data. I don't see why a physics professor's research computer would. Should the university also examine the contents of the professors pencil sharpener and coffee pot for state secrets? They are after all "institutional property."
If I can be modded down for being a troll, can I be modded up for being an orc, or a balrog?
I've used IdentityFinder for the purpose of attempting to implement university policy. It's got a horribly awkward interface on top of a very weak infrastructure. Its parsing of file formats is rudimentary, and its matching algorithms generate tons of false positives. (It'll tell you that every zip+4 is a social security number, because, hey, nine digits.)
"Spider" is even more primitive, and isn't up to the code standards of the most basic toss-over-the-wall open source projects on Sourceforge or Freshmeat. It's hard to propose basing any serious policy around it with a straight face.
The "Find_SSN" script I haven't looked at, but hopefully it does a decent enough job of that one very simple task.
And let me tell you, even with this junky software, it's clear that there is a problem. Social Security numbers are used in grant applications, in lists of study participants where the subjects get compensation, etc., etc. Professors and grad students *do* have this stuff in their home directories. So, yeah, given the potential expense of data exposure, something needs to be done.
A physics professor is not exclusively involved in physics research. At some point almost every professor will also be and educator. As an educator they will come into contact with personal information that is put into there trust.
There are no exceptions. As coming into contact with personal information is likely for most institutional workers.
-- Note this article was about personal information. Not state secrets. That is a completely different matter when it comes to information handling. Thus the topic of state secrets is out of scope here.
Otherwise, the snarky are going to point out that they have to run it on the university mainframe....and all faculty, employee and student data will be GONE
My university tried this a while back. They had everybody run a program that basically grepped through every ascii file for any 9-digit number it could find, allowing for delimiters in appropriate places.
My machine has about a half a gig of numerical data files. We went along with it until they tried to make us manually inspect (line-by-line, theoretically) several hundred megs of data that apparently flagged as an infringing pattern.
We eventually just dragged out the protocol sheets for the appropriate experiments to show that we hadn't collected any SSNs for the experiments that had flagged data files. But wow, what a hassle.
If the grading info requires SSNs, you have bigger problems.
Most human behaviour can be explained in terms of identity.
I'm one of the folks at Virginia Tech developing Find_SSNs (and also this is my first Slashdot post after lurking for years). We're working on improving it, so I was wondering if anyone had used and had suggestions. Right now we want pdf support, better interface, better ways of dealing with false positives, but in the future we want to be able to deploy the scan remotely.