Slashdot Mirror


Checking Web Content for Sensitive Data?

NetFiber asks: "I work as a security analyst for a large university. We have recently been tasked to scour our network in the hopes of finding and removing sensitive information such as credit card numbers, social security numbers, and such on all publicly available web servers. Our current method of analysis is to archive all the content (which often grows over 100GB) and later parse the data with various utilities and regexes that search for patterns and other pertinent information. So far, this process has proven to be rather cumbersome and time consuming. Does anyone have any experience collecting and sanitizing large amounts of web content? If so, what procedures/utilities do you use to accomplish this?"

3 of 44 comments (clear)

  1. Visa PCI CISP is a good set of practices by plover · · Score: 4, Informative
    As a large merchant that handles Visa card numbers, we have to undergo an annual Visa PCI CISP audit. The questions are pretty thorough, and if you can fully pass the audit you can tell management that you've reduced your risk of exposure. The link to the pages are here: CISP.

    Of course, you're probably not interested specifically in protecting "Visa's track data" but in whatever data you consider sensitive. Applying the listed policies and practices would go a long way towards securing your resources, whatever it is you want to secure.

    As a large corporation, failure to comply would mean the penalties would be severe (and most likely business-damaging.) If you're not handling card data, you won't have the same consequences, of course. What the penalties meant to us, though, is that top management made a decree: 'fix the problems and pass the audit -- we can't afford not to.' Having top-down pressure means that if we have sensitive data that we're passing to another team, we're both inclined to work together to solve the issues. If one team balks, a phone call up the pyramid gets things back on track. If your university is serious about this, a similar edict will go a long way towards cleanup.

    Another boost in the direction of securing our data was hiring an external consultant to perform the audit. Our auditor is very knowledgeable about ways to follow the data: where does it enter the system, where does it go from there, who writes it to disc, why do they save it, and do they have a business need to save it? Can the data be eliminated? Can a token be substituted for the data? Can the data be truncated? If not, can it at least be masked on reports where the details aren't needed?

    As far as specifics go, each development and maintenance director's pyramid was required to assign a manager to own the PCI process. Each team had to go through their code, identify sensitive data, and take steps to protect it. They also had to go to the data owners, and have them redact their archives.

    It's huge. But given the security breaches that are almost a daily occurrance, we can't afford not to.

    --
    John
  2. Which is entirely the wrong approach by Moraelin · · Score: 4, Informative

    I believe the original requestor is asking about software to help automate/speed-up monitoring and scanning of content that's being put up on web sites by staff and/or students.

    And I never cease to be amazed by the sheer number of people sharing that belief that there's some magical amulet (uber-security program/appliance/whatever) that you can just tack onto a site and make it auto-magically secure.

    Unfortunately that kind of thinking is outright counter-productive. It's dangerous. It's the kind of thinking that breeds such disasters as "we use SSL, so we're secure." (Shame that someone uploaded confidential documents on the web site anyway, so they can be downloaded by anyone. _Securely_ downloaded, to be sure;) Or "we have a Snake Oil (TM) gateway that can scan SOAP requests, so we're secure." (Shame that noone actually configured the rules for it, though. Or shame that the Web front-end there allowed users to escalate their privileges _before_ it all got packed in a SOAP request: the gateway can't detect whether it's genuinely a site admin or a regular user who escalated their privileges.) Or "we have a hardened Single Sign-On front-end in front of the servers, enforcing login and access rights, so we're secure." (Shame, that, literally, one application allowed users to escalate their privileges and see any content, by just editing the URL. E.g., someone could edit the admin's password by just editing the admin's user ID in the URL for the password change page, _then_ properly log in as the admin through that hardened SSO front-end. Literally. I'm not making it up.) Etc.

    But to address your actual point: content scanners aren't the answer, or rather are a bad and incomplete answer. E.g., I've seen one company deploy such a thing in front of the back-end, in their case to supposedly protect against SQL injection in the front-end. So it rejected anything that looked like an SQL keyword. Should be secure, right? But what do you do if it's not as secure or well-programmed as you think? E.g., the thing would cause a form submit to fail if you wrote something like "Visa Select" in a field, because it contained "select", but actually failed to protect against actual SQL injection using the quote sign, or XSS injection using the greater-then and less-than signs.

    Worse yet, it encouraged everyone to be lax and don't bother thinking about security or doing a code review, because, hey, they have the magical amulet on the backend. Even worse, it encouraged managers to not allocate time or resources for an actual security review.

    Security isn't about magical amulets, it's "holistic", so to speak. The security chain is literally as weak as the weakest link. People need to be educated to actually sit and think about the whole and about every single piece and scenario, not to throw in a couple of +5 Security amulets and call it a day. Throwing in the towel and relying on some magical amulet which somehow makes it all secure just because it's there, is actually the antonym and nemesis of security.

    Even if such appliances and programs are used, someone needs to sit and think about how they're used, how they affect their own program, what they prevent, and most importantly what they _don't_ prevent. What data and how does it prevent from being stolen, and what happens when (not if) someone _does_ get through. E.g., what data you shouldn't be collecting in the first place anyway, because you don't actually need it. (If it's not there at all, it can't get stolen.) And most often the right thing to do is _not_ to rely on them: they're there as a last ditch defense, that can't catch everything, but it's one last chance to _maybe_ catch something that got through the other layers of defense. Not as a replacement for the other layers.

    And teams and managers need to be educated that they _need_ to do just that: sit and do a proper analysis. And not just the technical implementation parts, but also, yes, the people processes involved. E.g., if a process can w

    --
    A polar bear is a cartesian bear after a coordinate transform.
  3. McAfee/Foundstone's free SiteDigger by CFrankBernard · · Score: 2, Informative