Slashdot Mirror


Ask Slashdot: Best Practices For Collecting and Storing User Information?

New submitter isaaccs writes "I'm a mobile developer at a startup. My experience is in building user-facing applications, but in this case, a component of an app I'm building involves observing and collecting certain pieces of user information and then storing them in a web service. This is for purposes of analysis and ultimately functionality, not persistence. This would include some obvious items like names and e-mail addresses, and some less obvious items involving user behavior. We aim to be completely transparent and honest about what it is we're collecting by way of our privacy disclosure. I'm an experienced developer, and I'm aware of a handful of considerations (e.g., the need to hash personal identifiers stored remotely), but I've seen quite a few startups caught with their pants down on security/privacy of what they've collected — and I'd like to avoid it to the degree reasonably possible given we can't afford to hire an expert on the topic. I'm seeking input from the community on best-practices for data collection and the remote storage of personal (not social security numbers, but names and birthdays) information. How would you like information collected about you to be stored? If you could write your own privacy policy, what would it contain? To be clear, I'm not requesting stack or infrastructural recommendations."

27 of 120 comments (clear)

  1. Just don't do it by sublayer · · Score: 5, Insightful

    Best practice from my perspective: do not collect the data at all.

    1. Re:Just don't do it by puterguy · · Score: 5, Insightful

      If you really feel the need to collect personal data and you *truly* care about the privacy concerns and needs of your customers, then don't go burying such disclosures in a privacy statement that the average user is unlikely to ever see let alone read.

      If you truly care about privacy, then either require the user to *opt-in* to such sharing or prominently display the lack of such privacy on the initial splash screen.

      Burying the collection of personal data in the middle of some lawyerly gobblygook privacy statement is like mortgage lenders burying key terms in the middle of 100's of pages of documentation. Yeah, it's legally there but no one is actually going to read or understand it.

    2. Re:Just don't do it by fm6 · · Score: 2

      So, Slashdot made a mistake in allowing you to create an account?

    3. Re:Just don't do it by c0lo · · Score: 2

      Best practice from my perspective: do not collect the data at all.

      More detailed:
      Rule 1. don't do it
      Rule 2. if for some reasons, rule 1 cannot be followed, collect them but discard them immediately
      Rule 3. if for some reasons, the prev 2 rules cannot be obeyed, after collection put them on a WORN storage (that is: "Write Only, Read Never" media)

      --
      Questions raise, answers kill. Raise questions to stay alive.
    4. Re:Just don't do it by philip.paradis · · Score: 2, Insightful

      Alternately, people could simply take responsibility for themselves and choose to avoid services which require agreement to miles of terms. Given your attitude on the topic, you probably haven't even bothered to read the terms of service for anything you're using right now. It seems you're trying to divert responsibility for yourself onto the backs of the service organizations you choose to deal with. Again, note the word "choose."

      You've also managed to miss the opportunity to discuss where data goes and how it's protected after it's submitted in the first place. Oddly enough, this is the essential question posed by the submitter in the first place, and regardless of what any given set of terms says, is actually the most important piece that very few people think about at all. In other words, you can trust an organization to high heaven based on what they say they will or won't do with your data, but if their infrastructure is a gaping mess of channels by which your information could get compromised, all of a sudden those terms don't mean much. I applaud the submitter for asking the right questions, and remind you to think more about your responses in terms of real wold data acquisition and retention mechanisms before posting again.

      --
      Write failed: Broken pipe
    5. Re:Just don't do it by davester666 · · Score: 5, Funny

      Yes, just store the data in plaintext, in a mysql database connected directly to the internet.

      Bonus points if you create mysql users for each unique user and use their username/password to authenticate connections to the database.

      --
      Sleep your way to a whiter smile...date a dentist!
    6. Re:Just don't do it by CodeBuster · · Score: 3, Interesting

      Whenever I'm signing up for a new site or using a service for the first time, I always do a recon of their sign up procedures using a fake name / email address so I can see what sort of information they "require" before I even get started and even then I only give up what I absolutely have to. If I can get away with using the fake information permanently, then I do that. I keep track of all my fake identities in an encrypted file container by site name so that I can be consistent with my aliases. This strategy works well for me and I'm sure that I can't be the only person out there who does this. As Robert De Niro's character, Jack Byrnes, said in Meet the Fokkers (paraphrased), "If you're outside the circle of trust, you're on a need to know basis and right now you don't need to know."

    7. Re:Just don't do it by isaaccs · · Score: 2

      There were little to no details given as to how the privacy disclosure would be phrased or provided to users. As it were, your assumption is wrong. There is no desire to squirrel away anything in legalese. Indeed, the question asks: "If you could write your own privacy policy, what would it contain?". You describe the "hidden" (which you've assumed) solution as unoriginal, but provide no alternative suggestions (which was the point of submitting the question to the community in the first place).

    8. Re:Just don't do it by jittles · · Score: 4, Interesting

      >Burying the collection of personal data in the middle of some lawyerly gobblygook privacy statement is like mortgage lenders burying key terms in the middle of 100's of pages of documentation. Yeah, it's legally there but no one is actually going to read or understand it.

      When I bought my house, I spent about 3 hours at the title company reading and signing the mountain of paperwork. I would never commit myself to 30 years of anything without knowing and understanding the details. I will say that the notary was pissed. After 30 minutes she said "Are you really going to read the entire thing?" And later "I have an appointment, you're going to make me late." My responses were "Yes, I'd be stupid not to." and "You scheduled this entire block with me, its not my fault you double booked yourself, you'll have to cancel your other appointment."

  2. risk vs. investment tradeoffs by noh8rz10 · · Score: 4, Informative

    I think your mind is on the right track in identifying your resource limits (i.e. no tip-of-the-spear experts) and the sensitivity of the data (i.e., it's not all nuclear bomb codes). That is the first step. Next, think on the exact types of data that you're collecting, and try to group like data together, for example, all text data, screen caps, keylogging, audio or webcam video if you have it, and find a way to store them in an efficient structure while everything stays linked together. Finally, if possible, associate all data collection events with time (timestamp) and location (gps). this will allow a more complete analysis on the back end.

    1. Re:risk vs. investment tradeoffs by SomePgmr · · Score: 3, Insightful

      Finally, if possible, associate all data collection events with time (timestamp) and location (gps).

      It started getting a little creepy there at the end, bud. ;)

  3. Don't by SmartyPants · · Score: 4, Informative

    honestly... try not to store it.

    You need to examine why you actually need the data, and if you can't think of a good reason (except it might be valuable in the future), then don't store it.
    If you do need it for analysis, machine learning apps, etc, try to anonymize it as early as possible, and not to keep raw data longer than you need it. (say raw data for 3 months, then just store aggregate info).

    also.. for behavior.. you don't need years of information, studies have shown people change, so make sure the things people do recently are more important, and the old stuff gradually decays.

  4. Start reading about PII by Anonymous Coward · · Score: 3, Informative

    Wikipedia (http://en.wikipedia.org/wiki/Personally_identifiable_information) is a good start.

  5. Re:I'm an experienced developer by Anonymous Coward · · Score: 2, Interesting

    Agreed. People mistake this for a technical forum.

  6. Break the association by cheros · · Score: 4, Insightful

    If at all possible, stay away from personally identifiable data. If your aim is to use identity as an index, work out a way in which you can translate an identity into an an index or hash value (i.e. one way). This is not going to be perfect (there will be about a million "John Smith"s out there), but if you have a consistent pair such as name and phone number, turn that into a hash and use it as data index.

    That means you can still do correlations, but a leak will not result in exposure of personal data.

    However, first of all, look at what you're holding on personal data and simply assume you got hacked and it's "out there" - plan for that crisis first because there is one question you need to answer:

    If you cannot afford to pay for security advice, can you afford to pay for the inevitable consequences?

    --
    Insert .sig here. Send no money now. Owner may sue, contents will settle. Batteries not included.
    1. Re:Break the association by cheros · · Score: 2

      He said he had little money available, so I figured I gave him something that was easy vs. perfect. The key question is if the delta introduced by the odd hash collision is actually significant in the volume of data he is planning to process. If it isn't, I would not try to develop perfection - he can use his little funding better elsewhere..

      In other words, in theory you're absolutely right, in practice I suspect there is little difference. But my favourite way of avoiding issues with personal data is simply not collecting them in the first place. Unless you are Google and get away with a pathetic fine, of course..

      --
      Insert .sig here. Send no money now. Owner may sue, contents will settle. Batteries not included.
  7. Collect as little as possible, throw it away... by IBitOBear · · Score: 4, Interesting

    I have been toying with a site idea. Your account name is your public key fingerprint. You public nicname is whatever you use in the message. Your login is validated because everything you send is signed wiht the key that matches the fingerprint (and encrypted with my public key for transmision). Input to user form is constrained and validated within those constraints (to prevent padding attacks).

    I would then have a database "key x","paid through date y".

    Sure, I couldn't sell any farmed data a-la facebook, but suppoena requests woudl be a breze... "here's your hex dump..."

    --
    Innocent people shouldn't be forced to pay for inferior software development.
    --"Code Complete" Microsoft Press
  8. Give me control and earn my trust by johnnick · · Score: 3, Insightful

    The short requirements:

    1) Explain what you're collecting in real-time at the moment when you give me the option whether or not to permit you to collect it. Tell me what you will use it for, when you will delete it and the consequences if I don't give it to you. People don't read privacy disclosures. Give notice and ask permission at the moment of proposed collection. Make it opt-in, not opt-out.

    2) Only request the information required to perform the service I've requested. Use the information I provide only to provide the service I've requested. Only share the information I provide with third parties to the limited extent necessary to provide the services I've requested. Obtain contractual commitments from those third parties that cause them to protect my information and delete it as soon as they've done what's required to provide the service I've requested. Keep information only as long as necessary to provide the service I've requested and delete it after you've done what's required to provide the service I've requested.

    3) Protect my information. Encrypt in transit and at rest. Delete thoroughly and don't give in to the urge to collect and keep information just because it might be useful some time in the future. You can't lose what you don't have.

    You say the collection "... is for purposes of analysis and ultimately functionality, not persistence." That seems inconsistent with the collection of name and email address. I can't think of too many use cases where you're collecting my name and email address and don't plan to keep it (and use it for marketing or otherwise share it in some way). If you need to contact me or I need to create a user-id that is my email address, you don't need my name.

    Your privacy policy is your contract with your user. It is an operational document that must be consistent with your practices. The privacy policy should be consistent with your policies and procedures. If the information you collect, or the way you handle it changes, you must change your privacy policy.

    --
    "The plural of anecdote is not data."
  9. Re:I'm an experienced developer by SomePgmr · · Score: 3, Insightful

    I'd give him the benefit of the doubt, and assume this isn't the only place he's looking for best practices.

    Meanwhile, "I'm an experienced developer, I'm familiar with all the general rules for securing customer data, but I'd like to hear of any 'gotchas' that you know about"? That seems like a reasonable thing to ask.

    Again, assuming this isn't the one-and-only source. So instead of grabbing our pitchforks, maybe someone has some examples of what he asked about?

  10. You can't afford it, by your own admission. by VendettaMF · · Score: 3, Insightful

    If you can't afford the expert then you can't afford to collect such data. Move away from this project to something you have the ability to do.

    --
    kartune85 : Incapable of reason, observation or learning. A kind of dim, drab, flightless parrot.
    1. Re:You can't afford it, by your own admission. by Mike610544 · · Score: 2

      If you can't afford the expert then you can't afford to collect such data. Move away from this project to something you have the ability to do.

      I'm surprised it took this long for someone to say that. The people who will exploit your system and extract something valuable from it can afford those experts.

      --
      ... also, I can kill you with my brain.
  11. OWASP by FormOfActionBanana · · Score: 5, Informative

    OWASP has guidance; for instance, here: https://www.owasp.org/index.php/IOS_Developer_Cheat_Sheet#Insecure_Data_Storage_.28M1.29

    From https://www.owasp.org/images/5/5e/Mobile_Security_-_Android_and_iOS_-_OWASP_NY_-_Final.pdf
    2. Insecure data storage
    Solution
      Avoid local storage inside the device for sensitive information
      If local storage is “required” encrypt data securely and then store Use the Crypto APIs provided by Apple and Google
      Avoid writing custom crypto code – prone to vulnerability

    --
    Take off every 'sig' !!
  12. Book of best practices by Okian+Warrior · · Score: 5, Insightful

    In the US, we have the National Electrical Code which explains in clear detail how house wiring is constructed.

    Following the code a legal requirement in many (most?) states, but from the point of an electrician it's a "book of best practices". Use this gauge wire for this current, staple the wire within 6" of the box, and so on. The code gets revised and added to over time as questions crop up and new technologies get added and people get more experience.

    There's a reason for everything. For example, the light in a bathroom should be on a separate breaker from the outlet next to the sink. It makes sense in retrospect, but this is not something that is obvious beforehand.

    It's very detailed, but also very clear. Homeowners routinely understand the instructions and are able to make simple repairs and modifications to their home wiring which conform to the code.

    We throw a lot of "best practices" around here as if they were simple and obvious at the outset, but maybe they're not. Hash your passwords, salt the hash, sanitize the form inputs, don't keep CC info... lots of best practices which in hindsight make sense but which aren't necessarily obvious beforehand.

    Most web apps have common requirements for login, identity management, privacy, various forms of functionality, and so on.

    Should we have a "book of best practices"?

  13. Also consider TLDR-TOS by Krishnoid · · Score: 2

    This site provides summaries of the terms-of-service policies for various companies covering privacy, retention, and use of user information. You can use it to compare your plans with those of major companies and identify privacy or TOS concerns you may have overlooked.

  14. "We aim to be completely transparent and honest" by stiebing.ja · · Score: 2

    +5 Funny

    --
    I lag
  15. Collecting Personally Identifiable Information by Rozzin · · Score: 2

    On passwords, I liked Jeff Atwood's article, `You're Probably Storing Passwords Incorrectly'.

    For Personally Identifiable Information (PII), I liked Brian Danger Graham's article, `What's in a name database?'.

    --
    -rozzin.
  16. Reading it. by hendrikboom · · Score: 2

    Here in Quebec, the notary actually reads the entire document to you and asks you enough questions that he is sure you've understood it.