Ask Slashdot: Best Practices For Collecting and Storing User Information?
New submitter isaaccs writes "I'm a mobile developer at a startup. My experience is in building user-facing applications, but in this case, a component of an app I'm building involves observing and collecting certain pieces of user information and then storing them in a web service. This is for purposes of analysis and ultimately functionality, not persistence. This would include some obvious items like names and e-mail addresses, and some less obvious items involving user behavior. We aim to be completely transparent and honest about what it is we're collecting by way of our privacy disclosure. I'm an experienced developer, and I'm aware of a handful of considerations (e.g., the need to hash personal identifiers stored remotely), but I've seen quite a few startups caught with their pants down on security/privacy of what they've collected — and I'd like to avoid it to the degree reasonably possible given we can't afford to hire an expert on the topic. I'm seeking input from the community on best-practices for data collection and the remote storage of personal (not social security numbers, but names and birthdays) information. How would you like information collected about you to be stored? If you could write your own privacy policy, what would it contain? To be clear, I'm not requesting stack or infrastructural recommendations."
Best practice from my perspective: do not collect the data at all.
I think your mind is on the right track in identifying your resource limits (i.e. no tip-of-the-spear experts) and the sensitivity of the data (i.e., it's not all nuclear bomb codes). That is the first step. Next, think on the exact types of data that you're collecting, and try to group like data together, for example, all text data, screen caps, keylogging, audio or webcam video if you have it, and find a way to store them in an efficient structure while everything stays linked together. Finally, if possible, associate all data collection events with time (timestamp) and location (gps). this will allow a more complete analysis on the back end.
honestly... try not to store it.
You need to examine why you actually need the data, and if you can't think of a good reason (except it might be valuable in the future), then don't store it.
If you do need it for analysis, machine learning apps, etc, try to anonymize it as early as possible, and not to keep raw data longer than you need it. (say raw data for 3 months, then just store aggregate info).
also.. for behavior.. you don't need years of information, studies have shown people change, so make sure the things people do recently are more important, and the old stuff gradually decays.
Wikipedia (http://en.wikipedia.org/wiki/Personally_identifiable_information) is a good start.
Agreed. People mistake this for a technical forum.
If at all possible, stay away from personally identifiable data. If your aim is to use identity as an index, work out a way in which you can translate an identity into an an index or hash value (i.e. one way). This is not going to be perfect (there will be about a million "John Smith"s out there), but if you have a consistent pair such as name and phone number, turn that into a hash and use it as data index.
That means you can still do correlations, but a leak will not result in exposure of personal data.
However, first of all, look at what you're holding on personal data and simply assume you got hacked and it's "out there" - plan for that crisis first because there is one question you need to answer:
If you cannot afford to pay for security advice, can you afford to pay for the inevitable consequences?
Insert
I have been toying with a site idea. Your account name is your public key fingerprint. You public nicname is whatever you use in the message. Your login is validated because everything you send is signed wiht the key that matches the fingerprint (and encrypted with my public key for transmision). Input to user form is constrained and validated within those constraints (to prevent padding attacks).
I would then have a database "key x","paid through date y".
Sure, I couldn't sell any farmed data a-la facebook, but suppoena requests woudl be a breze... "here's your hex dump..."
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
The short requirements:
1) Explain what you're collecting in real-time at the moment when you give me the option whether or not to permit you to collect it. Tell me what you will use it for, when you will delete it and the consequences if I don't give it to you. People don't read privacy disclosures. Give notice and ask permission at the moment of proposed collection. Make it opt-in, not opt-out.
2) Only request the information required to perform the service I've requested. Use the information I provide only to provide the service I've requested. Only share the information I provide with third parties to the limited extent necessary to provide the services I've requested. Obtain contractual commitments from those third parties that cause them to protect my information and delete it as soon as they've done what's required to provide the service I've requested. Keep information only as long as necessary to provide the service I've requested and delete it after you've done what's required to provide the service I've requested.
3) Protect my information. Encrypt in transit and at rest. Delete thoroughly and don't give in to the urge to collect and keep information just because it might be useful some time in the future. You can't lose what you don't have.
You say the collection "... is for purposes of analysis and ultimately functionality, not persistence." That seems inconsistent with the collection of name and email address. I can't think of too many use cases where you're collecting my name and email address and don't plan to keep it (and use it for marketing or otherwise share it in some way). If you need to contact me or I need to create a user-id that is my email address, you don't need my name.
Your privacy policy is your contract with your user. It is an operational document that must be consistent with your practices. The privacy policy should be consistent with your policies and procedures. If the information you collect, or the way you handle it changes, you must change your privacy policy.
"The plural of anecdote is not data."
I'd give him the benefit of the doubt, and assume this isn't the only place he's looking for best practices.
Meanwhile, "I'm an experienced developer, I'm familiar with all the general rules for securing customer data, but I'd like to hear of any 'gotchas' that you know about"? That seems like a reasonable thing to ask.
Again, assuming this isn't the one-and-only source. So instead of grabbing our pitchforks, maybe someone has some examples of what he asked about?
If you can't afford the expert then you can't afford to collect such data. Move away from this project to something you have the ability to do.
kartune85 : Incapable of reason, observation or learning. A kind of dim, drab, flightless parrot.
OWASP has guidance; for instance, here: https://www.owasp.org/index.php/IOS_Developer_Cheat_Sheet#Insecure_Data_Storage_.28M1.29
From https://www.owasp.org/images/5/5e/Mobile_Security_-_Android_and_iOS_-_OWASP_NY_-_Final.pdf
2. Insecure data storage
Solution
Avoid local storage inside the device for sensitive information
If local storage is “required” encrypt data securely and then store Use the Crypto APIs provided by Apple and Google
Avoid writing custom crypto code – prone to vulnerability
Take off every 'sig' !!
In the US, we have the National Electrical Code which explains in clear detail how house wiring is constructed.
Following the code a legal requirement in many (most?) states, but from the point of an electrician it's a "book of best practices". Use this gauge wire for this current, staple the wire within 6" of the box, and so on. The code gets revised and added to over time as questions crop up and new technologies get added and people get more experience.
There's a reason for everything. For example, the light in a bathroom should be on a separate breaker from the outlet next to the sink. It makes sense in retrospect, but this is not something that is obvious beforehand.
It's very detailed, but also very clear. Homeowners routinely understand the instructions and are able to make simple repairs and modifications to their home wiring which conform to the code.
We throw a lot of "best practices" around here as if they were simple and obvious at the outset, but maybe they're not. Hash your passwords, salt the hash, sanitize the form inputs, don't keep CC info... lots of best practices which in hindsight make sense but which aren't necessarily obvious beforehand.
Most web apps have common requirements for login, identity management, privacy, various forms of functionality, and so on.
Should we have a "book of best practices"?
This site provides summaries of the terms-of-service policies for various companies covering privacy, retention, and use of user information. You can use it to compare your plans with those of major companies and identify privacy or TOS concerns you may have overlooked.
+5 Funny
I lag
On passwords, I liked Jeff Atwood's article, `You're Probably Storing Passwords Incorrectly'.
For Personally Identifiable Information (PII), I liked Brian Danger Graham's article, `What's in a name database?'.
-rozzin.
Here in Quebec, the notary actually reads the entire document to you and asks you enough questions that he is sure you've understood it.