Ask Slashdot: Best Practices For Collecting and Storing User Information?
New submitter isaaccs writes "I'm a mobile developer at a startup. My experience is in building user-facing applications, but in this case, a component of an app I'm building involves observing and collecting certain pieces of user information and then storing them in a web service. This is for purposes of analysis and ultimately functionality, not persistence. This would include some obvious items like names and e-mail addresses, and some less obvious items involving user behavior. We aim to be completely transparent and honest about what it is we're collecting by way of our privacy disclosure. I'm an experienced developer, and I'm aware of a handful of considerations (e.g., the need to hash personal identifiers stored remotely), but I've seen quite a few startups caught with their pants down on security/privacy of what they've collected — and I'd like to avoid it to the degree reasonably possible given we can't afford to hire an expert on the topic. I'm seeking input from the community on best-practices for data collection and the remote storage of personal (not social security numbers, but names and birthdays) information. How would you like information collected about you to be stored? If you could write your own privacy policy, what would it contain? To be clear, I'm not requesting stack or infrastructural recommendations."
Best practice from my perspective: do not collect the data at all.
If at all possible, stay away from personally identifiable data. If your aim is to use identity as an index, work out a way in which you can translate an identity into an an index or hash value (i.e. one way). This is not going to be perfect (there will be about a million "John Smith"s out there), but if you have a consistent pair such as name and phone number, turn that into a hash and use it as data index.
That means you can still do correlations, but a leak will not result in exposure of personal data.
However, first of all, look at what you're holding on personal data and simply assume you got hacked and it's "out there" - plan for that crisis first because there is one question you need to answer:
If you cannot afford to pay for security advice, can you afford to pay for the inevitable consequences?
Insert
In the US, we have the National Electrical Code which explains in clear detail how house wiring is constructed.
Following the code a legal requirement in many (most?) states, but from the point of an electrician it's a "book of best practices". Use this gauge wire for this current, staple the wire within 6" of the box, and so on. The code gets revised and added to over time as questions crop up and new technologies get added and people get more experience.
There's a reason for everything. For example, the light in a bathroom should be on a separate breaker from the outlet next to the sink. It makes sense in retrospect, but this is not something that is obvious beforehand.
It's very detailed, but also very clear. Homeowners routinely understand the instructions and are able to make simple repairs and modifications to their home wiring which conform to the code.
We throw a lot of "best practices" around here as if they were simple and obvious at the outset, but maybe they're not. Hash your passwords, salt the hash, sanitize the form inputs, don't keep CC info... lots of best practices which in hindsight make sense but which aren't necessarily obvious beforehand.
Most web apps have common requirements for login, identity management, privacy, various forms of functionality, and so on.
Should we have a "book of best practices"?