More Web Site User Data Gathering Revealed
Emmett Interviews Interhack
Slashdot: For those uninitiated, what's interhack all about?
Basically, we're a firm of hackers interested in pushing technology forward through research, making computing apply to people by developing custom products and consulting for folks who want to put the technology to use, and helping people understand exactly what the ramifications of these systems are. That's a pretty broad way of saying that we're all about the Internet and making it work.
Slashdot: When did you start researching this story, and how long did it take to put the pieces together?
Sometime in May, someone sent us a tip about Coremetrics and what it's doing. We took a quick look over their web site to see their advertised services and then started to look at how the service is actually implemented on various client sites. We examined several sites, most of which very clearly stated in their privacy policies that they're using Coremetrics for site monitoring and provided links necessary for people who don't like it to opt out of the system. Most of the sites with clear, full disclosure policies weren't even sending Coremetrics personally-identifiable information like names and addresses.
The more interesting part of our find was in the sites that did send personal information to Coremetrics, particularly those that carried the TRUSTe privacy seal. Over the course of about three weeks, we performed an investigation of these sites, gathering as much information as possible from them. We reverse-engineered the system by reading the sites' code, reading through the obfuscation, and comparing logs of our network's activity with the activity that would be perceived by an end user.
What we found was a clear difference in user expectations and what was actually happening, as well as a clear difference between what Coremetrics says it offers and what its eLuminate service makes technically feasible. After writing drafts of our report and press release, we decided to take a wait-and-see approach to the release. Specifically, we wanted to ensure that sites that just started to use the Coremetrics service had adequate time to update their policies and to have an accurate idea of what was happening with the system after having been in production.
After waiting and watching for more than a month, we decided to release our findings. So, on Monday morning, we sent a pre-release copy of our report to Richard Smith and some folks at Zero Knowledge Systems. In addition, we contacted each of the firms named in our report and Coremetrics so that if the failure to disclose or the ability to profile people across web sites was unintentional, there would be time for some investigation and a decision about how to fix the problem. After the end of business Monday, we released our report.
Slashdot: What needs to change? In a perfect world, how do we deal with this?
This is a very interesting question. In my perfect world, detailed levels of profiling would not take place at all. There would be no such thing as persistent cookies. In general, I'm just not comfortable with the level of privacy that the industry as a whole has given up for the sake of a little convenience.
How big of a deal, really, is it to have to enter your password when you login to a web site? Don't forget that the reason why we have passwords in the first place is so that you'll have to do something at the beginning of the session to prove who you are.
Web browsers also need to be more intelligent. That is, they need to be able to identify things like dependencies on third parties so the user can know whether those images should be fetched or ignored. Right now, browsers -- for the most part at least -- just aren't very defensive. The model of parsing everything you're given worked fine in the Old Days for which some of us long so much but the fact of the matter is that you really can't blindly trust anyone on the Internet.
I'm not suggesting becoming a luddite. I'm suggesting that folks take a sort of "trust, but verify" approach a la Ronald Reagan. Right now, there's a lot of trust and almost no way to verify.
Slashdot: This all comes down to trust. How many policies are just there so people will shut up about personal information so they'll start buying stuff online?
I couldn't say. Policies are almost always written by lawyers. That probably speaks to the covering-one's-posterior-position value of privacy policies.
Slashdot: Since we can't trust written policies, what should people be doing before they start conducting business with these websites?
Verify everything. As I said earlier, though, we're severely lacking in tools that are accessible to most people that can help in that regard. I think Zero Knowledge Systems' Freedom network is a huge step in the right direction. Tools like Muffin (muffin.doit.org) also help, but it would be cooler for that kind of functionality to live right in the browser itself. There are opportunities for eager hackers on this front.
It's also important to stress that tools alone won't do it -- there is no silver bullet. People are going to have to have some understanding of what's happening in order to use these tools effectively.
Finally, where you see discrepancies, point them out. Most of the time, they're oversights. Look at how Lucy.com and Fusion.com dealt with this problem: they updated their sites. So although the problem shouldn't have happened in the first place, they did the right thing. Contrast that with Toys "R" Us, which issued a statement saying that what they're doing isn't a violation. And their privacy policy still doesn't say a word about Coremetrics. They still haven't said anything to address the issue of having information collected on children.
Companies that don't fix their problems don't take your privacy seriously, no matter how much lip service they pay. So don't go to their sites. Don't buy their stuff. Tell them why you're not buying their stuff. Tell their competitors why you shop where you do, lest the new places you shop get the bright idea to try to hide something.
Jamie Talks to Coremetrics
Here's the service Coremetrics provides to corporate websites:
Many companies demand accurate knowledge of how their sites are being used: what sections are popular, what paths visitors take through the site, where people click over from, and so on. It's like web log analysis but more specialized for large shopping sites.
Since these demands are very much the same, and the code to do the analysis is similar, outsourcing happens. From a CEO's viewpoint, Coremetrics fiddles with the website to do better-quality tracking than the company could do on its own, and then makes the resulting statistics available over SSL.
But from your viewpoint and mine, that "fiddling" results in cookie-carrying web bugs all over the sites we visit -- web bugs which usually send back to the Coremetrics servers a unique visitor tag, like any other cookie, but one that sometimes includes your name, email address or other personally identifying information.
Coremetrics promises that this information remains private. When DoubleClick collects data from <img> cookies across multiple websites, they do so with the stated intention of tracking you personally; this is part of their business plan.
According to Coremetrics, they do things very differently. Data is not cross-correlated between their client websites, they say, because their contracts with their clients prohibit this. In fact, their contract forbids them from doing much of anything with that data except statistical analysis.
I gave the Coremetrics PR person I talked to a chance to explain, using the example of their client Toys 'R' Us:
"Coremetrics is merely an agent that collects this data on behalf of an individual customer, for that individual's sole use only. We do not collect data, as was inferred very incorrectly by Interhack, across multiple unrelated websites, with any intention of selling it to third parties -- or even distribution to third parties. That's because we, as the agent, do not own that data, nor do we have any rights to that data. Toys 'R' Us, and Toys 'R' Us only, is the sole owner of that data. So legally, we cannot do any of the possibilities that Interhack had alluded to in their report."
But here's the interesting thing.
If I'm browsing my favorite website, Coremetrics is clearly a third party. They have a special contractual relationship to keep my data private, which we shouldn't ignore. But nevertheless -- a third party.
So why do some of their clients' privacy policies not mention this?
Toys 'R' Us is a good example. As Interhack made clear, they do send personal data to Coremetrics' servers. But their privacy policy reads, "We do not share any personally identifying data about our guests with anyone outside of Toysrus.com, its parent, affiliates, subsidiaries, operating companies and other related entities."
So is Coremetrics one of their affiliates or a related entity? I wouldn't think so, but I'm not a lawyer. One interesting thing is hidden in that privacy policy's HTML; after the closing </html> tag is the hidden message: "<!--CoreMetrics Information if enabled-->." Hmmmmmm.
Coremetrics lists twenty clients; I tried to contact seventeen of them for comment, with marginal success by press time. Three reported that they had not yet activated Coremetrics or had decided not to use the service at all. One (guru.com) reported not sending any personal information -- presumably, only tracking visitors with a non-identifying unique ID.
Two sites (lucy.com and fusion.com) began mentioning Coremetrics in their privacy policies on August 1, the day after the Interhack report. One site (thewest.com) did not even have a privacy policy until yesterday; they'd been working on it, and my email may have made it a priority because it was on their site three hours later.
According to Coremetrics, they encourages all their clients to disclose the use of their service in their privacy policy, and include a link for users to opt out. But some sites reported as using or planning to use Coremetrics' services have privacy policies that could use some clarification.
Altrec.com informs me that "...in the near future ... we plan to add to our privacy statement our use of Coremetrics and the fact that Coremetrics neither owns, distributes, nor has rights to the data it sorts on Altrec.com's behalf." However, their current privacy policy states very simply: "Altrec.com will never sell or give your e-mail address (or any other information about you) to anyone else without your permission. Period."
(Last-minute update -- just before press time, Altrec.com clarified that they are "sending unique ID (unique to Altrec.com) and city, state and zip. No other personally identifiable information is being sent to Coremetrics.")
Bravanta.com bounced me between different people until I got to leave voicemail that wasn't returned by press time. Their policy says they "do not and will not sell, trade or rent the personal information of our customers or gift recipients to any third parties."
(Update two hours later: Bravanta reports that they also have decided not to use Coremetrics' service, and are not currently using it.)
Mall.com didn't get back to me either, and their policy reads "We will NEVER release your name and personal information to a third party..."
Getplugged.com has a rather confusing privacy statement that begins, "Any personally identifiable information GetPlugged.com collects will be used solely for the purposes stated within this Privacy Statement" and wanders around from there. I'm not sure what to make of it, frankly.
All these polices may indeed be correct, if the sites are stingy with personal data. Like guru.com (and altrec.com), they may be using the Coremetrics service only with non-personal IDs. But, as with Toys 'R' Us, that may also not be the case.
(fusion.com, getplugged.com, and altrec.com also happen to be TRUSTe licensees, but TRUSTe wasn't able to comment by press time. In the AP wire story on Monday, they had harsh words but were speaking hypothetically; no comment since then.)
It's hard enough to read privacy policies already. Most of them are designed to protect companies legally, and mostly manage to confuse users. The distinction between Coremetrics as a third party; or affiliate; or agent, is a little too fine for the average consumer, and needs to be spelled out in each policy, as Coremetrics itself recommends.
But is all this a tempest in a teapot? If a signed contract forbids a company from misusing data, is that all we need to know?
I don't think so. In the first place, at the very least, companies like Toys 'R' Us need to disclose such things in their privacy policies. That's just common sense.
In fact, according to Coremetrics privacy advisor Dave Farber, they plan contractually to require such disclosure with future clients. (The company could not confirm or deny this at this time.)
More importantly, we as consumers are being asked to trust a third party whose reputation we know nothing about. In fact, 99% of us will never even have heard of them and might not understand what they do. We're told that a contract protects us, but we're still being asked to trust something we can't see. And when evidence of policy violations is turned up by a group of hackers, that erodes our trust.
After speaking at length with Coremetrics' PR, I get a general feeling of trust from them. (Of course that's a large part of their PR staff's job, earning reporters' trust.) More importantly, Dave Farber is well-respected, and his confidence carries weight -- with me at least.
Still, as Interhack says, our motto should be "trust but verify." That's why I proposed, to Coremetrics, that they publicly post, on their website, the paragraphs from their clients' contracts which assure that our private data remains private. If the actual legal words that protect our data are up there for us to see, we don't have to trust anyone.
When I mentioned this to Coremetrics' PR person, he promised to consider it; Dave Farber thought it was "a very good idea." It's unusual for corporations to make contracts public, even in part, but in this case it would do a great deal to put everyone's fears to rest.
Simple fix: /dev/null ~/.netscape/cookies
ln -sf
Your cookies will all be accepted and valid while they remain in memory (that is, as long as you keep the web browser open), but will be flushed every time you close netscape -- giving you the best of both worlds.
Matt
Please see my reply above, in which I answered the same questions.
The basic problem is that a huge percentage of advertisers outsource their advertising operations to DoubleClick. To have them advertise, you grab images off of DoubleClick. That's not anything we have control, unfortunantely, as that's the advertisers choice to go through DBL. I wish it were otherwise.
Yeah, I'm that guy.
article here
I don't care if it's 90,000 hectares. That lake was not my doing.
Profiling is an incredibly important tool to promote good customer service! We shouldn't do away with it because it COULD constitute a violation of privacy. That's like saying that we should do away with telephones just because they allow telemarketers to invade our privacy (try caller id).
Amazon, for instance, tracks all of my purchases, and, in return, gives me the only useful product recommendations I've seen on any commercial web site. Other sites could track my reading patterns (within their own site, not across others!) to figure out what types of articles actually interest me so that they can provide better content in the future. They need to plant a cookie on my browser to do that tracking, and they may even benefit from demographic information from me (to see what 20 year-old white males like to read), but they never need to know my real name, address, or phone number.
For me, the biggest privacy concern is spam and telemarketing. I WANT people to get enough data about me to serve banner targetted ads, because those are more likely to be interesting to me (I might buy a boxed copy of Enhydra, but I probably won't buy a copy of Cosmopolitan), as long as they don't invade my Inbox with those ads.
--JRZ
Not only does this Web designer use one-pixel gifs... pretty much every Web designer does. The reason is that browsers suck. Theoretically, by using CSS, visual presentation of information can be managed. But CSS support is horrible -- only IE 5 for Mac really has it (among released browsers at this point).
So Web designers are forced to use HTML for visual presentation of information (no, just putting it in a simple list isn't good enough -- 400 years of learning how to effectively present information says otherwise. See Edward Tufte's works FMI). And the only way to do that is to micromanage detailed issues like spacing.
But all that's moot. The worst part about this whole article is that the companies are lying to their customers about how their information is being used. There is almost no way an educated user, without the benefit of infinite time and tools, could have known to protect him- or herself from this information theft. That's why Truste needs to sue and the FTC needs to get involved. Personally, I think that the companies who did this need to be permanently banned from having a Web presence in order to set an example, but I don't know how that could be done legally.
You can do something: opt out
http://www.coremetrics.com/opt_out_ options.html
Please note that all these images come from slashdot's own servers. They're pagecounter images. I'll just forward along the email I got from Richard M. Smith, the guy who coined the term "web bug", when I asked him about it:
Jamie McCarthy
Jamie McCarthy
jamie.mccarthy.vg
Let's face it. The days of the Internet being a free-for-all are over. Corporations are going to find ways to collect demographic and personal data. Trying to legislate this out of existance is like trying to legislate Napster and Gnutella out of existance: It isn't going to happen.
The best you can do is write a browser plug-in that will reject such data and prevent the corporation from gaining any valuable data from your visit.
No amount of legislation can stop this kind of thing. If you ban companies from collecting data like this in the United States, they will simply move their servers outside the border and continue to do business as usual.
In the information age, it is no longer the job of government to protect our privacy - they can't, it's an insermountable job. The only way to protect online privacy is to do it yourself.
Brought to you by Frobozz Magic Penguin Fodder.
Mostly I avoid the problem by using a filtering proxy (eg Internet Junkbuster), but just for kicks sometimes I'll skip that, collect a few cookies then go and edit my cookies.txt file.
Interesting things to do with entries in the cookies file:
- randomly change some of the ID numbers -- let them think you're somebody else (or nobody)
- if there's a timestamp, change the date to something bogus -- 1956, or 1842, or 2003. Maybe somebody's database will break.
- insert really really long strings of random characters (or numbers if numeric) into the cookie values -- maybe it'll overflow a buffer somewhere.
- add a few hundred or thousand bogus cookie entries for some domains, maybe the cookie eater will choke.
How much of this actually adversely affects the cookie server I don't know -- not my area of expertise -- but it at least screws up their tracking somewhat. You want cookies? Here, I'll give you cookies....
-- Alastair
I don't see a big deal; These companies decided to outsource their traffic analysis. While the capability surely exists for Coremetrics to track users across websites, a'la Doubleclick, their customers would be terribly pissed.
Personally, I don't see the issue of online tracking as being more than 'a tempest in a teapot'. Those that do not wish to be tracked can surely disable it, and the tracking companies and user data mining companies will continue to make money off the mindless drones that populate the net.
It's always been 'buyer beware'. What is so special about the net that it no longer applies? So the tracking is easier to do, and easier to analyze, and there is more of it, and it is more meaningful; Do you honestly think your bank, the telephone company, and the credit agencies aren't selling your spending habits to marketers?
Um, uh.. Damn, I'll think of something after the hangover.