Slashdot Mirror


Google, Bing, Yahoo Data Retention Doesn't Improve Search Quality, Study Claims (theregister.co.uk)

A new paper released on Monday via the National Bureau of Economic Research claims that retaining search log data doesn't do much for search quality. "Data retention has implications in the debate over Europe's right to be forgotten, the authors suggest, because retained data undermines that right," reports The Register. "It's also relevant to U.S. policy discussions about privacy regulations." From the report: To determine whether retention policies affected the accuracy of search results, Chiou and Tucker used data from metrics biz Hitwise to assess web traffic being driven by search sites. They looked at Microsoft Bing and Yahoo! Search during a period when Bing changed its search data retention period from 18 months to 6 months and when Yahoo! changed its retention period from 13 months to 3 months, as well as when Yahoo! had second thoughts and shifted to an 18-month retention period. According to Chiou and Tucker, data retention periods didn't affect the flow of traffic from search engines to downstream websites. "Our findings suggest that long periods of data storage do not confer advantages in search quality, which is an often-cited benefit of data retention by companies," their paper states. Chiou and Tucker observe that the supposed cost of privacy laws to consumers and to companies may be lower than perceived. They also contend that their findings weaken the claim that data retention affects search market dominance, which could make data retention less relevant in antitrust discussions of Google.

5 of 38 comments (clear)

  1. Data retention at all, or more than 3 months? by AvitarX · · Score: 3, Informative

    Because I bet the 3 month retention is a huge boost, if only in giving me history of older searches in auto complete.

    Much more than that doesn't seem too helpful though, three months is a whole lot of searches, and should give plenty of information about what I'm searching for right now.

    --
    Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
    1. Re:Data retention at all, or more than 3 months? by olau · · Score: 2

      Won't argue about the study which may very well be flawed, but I don't think your last assertion is correct.

      The data centers certainly aren't full of search histories. Let's say each person generates 1 KB of data per day in search history (with compression) - that's 1 TB/day to store data from 1 billion. What's the marginal cost of storing that data per year? 100,000 dollars?

      One thing you need to keep in mind is that a company like Google ultimately isn't storing data because of the value it provides to their users. They are storing the data because of the value Google themselves derive from it.

      This old data may be a treasure trove for Google, but only of marginal value to each user, and they would still fight very hard to keep it.

      The best way to keep the data is of course to tell people how important it is to help us, and not tell us about any analysis that people may find disgusting.

      The other day I read an example about data mining transactions in a bank. One of the goals was to identify alcoholics by looking at how much you're spending on booze or in bars. Telling, isn't it, how much you can infer about people just by looking at where they've been and what they've bought.

      Examples like that makes you wonder what kind of labels we all might have inside the googleplex and similar.

  2. Suckers buy "predict the past" by davecb · · Score: 2

    And then the suckers "have" their advertizer send me ads for something they know I like... becauseI just bought it, and the advertizers know they can prove my interest to their customers/suckers.

    Net result? You get ads for stuff you bought.

    --
    davecb@spamcop.net
  3. How's that again? by 93+Escort+Wagon · · Score: 3, Insightful

    What has data retention got to do with search results? Advertising is why they want to hold onto all your data.

    --
    #DeleteChrome
    1. Re:How's that again? by GuB-42 · · Score: 2

      Both search results and advertising work the same : try to find the most relevant site for you. The fundamental difference is than one is paid and the other is not.
      And in both cases short term data retention definitely helps. Long term may give a marginal improvement. One area where long term may help is with periodic tasks. For example if you are doing your taxes, remember what you did the year before may be helpful for both you (ex: you found a great site listing deductibles) and advertisers (ex: you considered hiring an accountant).