Google, Bing, Yahoo Data Retention Doesn't Improve Search Quality, Study Claims (theregister.co.uk)
A new paper released on Monday via the National Bureau of Economic Research claims that retaining search log data doesn't do much for search quality. "Data retention has implications in the debate over Europe's right to be forgotten, the authors suggest, because retained data undermines that right," reports The Register. "It's also relevant to U.S. policy discussions about privacy regulations." From the report: To determine whether retention policies affected the accuracy of search results, Chiou and Tucker used data from metrics biz Hitwise to assess web traffic being driven by search sites. They looked at Microsoft Bing and Yahoo! Search during a period when Bing changed its search data retention period from 18 months to 6 months and when Yahoo! changed its retention period from 13 months to 3 months, as well as when Yahoo! had second thoughts and shifted to an 18-month retention period. According to Chiou and Tucker, data retention periods didn't affect the flow of traffic from search engines to downstream websites. "Our findings suggest that long periods of data storage do not confer advantages in search quality, which is an often-cited benefit of data retention by companies," their paper states. Chiou and Tucker observe that the supposed cost of privacy laws to consumers and to companies may be lower than perceived. They also contend that their findings weaken the claim that data retention affects search market dominance, which could make data retention less relevant in antitrust discussions of Google.
Because I bet the 3 month retention is a huge boost, if only in giving me history of older searches in auto complete.
Much more than that doesn't seem too helpful though, three months is a whole lot of searches, and should give plenty of information about what I'm searching for right now.
Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
Constantly looking a old, trivial info leads to jack shit.
News at 11.
SEO, sponsored results and the well established fact that almost nobody goes past a page or two of results... this seems like an obvious result.
As they are being taken out of context. Many websites imported UseNet newsgroups, a popular one was one I frequented.
Those would be best removed, yet none I regret; other than some of the websites they ended up at.
for liquidation.
The only reason for data retention is tracking. 3 months ago I was searching for info on what DVDs came out recently. Yesterday I searched for what DVDs came out recently.
My searches tend to be pretty random. Someone started showing The Incredible Hulk a few months ago. I searched for the show, Bill Bixby, the guy who played the reporter, and Lou Ferrigno. Why? Not cuz I want to buy them, but because I don't have it in me to just sit back and watch a TV show nowdays.
Hey, Hill Street Blues! Never caught it 30 years ago, let's google the hell out of it.
Hey, NYPD Blue! Never caught it 30 years ago, let's good the hell out of it.
And then the suckers "have" their advertizer send me ads for something they know I like... becauseI just bought it, and the advertizers know they can prove my interest to their customers/suckers.
Net result? You get ads for stuff you bought.
davecb@spamcop.net
What impact does this have on I.T.?
I regularly search for things three to five years old, sometimes I even find my own solutions on a website somewhere.
If the data retention has no effect on searches three to five years apart, on well aged data, then I've no problem with lower data retention.
Over time I've noticed various programming-related phrases that come up as the first result if I'm on my account, but are burried if I'm not.
So, I'd say it works good for me. Now, if things need to be stored long-term to get the same benefit versus, say, only a couple months, I have no idea.
What has data retention got to do with search results? Advertising is why they want to hold onto all your data.
#DeleteChrome
For me, there were two articles from the 80's that I remember in either Popular Science or Popular Mechanics that were relevant to /. stories in the past couple months. Unfortunately, not even the respective websites could be of any help. I would have really gotten a kick out of reading both articles, but it wasn't to be.
It comes down to information quality. Most forum answers to a question have a five-year or less value, and while an archive of my travel stories from a couple decades ago might be fun for my personal nostalgia... I doubt people are really going to be searching Google for Scuba J 2000 and stumble across something with all the noise from Facebook and Instagram today.
A 12 month warranty, with the 13 month retention you can bet they'll be looking for the same product once it goes kaput shortly after the warranty expires... another brick in the wall...
[($)]
I could have told them that. The quality of Google searches has been going down for at least a decade, probably for as long as they have been tracking users.
As other have mentioned, the problem is that all that retention can be used for is predicting the past. Using my previous searches to predict what I'm going to search for? More like predicting what I'm not going to search for again. Using what I previously clicked on to predict which search results I'm likely to click on again? The reason I'm still searching is that none of those results solved my problem.
Now that I think about it, I wonder if that's how StackOverflow and competitors keep getting on top. By having hundreds of posts without solutions, they get a ton of clicks from people hoping to solve a problem, and every single one of those clicks pushes those posts up even further. A site with a solution to every problem would get one click from a person with a problem. One with hundreds of questions and no solutions get hundreds of clicks.
"As a measure of accuracy, we examine whether a consumer repeats a search or navigates to a new site." This is a pretty awful way to measure search accuracy. It's much more complicated than this. There's a gradation of results. I might click on a result that's "good enough" even though it's not really what I'm looking for. I might click on a result because I can't tell from the title and snippet whether or not it's what I want. There could be a degradation in quality that this simple metric completely misses. They really need to do some search quality annotation to be sure.
Logging your searches has nothing whatsoever to do with improving the quality of search results because Google, Bing, and Yahoo don't give a damn about YOU, you're just a farm animal that produces data that they sell to so-called 'partner companies' that turn around and shove ads in your feed-box, expecting you to gobble them up, then defecate money that businesses scoop up to put in their pockets. I'm only half-surprised that they don't claim rights to our corpses when we die so they can sell our organs and render the rest of our bodyparts to make glue or lampshades or whatever. #CaptialismGoneBad
Data retention does improve the only thing they care about: monetization.
So I'm confused... the study is about "search quality", but I don't understand how they define that term. They were looking at search engines that changed their retention policy. They evaluated search quality before and after. That part sounds good.
It seems that they counted the number of users coming from search engine A and landing at site B before and after. Can anyone explain how that's an indicator of search quality? Perhaps they want to measure if the search engine lost or gained users?