Yahoo Releases Largest Ever Machine Learning Dataset To Researchers (tumblr.com)
An anonymous reader writes: Yahoo Labs has released a record-breaking dataset containing 110 billion interactions from 20 million Yahoo News users in 1.5TB of zipped data. The anonymized data is intended for research initiatives in artificial intelligence, including user-behavior modeling, collaborative filtering techniques and unsupervised learning methods.
Garbage out. Enjoy your 1.5tb of crap.
I hope its populated with content from Yahoo Answers; so this way the AI will never be as smart as a typical human.
Otherwise no access is granted. Which means I'll have to wait a few hours for a torrent to appear, fine...
best of intentions, road paved.
Holy crap! Yahoo released something actually useful and arguably innovative? I'm genuinely surprised.
This could be an interesting direction for Yahoo.
ML is the bee's knees.
PS-I just looked up the etymology on 'the bee's knees' and it's moderately interesting:
https://en.wiktionary.org/wiki...
The data release, part of the company’s Webscope initiative and announced on Yahoo’s Tumblr blog
What kind of news company is Yahoo when they have to make their own press releases on Tumblr.
For the last couple years I've been hitting their comics page daily, from there I'd sometimes go to finance and then regular news. Last month they nuked the comics page, and when I went to the finance page they had one of those annoying floating opaque ads that want you to click in them to make them go away. No thanks.
Haven't been to yahoo since. My reasons for going have been either A) removed; or B) made untrustworthy.
Icing on the cake? For about a week I kept trying to get the comics page, hoping it was a mistake. Then my google newsfeed told me that yahoo had deliberately deleted it. Not yahoo news, google news. Good job, yahoo.
This from the company who thought it'd be a good idea to get into video, but not put commercials in it or even charge? (Talking about Yahoo! Screen on certain devices.)
Porn bot ai..
My evil AI machine learning algorithms should have this problem licked post haste.
According to all the latest reports, there was no truth in any of the earlier reports.
The file is named "how_not_to_build_a_news_site.zip"
I'm guessing the university email address requirement is because they don't want someone using the data for commercial purposes, and ending up becoming as successful as Yahoo currently is...
It's nice of them to look out for us like that.
How many SOMADs will this dataset create? I shudder to contemplate what pure depravities will be distilled from these "interactions."
Scarce, scared, scarred, sacred... -Col. Bruce Hampton
1.5TB of user data, and all it does is download porn all day long.
Yahoo? Learn from 20 million old people, build a Geezer AI.
>> 110 billion interactions from 20 million Yahoo News users in 1.5TB of zipped data. The anonymized data
Which will be DE-anonymized in 3...2...1...