Slashdot Mirror


Yahoo Releases Largest Ever Machine Learning Dataset To Researchers (tumblr.com)

An anonymous reader writes: Yahoo Labs has released a record-breaking dataset containing 110 billion interactions from 20 million Yahoo News users in 1.5TB of zipped data. The anonymized data is intended for research initiatives in artificial intelligence, including user-behavior modeling, collaborative filtering techniques and unsupervised learning methods.

41 comments

  1. Garbage in... by Anonymous Coward · · Score: 2, Insightful

    Garbage out. Enjoy your 1.5tb of crap.

    1. Re:Garbage in... by Anonymous Coward · · Score: 0

      It's more than 1.5 TB. It's zipped currently. You are so stupid!

    2. Re:Garbage in... by Anonymous Coward · · Score: 0

      Maybe you're unfamiliar with machine learning.

      That's kinda the deal, always. Still works out pretty well, not always garbage on the end of it.

    3. Re:Garbage in... by Anonymous Coward · · Score: 0

      Ok, so it's 1.6TB unzipped.

      If they had used 7z, it would only be a few hundred megs.

    4. Re:Garbage in... by Anonymous Coward · · Score: 0

      Computers can only do what people tell them to do. They can't formulate their own independent thoughts.

      Like GP said, garbage in, garbage out.

  2. Yahoo Answers by Anonymous Coward · · Score: 0

    I hope its populated with content from Yahoo Answers; so this way the AI will never be as smart as a typical human.

    1. Re:Yahoo Answers by Anonymous Coward · · Score: 0

      Or: it will be as smart as a typical human. Which is not very much, crisis averted.

    2. Re:Yahoo Answers by epyT-R · · Score: 1

      because, you know, typical humans are intellectual powerhouses.

    3. Re:Yahoo Answers by Ol+Olsoc · · Score: 1

      because, you know, typical humans are intellectual powerhouses.

      Yup - which is why I often suggest to some of our dumber users to go to Yahoo and comment there.

      The whole issue with website commentary is that most postings are based on disagreeing with whatever was being commented on. In Yahoo's case, someone gets shot and killed, probably 90 percent of the comments are about anti-Gun control. A very small number is about any sympathy for the dead person's family.

      A negative story about Donald Trump is immediately responded to by supporters who believe that our constitutional rights can be preserved by suspending our constitutionalrights.

      You would get the impression that loud wins. That stupid is tne new smart. We even have a bit of that in here, although the level of smarts is a lot higher.

      --
      The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
    4. Re:Yahoo Answers by hairyfeet · · Score: 1

      That hasn't been what I've been seeing, in fact I find Yahoo comments to be quite fascinating as it shows the HUGE divide between the opinion of the average Joe and the political wonks.

      For those that haven't watched the sideshow lately that is Yahoo "News" let old Hairy fill you in, all the domestic stories? Have been completely taken over by a staff of the super left SJW/regressive type which is about as far from mainstream opinion as PETA and their "sea kittens" to the average person. When it came out that Trayvon Martin had posted love letters to assaulting people and had done a bunch of gangbanger poses with guns and dope? Yahoo refused to run any of that and instead ran interviews with his third grade teachers while replacing the already bullshit pic of him at 14 for one in the third grade, causing anonymous to cook up that Martin/Zimmerman meme pic. They did the same thing to Mike Brown, was quick to jump on the "poor Syrians didn do nuffins!" when the sky high rapes and assaults started pouring in, I swear its News by Tumblr blog.

      So you will see a lot of backlash in their comments, NOT just to be contrary though, but to show just how out of touch the SJW/Regressive faction of the left is to the mainstream, its pretty fascinating from a cultural point of view.

      --
      ACs don't waste your time replying, your posts are never seen by me.
    5. Re:Yahoo Answers by Anonymous Coward · · Score: 0

      You idiot, this dataset is not of user comments. It is of user *interactions* on the Yahoo news site.

    6. Re:Yahoo Answers by Ol+Olsoc · · Score: 1

      >

      So you will see a lot of backlash in their comments, NOT just to be contrary though, but to show just how out of touch the SJW/Regressive faction of the left is to the mainstream, its pretty fascinating from a cultural point of view.

      You are proving my point. You are seeing and replying to stuff that pisses you off, and using that as a universal attribute.

      People who might have a liberal bent will probably go to conservative looking articles to troll, and those with a conservative or Neocon bent will probably go likewise to any story that sounds liberal and troll there.

      I mostly go to the sports section, particularly NHL Ice Hockey. As A Pittsburgh Penguins fan, the comments after every story are largely fans of other teams whining about Sid Crosby, the team Captain, or as they call him, "Cindi Crysby". If you take the comments at face value, the most popular players are the least popular.

      Even with a pop culture perspective, Lets take for example, Kim Kardashian. If you read the comments on every story, they are almost all people begging for less stories about her. I even added a few of those until I figured out that Yahoo was trolling everyone.

      Until at this time, I find Yahoo comments are almost a 180 out of reality.

      --
      The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
    7. Re:Yahoo Answers by Anonymous Coward · · Score: 0

      Did you get your hairy feet from jacking off lots of cocks with your feet?

    8. Re:Yahoo Answers by hairyfeet · · Score: 1

      Actually YOU are proving my point by automatically ASSUMING based on your beliefs, want proof?

      Read the post again, can you find a single instance where it said I actually posted there? I simply pointed out how huge the divide was between the average person and the regressive left, which just FYI was coined by socialists to describe those trying to take over the democratic party in the same vein as the moral majority took over the republicans in the early 80s.

      So next time instead of bringing your own biases into every conversation why don't you try actually reading what has been posted, hmm?

      --
      ACs don't waste your time replying, your posts are never seen by me.
  3. Only if student or faculty at university... by kbonin · · Score: 1

    Otherwise no access is granted. Which means I'll have to wait a few hours for a torrent to appear, fine...

    1. Re:Only if student or faculty at university... by webmistressrachel · · Score: 0

      Troll, incorrect. I'm downloading my first 423Mb chunk now. Requires Yahoo! account, which I just created.

      --
      This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
    2. Re:Only if student or faculty at university... by webmistressrachel · · Score: 5, Informative

      Wait wait wait... mod me down... it made me sign up, then it made me fill more forms, then agree to alsorts of EULA's, THEN it demanded a university email address.... Sorry everyone. My download is stopped. And I just corrected the GP, wrongly. Sorry! (ducks and prepares to lose karma)

      --
      This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
    3. Re:Only if student or faculty at university... by Anonymous Coward · · Score: 0

      I agree - hit the same problem. I don't understand why a university email address is required. I am legitimately interested in the N-Gram dataset.

    4. Re:Only if student or faculty at university... by Cederic · · Score: 1

      At a guess, it's because they've provided the data set for academic research purposes.

      It's their data, it's a reasonable restriction and if there isn't already a torrent available with it anyway then I'll be surprised.

    5. Re:Only if student or faculty at university... by sbrown7792 · · Score: 1

      If all they're looking for is a .edu email address (which they probably are), sign up for a free account at australia.edu

    6. Re:Only if student or faculty at university... by Anonymous Coward · · Score: 0

      No worries there, guy. Every man makes mistakes from time to time.

  4. "is intended for research" by turkeydance · · Score: 1

    best of intentions, road paved.

  5. Something Useful/Relevant?! by Scottingham · · Score: 1

    Holy crap! Yahoo released something actually useful and arguably innovative? I'm genuinely surprised.

    This could be an interesting direction for Yahoo.

    ML is the bee's knees.

    PS-I just looked up the etymology on 'the bee's knees' and it's moderately interesting:
    https://en.wiktionary.org/wiki...

  6. Tumblr? by Anonymous Coward · · Score: 0

    The data release, part of the company’s Webscope initiative and announced on Yahoo’s Tumblr blog

    What kind of news company is Yahoo when they have to make their own press releases on Tumblr.

    1. Re:Tumblr? by xxxJonBoyxxx · · Score: 1

      >> What kind of news company is Yahoo when they have to make their own press releases on Tumblr.

      I think Couric was busy covering the Kardashians' newest pet.

  7. Somewhat outdated by Snotnose · · Score: 2

    For the last couple years I've been hitting their comics page daily, from there I'd sometimes go to finance and then regular news. Last month they nuked the comics page, and when I went to the finance page they had one of those annoying floating opaque ads that want you to click in them to make them go away. No thanks.

    Haven't been to yahoo since. My reasons for going have been either A) removed; or B) made untrustworthy.

    Icing on the cake? For about a week I kept trying to get the comics page, hoping it was a mistake. Then my google newsfeed told me that yahoo had deliberately deleted it. Not yahoo news, google news. Good job, yahoo.

    1. Re:Somewhat outdated by onepoint · · Score: 1

      Forbes and Yahoo seem to be the leading attack point for virus entry. I consistently read about, so you might be very lucky

      and to cite sources :

      Forbes https://www.hackread.com/forbe...

      and yahoo's https://blog.malwarebytes.org/...

      SideNote: Yahoo's finance page was considered on of the best until recently ( sorry no source to cite ), so I am going to guess that a new attack point will show up in due time

      --
      if you see me, smile and say hello.
  8. This? by Anonymous Coward · · Score: 0

    This from the company who thought it'd be a good idea to get into video, but not put commercials in it or even charge? (Talking about Yahoo! Screen on certain devices.)

    1. Re:This? by Anonymous Coward · · Score: 0

      This from the company that paid Mark Cuban $5 Billion for his worthless bullshit company and promptly dismantled it.

    2. Re:This? by BeaverCleaver · · Score: 1

      Um.... that DOES sound like a good idea. Maybe not for Yahoo stockholders, but for anyone who wants free video without those annoying ads...

  9. Bot by Anonymous Coward · · Score: 0

    Porn bot ai..

  10. Not going to be anonymous for long. by KingBozo · · Score: 2

    My evil AI machine learning algorithms should have this problem licked post haste.

    1. Re:Not going to be anonymous for long. by Anonymous Coward · · Score: 0

      Clustering != identity unless there are components of personally identifiable information remaining.

  11. Appropos tagline: by Anonymous Coward · · Score: 0

    According to all the latest reports, there was no truth in any of the earlier reports.

  12. how_not_to_build_a_news_site.zip by tlambert · · Score: 1

    The file is named "how_not_to_build_a_news_site.zip"

    I'm guessing the university email address requirement is because they don't want someone using the data for commercial purposes, and ending up becoming as successful as Yahoo currently is...

    It's nice of them to look out for us like that.

  13. I humbly request the wisdom of the Cube and Xalton by Bosconian · · Score: 1

    How many SOMADs will this dataset create? I shudder to contemplate what pure depravities will be distilled from these "interactions."

    --
    Scarce, scared, scarred, sacred... -Col. Bruce Hampton
  14. Don't hook it up by Anonymous Coward · · Score: 0

    1.5TB of user data, and all it does is download porn all day long.

  15. Way to go! by Anonymous Coward · · Score: 0

    Yahoo? Learn from 20 million old people, build a Geezer AI.

  16. "Anonymized Data" by xxxJonBoyxxx · · Score: 1

    >> 110 billion interactions from 20 million Yahoo News users in 1.5TB of zipped data. The anonymized data

    Which will be DE-anonymized in 3...2...1...

    1. Re:"Anonymized Data" by ConceptJunkie · · Score: 1

      Yeah, I recall when AOL released some anonymized data about 10 years ago, and it was de-anonymized pretty quickly.

      --
      You are in a maze of twisty little passages, all alike.