Slashdot Mirror


Obtaining Archives of USENET?

Academic Researcher asks: "Google took over Deja and built the mammoth Google Groups, which is a near complete archive of USENET dating back to early 1980s. For an academic project, I need to analyse a lot of USENET data. The Terms of Service for Google Groups will not allow automated access (and even so, I'd have to write a bunch of tools and reverse engineer it all). Inquiries about purchasing copies of the archive have gone unanswered. No one else seems to have such an archive. Apart from this meaning that world has one single USENET archive (I hope they have backup floppies!), how can I obtain historical data for research purposes ? I'd happy pay money for DVD's of archival material if they were available. Can anyone help"

6 of 86 comments (clear)

  1. You understand how much data you're talking about? by rthille · · Score: 4, Interesting


    You're going to want to talk with Jim Gray of this article:
    http://slashdot.org/article.pl?sid=03/07 /10/235225 1

    --
    Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
  2. History of selling Usenet archives by shoppa · · Score: 5, Interesting
    In the early 1990's, a company in Vancouver BC proposed a monthly distribution of Usenet via CD. This sparked extensive discussions in the newsgroups (at the time almost exclusively dominated by academic people, of course) with a lot of resentment that some company was going to be making money by selling *their* posts. (Do a Google groups search for "usenet on CD" to see some of these. They also mention Walnut Creek.)

    In any event the massive number of binary posts (porn, movies, warez, etc) on usenet in the past few years would make the "full" archive of the past few years number in the tens of thousands of CD's. A "full" usenet feed passed up the bandwidth of a T1 about 1998 IIRC.

    Some individuals archive individual usenet groups, or the group is gatewayed back and forth to a mailing list that is archived. This IMHO is more appropriately managable for research.

    The announcement of Google Groups with a 20 year archive acknowledges several sources for the broad timeframe of the archive (as well as the donors to the preceding Dejanews archive); you might want to check out their specific work.

    1. Re:History of selling Usenet archives by dougmc · · Score: 4, Interesting
      In any event the massive number of binary posts (porn, movies, warez, etc) on usenet in the past few years would make the "full" archive of the past few years number in the tens of thousands of CD's. A "full" usenet feed passed up the bandwidth of a T1 about 1998 IIRC.
      NOBODY has a full archive of the posts made to alt.binaries.* -- I doubt that even the NSA has that much storage to spare for it.

      I believe that a full feed is around 300 GB per day now, with 99+% of that being alt.binaries.*.

  3. Re:Google by Big+Sean+O · · Score: 4, Interesting

    If it's really for an academic project (say a master's thesis), you might want to direct your inquiry to Craig Silverstein. Since he's grad student it's likely he would be more interested in your project than the Google corporate types.

    Who knows, mebbe you can parlay the project into an internship into a real live job.

    --
    My father is a blogger.
  4. Re:Hey, that's my copyrighted data... by Arioch+of+Chaos · · Score: 3, Interesting

    Seriously, this should actually work. You do not automatically waive all rights just by posting to usenet. I don't think disk space or bandwidth is the only thing stopping them from archiving binaries. If they were not concerned about getting sued, wouldn't they at least archive all the popular binaries (i.e. porn and warez)? They could easily become the world's largest pay site ;-)

    --
    IAAAL - I am actually a lawyer ;-)
  5. Re:Hey, that's my copyrighted data... by Anonymous+Brave+Guy · · Score: 2, Interesting

    Hey... Cunning thought... <ahem> DMCA... <ahem> subpoena... <ahem>

    Given that Google may well not have a legal leg to stand on making any sort of money out of using posts where copyright is owned by the poster (no, I don't buy the "you've given implicit agreement" arguments for a second, and I'm betting they don't want to risk their whole business finding out whether a court does either) there's an interesting "deal" to be made here. :-)

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.