Slashdot Mirror


PetaBox: Big Storage in Small Boxes

An anonymous reader writes "LinuxDevices.com is reporting that a Linux-based system comprising more than a petabyte of storage as been delivered to the Internet Archive, the non-profit organization that creates periodic snapshots of the Internet. The PetaBox products, made by Capricorn Technologies, are based on Via mini-ITX motherboards running Debian or Fedora Linux. The IA's PetaBox installation consists of about 16 racks housing 600 systems with 2,500 spinning drives, for a total capacity of roughly 1.5 petabytes, according to the article. Now to strap one of those puppies to my iPod!" The Internet Archive continues to astound.

295 comments

  1. Good to see. by Anonymous Coward · · Score: 5, Funny

    For all the jokes out there about people 'downloading the internet' it's good to know someone is actually doing it.

    1. Re:Good to see. by FireballX301 · · Score: 4, Funny

      Who the heck cares about the rest of the internet, can this thing hold all the pr0n?

    2. Re:Good to see. by Anonymous Coward · · Score: 5, Funny

      But does it run Lin... um.

      How about a Beo.. oh damn

    3. Re:Good to see. by BlackMesaLabs · · Score: 0, Redundant

      In soviet russia, internet downloads YOU!

    4. Re:Good to see. by bigberk · · Score: 2, Insightful

      people from my univ might recognize this... there was a famous guy in our engineering faculty who, back in the 90s, had written some kind of an automated porn downloading app. It was running on their UNIX servers but he left it running unattended. apparently he had no quota because within a few days he had filled up the entire system storage with porn, several hundreds of megabytes worth which was very substantial back then.

      I had a similar experience, I was playing around on irc back when we were swapping video files through DCC. apparently some downloading got out of hand and paged the admin, who contacted me and politely pointed out that I had a process running wild and filling /tmp... oops, must be an experiment gone wrong I had to say

    5. Re:Good to see. by Panaphonix · · Score: 1

      In America, Google does it for no money!

    6. Re:Good to see. by -brazil- · · Score: 1

      Reminds me of that little program I had to write for a class whose job it was to print all the permutations of the numbers from 1 to n (n being a parameter). Happy that I'd found a very efficient solution I tested it with n=3, n=4, and all was fine. Then, for kicks, I started it with n=20. I knew that it would take a while, so I did a back-of-an-envelope calculation of how much RAM it might need to assemble the result... and came up with something like 20GB. Which was more than all the HDs of the 50 computers in the pool together had at that time. Oops.

      --

      The illegal we do immediately. The unconstitutional takes a little longer.
      --Henry Kissinger

    7. Re:Good to see. by paulatz · · Score: 1

      Have you evere tried to compile a LaTeX document including its self?

      --
      this post contain no useful information, no need to mod it down
    8. Re:Good to see. by poopdeville · · Score: 1

      Yep. I got the error:

      ! LaTeX Error: \include cannot be nested.

      Not particularly impressive.

      --
      After all, I am strangely colored.
    9. Re:Good to see. by Council · · Score: 5, Interesting

      In one of the weirder perspective exercises I've ever conceived:

      5 petabytes of storage is enough for a brief five-minute DVD-quality sex scene for each person of legal age in the US (two to a scene). 100 petabytes would be five minutes of porn of every pair of people in the world.

      I actually wonder about this a little; how many women have posed nude on the internet? There seem to be an awful lot; I haven't been able to see them all (though I will continue to try). Where do they mostly come from, I wonder.

      --
      xkcd.com - a webcomic of mathematics, love, and language.
    10. Re:Good to see. by DJCF · · Score: 1

      *Writes above factoid down* I love it!

      I've always wanted to know the answer to your second question as well, actually, hopefully someone else will be able to answer or at least give some interesting insights. Another question is "why do they do it?", and it's not something I've easily been able to work out. A friend of mine (one of those born-again Christian types) admitted in one of those email forward-things to posing nude, which I can't quite believe. Another friend's ex-girlfriend apparently has a suicidegirls.com account, which is why he never visits the site anymore. Unfortunately, going up to people and asking "Why do you pose nude?" isn't exactly the best way to make friends or influence people.

      So yeah. How many have done it, and why do they do it?

    11. Re:Good to see. by budgenator · · Score: 1

      Most of them probably have an exhibitionist streak in them, tend to need their self-esteem externaly reinforced. A good photographer/director make a sestion almost seductive for the model and and get many to go alot farther than the model/actress intended. It's interesting how a house-mouse can turn into a wild-cat with the right push. Photographers almost always retain full-rights to their photos which can be interesting because a set a nudes can be taken of a young starving want-a-be actress, forgotten for years to only resurface just after the big-name actress's new movie set box-office records. Money is always a strong motivater also.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    12. Re:Good to see. by Moderatbastard · · Score: 1, Funny

      Even the *cough* russian *cough* sort? In that case they should call it a pedobox.

      --
      1/3 of jokes get modded OT. If you get the joke, mod 1 in 3 insightful/interesting/underrated to restore karma balance.
    13. Re:Good to see. by clone22 · · Score: 1

      That would make you a peta-file.

      --
      Ask me about my vow of silence!
    14. Re:Good to see. by zkn · · Score: 0, Redundant

      How about: In soviet russia, the internet downloads you.

    15. Re:Good to see. by Antique+Geekmeister · · Score: 1

      They're all the same woman. It's amazing what you can do with a false nose and glasses....

    16. Re:Good to see. by Mark+Hood · · Score: 4, Funny

      There seem to be an awful lot; I haven't been able to see them all (though I will continue to try). Where do they mostly come from, I wonder.

      Let me get this straight, you're trying to see all the porn in the world, and you still don't know where babies come from? :)

      --
      Liked this comment? Why not buy me something nice
    17. Re:Good to see. by Anonymous Coward · · Score: 0

      I'm glad I'm not the only one who's wondered this too.

      But you have to admit, there are cycles. Popular, *legitimate* pornstars aside, you generally see a handful of the same girls all around the same time. A lot of them, I figure are licensed in a sense by those larger content providing companies. You see the link on the bottom of the page looking for Webmasters to make money. Once they get too old, I suppose, then they find new girls.

      Even taking those girls out of the equation, there still seems to be a large pool of amateur *talent* that is seemingly endless... and the things they will do, unbelievable. If there are so many girls out there who will do a threesome (which is now mild) for the camera, how come they're so hard to find on a night on the town? ;)

      Many of these girls are very pretty. You think they'd just do what other pretty girls do... control men with the possibility of sex!

    18. Re:Good to see. by callipygian-showsyst · · Score: 0, Offtopic
      5 petabytes of storage is enough for a brief five-minute DVD-quality sex scene for each person of legal age in the US (two to a scene). 100 petabytes would be five minutes of porn of every pair of people in the world.

      If you're dealing with petabytes, you'll have "petafiles". Why constrain yourself to "legal age?"

    19. Re:Good to see. by jzuska · · Score: 1

      California

    20. Re:Good to see. by paulatz · · Score: 1
      You must use a trick, save this code as test.tex:
      \documentclass[12pt,a4paper,oneside,italian]{amsbo ok}
      \usepackage{pslatex}
      \usepackage[T1]{fontenc }
      \usepackage[latin1]{inputenc}
      \usepackage{grap hicx}

      \makeatletter
      \usepackage{babel}
      \makea tother
      \begin{document}
      test

      \includegraphics {test.ps}
      \end{document}
      than run
      $ latex test.tex
      $ dvips -o test.ps test.dvi
      and wait until the rendering is finished. (Yes, I'm offtopic)
      --
      this post contain no useful information, no need to mod it down
    21. Re:Good to see. by Anonymous Coward · · Score: 0

      For a few months, I dated one of the fine ladies on nakkidnerds. She posed mostly for the money. She also, I think, liked the sort of attention she sometimes got from being a pin-up girl... but mostly it was the dough. She's in one of those very difficult jobs where you do a lot of hard, worthwhile work for very little money.

    22. Re:Good to see. by Anonymous Coward · · Score: 0

      5 petabytes of storage is enough for a brief five-minute DVD-quality sex scene for each person of legal age in the US (two to a scene).

      Five minutes? Egad, have you not heard of foreplay and lots of it?

      Where do they mostly come from, I wonder.

      Speaking of professionally-shot amateur-model porn:
      Florida and California, the only places in the US where porn laws are open enough to be good for business. LA, of course, but also Tampa and Miami. Drop an ad for "nude modelling" in any college paper and you'll get more women than you can possibly shoot. It's all about the money for most of them (Ladies: Don't let yourself get gypped! Good producers pay *very* well for video.) A few of them definitely have "issues" that they're trying to work out (with mixed success).

      Also, a surprisingly large amount of what you see in the US is made in Canada.

      (I used to edit porn. There's no better way to suck the fun out of it.)

    23. Re:Good to see. by Lilah · · Score: 1

      You know what else is good to see? Two sys admins macin in a DC. http://www.7simga.biz/ Check it out. It's free. Peace.

      --
      The mark of your ignorance is the depth of your belief in injustice and tradgedy. What the caterpillar calls the end of
    24. Re:Good to see. by SaltyMonkey · · Score: 1

      Sysadmin soft porn. More of what this industry needs.

    25. Re:Good to see. by Anonymous Coward · · Score: 0

      I'm packing right now for Canada and/or Miami! Do you think HotJobs.com has opening for porn video editor?

    26. Re:Good to see. by billcopc · · Score: 1

      Cuz it's fun, silly!

      And it's basically free money for anyone who doesn't have their head stuck in Jesusland.

      --
      -Billco, Fnarg.com
    27. Re:Good to see. by DJCF · · Score: 1
      free money for anyone who doesn't have their head stuck in Jesusland.

      And even those who do, apparently. And the girl with the SG account doesn't get paid for it though because she's not a featured model. Although "It's fun" wasn't quite the kind of long, detailed, insightful answer I was anticipating, cheers anyway.

    28. Re:Good to see. by ncc74656 · · Score: 1
      For all the jokes out there about people 'downloading the internet' it's good to know someone is actually doing it.

      It's gonna take a sh*tload of paper to print it all out so the PHB can read it, though...or has his Etch-a-Sketch been suitably upgraded yet?

      --
      20 January 2017: the End of an Error.
    29. Re:Good to see. by PingPongBoy · · Score: 1



      to print all the permutations of the numbers from 1 to n (n being a parameter).

      Then, for kicks, I started it with n=20. I knew that it would take a while, so I did a back-of-an-envelope calculation of how much RAM it might need to assemble the result... and came up with something like 20GB


      Now wait. Once you've considered a permutation, you can forget it. You don't need more than O(n) memory.

      If you were selecting without replacement the permutations at random, you will need to store indexes to the permutations, but the indexes can be compressed whenever blocks of permutations are selected.

      --
      Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
    30. Re:Good to see. by -brazil- · · Score: 1

      Now wait. Once you've considered a permutation, you can forget it. You don't need more than O(n) memory.

      Depends on the algorithm. My algorithm required keeping all permutations for 1..n-1 around in order to compute those for 1..n. It was faster than all other solutions (including that of the professor).

      --

      The illegal we do immediately. The unconstitutional takes a little longer.
      --Henry Kissinger

  2. Storage galore! by Bananatree3 · · Score: 2, Funny

    If, If only I could get a hold of one of those, I could Rival GOOGLE! Yes! I can become the next internet craze with my super, duper search engine crawling the web! I have the space, now I just need a connection in the middle of Alaska fast enough to rival google...

    1. Re:Storage galore! by Anonymous Coward · · Score: 0

      Screw that! You could have your own personal mirror of alt.binaries.pictures.erotica.*

    2. Re:Storage galore! by Mehtuus · · Score: 1

      Are you saying that our dog sleds aren't fast enough?

      --
      http://mehtuus.googlepages.com
  3. You hear about the Petabox? by Dancin_Santa · · Score: 5, Funny

    Michael Jackson was heard breathing a sigh of relief. He thought it was where they sent Petafiles.

    R. Kelly was scrambling to find the company's phone number.

    1. Re:You hear about the Petabox? by pyrrhonist · · Score: 4, Funny
      Michael Jackson was heard breathing a sigh of relief. He thought it was where they sent Petafiles.

      Hmmm, this seems almost familiar...

      Let's analyze this situation:

      • The time on our posts is exactly the same.
      • There's a difference of only 3 in the post id values.
      • I was unable to foresee the R. Kelly connection.
      This can only mean one thing... You are the Kwisatz Haderach!

      GET OUT OF MY MIND!!!

      --
      Show me on the doll where his noodly appendage touched you.
    2. Re:You hear about the Petabox? by pyrrhonist · · Score: 1

      The really funny part is that my post got modded redundant.

      --
      Show me on the doll where his noodly appendage touched you.
    3. Re:You hear about the Petabox? by kyrre · · Score: 0, Redundant

      The really funny part is that my post got modded redundant.

      What is funny about that? It is redundant. It had already been said. The idea behind moderation is not to reward witty types for their wittines, it is about filter out noise for the reader. If a joke is told two times one of them really is redundant. It would be noise.

      Oh, this comment is also redudant. One can probably read all about this here in the Slashdot FAQ

      If you don't agree with the redundant modifier you could always set redundant to be +/-0 points in stead of -1.

    4. Re:You hear about the Petabox? by Patik · · Score: 1

      Perhaps that's why your other post was modded -1 Redundant.

    5. Re:You hear about the Petabox? by Arzach · · Score: 1

      "This can only mean one thing... You are the Kwisatz Haderach!"

      So, does that make *YOU* the Ersatz Haderach?

    6. Re:You hear about the Petabox? by pyrrhonist · · Score: 1
      What is funny about that?

      It's funny, because the times on the posts are exactly the same.

      Oh, this comment is also redudant.

      No, it's flamebait.

      --
      Show me on the doll where his noodly appendage touched you.
  4. It's for PetaFiles! by pyrrhonist · · Score: 1, Funny
    LinuxDevices.com is reporting that a Linux-based system comprising more than a petabyte of storage as been delivered to the Internet Archive

    Wait, is a petabyte sized file called a petafile?

    If so, then this storage must be for all the recent Michael Jackson coverage.

    --
    Show me on the doll where his noodly appendage touched you.
    1. Re:It's for PetaFiles! by Anonymous Coward · · Score: 0

      Err, do you call a gigabyte file a gigafile? Or a megabyte file a megafile? Didn't think so.

  5. Mandatory by huntse · · Score: 1, Funny

    Imagine a beowulf cluster of these babies... ...oh, it already is one. nevermind, I'll get my coat.

    1. Re:Mandatory by MrDoh! · · Score: 2, Funny

      Ah, you must be new here.
      (sorry)

      --
      Waiting for an amusing sig.
    2. Re:Mandatory by jcuervo · · Score: 1
      Aren't you tired of the same old shit, repeated over and over again?
      Like a Beowulf cluster of 486 boxes? ...
      --
      Assume I was drunk when I posted this.
    3. Re:Mandatory by Anonymous Coward · · Score: 0

      Aren't you tired of the same old shit, repeated over and over again?

      No. And I for one welcome our new slashdot meme reposting overloards

    4. Re:Mandatory by farnett · · Score: 1

      Thank you!

      beowulf/soviet russia/i for one/all your base etc...

      Do you laugh when you read these?

      You might have once, but do you now?

    5. Re:Mandatory by afd8856 · · Score: 0, Offtopic

      Then I, for one, welcome our incredible smart, witty and original slashdoter overlords.

      --
      I'll do the stupid thing first and then you shy people follow...
    6. Re:Mandatory by RichardX · · Score: 1

      Not funny. I'm sick of hearing the same joke on every hardware related story. Really. Get over it.

      It's almost like a... Beowulf Cluster of unoriginal, unfunny jokes! I wonder if it runs Linux? Maybe in Soviet Russia, at least. Or maybe only for old people in Korea

      Seriously, you're never gonna be free of the tyranny of Slashdot memes - at least, not without leaving Slashdot. Yes, they're done to death, run into the ground, dug up, and done to death all over again.. but that's the way it goes on here.

      Just look at any story remotely involving optics or lasers, and count the number of "do not look into laser with remaining eye" posts. It's like some kind of special nerd tourettes.. people on here just SOVIETRUSSIA! suddenly scream out various Slashdot HOTGRITS! memes DOESITRUNLINUX!

      --
      Curiosity was framed. Ignorance killed the cat.
    7. Re:Mandatory by QMO · · Score: 1

      "If you would read it from someone else, would it still be funny?"
      Sometimes.

      "Aren't you tired of the same old . . ."
      Nope.

      "Now I feel better :)"
      Slashdot cathartic therapy does it again!

      --
      Exam 4/C again. Maybe I'll do better this time.
    8. Re:Mandatory by daikokatana · · Score: 1
      Of course!

      I for one laugh when all your beowulfs belong to russians.

      No, seriously: I will laugh at that moment.

      --
      http://jcsnippets.atspace.com/ - a collection of Java & C# snippets
    9. Re:Mandatory by GraemeDonaldson · · Score: 1

      In Soviet Russia, same old shit is tired of you repeated over and over again.

      --
      I think, therefore I am. I think?
    10. Re:Mandatory by sharpestmarble · · Score: 1

      Indeed.

      Want to get a comment modded funny on /.? Make some reference to porn or the /. effect, and do so fairly soon after the article is posted. And make it on topic.

      --
      AC's modded -6. I don't see you, I don't mod you, anything you say is lost. Don't like it? Don't be a coward.
  6. archive.org by Nasarius · · Score: 4, Interesting
    Internet Archive, the non-profit organization that creates periodic snapshots of the Internet.

    They do a lot more than that! I've just been downloading some Warren Zevon shows from their Live Music Archive.

    --
    LOAD "SIG",8,1
    1. Re:archive.org by BrianGa · · Score: 1

      Don't skip over the 2,851 Grateful Dead shows!

    2. Re:archive.org by SPY_jmr1 · · Score: 1

      What do they do when Archive.org tries to index itself?

    3. Re:archive.org by WeblionX · · Score: 1

      It shouldn't! But if it tried, it wouldn't, unless they forgot to deny themselves in their robot.txt file. Which they did.

      --
      (\(\
      (=_=) Bani!
      (")")
  7. copyright by DualG5GUNZ · · Score: 5, Interesting

    Not to sound like an advocate or anything... But how is it that the Internet Archives project resists claims of copyright infringement and the likes when they have copies of entire websites in their records?

    --
    "I'm a philosophy major. That means I can think deep thoughts about being unemployed." -- Bruce Lee
    1. Re:copyright by seifried · · Score: 3, Informative

      You can exclude them from your website using the robots.txt:

      User-agent: ia_archiver
      Disallow: /

      For example if you go to archive.org and plug my site into the wayback machine:

      We're sorry, access to http://www.seifried.org/ has been blocked by the site owner via robots.txt.

      and you can also request them to expunge your site from the archive.

      They go out of their way to make it easy to prevent your site being copied (more so then most search engines).

    2. Re:copyright by Baricom · · Score: 1

      I feel they make it too easy. IA blocks not only the present version of the site, but also every page of every past version.

      I can't get older pages of a web site I operated several years ago because a robots.txt file was inadvertently added that blocks it. At the time, I didn't know about the Internet Archive, and as a result potentially years of this site's history is gone.

    3. Re:copyright by Anonymous Coward · · Score: 0

      Unfortunately I'm sure that legal pressures and the cost of legal defense will almost guarantee that the Archive never gets much more aggressive with its archiving than it is now.

    4. Re:copyright by Leroy_Brown242 · · Score: 1

      "Historical Purposes"

    5. Re:copyright by Anonymous Coward · · Score: 0

      Not good enough. They need my express permission to use my copyrighted material, not a note from me to make it not ok.

    6. Re:copyright by Anonymous Coward · · Score: 1, Funny

      I accidently clicked your site and somehow it copied itself into my computers memory. Please don't sue me.

    7. Re:copyright by IntergalacticWalrus · · Score: 2, Interesting

      If they actually did that, the archive would be worthless.

      Besides, the IA only archives HTML pages, and small images in them, nothing else. If you consider your HTML content to be unproductible copyrighted material, might I ask why the hell is it publically accessible on the Web in the first place?

    8. Re:copyright by spacefight · · Score: 2, Interesting

      The Internat Archive is fucking up big time with their robots.txt stuff. If you exclude a site from beeing shown, it doesn't show anything, correct. But: If this site goes offline, the archived pages of that former site are all available, not blocked at all.

    9. Re:copyright by ubernostrum · · Score: 1

      I know that the US Copyright Office has granted a DMCA exemption for at least some of the material they archive.

    10. Re:copyright by Anonymous Coward · · Score: 0

      And, tell me, why do you block them ?

    11. Re:copyright by trifish · · Score: 2, Interesting

      But how is it that the Internet Archives project resists claims of copyright infringement and the likes when they have copies of entire websites in their records?


      Did you ask this question when Google introduced site cache several years ago?

    12. Re:copyright by Ronald+Dumsfeld · · Score: 1
      You can exclude them from your website using the robots.txt:
      They should ignore robots.txt altogether if they want to be a truly useful resource.

      Particularly for a robots.txt like this.
      --
      Where's the Kaboom?
      There's supposed to be an Earth-shattering Kaboom.
    13. Re:copyright by generic-man · · Score: 2, Insightful

      Yes, I did. I got two responses, neither of which answered my question.

      1. FAIR USE!
      2. Google is merely providing a service. If you don't like it you can opt out.

      The Google Cache is not fair use, as it reproduces the entirety of a web page's text for none of the purposes for which Fair Use is defined. (Under Fair Use you are entitled to use a portion of a copyrighted work, not the whole thing.)

      The second one just cracks me up. I thought the Slashdot crowd didn't like being asked to opt out.

      Now, trifish, how can the Internet Archive evade copyright laws by reproducing the entirety of many copyrighted pages? Don't try and argue that they're a library. Libraries buy books; they don't photocopy them.

      --
      For more information, click here.
    14. Re:copyright by Anonymous Coward · · Score: 0

      they're non-profit, so nobody bothers to sue them.

    15. Re:copyright by generic-man · · Score: 2, Insightful

      If you consider your HTML content to be unproductible copyrighted material, might I ask why the hell is it publically accessible on the Web in the first place?

      If you consider your music to be copyrighted material, might I ask why the hell it's being played on the radio in the first place?

      If you consider your book to be copyrighted material, might I ask why the hell it's being lent out in the library in the first place?

      If you consider your movie to be copyrighted material, might I ask why the hell it's being broadcast on HBO in the first place?

      Just because something is available for free doesn't mean that the producer has granted you a permanent license to distribute it for commercial gain, as Google does with its cache.

      --
      For more information, click here.
    16. Re:copyright by budgenator · · Score: 1

      They really saved my ass more than once, I'm sure I'm not special or anything.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    17. Re:copyright by drsquare · · Score: 1

      I'm afraid that the burden is on the archive.org not to archive copyrighted material, not on the copyright holder to explicitely deny people permission.

      If they really wanted to go out of their way, they would ask permission before illegally copying and distributing copyrighted material for which they do not have permission.

    18. Re:copyright by telecsan · · Score: 1

      If you publish a book and hand out copies for free, then the cost to the library is zero.

    19. Re:copyright by generic-man · · Score: 1

      True. But that still doesn't make the book public domain solely because it's free.

      --
      For more information, click here.
    20. Re:copyright by QMO · · Score: 1

      Copying music isn't necessary to listen to it on the radio.
      Copying a book isn't standard practice just to know the story.
      Copying a movie isn't required to see it on HBO.

      Making a copy of html from the internet IS part of getting your computer to display it.

      I hope that clears up the difference that was implied in the GP.

      --
      Exam 4/C again. Maybe I'll do better this time.
    21. Re:copyright by telecsan · · Score: 1

      True. As for myself, I would think the more appropriate thing from a true archival point of view would be to archive .pdf'd or .ps'd versions, almost similar to libraries turning newspapers into microfiche.

    22. Re:copyright by mlk · · Score: 1

      Google does with its cache

      Does IA "distribute it for commercial gain"?

      --

      The OP's question will not be answered until someone takes it to court.
      But then I don't think it would go far, IA will bend over backwards to remove on request.

      --
      Wow, I should not post when knackered.
    23. Re:copyright by generic-man · · Score: 2, Informative

      Just because you have a cache of something doesn't give you the right to redistribute it for commercial gain. The initial author still retains ownership.

      Imagine if you had a device designed to record audio and reproduce it. That doesn't mean that you can resell your recordings; the original author retains ownership.

      I'm not claiming that it is unethical to cache web pages, just that companies such as Google presume that they have the right to redistribute content to which they own no rights. The web is not like Usenet, where each server hosts others' posts; content is served by an author for as long as the author wants.

      --
      For more information, click here.
    24. Re:copyright by Anonymous Coward · · Score: 0

      Of course not. No .org site would ever seek commercial gain from content, especially others' content.

      As a lawyer, I believe that the archive.org doesn't distribute its services for profit.

    25. Re:copyright by deimtee · · Score: 1

      They do ask permission. They say "Hello Mr Robots.txt, am I allowed to copy this?", and Mr Robots.txt says either "Yes" or "No".

      --
      I'm guessing that wasn't on their radar screen...
    26. Re:copyright by RandomLetters · · Score: 1

      It's scary how often the work Iraq appears in that file...

      Can you just block the whole site with Robots.txt or do you have to block each directory?

      I would love to see what they are not blocking. Promotional stuff, i suspect. Kissing babies, Happy Soldiers...

    27. Re:copyright by swelke · · Score: 1

      From what I have read, it would appear that they are using the fair use exclusion along with (possibly) the DMCA's network-provider loophole. I would guess that they're using the network-provider loophole based on the text of their copyright policy here (scroll to the bottom). The way the DMCA exclusion works is that they have to fix copyright infringement as soon as they're notified, and that looks like what they're asking for (and making it as difficult as they're legally allowed to, I might add).

      --
      Have you ever wondered How to Take Over
    28. Re:copyright by NanoGator · · Score: 1

      "If you consider your HTML content to be unproductible copyrighted material, might I ask why the hell is it publically accessible on the Web in the first place?"

      Ad revenue.

      --
      "Derp de derp."
    29. Re:copyright by MushMouth · · Score: 1

      You can still contact them to have them remove years and date. If the archive was opt-in it would be useless.

    30. Re:copyright by Spider-X · · Score: 1

      actually they archive big images too. check out www.cat-scan.com (from 1997 or 1998 and you'll see).

      --
      witty sig goes here
    31. Re:copyright by mmkkbb · · Score: 1

      the purposes for which Fair Use is defined. (Under Fair Use you are entitled to use a portion of a copyrighted work, not the whole thing.)

      Fair Use is not defined as a list of things which are acceptable. The US Copyright Office explains this here, and the relevant section of the copyright law which I believe you are referring to reads:

      "the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright."

      If this ever got to court (which I doubt would happen if they respond quickly to requests to expunge) then they could probably argue that they fall under scholarship or research, but they don't even need to. Other purposes may be considered as well.

      --
      -mkb
    32. Re:copyright by bhsx · · Score: 1

      Go to archive.org. It is funded through grants and is a non-profit org. This is not done for commercial gain. It is done to record the history of human documentation when production/distribution costs drop to zero.

      --
      put the what in the where?
    33. Re:copyright by joeljkp · · Score: 1

      A question about fair use (I know you're not a lawyer, just wondering what you think):

      I run a site called the [url=http://freemedia.ballsome.org]FreeMedia Project[/url]. The goal is to harvest closed media on the internet, like WMV-encoded video clips and WMA-encoded audio, and transcode them to an open format, providing a reposity of these copies as a public service.

      So far, I've been asking for permission for each video I do, but this obviously doesn't allow for much of a service. Mostly I get tech-related films by small-time groups.

      I'd like to do a movie trailer or something like that, but the copyright issue has prevented me from trying it. Do you think this would fall under any kind of fair use exception like the IA may have?

      I've emailed the EFF about it, but haven't heard anything back.

      --
      WeRelate.org - wiki-based genealogy
    34. Re:copyright by generic-man · · Score: 1

      Okay. Now how can Google and Yahoo (yes, Yahoo has a cache too) get away with this? They're clearly for-profit companies.

      --
      For more information, click here.
    35. Re:copyright by joeljkp · · Score: 1

      Sorry about the bbcode link. Use the link in my sig.

      --
      WeRelate.org - wiki-based genealogy
    36. Re:copyright by mmkkbb · · Score: 1

      I think you should ask a lawyer!

      --
      -mkb
    37. Re:copyright by bbc · · Score: 1


      "Do you think this would fall under any kind of fair use exception like the IA may have?"

      Fair use is a legal defense. If Miramax or Paramount or whoever decides to sue you, your cheap little lawyer can try and convince a grumpy judge that your use was fair and that he should not put you in a position where you have to pick up the soap.

      "I've emailed the EFF about it, but haven't heard anything back."

      Mail 'em again. Sometimes they're busy.

    38. Re:copyright by petermgreen · · Score: 1

      the internet archive is a sufficiantly major project that i'd expect them to have competent lawyers advising them on the legality of what they are doing.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    39. Re:copyright by drsquare · · Score: 1

      I'm afraid that a text file which may or may not exist is not the copyright holder. The copyright holder is NEVER asked. This is a legal time-bomb waiting to happen.

      An analogy: "I took some things from someone's house, but it's not burglary, I asked permission. I went to the birdtable and said 'Hello, am I allowed to take things from this house?' The bird table didn't say no, so I was allowed to break in."

      Can you see how stupid you are?

    40. Re:copyright by deimtee · · Score: 1

      No idiot, no-one is breaking in or stealing anything. The content is still there.
      Robots.txt is an electronic version of the little notice at the front of books which states under what conditions you can copy the book, or parts of it. If you don't put the notice there then you don't care to protect your content.

      --
      I'm guessing that wasn't on their radar screen...
  8. Petabox? by eclectro · · Score: 4, Funny


    Isn't that what naked girls climb out of to protest fur coats?

    Thank you, I'll be here all week.

    --
    Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
    1. Re:Petabox? by Anonymous Coward · · Score: 2, Funny

      Actually, it's what geeks would like to do, but are seldom given the chance.

  9. Mega Systems by Anonymous Coward · · Score: 0

    Everyone, please feel free to chime in so I don't feel like such a goon for saying this, but damn, these big systems are a reason to live, aren't they? I mean, I saw the rack of red on the link, and it just makes me drool. It's not so much the storage, but the logistics of the thing. I mean, I get the same feeling when I watch Jurassic Park, and whats-her-name is pumping up the electrical charger to get the main switch going. Charging a switch? Welcome to flavour country.

    Anywho, just wanted to expose my hard-on for hardware, my raison d'etre. Someone, give me a job a datacenter, or a power plant. I beg you.

    1. Re:Mega Systems by name773 · · Score: 2, Funny

      large bundles of neatly organized cable... ohh man.

    2. Re:Mega Systems by poor_boi · · Score: 1

      You're a twisted, demented man. You need to have your petaphiliac tendencies treated by a squad of trained monkeys wielding high voltage cattle prods powered by UPS.

  10. Okay, I admit it. by halcyon1234 · · Score: 1

    Forget the jokes. That setup kicks the ass out of any beowulf cluster. Heh.

  11. Redundancy by Anonymous Coward · · Score: 0

    Haven't read TFA yet, but what are they doing with regard to redundancy? With that many drives whirring around more than a couple are likely to go bad over time. Do they have a set of dedicated redundant drives to serve as backups?

    1. Re:Redundancy by Eric604 · · Score: 1

      TFA only says it does NOT use RAID but JBOD (just a bunch of disks).

    2. Re:Redundancy by MushMouth · · Score: 1

      When a drive dies, it's data tends to go with it. It's sad, but similar to a drinking bender, or e.c.t. sometimes things are better that way. One of Brewster's drving ideals is "do it as cheaply as possible". (sometimes penny wise/dollar foolish) At first, 1996 or so, the archive used tapes on a tape robot, which was cool looking, but slow and really painful to use as the tapes degraded, as the drives aged they stopped being able to read old tapes (the heads had to be re-aligned far to often). Finally Circa 2000 the Archive was able to get a killer deal (1/3 retail) on consumer HP boxes and 80 Gig drives, and now the data has never been more safe. Every now and then, someone wants to get a copy of the archive (there is one in Egypt and I think the LOC) but I don't know how up to date they are kept.

  12. Umm.. by coldeeze · · Score: 0

    What kind of power bill are those guys getting and is their service really worth it?

    1. Re:Umm.. by Wdomburg · · Score: 1

      "Despite its large size, the IA's PetaBox installation draws only about 50kW of power..."

      There are advantages to the VIA Eden platform.

  13. Only 1.5 petabytes? by Anonymous Coward · · Score: 0

    1.5 petabytes? That hardly enough to hold a decent porn collection.

  14. IPod? by NegativeOneUserID · · Score: 2, Funny

    Right, sure, like anyone believes that you want that much storage for music. You just want to use it for pr0n.

    1. Re:IPod? by BlackMesaLabs · · Score: 2, Funny

      Decide to use it for "Pr0n" and you're gonna NEED a beowulf cluster of them...

    2. Re:IPod? by Anonymous Coward · · Score: 0

      Hmmm. If a Gigabyte holds 10 hours of music, then a Petabyte will hold 10,000,000 hours of music.

      Which means that, at 24 hours per day, it will take 1141 years before I will hear a repeat of Los Del Rio's Macarena.

      That's not long enough.

    3. Re:IPod? by mlk · · Score: 1

      I encode my music 100 times better than the human ear can handle.
      Dogs for miles around bark out in pain when I use my iPod.

      --
      Wow, I should not post when knackered.
    4. Re:IPod? by lullabud · · Score: 1

      Seriously, what's up with the iPod quote? People say the most random shit around here.

  15. great usage. by Bananatree3 · · Score: 4, Informative

    Seriously, I think archive.org deservese sutch a storage system. I have very often wanted to go back to view an archive of a website a while ago, but the cache on Google was from yesterday. It also gives multiple archives of the website based on day which can be quite handy, especially for news related sites. I think they quite well deserve it.

  16. Ouch by Dancin_Santa · · Score: 1

    Didn't realize the moderators were Michael Jackson supporters.

    Which reminds me of why Michael Jackson likes twenty eight year-olds. Because there's twenty of them.

  17. Re:Downloading Kazaa by HyperChicken · · Score: 3, Informative

    Not "periodic", continuous. Own a website? Check your logs for the user-agent "ia_archive".

    --
    Free of Flash! Free of Flash!
  18. And if Linux had a working GFS... by Anonymous Coward · · Score: 0

    this would actually be useful!

    Sorry, I'm just bitter after almost a decade of Sistina's promises to get their global file system working 100%. We were one of their victims, err, customers.

  19. Modded case to come by icecow · · Score: 1

    I give 72 hours tops before one of those fettish case modders makes a 'peta' case. Oh shit, I was thinking chia.

    --
    Stop invalid scientific research. Ask your local scientists to feed their lab rats with a phytoestrogen-free chow.
  20. 'small box' by MonoSynth · · Score: 5, Funny

    So the inventor of the microprocessor dies and suddenly the definition of 'small box' for computer components is again reduced too 'fits in a big room'....

  21. Puppies by Sinner · · Score: 3, Funny
    An anonymous reader writes "LinuxDevices.com is ... according to the article. Now to strap one of those puppies to my iPod!"
    I'm sorry, baby dogs? That's so last week. I've got an arctic seal pup strapped to my iPod. You should see the looks I get on the subway. Bling, baby, Bling.
    --
    fish and pipes
    1. Re:Puppies by Leroy_Brown242 · · Score: 1

      bastard. I just coughed up some rice.

      hehehe

    2. Re:Puppies by Sinner · · Score: 2, Funny

      You gonna eat that?

      --
      fish and pipes
    3. Re:Puppies by OneArmedMan · · Score: 1

      Baby Seal walks into a Club...

      Turns into a Fur Coat.

  22. maybe i'll be quoted in 15 years.. by qda · · Score: 4, Funny

    "nobody needs more than a perabyte of storage"

    1. Re:maybe i'll be quoted in 15 years.. by Anonymous Coward · · Score: 2, Funny

      Well, I'd hope somewhere along the line somebody will fix that typo for you. Otherwise, you'll forever be quoted as "nobody needs more than a perabyte [sic] of storage."

  23. No RAID?! by kf6auf · · Score: 1

    I am more than slightly concerned about the lack of RAID in the system. They said that they had some sort of painful experience with RAID 5 not scaling to petabyte-size storage and therefore recommend JBOD. I wouldn't expect RAID 5 to scale to petabyte-size storage because of the parity all being done at once and in the same place but there has to be a way around this that still allows for redundancy. Take a RAID 50, with a lot of RAID 5 arrays in the hundred-terabyte range and a RAID 0 array striping over them, still provide redundancy with only slightly greater inefficiency and dividing up the parity process to the smaller RAID 5 arrays. Also, $2/GB seems kind of high to me, given that hard drive prices are down to $0.33/GB and you're putting 4 in each mass produced box.

    1. Re:No RAID?! by iamplasma · · Score: 3, Insightful

      Yeah, but the thing is that the storage is spread out between lots of different 1U units, each with either 1 or 1.6Tb. So to make a RAID5 over 1.6Tb in size, you'd have to cross over multiple machines, adding a serious overhead, especially when you have to calculate parity for the parity drive. On the other hand, if you only did RAID 5 in the individual units, it'd be pretty pointless, because with that many units you'd be crazy to rely on no entire machine failures.

      So, while yes, if it really was just one giant supercomputer with a bajillion hard drives in it, RAID 50 would be an ideal solution (as long as the stripes were large enough to prevent too many accesses crossing too many drives, the one big advantage of JBOD here), but that's not what's really in use here.

    2. Re:No RAID?! by Anonymous Coward · · Score: 1, Insightful

      Bingo. Every distributed file system, whether RAID5, GFS, or the other fascinating software variants, pays significant overhead for all that striping. And managing a RAID set that big, striped across all those machines, is kind of tough. Then when several machines fail at once, as is inevitable across that large of arrays, or when a controller or two fail, you have to rebuild these wildly scattered RAID arrays.

      It basically triples the price without getting you much for such a large setup, where point replacement of lost systems without imperiling your other systems is much, much easier.

    3. Re:No RAID?! by Anonymous Coward · · Score: 0

      You don't need RAID if your application is aware of it's bts and you do a little application layer parity and such.

      Also, with a petabyte, I imagine that there are at least 1 or 2 disk failures per day... I'd much rather just swap in new drives and have the application deal with filling in the data than to hjave to deal with a broken RAID5 array everytime a disk goes.

      Regards to cost. Does a 400GB hitachi (read reliable) drive cost 0.33/megabyte, I don't think so. The idea I tink is to have 64 TB using only a couple of 20 Amp circuits, which is pretty good.

      Also, I believe the cost includes everything. All 40 boxes, ethernet cables, rack, gigabit switch, assembly, etc...

      If you replaced the VIA board with something cheaper, you would gain on immediate cost, but think about the electricity cost over the persiod of 3 years... A petabyte is probably costing $10-$20K/month just in power bills...

  24. what about redundancy? by Anonymous Coward · · Score: 0

    So if the storage is JBOD then what about redunancy when a drive fails?

    1. Re:what about redundancy? by Anonymous Coward · · Score: 0

      You just buy 2 of these and use md to create a RAID1 between them. ;)

  25. Electricity $$$ ? by kasnol · · Score: 3, Funny

    Wow - have they calculate how much is the running cost per day ? I might just stay with my iPod instead for the time being~
    Haha~

    1. Re:Electricity $$$ ? by TheFlyingGoat · · Score: 2, Informative

      50kW at 10 cents per kilowatt hour = $120/day.

      I doubt it draws at a constant 50kW, though. It's probably an average (was given in TFA).

      My math might be completely wrong, given I don't have a clue how to calculate kilowatt hours. Is it just kW * hours_used_daily? :)

      --
      You have enemies? Good. That means you've stood up for something, sometime in your life. --Winston Churchill
    2. Re:Electricity $$$ ? by masklinn · · Score: 1
      I doubt it draws at a constant 50kW, though. It's probably an average (was given in TFA).
      I think you meant "peak", because there isn't much difference as far as price goes between constant 50kWh and average 50kWh

      And yes, to compute energy consumption (in kWh) you merely multiply the power drawn from the grid (in kW) by the consumption timeframe (in hours).

      Therefore if a unit uses 50kW, it consumes 50KWh worth of energy.
      --
      "The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
    3. Re:Electricity $$$ ? by catacow · · Score: 1
      Therefore if a unit uses 50kW, it consumes 50KWh worth of energy.
      ..in an hour
    4. Re:Electricity $$$ ? by Phreakiture · · Score: 1

      My math might be completely wrong, given I don't have a clue how to calculate kilowatt hours. Is it just kW * hours_used_daily? :)

      Close. It is kw * hours_used. The "daily" part is only valid if (as in your case) you are talking about the amount of energy used over the course of a day.

      Electricity here is $.15/kWh, which would put this box's operation at $180/day. In some places, electricity is as low as $.04/kWh, which would put the energy cost of these boxes at only $48/day.

      --
      www.wavefront-av.com
    5. Re:Electricity $$$ ? by swelke · · Score: 1

      Just to be anal about it, it's 50kW * 24hours/day * $0.1/kW*hour = $120. As another reply stated, however, I'm pretty sure the power stated in the article was peak (ie when most of the drives are spinning up, I'd guess), not constant or average. The peak power draw is what you have to build the electrical systems to handle.

      --
      Have you ever wondered How to Take Over
    6. Re:Electricity $$$ ? by Frank+T.+Lofaro+Jr. · · Score: 1

      Spin the drives up one at a time.

      --
      Just because it CAN be done, doesn't mean it should!
  26. 1.5 Petabytes? by TheFlyingGoat · · Score: 3, Interesting

    Where can you purchase 600GB drives these days? (1.5PB / 2500 drives)

    The math doesn't work when you multiply the number of systems out either: 600 systems * 1.6TB/system = 960TB. That's just under a petabyte, or am I missing something?

    Also, if you've got those in a RAID5 setup, you're 'only' talking about approx 800TB of usable space. That's far less than the 1.5 petabytes claimed.

    800TB is a lot of space, but there must be a cheaper/easier way than purchasing 600 systems to do it.

    --
    You have enemies? Good. That means you've stood up for something, sometime in your life. --Winston Churchill
    1. Re:1.5 Petabytes? by AaronLawrence · · Score: 1

      You're missing something: 4 drives in each system. ->150GB.

      --
      For every expert, there is an equal and opposite expert. - Arthur C. Clarke
    2. Re:1.5 Petabytes? by TheFlyingGoat · · Score: 2, Informative

      No. They say 2500 drives (actually 2400 since it's 4 per system in 600 systems), which comes out to 600GB per drive for 1.5PB.

      --
      You have enemies? Good. That means you've stood up for something, sometime in your life. --Winston Churchill
    3. Re:1.5 Petabytes? by Anonymous Coward · · Score: 0

      TFA says Total Capacity. They didn't just throw out the hundreds of TB of storage they already had, you know.

    4. Re:1.5 Petabytes? by Anonymous Coward · · Score: 0

      The article said that they weren't using RAID because of problems that they had scaling it to that many drives. Insted of RAID they are using JBOD (Just a Bunch Of Disks)

    5. Re:1.5 Petabytes? by A+Commentor · · Score: 1
      Where can you purchase 600GB drives these days? (1.5PB / 2500 drives)

      The math doesn't work when you multiply the number of systems out either: 600 systems * 1.6TB/system = 960TB.


      Yes, that didn't look right to me either. It's even worse when you calculate 'proper' disk sizes. (I don't care that the drive companies claim 1GB = 1,000,000,000 Bytes, it really equals 1,073,741,824). Thus if it's 960 TB, it's not .96PetaBytes, it's really, .853 PetaBytes. If it is some how '1.5 PB', it's really 1.33 PB.
      --

      Looking for any old 8-bit Heathkit/Zenith software/hardware - http://heathkit.garlanger.com

  27. terrifying by Anonymous Coward · · Score: 0

    1. According to the specs this thing is 600 1.6TB JBOD array's.. They must handle redundancy on top of the storage mechanism, but they don't mention it anywhere..

    2. The blurb says that they have roughly 1.5PB of storage space but by my calculations it comes out to roughly 1 PetaByte (40 servers per rack, 15 racks of systems = 600 servers ( 4 * 400GB per server ) = 1 PB

  28. Slashdotted .... by theoddbot · · Score: 4, Informative
  29. No redundancy? WTF? by melted · · Score: 2, Informative

    I've actually read TFA. They recommend JBOD configurations to their clients. One drive goes titsup and you've lost 400GB of data. Do they at least offer some kind of mirroring/redundancy solution to back the data up to another array?

    1. Re:No redundancy? WTF? by grimJester · · Score: 1

      I'd hate to be the guy who has to burn it all to cd-r when management realizes thay need backups.

    2. Re:No redundancy? WTF? by Depili · · Score: 4, Informative

      Acording to the archive.org (http://www.archive.org/web/petabox.php) they indeed have some redundancy, but not raid. They are operating each system as a separete node, and mirroring nodes. The above link also sheds light on other questions regarding TFA

    3. Re:No redundancy? WTF? by puhuri · · Score: 2, Interesting

      The archive.org maintains its archives in several geographicaly different locations and files are mirrored between those sites. If one disk or node breaks, you still have two or more copies of that material.

      If you archive serious amounts of data, redundancy within node is not the best solution, but to distrbute information between systems. For very important data, you can have as many copies as you have nodes; lesser important data may have just a single copy. If it gets lost, then ok, shit happens but so what. For example, I have just a single copy (no backups, partly RAID) of 10 TiB data (and that data is not available from P2P shop) because it is not economicaly viable to make backups. On the other hand, I have some data in 5 geographicaly diverse copies, both on-line and off-line.

  30. A Great Historical Tool by simrook · · Score: 5, Insightful

    The Internet represents a great historical tool. Case and point is what happened on 9/11. Being able to go back and see the progression, paranoia, patrotism, and early iraq/afgahanistan/binladen/hussien posts and opinions on various new sites is amazing. cnn, fox, the ny times, all are archived several times on 9/11 on archive.org.

    I for one think that archive.org should turn into some UN effort, with a mission to chronical and store daily/timely snapshots of the internet and the culture at the time, preserving it for future generations. What a tool for future historians!

    The ability to look at a large representation of socity at one single critical moment in time, and being able to have first hand sources for all that information is something that can truely change the way history is recorded (and not in the bad newspeak ingsoc way either). Infact, a wholeistic archive of what happens day-to-day, in an easily accessible format, might well help written history to be more representative of actual history (instead of, say the history Bush wants us to believe; that the Iraq war was for human right and not wmd's). I love Foucault.

    The internet archive rocks... really hope this project continues full blast.

    - Peace

    --
    'Truth' is linked in a circular relation with systems of power which produce and sustain it...
    1. Re:A Great Historical Tool by Anonymous Coward · · Score: 0

      Which is why the 21st century will be 'lost in history' thanks to DRM, Patents and CopyRights (or wrongs).

    2. Re:A Great Historical Tool by venicebeach · · Score: 2, Funny


      Yes, otherwise such cultural gems as goatse.cx would be lost into the void forever...

    3. Re:A Great Historical Tool by Anonymous Coward · · Score: 2, Insightful

      The 911 targets where chosen in a way everyone would notice. Not exactly amazing that it's well reported on, it would have been if it happened 20 years ago. But that was just a single attack. If you look at the much bigger recent events that you mention, like the war on Iraq, you'll see that there really is hardly any detailed reporting. You have a lot of propaganda by the attackers, some propaganda from the Iraqi government, and some reports by angry people getting in the middle. You still have a completely unclear view of what happened.

      We already had people writing diaries and making lots of pictures in WWII. The improvement isn't that great.

    4. Re:A Great Historical Tool by Yjam · · Score: 1
      "by the only real power left in the world, and that is the United States, when it suits our interest, and when we can get others to go along."
      The only real power in the world, well, maybe are you right but you should have a look at what/where/who is Hitachi (HDD in the PB system are Hitachi ones). And maybe you'll then see that the actual power nowodays is in Asia. Not anymore in North America or in Old Europe.
    5. Re:A Great Historical Tool by PReDiToR · · Score: 2, Insightful

      (and not in the bad newspeak ingsoc way either)

      Funny you should mention that, but this whole "Internet as history" thing has me wound up tight.

      Books cannot be changed. They can be destroyed, reprinted and banned but the first edition will always exist in a collection.
      The first edition of a website only exists in digital form and there is no way to stop the original from being edited and timestamped back to the expected date.

      The IA is the MiniTruth's dream come true.

      But who cares? History has always been written by the victorious, hasn't it?

      --

      Do not meddle in the affairs of geeks for they are subtle and quick to anger
    6. Re:A Great Historical Tool by Anonymous Coward · · Score: 0

      I agree with you, this internet stuff is all just a bunch of hooey.

    7. Re:A Great Historical Tool by TuringTest · · Score: 1


      The first edition of a website only exists in digital form and there is no way to stop the original from being edited and timestamped back to the expected date.


      ...unless you make a digital signature of the timestamp?

      If you want trust, use trust tools. We already knew that digital media does not leave physical traits behind, but that doesn't mean that other checking processes can't be built.


      But who cares? History has always been written by the victorious, hasn't it?


      Actually yes. The originals couldn't be rewritten, but they could be destroid and replaced by a new official version. Nothing new under the sun.

      --
      Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
    8. Re:A Great Historical Tool by Ph33r+th3+g(O)at · · Score: 1
      The IA is the MiniTruth's dream come true.

      Actually, it's so far been its nightmare come true. Many an effort to redact information or remove something embarrassing from corporate, government, and news websites has been foiled by the IA. For example, a page related to a plagiarism controversy local to me was conveniently pulled from where it was hosted, but remained on the IA--foiling the effort to suppress the ability to compare the infringing text.

      --
      I too have felt the cold finger of injustice.
    9. Re:A Great Historical Tool by ampathee · · Score: 1

      Well without the IA, there would be NO record of the first version of said website - surely MiniTruth would find that much more dreamy..

    10. Re:A Great Historical Tool by dubious9 · · Score: 1

      We already had people writing diaries and making lots of pictures in WWII. The improvement isn't that great.

      Huh? We already know the second Iraqi war as well as WWII. It tooks *decades* for some of the stuff to come out of WWII. If you talk about biased-journalism, propaganda and government interference, it was about, oh, a million times greater back then.

      With the advent of satelite communications and 24-hour news services the general public knows about major events and combat movements hours or days after they happen. Not months or years. Yes, it's not perfect, and no it's not unbaised. However, it *is* leaps and bounds better journalism than what was allowed back during WWII.

      The improvement isn't that great? What the hell are you talking about. In which reguard *isn't* reporting better? No detail? Have you even watched the news or gone to independant websites? Like, I don't know, say read Egyptian and Saudi editorials, watch a little Al Jazeera.

      It's not that there is a dearth of information, it's that there's too much information. Is the 'real' truth hard to assemble from the disparate sources? Oh yes. But is there an improvement from WWII?

      You tell me, Mr. AC, how can there not be?

      --
      Why, o why must the sky fall when I've learned to fly?
    11. Re:A Great Historical Tool by Anonymous Coward · · Score: 0

      Huh? We already know the second Iraqi war as well as WWII.

      What? No, we really don't. I'd love to see transcripts of Bush's conversations about it in the White house though.

      It tooks *decades* for some of the stuff to come out of WWII

      And it will take decades before we'll know what really happened in Iraq.

      You should read a bit about WWII. There are more books written about it than about any other subject. Some of the radio reporting is amazing. Especially if you can listen to Japanese, Dutch German and French, you'll get the most stunning reporting you've ever heard. It's all slanted of course, and misses vital information that's available now. Listen to some 'Radio Oranje' for kicks.

      And get rid of the anger on others, some day you too will realize that things really don't change *that* much.

      ps: About the Al Jazeera things, I stopped reading translations when I figured they are just as stupid as the Fox News flicks.

    12. Re:A Great Historical Tool by ImaLamer · · Score: 1

      I for one think that archive.org should turn into some UN effort, with a mission to chronical and store daily/timely snapshots of the internet and the culture at the time, preserving it for future generations. What a tool for future historians!

      I agree, but do you think that politicians want to have egg on their faces forever?

      Nations around the globe would stop the UN from doing this. I predict Blair and Bush would be the first to stop it - considering all of the egg on their faces (Downing Street Memo?).

    13. Re:A Great Historical Tool by samdu · · Score: 1

      I for one think that archive.org should turn into some UN effort, with a mission to chronical and store daily/timely snapshots of the internet and the culture at the time, preserving it for future generations. What a tool for future historians!

      I can see the "Storage For Food Scandal" headlines now. :)

  31. The MPAA and RIAA by PrivateDonut · · Score: 3, Interesting

    are going to make a killing of the IA when they have finished, it isn't like they haven't made enough money off others as it is, so they may let this one slide in the name of conserving data. On that note, is the IA downloading EVERYTHING or selectively downloading to prevent such issues as copyright infringment?

    1. Re:The MPAA and RIAA by Anonymous Coward · · Score: 0
      From the IA FAQ

      "How can I remove my site's pages from the Wayback Machine? The Internet Archive is not interested in preserving or offering access to Web sites or other Internet documents of persons who do not want their materials in the collection. By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled as well as exclude any historical pages from the Wayback Machine. Internet Archive uses the exclusion policy intended for use by both academic and non-academic digital repositories and archivists. See our exclusion policy. You can find exclusion directions at exclude.php. If you cannot place the robots.txt file, opt not to, or have further questions, email us at info at archive dot org."

  32. Fedora... by YourMotherCalled · · Score: 0, Redundant
  33. Wayback and Slashdot by mcrbids · · Score: 4, Funny

    Go ahead. Try Slashdot in the wayback machine.

    Slashdot has looked virtually identical since 1998!

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
    1. Re:Wayback and Slashdot by hostyle · · Score: 1

      How strange for a linux based community - usually famed for fixing things that aren't broken.

      --
      Caesar si viveret, ad remum dareris.
    2. Re:Wayback and Slashdot by pcgabe · · Score: 2, Informative
      Linky Goodness:

      http://web.archive.org/web/19981111190256/http://s lashdot.org/

      Highlights:
      • Episode 1 teaser sheets
      • Does the world really need a 25 gig drive?
      • Patents: how do we keep software free?
      Oh, how far we've come.
      --
      Don't put advice in your sig.
    3. Re:Wayback and Slashdot by Anonymous Coward · · Score: 0

      /. (fucked up/outdated) html comes up every so ofton.
      Basicly, Slashcode is such a pile of poo, it will not be updated without much much pain.

    4. Re:Wayback and Slashdot by SlashdotMeNow · · Score: 1

      You forgot

      - Bringing Linux to the desktop for normal users

    5. Re:Wayback and Slashdot by atavus · · Score: 1

      With these editors.... it may even have the same articles!

    6. Re:Wayback and Slashdot by hawk · · Score: 4, Funny
      Oh, c'mon. It's not that bad.

      Why, just last year they introduced an entirely new story into the rotation of duplicates . . .

      :)

      hawk

  34. It doesn't matter. by Anonymous Coward · · Score: 0

    Once peak oil arrives there will be a total economic collapse. Companies like Capricorn will go bankrupt as Americans just try to save enough for food.

  35. nothing new... by Anonymous Coward · · Score: 0

    I remember seeing this box at the Univ. of San Francisco Flashmobcomputing event. Brewster Kahle (founder: IA) was showing it off. I saw it a few weeks later running at IA's Presidio office. This was a while ago...

  36. small box by Anonymous Coward · · Score: 0

    Since when can 16 racks be described as small?

    Okay great achievement and all, but the title is simply not pedantic enough....

  37. Just imagine... by M3rk1n_Muffl3y · · Score: 0, Redundant

    ...a beowolf cluster of these.

    Sorry, it had to be done.

    --
    This is not the sig you are looking for...
    1. Re:Just imagine... by poopdeville · · Score: 1

      ...again, and again, and again.

      --
      After all, I am strangely colored.
  38. article not clear by planckscale · · Score: 1
    so are these machines (individual pc's) not hot swappable? Taking down the entire machine because a node goes down seems extreame. I would think that VIA isn't pushing out enough of these chips and M-10000's to get this thing together. $2/GB is cheap I wonder what filing system it uses?

    --
    Namaste
    1. Re:article not clear by imsabbel · · Score: 1

      Its a cluster, so of course you can take one node out.
      But its only the raw meat. In order to really use it, you need a storage solution taking care of things like redundency, node restore, ect.

      --
      HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
    2. Re:article not clear by guygo · · Score: 1

      It takes me 10 minutes to power down, pull the node, replace a drive, and replace and reboot the node. The rest is up to the software.

  39. What's wrong with hot swap and RAID 5? by fgrieu · · Score: 1
    Quoting http://linuxdevices.com/news/NS2659179152.html

    "We experimented with hot-swap, but found it caused as many problems as it solved. It actually induced failures, so we backed away."
    (we) "tried then backed away from RAID, instead opting to recommend JBOD"
    "We had a painful experience with RAID 5, which does not scale well to petabyte-level storage."

    Why the hell are the reports of these guys so far from what the accepted industry practice is, according to IT magazines?
    1. Re:What's wrong with hot swap and RAID 5? by Lussarn · · Score: 1

      Maybe because as they say RAID 5 (Or at least not the implementation they where using) didn't scale well to petabyte-levels. They could of course have done many smaller RAID 5 arrays and still keep redundancy. Don't know why they didn't.

    2. Re:What's wrong with hot swap and RAID 5? by imsabbel · · Score: 2, Interesting

      Because you are comparing apples to oranges.

      They dont use hot swap and raid5 for the same reason google doesnt run on mainframes:
      Its just cheaper to let a higher level logic take care of that stuff instead of strapping redundancy on every node...
      Why hot swap if it isnt needed? The rest of the node will be mirrored somewhere else, so for the cost of fitting out everything with HS bays you could get 5 or 10% more nodes...
      Same for raid5: good high performance Raid5 controllers would increase the system cost by 50% or something. And then its not less expensive than just mirroring nodes.

      --
      HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
    3. Re:What's wrong with hot swap and RAID 5? by tim_uk · · Score: 2, Interesting
      Why the hell are the reports of these guys so far from what the accepted industry practice is, according to IT magazines?

      GOK, I have 3Pb of storage syncronised across two data centres here, all in 7+1 RAID5. Mostly self healing too, if a drive pops, then a spare drive in the same array builds itself into that stripe set, enabling hot replacement of the dead drive.

      I would love to know what their "painful experience" was!

      Using JBOD for this seems a tad courageous, to say the least.

      And then, of course, there's backup...

    4. Re:What's wrong with hot swap and RAID 5? by masklinn · · Score: 1
      Why the hell are the reports of these guys so far from what the accepted industry practice is, according to IT magazines?
      Different needs have different solutions. IA probably doesn't need perfect 24/7 uptime.

      They don't have "industry constraints", therefore don't need "industry practices"
      --
      "The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
    5. Re:What's wrong with hot swap and RAID 5? by Antique+Geekmeister · · Score: 1

      They also don't have the local bandwidth requirements of, say, a banking facility processing stock predictions and transactions that needs to pass many Gigabytes of data among a local cluster. Their bandwidth is more limited by their external access, and usage by the last mile of cable to their user's machines. Good hotswap and RAID5 is expensive, especially if you want to buy the good 3Ware or Adaptec stuff instead of that abomiable and undocumented Promise stuff you see in desktop motherboards.

    6. Re:What's wrong with hot swap and RAID 5? by MushMouth · · Score: 1

      Exactly,

      every 9 of uptime costs 10X the previous on EG

      90% $10

      99% $100

      99.9% $1000

      The archive chooses a level which gives it the reliabilty it needs for the cost it can cover. Since nobody is using the archive for mission critical work, and it is a free service to you and me, they can do it however they please. I have to say, they probably got the number right, as they are actually providing the the data now, and have been for years. Whereas other much bigger organizations (the archive when part of Alexa maxed out at about 80 people) have been crawling just as long (google, as a stanford project) or longer (altavista), either never tried to save it all or never put any money into serving it to the people. Yet for some reason the archive NEVER won a webby.

    7. Re:What's wrong with hot swap and RAID 5? by Anonymous Coward · · Score: 0

      Well, simple really.

      2500 disks = 2500 IDE cables. They go bad.
      Hot swap means you add a caddy, yet another electrical connection between the drive and host that can fail.

      You can do application layer parity and replication.

      Secondly, it's an archive. With RAID, finding a pile of disks 1,000 years from now is pretty much worthless. Finding non-raided disks can yield a wealth of archeological information.

  40. What's in a name? by MadCow42 · · Score: 1

    A friend of mine used to work for Sony... he swears this is a true story:

    Sony had a petabyte tape backup system they wanted to sell into North America... called the "Peta-file". Thankfully, Sony NA managed to have the name changed prior to it's introduction here.

    So, PetaBox is slightly better... slightly. :)

    MadCow.

    --
    I used to have a sig, but I set it free and it never came back.
    1. Re:What's in a name? by steevc · · Score: 0

      I think that must be what they sell as the Petasite

      http://www.cybernetics.com/tape_backup/dtf/petasit e.html

      Some TV companies use them to store their video.

      But there's quite a difference between storing that much on disk rather than tape.

      Give it a few years and we'll have that much storage in whatever we use as home PCs then.

    2. Re:What's in a name? by Anonymous Coward · · Score: 0

      In 1999, when the internet was considered newish by some, Toshiba wanted to introduce an internet optimized laptop.

      The initial marketing materials called it "Woody, the internet pecker".

      I kid you not.

  41. tried that... by Anonymous Coward · · Score: 0

    Built a VIA based storage cluster as a test some time back. Surprised that they have decided to go into production. 2 harddrives on an IDE channel is not a good idea, VIA boards are not highly reliable, 100Mb ethernet is just too slow if you want to copy the contents of one machine.

    Also they havent really worked on the software side - its just a bunch of machines you have to rsync to, which really gets to be a pain to manage when you have that many.

  42. They don't like RAID by billstewart · · Score: 4, Interesting
    I was a bit puzzled by that also - the article said the things come in racks of 40 or 64TB, and 16 racks times 64TB is about 1PB, not 1.5.

    Also, the article says they don't like RAID, due to bad experiences with RAID5, and the system is configured as JBOD (Just a Bunch Of Disks). It doesn't say why, or what users should do to get equivalent protection. My guess is that depending on RAID within a box means you're still vulnerable if the box's CPU or disk controller decides to scribble the disks, or the power supply decides to catch fire or short out and deliver 240VAC on the +5V line or whatever. So if you want a RAID-like set of redundancy, set up your applications or file system mounting or something to calculate the protection disk in software and hand it off to another 1U box for storage.

    The overhead of the motherboards here is not that high - they're about $150-200, and support 4 disks that probably cost $200-300 each, so they're only about 20% of the cost, which is not bad. The article didn't say they're using SATA, and it sounded like it's some IDE variant instead, but if you're only using 100 Mbps Ethernet to connect to the box and not the optional GigE, it's not the bottleneck anyway. If you wanted an alternative design, you could probably do something with a couple of 4-way SATA controllers per CPU, with a lot of disks stacked vertically in a 3-4U box looking like an X-serve or something. But that wouldn't necessarily have much of an advantage.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
    1. Re:They don't like RAID by budgenator · · Score: 2, Informative
      "Although Hitachi does not offer an 'enterprise' or '24x7' SATA drive, our testing found their drives to be as reliable as anything out there, enterprise distinction or not," Saikley said.

      I read that as SATA drives. What I wonder about is
      Pentaboxes are ~$ 2.00/GB per the article
      while
      Coraid, priced at $1,995.00 + (4*$314.99 hard drives) = 3918.94 + 664.00( 15U tabletop rackmount) or ~$0.41/GB per my calculations;
      looks like a price war is brewing here unless pentabox has some serious KW in BTU out or performance advantages.
      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    2. Re:They don't like RAID by Rich0 · · Score: 1

      The article didn't say they're using SATA, and it sounded like it's some IDE variant instead, but if you're only using 100 Mbps Ethernet to connect to the box and not the optional GigE, it's not the bottleneck anyway.

      Does SATA vs IDE actually make a difference with current drives? I know that SATA is capable of much higher speeds, but are standard drives generally capable of taking advantage of this?

      Back when I shopped for hard drives for my system which had both IDE and SATA I had the choice of an ATA133 or an SATA133 for about $10 more. I just grabbed the IDE since I could find nothing anywhere that showed the SATA would be faster, and $10 is $10.

      I see that now you can get SATA150, which is likely to be superior to IDE ATA133. I agree that if you just have 100MB ethernet it makes no difference at all in the end...

    3. Re:They don't like RAID by Anonymous Coward · · Score: 0

      I did some deep reading at their site a few months ago. These guys are running under different assumptions than the normal datacenter.

      One of the big ones was data recoverability over the long haul. If you pull a single drive out of a RAID array (other than RAID1), the data is not recoverable from that drive alone. Also one needs to realize that RAID cards aren't a commodity. If you standardize on RaidCardX, and you need another one of those 10 years from now but the card has been discontinued, what do you do? Can't just throw in RaidCardZ and be assured the data will be readable.

      So they are building a least-common-denominator low-low-cost system which removes as many dependencies on proprietary technology as possible. They want to be able to build a completely compatible system 10, 20, 30 years from now. They hope that even if the system is found (without operating manuals) 200 years from now, the finder can recover at least some of the data.

      And they are willing to give up some efficiencies to get there. They duplicate *all* their data to at least one other location, at another datacenter. So this takes a lot of bandwidth, and eats a much higher amount of disk space. In terms of disk space used, it's like comparing RAID1 with RAID5.

      But that's okay, because they're saving a lot of money by using cheap parts and adding just enough redundancy to beat the risk associated with those cheap parts.

      It's a very interesting approach; one we'll be seeing more of in the future I am sure. Check out their site and follow the links; neat stuff there! Like the idea of putting a few petabytes, along with associated cooling equiipment in a standard shipping container, and literally replicating the entire datacenter, then shipping it to the final destination! Yow!

    4. Re:They don't like RAID by Anonymous Coward · · Score: 0

      That's Mb, not MB!

  43. NAS or SAN or ??? by joib · · Score: 1

    I read the article, and the website of the company, but I couldn't find out how you're supposed to access all this data? It's hardly practical that every node exports it's own NFS, is it? Is it supposed to use some kind of cluster file system such as (Open)GFS?

    Or is the user expected to do some kind of in-house thingy, like google or (presumably) the internet archive?

    1. Re:NAS or SAN or ??? by TTK+Ciar · · Score: 2, Informative

      The Petabox is shipped to a customer running Debian Linux by default (though of course you can install whatever you want), so there are a number of solutions to choose from. OpenAFS and (as you pointed out) GFS are made specifically for this kind of setup, providing fairly good abstraction of the underlying cluster and easy access to random data. Within The Archive, we have experimented with different approaches, the one currently in production using an API based on a UDP locator service and rsync.

      Another approach uses a /net directory under which remote filesystems are NFS-mounted on demand (I'm not sure how it works, our chief sysadmin set it up for testing, but if /net/ia105783/0/foo is not mounted, and then you type 'ls /net/ia105783/0/foo' (or any other command which opens a hypothetical file off /net), the remote filesystem is automagically NFS-mounted so that the command can complete).

      I'm not sure that we'll ever use it in production to access our distributed information, though; NFS has a very, very low error rate, but when you have thousands of NFS mounts going on at once (as we do NFS-mount users' /home directories everywhere), "very, very low" translates to "tripping over errors every few days". I've seen some really weird NFS failures and partial failures at The Archive, and I've written some software to be tolerant of them, but most of our software is not, and realistically speaking never will be. It's written to be tolerant of rsync errors instead. *shrug*, six of one, half a dozen of the other. This is one of those things where you need to just pick a solution and use it, whether it's OpenAFS, GFS, NFS, or some homespun thing. All have their pros and cons, and you'll learn to deal with their problems as you use them.

      -- TTK

  44. Courageous? Try insane. by Otto · · Score: 1

    I can't think of a single reason to use a JBOD setup when you could just as easily use RAID 0.

    If you don't need redundancy, great, fine, you can be redundant elsewhere. I'm down with that. But RAID 0 is so easy to implement as opposed to a JBOD setup and works so much better that there's essentially no reason to ever use JBOD except pure laziness.

    I mean, with either one, if you lose a drive, you lose the array, but at least with RAID 0 you get the benefits of striping in both read and write operations, basically doubling your throughput speed.

    --
    - Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
    1. Re:Courageous? Try insane. by imsabbel · · Score: 1

      Actually, you are a bit wrong with that:

      Those if properly managed, just using them as single disc would result in lower access times then RAID 0 is you can independently access files on the different discs instead of blocking all head for retrieving a single file. And as they are only connected via Gbit lan, STR doesnt matter anyway.

      --
      HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
    2. Re:Courageous? Try insane. by CXI · · Score: 1

      Plus they don't lose the whole array if a disk fails. They lose the data on that disk, which is simply mirrored on another node in another geographical location via software mirroring. It's more redundant AND cheaper than spending more to buy an extra drive and do a RAID setup.

  45. I read and I thought... by manojar · · Score: 0, Offtopic

    I saw the topic and I thought what the hell those animal guys have to do with slashdot...?

  46. excelent plan by edsonmedina · · Score: 0

    1. Buy a petabyte system
    2. Backup the internet
    3. ????
    4. PROFIT!!!

  47. Not a big improvement... by paulatz · · Score: 2, Interesting

    It was 3 or 4 years ago when I saw a 600 terabytes (0.6 petabytes) tape-based storage system at CERN.

    --
    this post contain no useful information, no need to mod it down
    1. Re:Not a big improvement... by GigsVT · · Score: 1

      Tape is way inferior to being able to access TB amounts in real time.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
  48. hardware supplier? by bani · · Score: 1

    looks like generic mini-itx, but who makes the 1u? custom built?

    1. Re:hardware supplier? by guygo · · Score: 1

      Custom made enclosure, assembled by me. 560Tb put online by my hands, and counting

  49. The angular momentum must be huge! by MCRocker · · Score: 1

    2,500 spinning drives!!! These folks are located in San Francisco... if there's ever an earth quake the gyroscopic effects could flip the building over! Perhaps they should mount every other drive upside down to cancel out the effect to prevent serious injury ;)

    --
    Signatures are a waste of bandwi (buffering...)
    1. Re:The angular momentum must be huge! by Butterspoon · · Score: 1
      Well... Say a platter is 10g. How many platters per drive? TFA doesn't say; let's be generous and say 4. So that's 100kg of spinning 3.5" (~10cm) discs. 7200 rpm?

      So that'll be about the same angular momentum as a 1kg wheel with a diameter of 10m spinning at a rate of about 1Hz. Not quite enough to tip over a building...

      (Don't worry, I didn't have a sense of humour failure - I'm sure I'm not the only one here who felt compelled to estimate this!)

      --
      pi = 2*|arg(God)|
  50. RAID 5 is inferior to JBOD by Anonymous Coward · · Score: 1, Insightful


    Depends heavily on your purpose of the system, of course.

    If you need something that is highly aviable and have good performance, then raid is wonderful. But archives don't need to be highly aviable, they just need to be highly redundant and backed up to several places.

    For instance if you have a RAID 5 array, then a single harddrive failing couldn't take it out. But a single controller failing could. If one drive starts spewing out nonsense then that corruption could be replicated automaticly between harddrives on a array before anybody notices or hardware monitors shutdown everything.

    So in this sense simply having multiple copies on different computers on different disks is actually preferable to raid setup. It is simplier, as long as you have high quality distributed filing systems, it's easier to restore materal. It'll be easier to access down the line.

    It just won't have the higher performance or high aviability that raid will provide.. but then again it doesn't realy need it.

    And remember:
    RAID != backups.

  51. Two points by Salamander · · Score: 4, Interesting

    First off, this isn't quite an example of a company suddenly deciding to donate stuff to the Archive. As can be seen on their own website, Capricorn was spun off from the Archive on July 1, 2004. To a large extent, Capricorn exists for the specific purpose of providing storage to the Archive, and if that same storage can be sold to others so much the better.

    Second, what about interconnects and performance? The product descriptions say nothing about SCSI or FC or other storage-oriented connectivity, so one must assume that the connection to these boxes is through a network. That would mean each node is an NFS server (or similar), serving up 1.6TB using a 1GHz C3 processor, a maximum of 1GB of memory (for caching etc.) and what appears to be a single GigE link. Can you say unbalanced? The Internet Archive might be the only system with an access pattern so sparse that the ratio between capacity and performance wouldn't be crippling. Don't try using one of these with any other kind of application if performance is a concern...and BTW they don't seem to say anything about high availability or other storage functionality (e.g. integrated backup or snapshots) either. Capricorn's big play seems to be power consumption, but there are other players that can beat them on density (e.g. Copan with 224TB per rack) and multitudes who can offer better performance/functionality. I hate to sound negative, but this is a product so specialized as to be uninteresting.

    Disclaimer: I think I met some of the Copan guys once and they seemed cool enough, but there's no other relationship between me and them. That just happened to be the first name I thought of in this space.

    --
    Slashdot - News for Herds. Stuff that Splatters.
    1. Re:Two points by Cheeze · · Score: 1

      Gigabit ethernet is "good enough" when all you're really doing is serving up web pages. When you have 16 racks of 1U servers, spending the extra cash to get SCSI with raid and more bandwidth for each server has little or no extra benefits when those servers aren't necessarily talking to each other. Extra space at a low price is probably many times more important than speed.

      Capricorn's big play is also probably price. Price is mentioned quite a few times in the article.

      Their product kinda sounds like google. Cheap, replaceable hardware.

      --
      Why read the article when I can just make up a snap judgement?
    2. Re:Two points by randalware · · Score: 1

      I work with SAN storage a lot at work.

      And I was just did some math.

      600 systems & 2500 drives is only a 4.11 drive per system average.
      I think the modular midrange storage systems from IBM,EMC,HP etc. would look pretty reasonable.

      And I wasn't able to RTFA yet, but the connections would be a problem for 600 systems.
      Fiber channel, scsi,network, ?

      --
      This is my opinion based on what little I know and understand of the rumors and lies Thanks, Randal
    3. Re:Two points by dildatron · · Score: 1

      I also work in SAN, entirely, for the last several years. The article was very sparse on details, but I didn't see any ports for connections except the built in 10/100/1000 ethernet. Perhaps for archiving this is fine, but it wouldn't fly in most data centers. I doubt you could pull data off it really fast with SATA drives and the overhead of ethernet. Also, how are faults handled? We have thousands of disks spinning in our arrays, and you can bet on one or two failing every week. What happens if a whole rack-mount server fails, or a couple of them? Do they have redundant power supplies in each component? Dual networks to have redundant paths? As usual, you get what you pay for. If that is good enough, then they did just fine.

      --


      If you had nuts on your chin, would they be chin nuts?
  52. Petabox / Internet Archive by elronxenu · · Score: 1
    I'd like to understand more about their filesystem. They say RAID doesn't work for them, so they use JBOD.

    What kind of metastructure do they put on the disks to achieve that kind of large filesystem, and improve reliability?

  53. It was obviously faked by karlandtanya · · Score: 1

    The first version will be called Capricorn One.

    --
    "Reality is that which, when you stop believing in it, it doesn't go away." - Philip K. Dick
  54. tfa is not well explained by Anonymous Coward · · Score: 0

    it mentions that they backed away from raid in lieu of jbod? obviously there must be some redundancy, though. a few of your 2500 sata disks will certainly die...maybe there's redundancy between nodes somehow?

  55. MOD PARENT UP by WilliamSChips · · Score: 1

    -notext-

    --
    Please, for the good of Humanity, vote Obama.
    1. Re:MOD PARENT UP by Dolda2000 · · Score: 1
      Oh indeed! What is the world coming to when moderators think that the "In Communist Russia..." jokes are redundant? What next? No more dupes?! Correct grammar?! We've all seen hell freezing over lately anyway, so God knows where the world is heading next!

      We've better stop it, and the sooner the better. So please, for God's sake, mod GP up!

  56. No, it can't just be JBOD. by Grendel+Drago · · Score: 1

    But that doesn't make any sense. They talk about needing to replace drives, and opting out of the use of hotswap, saying that it caused more problems than it solved... but JBOD means no redundancy at all. (And with that many drives, there will be frequent failures, as TFA also stated.) So how do they deal with data failure? There has to be some solution for redundancy; what is it? Okay, so RAID 5 didn't scale---I'd think they'd use a sort of hierarchical RAID, but JBOD isn't any sort of enterprise-level solution.

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
    1. Re:No, it can't just be JBOD. by daikokatana · · Score: 1
      Just like everybody else - they make backups on DVDs, duh.

      It's not like it's a lot of work copying 1 petabyte of data to DVD, you just... ehm... oh, wait :)

      --
      http://jcsnippets.atspace.com/ - a collection of Java & C# snippets
    2. Re:No, it can't just be JBOD. by Anonymous Coward · · Score: 1, Informative

      Check out The PetaBox page at The Internet Archive.
      That page has been around for years, and their forum talks about many of the things they went through. They custom-built the cases, and they couple nodes together, and they are mirrors of each other. If one fails, the other copy is still there. Not to mention the copies in other geographic locations. This also isn't just "one large file system". Each drive is a separate filesystem, and they serve the files up via standard means such as FTP and HTTP. (There is a UDP-based locator protocol they wrote as well, to find data in the massive amount of storage.)

  57. In terms of bandwidth by ihtagik · · Score: 1

    If they delivered the 1.5 petabytes in one week (7.5 days) that's about .2 petabytes a day or 1.6 petabits a day... which is roughly 333.33 Tb/s

    Impressive!

    1. Re:In terms of bandwidth by shobadobs · · Score: 1

      Not exactly. 1.6 petabits a day = 1.6 petabits per 86400 seconds = 1600000 gigabits per 86400 seconds = 18.518 Gb/s. (Assuming scaling by 1000, not 1024; we are dealing with hard drive space after all.)

      I put my laptop with an 80GB hard drive onto my desk in a quarter second; does that mean I got 256 Gb/s?

    2. Re:In terms of bandwidth by Anonymous Coward · · Score: 0

      "I put my laptop with an 80GB hard drive onto my desk in a quarter second; does that mean I got 256 Gb/s?" ...Yes... Yes it does.
      And if we went from using the internet, to using birds with large hard drives strapped under their bellies, we'd be able to transfer rediculously large amounts of data much much faster than with the internet. But the packets are too large and aren't sent fast enough to allow for actual near-realtime communication.
      But... oh well! It's still faster!

    3. Re:In terms of bandwidth by Enigma_Man · · Score: 1

      Never underestimate the bandwidth of a stationwagon full of hard disks speeding down the highway.

      The bandwidth may be great, but the latency is terrible :)

      -Jesse

      --
      Nothing says "unprofessional job" like wrinkles in your duct tape.
    4. Re:In terms of bandwidth by pboulang · · Score: 1

      nope, it means you got 256GB/s ;)

      --

      This comment is guaranteed*

      *not guaranteed

    5. Re:In terms of bandwidth by shobadobs · · Score: 1

      Ah, but when sending hard drives in motion, you get the added plus of calculating measurements using Gb*m/s (these very special calculations make the latency tolerable :-)

      This is why I prefer downloading from mirrors on the other side of the planet; I get to maximize the total amount of bitmeters, I mean, the total 'net' work. :-)

  58. Oops by QMO · · Score: 1

    Shouldn't have used those backup tapes for streamers, I guess.
    Or was it backup CDs for coasters/frisbees?

    (CDs don't work well for frisbees. In my experience they break after just a few brick walls, and it costs a stroke, and makes it harder to get par.)

    --
    Exam 4/C again. Maybe I'll do better this time.
    1. Re:Oops by Baricom · · Score: 1

      It would be silly to rely on a site like the Internet Archive for backup purposes.

      My goal was to see how the site evolved over the years of its operation. I didn't need anything from the old versions. The backups I had were more than adequate, thank you very much.

  59. Case and point? by lheal · · Score: 1

    That's "case in point". Like "under scrutiny" or "off topic". Which is what I should be modded.

    Sorry.

    --
    Raise your children as if you were teaching them to raise your grandchildren, because you are.
  60. Once upon a time by QMO · · Score: 3, Funny

    I was driving to work. It wasn't a long drive, but more than 5 minutes.

    "Macarena" was on the radio when I started the car. A few minutes later "Macarana" was still on, and I was thinking that the song must be longer than I thought, or something. About then the DJ came on and said "We're playing 'Macarena' until you vomit." Then played the song again.

    After that iteration of the song the DJ came back and played some phone calls of people begging him to change the song, but he just said that it was "Macarena" until you vomit.

    I don't know when the thing started, but by the time I got to work it was the 17th or so "Macarena" in a row.

    --
    Exam 4/C again. Maybe I'll do better this time.
    1. Re:Once upon a time by AKAImBatman · · Score: 1

      I had that happen once with "One Night in Bangkok". For some reason, the DJ kept taking requests for the song during the "Lunch at the 80's" program. Finally, one of the requesters asked why he kept playing "One Night in Bangkok". He finally fessed up to the fact that the CD player was jammed and that he had a technician on the way to get it out so they could play other music. Whoops. :-D

    2. Re:Once upon a time by karnal · · Score: 1

      We had a radio station here in Columbus, OH switch formats/owners.

      So, for 2 straight days, they played "Prison Bitch" over and over and over. Was kind of silly actually, especially since they weren't bleeping out the words.

      This was probably 3-4 years ago, but it still brings a smile to my face. Probably the first time I'd listened to radio in a LONG time.

      --
      Karnal
    3. Re:Once upon a time by swelke · · Score: 1

      I know geeks are lazy, etc. but did you ever consider... I don't know... changing the station?

      --
      Have you ever wondered How to Take Over
    4. Re:Once upon a time by QMO · · Score: 1

      I was curious.

      Besides, the phone calls were hilarious.
      "Pleeeeaaaasssssseeee stop. I can't take it any more."
      "Why are you playing Macarena over and over?"
      "How many times are you going to play this?"

      To all of these kinds of calls the DJ would respond, "Have you voimited yet? This is Macarena 'til you vomit."

      --
      Exam 4/C again. Maybe I'll do better this time.
    5. Re:Once upon a time by MagicMike · · Score: 1

      Had the same thing happen to me in Dallas in the early nineties. "The Eagle" (a rock format) was turning in to some country station or some such.

      They played nothing but Eagles for a day, then just put Hotel California on for two days straight.

      The sad thing was, it was the best thing on most of the time. And I like the Eagles, but not *that* much.

      Must've been fun to do though, as a parting shot, if you were the DJ.

    6. Re:Once upon a time by suitepotato · · Score: 1

      I always thought that if I could develop a high frequency gravity wave generator capable of making the university detectors respond, I'd modulate it to the Macarena and see how long it took for someone to notice that it wasn't a message from ETs.

      --
      If my grammar and spelling are off, I am [distracted/tired/careless] (take your pick)
    7. Re:Once upon a time by NaDrew · · Score: 2, Informative
      About then the DJ came on and said "We're playing 'Macarena' until you vomit." Then played the song again.

      After that iteration of the song the DJ came back and played some phone calls of people begging him to change the song, but he just said that it was "Macarena" until you vomit.

      I don't know when the thing started, but by the time I got to work it was the 17th or so "Macarena" in a row.

      This is called stunting. Radio stations do it to mark a transition between formats, apparently in an attempt to drive off listeners to their previous format.
      --
      Vista:XPSP2::ME:98SE
    8. Re:Once upon a time by JhohannaVH · · Score: 1

      Or to attract new listeners. In 98, this was done here in San Diego when one station cut from Dance to Mexican. I think they played Macarena for an entire weekend. And yes, I vomited repeatedly and they didn't STOP!!!!

      --
      Sorry man... the Internet pooped on me.
    9. Re:Once upon a time by paxmark1 · · Score: 1

      Been there, heard that.

      First time it was done was in Mobile Albama in the early 1990's. Whole bunch of buyouts, so the decent album rock station one day, two days, three days continuously played the Macarena.

      That is to make damn sure that previous people listnening to that station will never go there again. That one then went to oldies.

      And a loss of a decent album station that moved down the dial and went to a more shit head crap satellite playlist.

      Fuck Clear Channel.

      Impeach George Bush. To re-elect a college coke dealer to the presidency - shit.

      Much better radio stations in Canada, and I don't pay US taxes anymore.

  61. Libraries Of Congress... by pulse2600 · · Score: 1

    ...so come on, tell us - how many?

  62. Power consumption by Anonymous Coward · · Score: 0

    "Despite its large size, the IA's PetaBox installation draws only about 50kW of power."

    Hell, hydro's included in my apartment. I'll take two.

  63. There are other uses... by goldragon · · Score: 1

    I am a biomedical engineer for Cardiology at a top 25 research university medical center. One of my primary responsibilities is maintaining the cardiac PACS for the medical images we create. We generate about 2TB of data a year, and Radiology does probably ten times that amount. Our data, stored in DICOM format, is static; by law, we cannot change it (the patient demographic information is included in the file header and if a nurse mispells a patient name, etc, we only update the image location database, not the image file itself). Once created, the images are accessed several times a day until the patient goes home, when it might not be retrieved for weeks or months at a time. However we have a legal obligation to keep the image available for seven years (for kids, it is until they turn 21) so cheap storage is a good thing for us. The current DICOM standard archive media is DVD-R and we use 200-disc rack-mountable changers. We researched going with an EMC Centera NAS unit but our cardiac PACS vendor wouldn't certify it because data flows through and is altered by a gateway server. If we had direct access to cheap storage, we wouldn't be affected by the performance imbalance.

    1. Re:There are other uses... by Anonymous Coward · · Score: 1

      Wow, archiving to DVD-R? I suppose it's OK since the data is only needed for 7 (or 21) years, but DVD-Rs have dye layers that decompose just like the ones in CD-Rs. Perhaps it's fine since the discs are stored in controlled conditions. And you're not exactly getting a huge amount of storage per platter, either.

      On the plus side, versus a more typical solution like high-end magnetic tape, DVD-Rs are write-once, optical, and mass market, so you don't have to worry about magnets changing all the old patient data behind your back, and it's probably cheaper than alternative optical media.

      Still, can't help but think there should be a better solution. Someone should build one, at least, even if there isn't one right now. Can we say market opening?

  64. will they have an index someday? by rubycodez · · Score: 1

    that's the only thing that's missing from the current internet archives, here's hoping they devote some resources to indexing

    1. Re:will they have an index someday? by TTK+Ciar · · Score: 1

      Hi!

      I'd be interested in corresponding about this, if you're willing. Could you please email me at "t t k at archive dot org"?

      Thanks,

      -- TTK

    2. Re:will they have an index someday? by rubycodez · · Score: 1

      Hi TTK, cool web pages you have there. I've made an algorithm that takes advantage of the way websites grow, self-reference, and prune over time, crawling a couple archives now to see if it's worthy or worthless. Anyway, tonight I'm off to SE asia for a few months and will be out of email reach probably.

  65. Naked Americans? by Grendel+Drago · · Score: 1

    Yeah, but how many of them would you want to see naked? Unless you have a chub fetish, you're unlikely to find the US demographic pool particularly attractive.

    On the other hand, you could just go grab a Livejournal account, join the communities "kaizersoze125" and "show_your_boobs", and marvel at the quantity of amateur porn folks throw out there for free.

    Seriously. There's some high quality out there. Some of it's not even members-locked (earningtails, for instance).

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
    1. Re:Naked Americans? by mmkkbb · · Score: 1

      spoken by a non-member, i assume. i had to leave show_your_boobs because of too much low quality and too much wang

      --
      -mkb
  66. Your sig, and Amazon wishlists. by Grendel+Drago · · Score: 1

    Funny thing about your sig---I just noticed that, as your wishlist is on Amazon.co.uk, the items say things like "Usually dispatched within 24 hours". In US English, we say 'shipped' instead of 'dispatched'. I never knew that was a UK-ism.

    Learn something new every day, I suppose.

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
  67. KGARTH by hawk · · Score: 1

    About ten years ago, driving through Califorinia's central valley, a DJ announced "KGARTH---all Garth, all the time."

    I figured it for a gag, making fun of the overplaying of the singer of the week as he played a couple in a row.

    At about the fourth song, the joke was old, and I found another station (for crying out loud, he only had 2 or 3 albums at the time!).

    An hour or two later I checked (morbid curiosity). They were *still* playing Garth Brooks.

    hawk

  68. Useless garbage. by operagost · · Score: 1
    Most of the people who browse Slashdot could build something better.

    This is simply a large collection of commodity-quality hardware. There is no RAID and no hotswap, so a hardware failure results in large chunks of data being unavailable for extended periods while data is restored. Useless for truly critical data.

    --

    Gamingmuseum.com: Give your 3D accelerator a rest.
    1. Re:Useless garbage. by misterTreellama · · Score: 1

      My microwave is also useless for truly critical data, but that's not it's intended use either. The machine in the article is meant for massive amounts of storage, not the backbone of corporate network. Obviously slashdotters could build something better, like a tasty blueberry pie, but we're talking about mass storage, not pie.

      --
      "Let the Spanish keep it, it's a sh*thole," we said, but you had to have your goddamned orange juice.
  69. "AS been"?... by Anonymous Coward · · Score: 0

    "AS been".. i see... the notion of petabytes must have increased the illiteracy factor of the editor.

  70. Please clarify your point by Anonymous Coward · · Score: 0

    Copying music isn't necessary to listen to it on the radio.

    Yes, it is. You, by playing your radio, create a copy of the sounds that were made in the recording studio.

    The singer doesn't just shout really, really loud, you know.

    Copying a movie isn't required to see it on HBO.

    The television, again, creates a copy of the movie from the signal it reads from it's attennna, or over the cable it's connected from.

    You can try this experiment if you don't believe me. Get two TVs. Turn them to the same channel. They'll both work at the same time, and they'll each independantly display a copy of the movie. Seriously. Trust me. It really works!

    Making a copy of html from the internet IS part of getting your computer to display it.

    The computer makes a copy from the signal it gets over the phone line or other cable that it's connected to.

    In what material way is translating a signal sent to a computer screen different than translating a signal sent to a television screen or radio speaker? In all three cases, a new copy of the work is made from the signal recieved.

    As for your last point, books, yes, we do keep a copy of the text in your minds as we read. The law, however, chooses to pretend that the copy in our head doesn't exist, or at least, doesn't apply it to copyright law.

    However, if you read a book out loud, you do violate copyright, at least in my country. There's even a special exemption in the Copyright Act that says that teachers can read "reasonable portions" of a book out loud for purposes of academic study. Yay, copyright!

    So, reading books to a class makes a copy. Listening to the radio makes a copy. Watching TV makes a copy. And so does downloading HTML to a computer.

    In other words, a copy is a copy is a copy. The internet isn't a special case. Google probably is risking prosecution under copyright law, but will probably win if prosecuted, because judges like Google. Judges are ex-lawyers, and as such, are quite good at protecting their own interests: and everyone uses google, even judges.

    Short answer: in this case, the good guys will probalby win, because the bad guys find them useful. ;-)
    --
    AC

  71. This should almost be enough by suitepotato · · Score: 1

    to keep a copy of every forked distro with source and commentary on same going back to the beginning and onward for at least the next five... minutes. Too bad most of my floppies are history. Anyone still got a complete set of Yggdrasil files?

    --
    If my grammar and spelling are off, I am [distracted/tired/careless] (take your pick)
  72. Exclusions by bhadreshl · · Score: 1

    I wonder how redundant the archives actually are.
    For example, if they backed up Google cache, that would be absolutely redundant, they would waste space.

    On the same note, does google cache cache its own cache ? (stupid tongue twisters)

  73. Hahaha by dot_borg · · Score: 1

    "Petabyte" just sounds so dirty. :)

  74. How does it work? by sanferrera · · Score: 1

    Anybody knows how this thing works? I mean, I understand you can use LVM to bind together the 4 hd in the same computer as a logical Volume, but how do you put together devices from many nodes (computers) as a single logical volume?

    1. Re:How does it work? by mattpalmer1086 · · Score: 1

      I have actually seen a Petabox, and it's quite impressive. I work at the UK National Archives, and we have an arrangement with the Internet Archive to archive government web sites. They were kind enough to show us around and explain everything.

      The point of the Petabox is that a single person should be able to maintain that amount of storage cheaply. The genius of the design is it's simplicity. All units are identically specced. Their IP addresses are sequential and the units are arranged physically to match, making it extremely simple to locate a malfunctioning rack unit. There is no logical volume management at all. Instead, a home web server locates a requested file and redirects the request to the box on which it lives. All technology is open source and open standards.

      All the custom units are mirrored, and arranged in a double sided rack mount with a central heat vent. They all do some pretty sophisticated power management to reduce the cooling requirements.

      It's not that you can't create a more resilient architecture; you can. But you can't do it as cheaply, or manage it as efficiently and simply as the Petabox.

      I wouldn't use it for critical data with absolute availability requirements. I would use it if I needed to store huge amounts of data cheaply and reasonably securely.

      I've only touched the surface of what I was shown, and the extreme focus on simplicity is carried through every aspect of the Petabox's design. I was extremely impressed with it, its design superbly fulfills it's requirements.

  75. $2 a gb? by iamhassi · · Score: 1

    Last I checked I can get 200gb drives for about $100. That's me, just some guy buying one drive, paying 50 cents a gb, so I'm guessing they're paying a lot less buying a petabyte at a time, so what's with the 4x pricing? I understand the need for profit, but 4x?

    --
    my karma will be here long after I'm gone
  76. Apple XServe by Guspaz · · Score: 1

    Has nobody noticed that this solution isn't really any better than competing solutions?

    Take Apple XServe for example. Whereas the PetaBox is $2/GB, Apple's much more advanced (much more reliable and redundant) solution costs only $2.27/GB... And I bet that if you were buying XServes in groups of 12 to match the 64TB PetaBox offering, Apple would give you a bulk discount taking it down to $2 or under.

    I say more advanced because Apple's solution supports 2gbit fibre channels, hardware RAID, redundant PSUs, cache battery backup, redundant cooling, and the PSUs and hard drives are hotswap.

    Oh, and while the Petabox is 1.6TB per 1U, Apple's solution is ~1.9TB per 1U (5.6 in 3U).

    So, it would seem that in all respects, Apple's XServe is immensely superior. Price is only slightly higher and the bulk discounts that you could probably get in order to match the PetaBox offerings would probably make it no more expensive.

    So what is special about this PetaBox stuff? Why would anybody buy it? How can they claim to be the best when they are clearly not? Why do we CARE about it? Why did archive.org choose it?

    1. Re:Apple XServe by Anonymous Coward · · Score: 0

      I don't think apple's price includes the rack, cabling, and switching backbone. I also think that the $2/GB price is probably conservative.

      So what's so special? Well, try operating 160 Xserve's and tell me what your power bill is. :-) That's the end-game, not to mention volume discounts you could get if buying a petabyte.

      If $/GB is your problem, then why not simply put 250GB or 300GB disks in there instead of the 400's. That would reduce you $, but you'd spend more per month operating each of the terabytes.

    2. Re:Apple XServe by Guspaz · · Score: 1

      True on those points, but racks and cabling can be had cheap, though fibre switches might drive up the cost.

      You'd actually need 275 xserves to do the 1.5PB. According to apple it is typical for that many machines to need about 83kW, as opposed to the petabyte's 50kW. However the xserve uses less space, so what is more expensive, space or power? Usually the answer is space, from what I understand, but I guess it depends. As a totally apples to oranges comparison, where I live in residential power costs the difference (33kW) would be $1425 per month US. That much money barely buys any rack space, so I would say it is highly likely the xserves cost less to run even though they use more power.

      As for cost per gig, with Apple the cost per gig for the 400GB drives is way less than the cost per gig with the 250GB drives. So the bigger drives cost less per gig, and at the same time require less space and power. It is a win/win/win situation. If you are referring to the cost of bare drives, the cost per gig at 400GB doesn't seem to be terribly more than at 250GB, but it depends on so many factors (Brand/store/specs/etc).

    3. Re:Apple XServe by MushMouth · · Score: 1

      Keep in mind that the archive also gets some processing power on these machines, and it does actually use it.

    4. Re:Apple XServe by Anonymous Coward · · Score: 0

      ... Why would anybody buy it? ...

      You're right, you know their requirements from the summary. They should just listen to you and do what you think. They're dumb.

      For christ sake, if they could use a solution that is immensely superior, they probably would be. Usually when you don't understand why something is happening, it is because you don't have all the information. Either that or you're a zealot and ignoring the information.

  77. The New iPod... by Jozer99 · · Score: 1

    Apple has announced the new 1.5 Petabyte iPod. Holding 250,000,000 songs encoded in 96kbps AAC, or 3 songs in Intels new High Definition Audio format. Although it is the size of a small delivery truck, Apple has not increased the battery size from previous iPods to "conserve battery power". Even with Apple's newly implemented power conservations schemes, the iPod 1.5p gets aproximately .0034 seconds of battery life, before the battery melts in a spray of toxic superheated lithium and acid. Several have already been purchased by Michael Jackson, Sting, and Ben Afleck.

  78. Re:They don't like RAID - SATA vs. IDE by billstewart · · Score: 1

    For this application, the performance differences aren't significant, and any CPU utilization differences don't matter either, but SATA's a lot more convenient mechanically, which is important when you're trying to cram thousands of parts into a manageable space, especially if you want the drives to be removeable.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  79. Re:Apple XServe, are you insane! by Anonymous Coward · · Score: 0

    Here is a link to their storage.. and so what, you still need to buy a computer to do anything with it.

    http://store.apple.com/1-800-MY-APPLE/WebObjects/A ppleStore?family=XserveRAID

  80. Re:Apple XServe, are you insane! by Guspaz · · Score: 1

    Insane? No... Without considering extras it is not that much more expensive than the PetaBox solution.

    I realize that you still need a computer, however you can hook up multiple XServe RAIDs to a single server via a fibre switch. I guess it comes down to bandwidth; if you load up an rack with 13 XServe RAIDs, 1 fibre switch, and then 1 server, you'll have 72.8TB in the rack, yes, but you'd be limited to whatever the max bandwidth of that one server is. Probably 1 or 2 gigabits, though you could go for something more customized.

    Regardless, the XServes would still be immensely more reliable than the PetaBox nodes. If you want maximum uptime and reliability, then PetaBox is a disaster waiting to happen, unless they use software mirroring. And you'd have higher admin time for replacements, and worse management/monitoring tools.

  81. A petabyte, huh? by ScrewMaster · · Score: 1

    Better hope that that La Femme Nikita chick doesn't have rabies.

    --
    The higher the technology, the sharper that two-edged sword.
  82. AoE by kinema · · Score: 1

    Wouldn't something like Coraid's ATA-over-Ethernet based product EtherDrive product make more sense for building massive storage array like this?

  83. No, really. by Grendel+Drago · · Score: 1

    There's good stuff on there. If you don't like the chub, don't look behind the cut when you see thenewwavechick or whatever; wait for i_like_sharks to post again. Not to mention that they have a policy now about wang-warnings on the cuts. Or, if you're allergic to even the possibility that wang may be lurking behind an unclicked cut, there's always show_your_pussy.

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
  84. Makes me wonder by MyLongNickName · · Score: 1

    If you made a document that filled the whole storage unit, would it be a "petafile"?

    --
    See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year