Slashdot Mirror


World's Five Biggest SANs

An anonymous reader writes "ByteandSwitch is searching the World's Biggest SANs, and has compiled a list of 5 candidate with networks supports 10+ Petabytes of active storage. Leading the list is JPMorgan Chase, which uses a mix of IBM and Sun equipment to deliver 14 Pbytes for 170k employees. Also on the list are the U.S. DoD, which uses 700 Fibre Channel switches, NASA, the San Diego Supercomputer Center (it's got 18 Pbytes of tape! storage), and Lawrence Livermore."

161 comments

  1. ... That we know about by Anonymous Coward · · Score: 5, Insightful

    What about Google, Amazon, Yahoo, Microsoft, etc.?

    1. Re:... That we know about by Anonymous Coward · · Score: 1, Informative

      Google doesn't use conventional SAN architectures, so they probably wouldn't qualify for this list.

    2. Re:... That we know about by Anonymous Coward · · Score: 5, Funny

      And there wasn't even a single Japanese firm listed. You'd think they'd have the biggest SANs of all.

    3. Re:... That we know about by Chapter80 · · Score: 4, Funny

      I was thinking San Diego, San Francisco, San Antonio, San Jose, Santa Claus.

    4. Re:... That we know about by curlynoodle · · Score: 1

      I expect that government and non-profit organizations are more likely to release such info. It make them appear high-tech and cool. Beats me as to why Chase would comment on their storage, maybe for shareholders.

      As for Google, Microsoft, Amazon, Yahoo, etc, I expect its important to business not to share system architecture with the public. However, read http://labs.google.com/papers/disk_failures.html to get an idea as to Google's storage systems.

    5. Re:... That we know about by homey+of+my+owney · · Score: 1

      The list seems arbitrary. The mother of them all is the Naval Oceanographic Office, which as I recall is a mix of IBM, EMC and Hitachi - isn't listed.

    6. Re:... That we know about by GoodOmens · · Score: 2, Interesting

      I know it's not a PB but here at the Census we have one 150TB array thats used for one project and not 170K employees.

      It is interesting when you get in storage of this size. I remember sitting in a meeting where we discussed a storage cabinet we were ordering. The RAW size of the cabinet was 150TB but formatted it would be 100TB ... 50TB is a lot of storage to "throw away" for redundancy / formatting! Considering at this price your paying about $10k+ a TB (With staff and infrastructure costs fractured in)!

    7. Re:... That we know about by Anonymous Coward · · Score: 0

      Amazon is not a big SAN shop either.

    8. Re:... That we know about by valen · · Score: 1


        Google don't release stats on the size of their clusters, obviously. But this is what they were doing four years ago;

        http://labs.google.com/papers/gfs.html
        http://labs.google.com/papers/bigtable.html

        Since then, they've double their engineering staff every year, and rolled out a lot of new purpose-build datacenters. I've seen public guesstimates as high as 400,000 machines. With SATA, it's not unreasonable to imagine 5 or 10 disks in every machine. That's a lot of storage. Of course, having one 'SAN' makes no sense; you need it distributed, so latency to each user is as low as possible.

        Bigtable in particular is interesting; a database designed to handle many petabytes of data in a low-latency environment.

        Sigh. I really really wish we could be completely candid about all this stuff. Maybe over time...

      John

    9. Re:... That we know about by WuphonsReach · · Score: 2, Informative

      $10k per Terabyte isn't all that bad; maybe about 30-50% higher. Server level storage goes for $4-$8 per GB (so $4k to $8k per Terabyte). It may also depend on when that SAN was put into use. Were they able to use less expensive SATA drives, or did they need the raw performance of SCSI, etc. Plus cost per gigabyte slowly decreases over time (not as fast as it used to, but it's still a gradual decline of maybe 25% per year).

      --
      Wolde you bothe eate your cake, and have your cake?
  2. Not so accurate by cymru_slam · · Score: 4, Informative

    I work for one of the organisations listed and I have to say that what they described sounds NOTHING like our infrastructure :-)

    1. Re:Not so accurate by bigmouth_strikes · · Score: 1

      Not to imply that you don't know what you're talking about, but isn't part of the point of a SAN; to an end-user it practically looks and behaves as if it were a local device ? Of course, you might be in the know of the overall storage infrastructure, but then again... you might not.

      So to save your reputation you'll have to spill your beans, so to speak.

      --
      Oh, I can't help quoting you because everything that you said rings true
    2. Re:Not so accurate by pcsmith811 · · Score: 0

      Not to imply that you don't know what you're talking about, but isn't part of the point of a SAN; to an end-user it practically looks and behaves as if it were a local device ? No, the point of a SAN is for performance, redundancy, and flexibility, growth and sharing. End users wouldn't even be seeing local devices because you shouldn't have end users plugged into a SAN. If you meant sysadmin's, then yes, it would essentially look like a local device. Anyway, like cymru_slam implied, this article is just FUD.
    3. Re:Not so accurate by Anonymous Coward · · Score: 0

      Well, if you're Welsh, I imagine you're not working for the U.S. DoD, NASA, the San Diego Supercomputer Center or Lawrence Livermore. So, how's it working for JP Morgan Chase then?

    4. Re:Not so accurate by Znork · · Score: 4, Insightful

      "you shouldn't have end users plugged into a SAN."

      Exactly why shouldn't you have end-users plugged into a SAN? I run a SAN, and I find that diskless workstations PXE booting off gigabit iSCSI storage are a huge improvement to having local disk. For more or less exactly those reasons; performance, redundancy, flexibility, growth and sharing. Not to mention data consolidation and savings in less wasted local storage.

      I suspect the idea that SAN's are for servers is mostly spread by overcharging SAN vendors who dont want their profit margins eroded by inexpensive consumer devices. In fact, I'd say consumer storage is rapidly progressing beyond the server side and is these days the main driver behind storage expansion; I certainly know my home storage needs expands faster than the vast majority of the servers I admin (yes, there are the we-want-to-simulate-the-atoms-in-the-ocean exceptions, but most business application servers use less storage than you can get in an mp3 player).

    5. Re:Not so accurate by Bandman · · Score: 3, Insightful

      I'm not sure it's FUD, since that means "Fear Uncertainty and Doubt"

      It's more, inaccurate, or maybe a result of shallow researching, or at the very least simplified.

    6. Re:Not so accurate by Anonymous Coward · · Score: 5, Insightful

      Well, it sounds like your environment is PC based. The environment I work in is server based. An end user could leave his/her computer in a taxi and we can have them up and running and productive on a new PC within minutes with little chance of actually losing anything. I say little chance because although we make every attempt to force things to the network through our computer system policies and document management systems, sometimes they still manage to put things in "My Documents" but that is the exception, not the norm. It is more then just a single user though. With that system in place, our entire office in downtime Washington DC could be blown up and the bulk of the offices business operations can be up and running from another one of our offices in another city or our companies DR site in a short period. For our environment, it is much easier to manage a backend and provide adequate remote user tools (Citrix for example) then it is to attempt to manage storage on a thousand or so individual computers. Imagine trying to do disaster recovery or emergency planning for an office that had a bunch of individual personal storage devices and a local PC based file storage system.
      Not everyone needs a SAN for storage but using a SAN is a very sound decision for those that need the capabilites it provides. A SAN is not just a buzz word although I do not doubt some people bought them without understanding what they were getting and why.

    7. Re:Not so accurate by DarthTaco · · Score: 5, Funny

      How about SHallow and Inaccurate Tripe?

    8. Re:Not so accurate by LWATCDR · · Score: 2, Insightful

      "but most business application servers use less storage than you can get in an mp3 player)."
      Yes they do.
      I am migrating the our support call, issue tracking, and RMA data base to a new server. We take a good number of calls a year and have almost six years of data on the server. The dump file is only 16 megabytes. Most business data is still text and text just doesn't eat up that much space.

      For home use doesn't and workstations does NAS make more sense than SAN? I am on a small network so we only use NAS for shared drives.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    9. Re:Not so accurate by Philosinfinity · · Score: 4, Insightful

      Go to a law firm and ask them about their document management systems or their litigation support applications. Go to a bank and ask them about their financial records. What about email archives for compliance? Size up the disk space utilization and I think you will see many application servers that are significantly larger in storage than an MP3 player. Point taken, SANs can be used at the desktop level. But I partially wonder why? Wouldn't it be better to synchronize users' data folders with shares on a server that is diskless to the SAN? Why waste all that 'spensive storage just to make workstations diskless? Unless you are using a Compellant SAN or some SAN that is running a deduplication engine on the fly, you're stuck storing an OS install for each workstation.

      Besides this, I've always felt that the big advantage of a SAN is the ability to replicate an entire environment to another site in case of disaster. SANs are really utilized to the max in enterprise environments where these features are necessary for successful business operations.

    10. Re:Not so accurate by Anonymous Coward · · Score: 0

      Mod parent down. No one implied the article is FUD. I've read the article (yes, I must be new here), and have no clue what you're talking about. They're compiling a list, making no argument or editorial whatsoever. There's no fear, no uncertainty (except that they need people to submit SAN info), and no denial. Furthermore, you don't plug anyone into a SAN really. You use a fiber switch or iSCSI client to present disk space in the form of volumes from the SAN to the client (in this case client means any box that needs a drive).

    11. Re:Not so accurate by Anonymous Coward · · Score: 0

      "but most business application servers use less storage than you can get in an mp3 player"

      Uh, what? No matter how I try to rationalize this comment, I can't.

      Exactly what business applications are you administering? Last I checked there wasn't a 400GB MP3 player in the consumer market.

    12. Re:Not so accurate by gnuman99 · · Score: 4, Funny

      our entire office in downtime Washington DC could be blown up


      I sense the little counter at NSA/"homeland security" click up -- Internet chatter about possible attack just increased! Few more like that, and terror alert will go up!! Geez people, watch what you type!

    13. Re:Not so accurate by Znork · · Score: 1

      "For home use doesn't and workstations does NAS make more sense than SAN? I am on a small network so we only use NAS for shared drives."

      Yes and no. Shared storage such as home directories, shared files, etc, are better on a NAS.

      The advantage of SAN connections for the desktop lies in the OS and paging space area, for speed critical applications, and more in the area of maintenance and support than in convenience. Getting rid of the local disk gives you the ability to do things like migrate clients to new operating systems with a simple reboot (and still have the old disk image to fall back on), take snapshots of client disks, save large amounts of space by buying more cost effective disks (smallest disks available today are, like, 80 gb, where the OS needs maybe 10. And larger disks are far more cost effective per gigabyte, so sharing them is cheaper). Etc.

      Basically you get most of the advantages of thin-client hardware, without suffering from as many of the disadvantages.

    14. Re:Not so accurate by Znork · · Score: 1

      "What about email archives for compliance?"

      Good examples of business uses, but still comparatively small (depending on company size). Compare with PVR storage, mpeg files, etc.

      There are, of course, larger storage needs for some applications, particularly in large companies, but I see more and more cases of formerly huge applications where I had a terabyte storage ten years ago and the disk arrays were awe inspiring (and cost a fortune). Now the same systems will have a terabyte and a half storage, and there are almost consumer grade disks that large.

      Enterprise storage simply isnt growing that fast anymore, nor is most of it extreme. The litigation support apps, financial records and email archives grow at a snail pace compared to things like PVR applications that record many hours worth of video per day (and that's not even in HD).

      "Why waste all that 'spensive storage just to make workstations diskless?"

      You use the cheapo storage instead. Gigabit ethernet and cheapo (well, cheap-er at least, and actually cheap if you can live with a diy solution) iSCSI servers are perfectly adequate for workstation use.

      "you're stuck storing an OS install for each workstation."

      Depends on how you do it. If you're on linux and for something like classroom use, you can use unionfs to overlay a shared image and save that space too.

      But even if you are stuck using an OS install for each workstation, the fact is you need far less OS diskspace per client than the smallest available disks these days. Add to that the ability to non-destructively change OS images with a simple reboot rather than a reinstall, coupled with easier backups, better data control, etc, and there are several compelling advantages to it.

    15. Re:Not so accurate by Philosinfinity · · Score: 1

      Good call on the overall disk space savings. I wasn't really thinking about the unneeded hard disk space on the client boxes. It is definitely an interesting potential configuration... especially if you could pipe a deduplication engine in front of the disk array. Granted you would take a performance hit, but you could save a great deal of space.

      As far as the overall decrease of disk necessity in the enterprise, I'm not so sure I agree. Our email environment is about 3TB local with another 5. - 1.5TB in each of 8 remote offices. Every time we order more shelves for our SAN, the storage is claimed before the hardware comes in. If anything, I think programmers are taking the increase of disk size as a open door to utilize as much of it as possible. Even looking at the consumer end, this holds true. New digital camera shots are easily 10MB each. My old Sony Mavica could store about 10 full res images in a single floppy. Blank Word documents start off around 23K. I think the argument that storage necessity has consistently gone up dramatically over the last 15 years, 10 years, and even 5 years is nothing short of accurate. This is especially true in the enterprise. For instance, our firm received a new case with a bunch of documents to scan into our litigation support database. Between that, PST files, and other information, we utilized over 2TB of disk space. That's 1 case. Hi-res images of case documents, emails, and our financial DB grow exponentially. I've worked on DR plans for several clients throughout the enterprise and they all experience the same thing. Perhaps some areas are slightly decreasing in disk utilization, but the big apps are growing and growing.

    16. Re:Not so accurate by Anonymous Coward · · Score: 0

      Smooth... You obviously didn't read the parent post. :-)

    17. Re:Not so accurate by Anonymous Coward · · Score: 0

      our entire office in downtime Washington DC Aha! You've just given us a huge clue regarding your choice of server OS!
  3. Finally ... by Anonymous Coward · · Score: 0

    ... somewhere to store all my porn

    1. Re:Finally ... by Zymergy · · Score: 1

      Australian library may catalogue Internet porn http://news.bbc.co.uk/2/hi/asia-pacific/2221489.stm You may Need to give them a call....

  4. What about MIB? by Anonymous Coward · · Score: 0

    How bing is the central medical database repository?

    1. Re:What about MIB? by Anonymous Coward · · Score: 0

      Its fine, and stop calling me bing.

    2. Re:What about MIB? by Anonymous Coward · · Score: 0

      Ha ha ha! Classic.

  5. Very U.S. Centric... by Noryungi · · Score: 5, Insightful

    Yes, I know, US web site and everything but, seriously, have you checked the data storage of CERN (birth place of the web) lately?

    If I remember correctly, these guys will generate petabytes of data per day when that monster particle accelerator goes online in a few months...

    --
    The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
    1. Re:Very U.S. Centric... by barry_the_bogan · · Score: 5, Funny

      They're talking about the "world", as defined by the World Series Baseball people. Lame story.

    2. Re:Very U.S. Centric... by palpatin · · Score: 3, Informative

      Well, don't know about the storage capacity, but the LHC will produce around 15 petabytes per year, when they turn it on.

    3. Re:Very U.S. Centric... by Daniel+Phillips · · Score: 1

      If I remember correctly, these guys will generate petabytes of data per day Rechecking your facts would be in order. Some 10 to 15 petabytes are expected to be saved per year according to sources I have seen, though only a small fraction of the raw sensor data will be permanently recorded.
      --
      Have you got your LWN subscription yet?
    4. Re:Very U.S. Centric... by JorDan+Clock · · Score: 1

      You would have been real upset before San Francisco was taken off the list.

    5. Re:Very U.S. Centric... by TapeCutter · · Score: 2, Interesting

      IIRC they need a massive cache where the "sampling algorithm" throws a heap of data away. A quick google gives the following precise measure - "The LHC will generate data at a rate equivalent to every person on Earth making twenty phone calls at the same time." - but as you say it only stores a fraction of that.

      Now asuming the phone calls are made over POTS, the bitrate from the sensors should be...20 * 6*10^9 * 1220bps...

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    6. Re:Very U.S. Centric... by torako · · Score: 2, Informative
      Most of the caching is done using custom hardware that lives right in the detectors with latency in the order of s. The data output for permanent storage is (for the ATLAS detector, that's the one I know some stuff about) 200 MBytes / s, which is not gigantic. There are some PDF slides of a seminar talk on the triggering mechanism for ATLAS on my homepage in the Seminar section (in English even).

      Still, the data aquisition and storage system is impressive. Most of the storage will be distributed over different sites, so I don't know if there will be a huge central storage system.

    7. Re:Very U.S. Centric... by bockelboy · · Score: 1

      The CMS detector will take data at 8GB/s at turn on (that's gigabytes, not gigabits). This will be filtered and a few percent will be saved.

      CASTOR's (the CERN data store) current stats are here:

      http://castor.web.cern.ch/castor/

      About 8 PB of files. If i recall correctly, there's around 500TB of online disk space and 10-30PB of tape storage (some of it is getting phased out).

      FNAL has a similar setup, except with a storage manager called dCache. There is no use of protocols like iSCSI or Fiber Channel over IP, but rather physics-specific ones (xrootd, rfio, dcap) and grid-specific ones (SRM and gridFTP).

    8. Re:Very U.S. Centric... by perturbed1 · · Score: 4, Informative

      I'll talk about one of the experiments, ATLAS. Yes we "generate" petabytes of data per day. It's rather easy to calculate actually. One collision in the detector can be compressed down to about 2MB raw data-- after lots of zero-suppression and smart-storage of bits from a detector that has ~100 million channels worth of readout information.

      There are ~30 million collisions a second -- as the LHC machine runs are 40Mhz but has a "gap" in its beam structure.

      Multiplying: 2 * 10^6 * 30 * 10^6 = 6* 10^13 Bytes per second. So ATLAS "produces" 1 petabyte of information in about 13 seconds!! :)

      But ATLAS is limited to being able to store about ~300 MB per second. This is the limit coming from how fast you can store things. Remember, there are 4 LHC experiments after all, and ATLAS gets its fair share of store capability.

      Which means that about of 30 million collisions per second, ATLAS can only store 150 collisions per second.... which it turns out is just fine!! The *interesting* physics only happens **very** rarely -- due to the nature of *weak* interactions. At the LHC, we are no-longer interested in the atom falling apart, and spitting its guts (quarks and gluons out). We are interested in rare processes such as dark-matter candidates or Higgs, or top-top production (which will dominate the 150Hz btw) and interesting and rare things. In most of the 30 million collisions, the protons spit their guts out and much much *rare* things occur. The catch of the trigger of ATLAS (and any other LHC experiment for that matter) is to find those *interesting* 150 events out of 30 million every second -- and do this in real time, and without a glitch. ATLAS uses about ~2000 computers to do this real-time data reduction and processing... CMS uses more, I believe.

      In the end, we get 300 MB/second worth of raw data and that's stored on tape at Tier 0 at CERN permanently -- and until the end of time as far as anyone is concerned. That data will never *ever* be removed. Actually the 5 Tier 1 sites will also have a full-copy of the data among themselves.

      Which brings me to my point that CERN storage is technically not a SAN (Storage Area Network)... (My IT buddies are insisting on this one. ) I am told that CERN storage counts as a NAS (Network Attached Storage). But I am going to alert them to this thread and will let them elaborate on that one!

    9. Re:Very U.S. Centric... by bushki3 · · Score: 4, Insightful

      from TFA

      "We at Byte and Switch are on the trail of the world's biggest SAN, and this article reveals our initial findings."

      and this

      "Again, this list is meant to be a starting place for further endeavor. If you've got a big SAN story to tell, we'd love to hear it."

      oh, and this too

      "we present five of the world's biggest SANs:"

      notice how everything in TFA clearly says this is not THE 5 BIGGEST SAN's in the world but the 5 largest they have found SO FAR.

      I know -- I must be new here, but I'm getting there. I didn't read the whole article, just a few sentences from the first page.

      --
      011100110110100101100111
    10. Re:Very U.S. Centric... by Anonymous Coward · · Score: 0

      only 300MB/s...? that's a bit pokey... i've got pitiful filesystems set up that can do at least a GB/s, our larger filesystems can pull 5-7GB/s... (big B) Is this a single node?...

    11. Re:Very U.S. Centric... by Anonymous Coward · · Score: 0

      CERN generates ca. 15 petabytes of stored data per year. A lot of intermediate data is gathered earlier, but it is discarded.

    12. Re:Very U.S. Centric... by perturbed1 · · Score: 2, Informative

      Can't imagine why you write this as AC, but ok...

      Answer: "That's only 300MB/s 24/7 for more than half a year for writing the raw data to storage. Then there are the other three experiments with the same amount of data, actually one of them does 1.2GB/s of raw data. The data ends up on disk first with an aggregate write speed of ~1.5GB/s (let's not exaggerate). The data is read immediately from disk again to be written to tape (our final storage media), so ~1.5GB/s reads ... Then, all this data is being exported to external computing centers pretty much immediately too (multiple copies etc. etc., so aggregate is much higher than 1.5GB/s), so we get ~3GB/s of reads just from this data export (it can, potentially, be a lot more. we have already a total of 120Gbit/s of network connectivity to those sites). So, we are already at ~6GB/s of I/O and nobody even had a look into the data itself!! If we talk data analysis, we talk about repeated reprocessing runs over the entire collection of raw data in order to "create" the data format that physicists can more easily use for their analysis, we talk about several thousand people accessing all the accumulated data in a perfectly random way ... mind you, we keep all the raw data active, so 10 years there will be at least 100PB, probably more like 150PB, maybe even 200PB of active storage. The current estimate for the I/O caused by the data analysis is in the order of 50GB/s (big B). "

    13. Re:Very U.S. Centric... by BoothbyTCD · · Score: 1

      You mean like 'The New York World', a now defunct newspaper that was the original sponsor of the first MLB championship? I know we like to bash US cluelessness here, but, this particular bit of misinformation always irks me.

      --
      snig
  6. The big surprise is by Centurix · · Score: 5, Funny

    that all the disks are formatted FAT32...

    --
    Task Mangler
    1. Re:The big surprise is by ichigo+2.0 · · Score: 2, Funny

      2 TB should be enough for everybody!

    2. Re:The big surprise is by CastrTroy · · Score: 1

      Unless you run Windows, then they only let you create partitions up to 32 GB.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    3. Re:The big surprise is by drseuk · · Score: 1

      Pah, 640k is more than enough for real hackers. P.S., At 14+ PB, the command line to unsplit the 2GB FAT splits is itself going to exceed 2GB. Anyone got a workround? (I'm running Vista SP0 I think).

    4. Re:The big surprise is by Anonymous Coward · · Score: 1, Funny

      Whoosh!

  7. Pronunciation is the key by User+956 · · Score: 4, Funny

    Finally somewhere to store all my porn

    We're talking about Petabytes, not Pedobytes.

    --
    The theory of relativity doesn't work right in Arkansas.
    1. Re:Pronunciation is the key by BosstonesOwn · · Score: 1

      Does that mean he has lots of PedoFiles ;)

      --
      This package Does Not Contain a Winner
    2. Re:Pronunciation is the key by Kjella · · Score: 1

      Not to mention PETAbytes - "No animals were hurt during the production of this storage medium.

      --
      Live today, because you never know what tomorrow brings
    3. Re:Pronunciation is the key by jamstar7 · · Score: 1

      Next time, try harder to injure the l*wy*rs.

      --
      Understanding the scope of the problem is the first step on the path to true panic.
  8. 14Pb for 170k employees... by Joce640k · · Score: 4, Insightful

    14Pb for 170k employees isn't so much - 83 gigabytes per person.

    If you add up the total disk space in an average office you'll get more than that. If I add up all my external disks, etc. I've got more than a terabyte on my desktop.

    (And yes it's true, data does grow to fit the available space)

    --
    No sig today...
    1. Re:14Pb for 170k employees... by Enderandrew · · Score: 2, Informative

      When I generate ghost images for the PCs here at work, the average desktop user goes through about 4 gigs here, if that. 83 gigs per person is quite a bit.

      I'm also curious about Google and the like. Do they not disclose their storage?

      --
      http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
    2. Re:14Pb for 170k employees... by StarfishOne · · Score: 2, Informative
      "I'm also curious about Google and the like. Do they not disclose their storage?"


      To a certain extend they have disclosed some numbers in a paper about their distributed storage system called "BigTable". The title of the paper is "Bigtable: A Distributed Storage System for Structured Data" and it can be found right here.

      Some numbers can be found on page 11:
      Project and Table size in TB:

      Crawl: 800
      Crawl: 50
      Google Analytics: 20
      Google Analytics: 200 (Raw click table)
      Google Base: 2
      Google Earth: 0.5
      Google Earth: 70
      Orkut: 9
      Personalized Search: 4

      Total so far: 1,155.5 TB

      It's a very interesting paper to read. One of the many papers Google has put online:

    3. Re:14Pb for 170k employees... by commlinx · · Score: 3, Funny

      14Pb for 170k employees isn't so much - 83 gigabytes per person. If you add up the total disk space in an average office you'll get more than that. If I add up all my external disks, etc. I've got more than a terabyte on my desktop.

      You'd find a lot of the 83GB on a typical office PC is crap you're not going to put in a SAN, my boot drive without data has 50GB used and other than the pain in the arse of re-installing I couldn't give a toss if I lost all that "data". Yes I've got a TB of storage too but subtract p0rn, DVDs and other contents that would get me sacked if I worked in a corporate environment, subtract the large amount of reference material (that would be shared between users in a corporate environment) and all my original work for the past 10 years amounts to well under 10GB.

      The data use has little use to do with the number of employees - it amounts to how much data you are getting from external sources whether they be a large number of customers or data acquisition (such as digital photographs). If you're talking text for example a fast typist is probably hitting something like 10 characters per second, a whopping 280K per working day, 70MB per working year, 3.5GB over their life.

    4. Re:14Pb for 170k employees... by WindowsIsForArseWipe · · Score: 1, Insightful

      1Gb of natilie portman and hot grits should be enough for anyone

    5. Re:14Pb for 170k employees... by somersault · · Score: 2, Funny

      I hope they filter all those clicks before they dump them in the landfill. Can you imagine the mess 200 terabytes of raw clicks would make?

      --
      which is totally what she said
    6. Re:14Pb for 170k employees... by asserted · · Score: 1

      > I'm also curious about Google and the like. Do they not disclose their storage?

      there's much more than that, but GFS is no SAN at all. google can do better than that, and does.
      GFS is all about cheap storage, lots of it. and yes, 14 P is basically nothing, in google terms.
      what article really is about is "who wasted more money on over-priced enterprisey SAN crap".

    7. Re:14Pb for 170k employees... by fellip_nectar · · Score: 4, Informative
      --
      Worst. Signature. Ever.
    8. Re:14Pb for 170k employees... by RogerWilco · · Score: 1

      No he did not.

      --
      RogerWilco the Adventurous Janitor
    9. Re:14Pb for 170k employees... by me_mi_mo · · Score: 1

      Mod parent down.

      Grandparent is correct.

    10. Re:14Pb for 170k employees... by Poromenos1 · · Score: 0

      No, you miscalculated by a factor of SUCK!

      (Hah :/)

      --
      Send email from the afterlife! Write your e-will at Dead Man's Switch.
    11. Re:14Pb for 170k employees... by PremiumCarrion · · Score: 1

      My calculations suggest grandparent is only out by a factor of ~1.07

      My assumptions are as follows:
      Pb == 10^15 bytes
      170k people == 170 000 people

      Therefore 14 000 000 000 000 000 / 170 000 = 82 352 941 176 (rounded)

      i.e under 83GB per user

    12. Re:14Pb for 170k employees... by Anonymous Coward · · Score: 0

      Why do people have so much difficulty with this:

      b = bit
      B = byte

      so

      Pb = 10^15 bits
      PB = 10^15 bytes

    13. Re:14Pb for 170k employees... by Anonymous Coward · · Score: 0

      So I have been in the embedded development trade for ~25 years (and it has been great fun!). I know that I have far less than 10GB that I currently care about.

      The entire SVN repository where I work, of which my part is just a small piece, is only 15GB. And that includes version info and branches.

      I know for a fact that the entire source/documentation repository from my first thirteen years no longer exists, even though oddly enough, a couple thousand units of one device that I developed during that time are still out in the world doing their job. It is old tech though (wow, twenty years old this year) and it is amazing that it has been useful for this long. So I know for certain, that the full sum of my current work endeavor, the part I personally care about, is much less than 10GB.

      On the other hand, I have 9GB of family photos that are pretty important, and backed up in two places.

      I also have 46GB of music, though I still own all the source CDs for that collection. It would be a bummer to have to re-rip it all, so it is backed up as well. (offsite storage for both).

    14. Re:14Pb for 170k employees... by gardyloo · · Score: 1

      I'd call it a good start.

    15. Re:14Pb for 170k employees... by Anonymous Coward · · Score: 0

      Probably for the same reason they write "C02" for carbon dioxide, "loose" instead of "lose", and "penquin" instead of "penguin": because they're morons. They "think" out loud while mashing their greasy sausage fingers into whatever keys happen to be convenient and assume the reader will put in as much work as necessary to decode the resulting gibberish. Of course most readers are morons too and have no way to distinguish between good writing and sausage-finger gibberish, so for the most part it doesn't matter anyway.

    16. Re:14Pb for 170k employees... by Daniel+Phillips · · Score: 1

      Missed the "k", my bad. Not that that should justify some of the responses...

      --
      Have you got your LWN subscription yet?
    17. Re:14Pb for 170k employees... by ToasterMonkey · · Score: 1

      Big surprise, NOBODY on slashdot knows what a SAN is. Return your geek cards!

  9. At Last! by Zymergy · · Score: 5, Funny

    Someone can install a FULL install of Windows Vista!

    1. Re:At Last! by kraemate · · Score: 1

      Pfft. I am going to need this to install all the packages in the 12 cds of Debian Sarge.

    2. Re:At Last! by Pingmaster · · Score: 1

      and yet, still they cannot find a machine with enough RAM to run it...

  10. Shouldn't this be written somewhere? by rm999 · · Score: 4, Informative

    SAN = Storage area network

    1. Re:Shouldn't this be written somewhere? by OverlordQ · · Score: 4, Funny

      This is slashdot, if you dont know what SAN stands for, please turn in your geek card and report to Digg.

      --
      Your hair look like poop, Bob! - Wanker.
    2. Re:Shouldn't this be written somewhere? by Eideewt · · Score: 2, Informative

      Why thank you. I was trying to figure out what a Storage Attached Network might be.

    3. Re:Shouldn't this be written somewhere? by Smurfeur · · Score: 1

      I thought that SAN = "Call of Cthulhu"'s sanity.

      Does it count as geeky ?

    4. Re:Shouldn't this be written somewhere? by __aailrp9629 · · Score: 1

      Sanity, right?

    5. Re:Shouldn't this be written somewhere? by mgblst · · Score: 1

      What is this network you speak of? And storage, I am guessing you mean cupboard space?

    6. Re:Shouldn't this be written somewhere? by ari+wins · · Score: 1

      Crap. :(

      I'm going to miss you fuckers.

      --
      Don't worry if you're a kleptomaniac, you can always take something for it.
  11. ...and why does the article say "Pbytes", "Tbytes" by Joce640k · · Score: 2, Informative

    ...and why does the article say "Pbytes", "Tbytes", etc.

    The abbreviated units are "PB" and "TB".

    See: http://en.wikipedia.org/wiki/Petabyte

    --
    No sig today...
  12. That's nothing... by Jah-Wren+Ryel · · Score: 3, Funny

    "ByteandSwitch is searching the World's Biggest SANs, and has compiled a list of 5 candidate with networks supports 10+ Petabytes of active storage. What? That's nothing. I've got 100 petabytes just for my pr0n collection!
    --
    When information is power, privacy is freedom.
    1. Re:That's nothing... by Anonymous Coward · · Score: 1, Funny

      yeah I've had a look. How you managed to find so much midget pr0n is beyond me...

    2. Re:That's nothing... by Anonymous Coward · · Score: 0

      and its all of me you sick freak.

    3. Re:That's nothing... by Lxy · · Score: 1

      Wow... that's a lot of petafiles *rimshot*

      --

      There is no reasonable defense against an idiot with an agenda
      :wq
    4. Re:That's nothing... by Anonymous Coward · · Score: 0

      Wow... that's a lot of pedofiles *rimshot*
      Fixed.

  13. Rubish by Anonymous Coward · · Score: 0

    Utter crap. CERN, for instance, is building 15 PB worth of tape storage. The currently have 10PB. And there are organizations on the EB range...

  14. I didn't say the disks were full... by Joce640k · · Score: 1

    I didn't say the disks were full, just that the storage available per person in the average office is more than that.

    Does the whole 14Pb go in a single room? That might be impressive.

    --
    No sig today...
    1. Re:I didn't say the disks were full... by Enderandrew · · Score: 1

      For purposes of disaster recovery, I sure hope not.

      --
      http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
  15. Sooo... by Tastecicles · · Score: 2, Funny

    My home entertainment server at 3.3TB RAID6 isn't even in the running then?

    Bugger.

    --
    Operation Guillotine is in effect.
    1. Re:Sooo... by farkus888 · · Score: 1

      haha, I remember it was only a few short years ago I was hot shit because my personal file server had 500 gig of storage in it. most storage in a single computer of anyone I knew. I had a gig of ram and one of the fancy p4 processors with twice the cache of all the wussy ones. man was I cool. of course now I am older and don't have nearly the discretionary computer money I used to, and I am still running that exact same hardware having wet dreams about joining the 64bit real dual core revolution.

      --
      thats right, I rarely use capitals. deal with it. but don't mistake my laziness for stupidity
    2. Re:Sooo... by clarkkent09 · · Score: 1

      You mean your porn server?

      --
      Negative moral value of force outweighs the positive value of good intentions.
    3. Re:Sooo... by EmagGeek · · Score: 1

      I am running way behind! My home video server only has 1920GB of storage via 8x320GB drives in a Raid-5... but I know what you mean. In college, I put four 1.6GB drives in my box, and did a Linux Stripe to 6.4GB, and that was hot shit back then. We stored lots of CDs in that new-fangled MP3 format on there.

    4. Re:Sooo... by Tastecicles · · Score: 1

      'Ave you been peekin' through my window?

      --
      Operation Guillotine is in effect.
    5. Re:Sooo... by Chineseyes · · Score: 3, Funny

      Unfortunately your "entertainment server" is not among the winners for biggest SANs but it IS in the running for "copyright offender of the year". Our lawyers will be contacting you with your prize soon. Sincerely, Steven Marks Executive Vice President and General Counsel, RIAA

      --
      I think the invisible hand of the market has its middle finger extended

      --A wise old fart named SC0RN
    6. Re:Sooo... by Tastecicles · · Score: 1

      Dear Mr. Marks,
                                    Apologies for the prompt reply, but I feel it falls within my diplomatic obligation to inform you that **AA has no jurisdiction within 2274 miles of the location of my home media storage. I would also like to inform you that said server contains material which I OWN in optical or magnetic tape format which falls within my consumer rights to format shift to view or listen on devices of MY choosing.

      So, fsck off. :p

      --
      Operation Guillotine is in effect.
  16. Let's hope CERN's data can be zipped... by Joce640k · · Score: 4, Funny

    Let's hope CERN's data can be zipped...if not, they'll be in trouble pretty quickly.

    Remember when you got your first copy of Napster and ADSL? That's how serious...!

    --
    No sig today...
  17. Just SANs... so what? by Duncan3 · · Score: 5, Interesting

    Kinda like saying the worlds fastestest runner that likes swiss cheese best. This isn't a list of fastest, largest, most used, etc. Just just some PR spin for SANs. Nothing wrong with that, but still.

    --
    - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
  18. Large Hadron Collider (CERN) will need 15 PB/yr by infolib · · Score: 1

    At least according to this article.

    It's planned to go running next spring, but it's already at 3.5 PB. (The old LEP collider working in the same tunnel produced quite a bit of data too).

    --
    Any sufficiently advanced libertarian utopia is indistinguishable from government.
  19. Details? by clarkkent09 · · Score: 4, Funny

    Ah, go on tell us. We won't tell anybody

    --
    Negative moral value of force outweighs the positive value of good intentions.
  20. They forgot about ... by chris_sawtell · · Score: 2, Interesting

    Google, the WayBack Machine, to say nothing of the 1.5 million machine bot-net we've been hearing about recently.

    1. Re:They forgot about ... by swordgeek · · Score: 1

      You apparently don't have the slightest clue what a SAN is.

      --

      "People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
  21. security, resilience, risk, etc by kefa · · Score: 1

    From my experience most FTSE 100 companies in the UK have multi-petabytes of storage so I'm assuming that the article is referring to a single consolidated SAN and not disparate SAN islands. Although it is interesting to examine the limits of scalability for such an environment on theorectical grounds, a more interesting question would be to understand the reasons why organisations would want to consolidate such vast quantities of data within a single SAN system.

    Surely there are other important considerations such as security, resilience (yes most SANs are dual fabric - but do you not need more if you are putting an entire organisational egg in one basket?) and risk which must be balanced againsts the need to have consolidated access to the entire organisation's storage through a single interconnected SAN?

    1. Re:security, resilience, risk, etc by statemachine · · Score: 1

      Zoning, Virtual SANs (VSANs), and Inter-VSAN Routing (IVR) solve that problem. And why would you need more than a dual fabric setup? Do you like buying expensive HBAs for your servers? Just what kind of hostile environment are you deploying into?

      Do tell. I'm curious.

    2. Re:security, resilience, risk, etc by kefa · · Score: 1

      These are all logical methods of isolation and do not enable you to escape the the impact of physical infrastructure changes. Suppose you need to replace your SAN infrastructure to upgrade - do you really want your entire infrastructure to be dependent on a single fabric while you are carrying out your changes? Likewise I would never want to be dependent on only two copies of your data. If you are working on one copy do you really only want to be protected by a single remaining copy. I wouldn't necessarily advocate a SAN system that uses more than 2 fabrics but when your entire organisation sits on a single SAN surely this poses a significant physical risk.

    3. Re:security, resilience, risk, etc by statemachine · · Score: 2, Informative

      And to that I ask: How paranoid are you, and how much money do you have?

      You also talk about copies of data as if a disk went bad, you'd lose the data. These storage arrays have multiple redundancies (RAIDs of VDisks which are RAIDs themselves) as well as having live replication capability to remote sites -- at which point you likely have a copy (or copies) of an entire datacenter in a different geographic location that is running as a hot spare.

      Within a datacenter, you would not have more than dual fabrics. Your fabrics' switches will also be redundantly connected within themselves. And if you're killing an entire fabric with an upgrade, you're doing it wrong.

      You'll also have service contracts with lockers of disks, switches, linecards, etc., *on site* with field technicians from the vendors on-call 24/7.

      Fibre Channel installations are not like some small company's Ethernet LAN.

    4. Re:security, resilience, risk, etc by Bandman · · Score: 1

      Aside from working in a facility with a pre-existing setup like this, how does one go about educating themselves on how SANs of this magnitude work?

    5. Re:security, resilience, risk, etc by afidel · · Score: 1

      Huh? Unless you have unwisely maxed out your switches before planning the upgrade you would simply interconnect the new physical switches into the fabric through ISL's and then move host over to the new distribution layer switches. You would be vulnerable for the length of time it takes to move a cable from one distribution switch to the new one and only for that single host which is most likely a member of a cluster. As far as the two copies comment, no you would likely have very frequent snapshots within the storage pool at each site which are going to be stored in different cabinets if things are correctly designed. Beyond that many very large companies will have three or more redundant sites, though it gets terribly expensive beyond three from what I have seen and is largely unnecessary as you can get great geographical dispersion with three sites.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    6. Re:security, resilience, risk, etc by kefa · · Score: 1

      This assumes that the new switches are of the same manufacturer and generation. Many SAN 'upgrades' involve changing from one manufacturer to the latest flavour the day (probably as a result of some deal made on the golf course); in this instance you would have to ensure that the SAN is running in 'interoperability' mode which is not generally recommended and can result in reduced SAN functionality - not the sort of state you want your mission critical business to be in.

      Also fabric merge and other configuration commands can be momentarily disruptive to a fabric. You would be suprised by the number of organisations that still run single-connect hosts. Obviously here the correct solution is to fix this, but with the best will in the world this often doesn't happen.

      As for the copies of data - again vast majority of FTSE 100 companies do not have fully redundant sites. Even a single redundant site is too expensive for most companies - it is cheaper to take the hit on shareholder value (although often the risk that organisations take on a daily basis is not exposed to shareholders) rather than invest in full site redundancy.

      Don't get me wrong: most companies have multiple datacentres (e.g. UK1, UK2, etc) that would appear on the surface to provide redundancy, but the quantity of redundant infrastructure in these confusingly titled datacentres is usually vanishingly small. (the banks might be excepted here...)

    7. Re:security, resilience, risk, etc by kefa · · Score: 1

      It's difficult aside from a few public whitepaper produced by each of the appropriate vendors. However these whitepapers are often no more than bragging rights to boast about 'how scalable our solution is' rather than providing any real information.

    8. Re:security, resilience, risk, etc by statemachine · · Score: 1

      Now you're making the argument that these companies need redundancy but are too cheap to spring for it, or the existing topology won't allow it because of the limitations of the design. No one can help that.

      Fabric merges and other configuration commands can have their potential disruptions limited by using zoning and VSANs. But no one can help you if you decide to log on and reboot the switch.

      And yes I do know about those multi-vendor switch fabrics. While no one vendor seems to follow the standards the same way, these differences are mapped out and are well understood by each vendor. If you're just dumping a new vendor's switch on the fabric without research and a plan, then no one can help that.

      It seems the situations you describe are either brought about by unwillingness or being unable to spend the correct amount of money, or just a poorly designed SAN.

    9. Re:security, resilience, risk, etc by Anonymous Coward · · Score: 0

      Data centers can easily have more that dual fabrics. It is not a hard and fast design rule, and if you believe it is that is a big mistake. SAN design must be dictated by business need, and should never be driven by vendor sales people. Here is a great example for more than two fabrics: Say you work for a larger company with multiple departments that due to business need will never have overlapping maintenance windows (if you don't believe you will ever need a maintenance window, you have another thing coming). If you just had 2 fabrics you would always be putting a business unit at risk (single point of failure) during any maintenance. For most companies that risk is not an option. If you work at one where that is an option, well, your job is a breeze. Also, SAN switches are five 9's, while good, far from being bug free. That means they can go down, and don't think an entire fabric can't be affected :)

    10. Re:security, resilience, risk, etc by statemachine · · Score: 1

      Of course they can easily have more than 2 fabrics. They can have 3, 7, 12 redundant fabrics, if they were that paranoid. I never said it wasn't possible.

      But you're just picking and choosing and making up strawmen now. If you have three fabrics, and one goes down because you just botched the upgrade, and the other goes down because a circuit breaker blows, well, now you're back to one fabric and vulnerable again. You're also totally flipping insane if you're worried about this, because the power company can have an extended outage or a maintenance person can short circuit the building with a new power line cut-in for a UPS. So, now you're saying you need three separate power companies, with three separate connections, and three separate UPS systems capable of running the whole building? What's the risk/return ratio on this?

      Just as a simplistic example, director switches have dual power supplies. When one fails, it's reduced to running on one. Do you now want three redundant power supplies to guard against the infinitesimal risk (assuming you've designed properly) that the other will fail? You've just upped the cost, which will probably be more than 50%.

      If you had read some of my other posts in a parallel thread, you'd see that I mentioned having redundant datacenters in different geographical locations.

      You say that SAN design should only be directed by business need, and I've never said it shouldn't. However, your argument is a vendor's dream scenario, and not dictated by sound logic.

    11. Re:security, resilience, risk, etc by Anonymous Coward · · Score: 0

      No, I think you are missing the point. First off, vendors really don't gain much by customers having SAN islands, sure it might sell a few more switches at first, but the reduction in ISL means there are more ports available for hosts/storage. In the long run, a smaller switch count in a fabric is to the users advantage, as they have more ports available for hosts/storage, so they don't need as many switches when they expand. The arguement has nothing to do with redunant data centers, redundant paths, redundant power supplies, or botched upgrades. I agree that there are plenty of sound reasons to have all data accessable through one fabric with a redundant fabric for failure cases, but my arguement is that there are just as good reasons to physically separate different data that are not just due to paranoia and are not extremely more expensive. If you are a multi switch shop it really would be just as easy and cost effective to have physically separate fabrics as it would be to setup hardware partitioning, and you have then contained errorsupgrades to a particular data set fabric. When SLA dictate your performance any fault isolation yout can get is a plus. The main point is that slamming SAN islands as you did in your first post is some what short sighted. There are many ways great to build a SAN, dual fabric, multiple fabric, heck if all you need is storage port fan-in you could get away with no ISLs and be a bunch of single switches sitting in front of storage ports. I don't see a reason to rip on one as being a poor design. If it is what is needed in your DC and you are comfortable managing it that is what should be used.

    12. Re:security, resilience, risk, etc by statemachine · · Score: 1

      Huh? I never slammed any SAN islands. You posted about redundancy with more than a dual fabric. If you are talking about SAN islands, you need to get your terminology straight. You also need to understand that if you have a SAN island, and there isn't more than one path (like through a dual fabric setup), like you say, you're taking down your entire BU anyway as you have no redundancy.

      SAN islands can be dual fabric, as well as single fabric. I'm not sure you're grokking my posts.

  22. Bad way of putting it. by Colin+Smith · · Score: 0, Redundant

    14Pb for 170k employees isn't so much - 83 gigabytes per person. Sorry, this is completely naive. It's a misunderstanding of what an average is.

    Each employee is NOT getting 83Gb of space on the SAN. They might get a few Gb for email. That space is used to store accounts, general business stuff, personal information, credit reports, market information, simulations etc primarily for data mining. Then of course it's replicated to several locations.
    --
    Deleted
  23. Internal Revenue Service by Dekortage · · Score: 1

    A few years ago, I remember reading an article about the IRS (the government tax division) that had seven or eight regional data centers around the U.S. -- each with many petabytes of storage to store current and historical tax data on hundreds of millions of Americans, corporations, etc. I can't find the article now but it seems like *it* should make the list... maybe even top the list.

    --
    $nice = $webHosting + $domainNames + $sslCerts
  24. Discount Web host with scalable SAN by rjamestaylor · · Score: 3, Funny
    Recently I found a discount online web hosting company
    with an unlikely name that offers a scalable,
    distributable SAN, called an HDSAN
    (High Density Storage Area Network),
    for its customers:

    SlumLordHosting.com
    --
    -- @rjamestaylor on Ello
    1. Re:Discount Web host with scalable SAN by solakov · · Score: 1

      LOL, you totally got me on this. I was expecting to read something great, and I was surprised by something so great it even defied time itself! Awesome.

    2. Re:Discount Web host with scalable SAN by rjamestaylor · · Score: 1

      one of the best "gotta have been there to get it" sites I've ever seen. It's either pure brilliance or a complete WTF.

      Excellently executed (no, I had no hand in it!).

      --
      -- @rjamestaylor on Ello
  25. Largest DISCLOSED SANs by mihalis · · Score: 1

    I have no idea how much disk space my firm has, but I did hear an apocryphal tale of installing multiple truckfuls of disks every week pretty much indefinitely (now, of course, older smaller disks are also being removed, but even if it's one for one the service life of enterprise disks means the total is continuously growing). But the firm and the total space can't be disclosed. I'm not trying to make any claims - it could well be smaller than the five mentioned, but the point is nobody knows. I'm sure lots of other firms have very big SANs too.

    1. Re:Largest DISCLOSED SANs by Anonymous Coward · · Score: 0

      I know who you are and what firm you're talking about. If you pre-declared your PHP variables or turned off notices in php.ini you'd cut your disk use by 2/3rds.

    2. Re:Largest DISCLOSED SANs by wlandman · · Score: 1

      Is your firm Goldman Sachs as per your resume on your website?

    3. Re:Largest DISCLOSED SANs by mihalis · · Score: 1

      Is your firm Goldman Sachs as per your resume on your website?

      Not since they let me go in 1998

    4. Re:Largest DISCLOSED SANs by mihalis · · Score: 1

      know who you are and what firm you're talking about. If you pre-declared your PHP variables or turned off notices in php.ini you'd cut your disk use by 2/3rds.

      I've never used PHP personally in my life.

  26. Re:...and why does the article say "Pbytes", "Tbyt by imsabbel · · Score: 4, Insightful

    Because EVERY SINGLE FUCKING story with "TB" and "GB" causes arguments in the way of "this has to be "...bits", the number is too large for bytes" or vice versa even here.
    To avoid missunderstandings, 4 additionals bytes (B) dont seem that much of a price.

    --
    HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
  27. 5 biggest sans? by aapold · · Score: 1
    --
    "Waste not one watt!" - CZ
  28. World's Five Biggest NASs by Anonymous Coward · · Score: 0

    What should be more interesting is a listing of the 5 biggest NASs.

  29. Pebibytes by rukkyg · · Score: 0, Troll

    Is it pebibytes or petabytes? They're 12% apart. That's a big deal. http://en.wikipedia.org/wiki/Pebibyte

    1. Re:Pebibytes by guruevi · · Score: 1, Informative

      I've never heard of anything expressed in Pebi, Tebi or Gibi nor Mebi. A Petabyte is still 1024 x 1 Terabyte which is 1024 x 1 Gigabyte which is 1024 x 1 Megabyte which is 1024 x 1 Kilobyte which is 1024 x 1 Byte which is 8 bit. As soon as you have a 10-bit based computer, you can express your stuff in *bibytes

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    2. Re:Pebibytes by rukkyg · · Score: 1

      Try reading the side of a disk drive. 300 GB on a drive is 300,000,000,000 bytes. When you plug it in to your computer you're short quite a few Gibibytes.

  30. mod parent up by danimrich · · Score: 0

    mod the parent post up if you have mod points...please

    --
    where's all that Karma?
    1. Re:mod parent up by j-cloth · · Score: 2, Funny

      I would, but I'm a grammar nazi and then != than

    2. Re:mod parent up by Anonymous Coward · · Score: 0

      Did you forget something at the end of your sentence?

  31. LDS Church by Anonymous Coward · · Score: 1, Funny

    I swear the LDS church has 20 Petabytes for geneology information.

    1. Re:LDS Church by Anonymous Coward · · Score: 0

      That sounds like a lot for a genealogy database. Or ss table that relates them almost a Cartesian product?!

  32. What I want to know by teslatug · · Score: 2, Interesting

    How do they do backups (especially online ones) and restores?

    1. Re:What I want to know by demi · · Score: 1

      I suspect in at least some of these cases, the SAN is not continuous, and they're actually doing backups to the SAN. Part of the storage mentioned will be the online portion of a hierarchical storage manager or similar.

      --
      demi
    2. Re:What I want to know by Phishcast · · Score: 1
      There are many ways to do online backups of large amounts of storage. One common way is with storage based snapshots. You quiesce your application (i.e. put your database into hot backup mode), take your logical snapshot, and present that snapshot to a backup server. To the backup server, it looks like you're backing up local disk, but in actuality you're getting a point in time copy of your application/database.

    3. Re:What I want to know by The+Second+Horseman · · Score: 1

      You can also be doing scheduled replication at the disk array level - from array to array, and back up from the secondary storage. Also, you have dedicated network interfaces for any backup to tape that isn't going exclusively over the storage network. Finally, there are some really massive tape libraries with robot loaders (that move quickly enough to kill anyone inside the unit when it comes on - detecting debris / obstacles becomes an important task). There are tape library systems that can support numbers like 400 drives, and 40,000 to 50,000 tapes.

  33. I think we need to coin a few new terms by Teilo · · Score: 2, Funny

    Pr0ntab: A score, equal to the amount of time in tenths of a minute, that elapses from the moment a news article is posted to the first comment relating said article to a person's porn collection or viewing habits.

    Pr0ntible: The statistical likelihood that any given article will have a low Pr0ntab score, where 1.0 is the highest score, and 0 the lowest.

    Pr0ntabulary: A time sensitive, categorical table of subject matter, where each category is assigned a Pr0ntible, and said table is organized in descending order by Pr0ntible.

    Example: On today's Pr0ntabulary, the Storage category ranks near the top.

    --
    Mir tut es leid, Menschen daß Einfältigfehlersuchenbaumfolgendenaffen sind.
  34. This site is unbelievable ... by freaker_TuC · · Score: 1

    ... Thanks for raping my computer, messing with my eyes and giving me 5 extra braintumors on top looking to that site!

    heeeeellllooooo 1990! We're back!

    --
    --- I am known for the ones who want to find me on the net. Is that a privacy risk or a privilege? One might wonder..
    1. Re:This site is unbelievable ... by rjamestaylor · · Score: 1

      I though it was more "Heaven's Garish", myself.

      Mmmmm... Heaven's Garish. I think that's the name of that style web (non-)design I'll use from henceforth.

      --
      -- @rjamestaylor on Ello
  35. Gootube? by hedkandee · · Score: 1

    With all those (crappy bitrate admittedly) videos they must have a fair chunk of data lying around

    --
    Up for it.
  36. Biggest =/= Best. JPMChase used Hyperterminal! by Anonymous Coward · · Score: 0

    I should know. I used to work for JPMChase until they outsourced us to IBM.

    I do not know why they want to keep their infrastructure "under wraps" according to the article. I know from previously working there that it is a hideous mess of MVS, AS/400, UNIX, Solaris, Windows NT, Novell, and who knows what else these days. They might have upgraded NT to 2000 by now, but I suspect NT still has a heavy presence. I'm not saying those are crap systems (MVS and AS/400 are incredibly good), but it's the implementation of how they are put together that is a mystery even to those who knew it from the inside, like myself. They were even using an insecure laptop with an outside line to send transactions to BankOne with ... get this...

    Hyperterminal!

    This was all because of their shift from VSE to MVS, that Bank One would not recognize their mainframe's IP address, so this was their workaround.

    From what my sources there tell me, the outsourcing decision was a disastrous decision they made. The place is full of back-stabbing, incompetent managers that care only about the bottom line (much like many corporations that outgrow their usefulness.) In retrospect, their decision to lay me off was probably the best thing that could have happened, as they had no clue how to take advantage of in-house talent. Now I am making considerably more in much better places, and would never even consider going back there, even if they offered triple my current salary.

    Don't be mislead by articles about size. Often, there's much more going on behind the scenes than what they'd like you to believe.

    1. Re:Biggest =/= Best. JPMChase used Hyperterminal! by Anonymous Coward · · Score: 0

      In those environments I believe it is better to "let stuff happen" and then fix it up quickly, rather than trying to build something good. This way you will be recognized quickly as The Saviour Of Our Systems.

      Not really pleasing conditions though. For those who like it so...

  37. Call of Cthulhu players know... by Jim+in+Buffalo · · Score: 1

    Call of Cthulhu players know that your maximum SAN can never be more than 99 minus your Cthulhu Mythos Knowledge score.

    --
    This sig, aah-ah, is comin' like a ghost-sig...
  38. Re:...and why does the article say "Pbytes", "Tbyt by TheSkyIsPurple · · Score: 1

    > 4 additionals bytes (B) dont seem that much of a price.

    Sorry, had to say it... but are you really sure that "ytes" is beings represented on my machine in bytes? My browser might actually be translating into a unicode representation (or some other storage) that uses at least 2 Bytes per character. =-)

  39. Re:...and why does the article say "Pbytes", "Tbyt by raddan · · Score: 1

    Because everybody around here thinks that means "peanut butter" and "turkey butter", respectively.

  40. I used to work at the NASA facility mentioned.... by jCaT · · Score: 1

    Back in 1998, the amount of storage they had was pretty impressive too. It took rooms and rooms to do it, but if I'm remembering this correctly they had about 5 terabytes of disk online and close to 5 petabytes of tape robots. It was a pretty slick automated system- everyone had accounts on a main fileserver, and files that had not been accessed in a certain amount of time were written to tape. If you were to try to grab a file that was on tape, it would fire up the robot, transfer it off the tape, and give it to you- all seamlessly. It was a little slow of course but you never had to do anything- it all appeared as though the files were in your home directory the whole time.

  41. this list is quickly going to be irrelevant by xcjohn · · Score: 1

    As the next round of NSF machines come online, this list will quickly be outdated. The least of these machines will have at least 1PB of *scratch* space. Tape libraries are going to be completely necessary rather quickly. SDSC's 19PB might seem like a lot, but we're constantly expanding that to deal with how quickly users dump data into it.

    --
    ~~~ They call me Little John, but don't let the name fool you...in real life I'm very big.
  42. NSA SAN by shis-ka-bob · · Score: 1

    I guess that the NSA counter would be stored that on the NSA SAN. Hmmmm, NSA is an anagram of SAN. A SAN is an anagram of NASA, the government is everywhere in this. Perhaps all SANS have a back door for the NSA, some sort of _NSAKEY. Hey, _NSAKEY is an anagram of SNAKEY, if you don't think that something that is snakey isn't evil, you have better review the history of Adam and Eve. Maybe all SANs are controlled by the NSA. I think we all need a personal Gauss box (aka tinfoil hat) to prevent the NSA from monitoring our brainwaves from the now ubiquitous SANs that surround us and that share their data with the NSA's master SAN.

    --
    Think global, act loco
  43. In their pockets by LeeMeador · · Score: 1

    I know ... it's not a SAN. But a PB isn't what it used to be.

    When a large university graduates (say 8000 people) you have to figure that in that stadium, auditorium, whatever each graduate has 20GB in their pocket between the phone and the music player plus, say, 3 family members with video recorder or cameras between 1 and 10 GB. Let's say it averages 2 GB.

    (20 GB + 2 GB * 3) * 8000 graduates = 208 TB

    I know its a quick estimate but that's a lot of storage and that's just what they are carrying around in their pockets.

  44. CERN? by mikelang · · Score: 1

    CERN has 8 petabytes on just a CASTOR filesystem... Not to mention all the data on EGEE grid. Poor guys, they only quoted those _inferior_ commercial solutions :-). Forgot that scientists have more guts... And of course: don't you remember that Google has a SAN also?

  45. FFS, what he described isn't even a SAN by ToasterMonkey · · Score: 1

    Stop bitching about the numbers and go look up fibre channel.