IBM Introduces Petabyte-Capacity 'Storage Tank'
statikuz writes "Wired is reporting that IBM's new data storage system, codenamed "Storage Tank", uses software to link servers in multiple locations over an IP network, creating a sort of mega-server capable of connecting thousands of computers and processing multiple petabytes of data. 'Storage Tank has the potential to become to an organization's data what the Dewey Decimal system is to a library,' said Dan Colby, general manager of storage systems at IBM. 'It reinvents the way information is filed, managed, shared and accessed within an organization.' CERN is currently using a beta version of the system to store data from the Large Hadron Collider particle accelerator, which is being used to recreate the first moments of the Big Bang. IBM expects Storage Tank eventually will be able to handle 10 to 20 terabytes of CERN data. Get your own 'starter configuration' for only $90,000!"
Storage Tank comes extremely late - it was first promised to come out in early 2001.
According to this article at The Register, IBM failed to provide such features of Storage Tank as, "link servers and storage systems from all vendors, making it possible to view and access a file from any system. ". Instead, it will only support AIX and Windows platforms starting this November. Support for other Unix versions, including Linux, is expected not earlier than mid-2004.
I always thought a good idea was multiple RAID storage across the entire network. So all the files are spread throughout the network. With multiple copies so if two or three computers go down, that data is not lost...kind of a cross between SAN and RAID.
open source solution that already stores 100s of terabytes that is called LUSTRE... LUSTRE is already deployed in a few live aplications run by the NCSE (hope I remembered that right)....
At the symposium this year, the fellow mentionned they were working on scaling to petabyte storage for next year.
that "10-20" terabytes line has to be a typo.
I spoke w/ some people from CERN regarding their CASTOR HSM, and a few years ago they were up in the petabyte range already. By now, they're probably sitting at at least a few hundred TB online, and probably 5 PB offline, as a conservative guess.
IBM's been doing GPFS filesystems in the > 50 TB size, w/ > 1 GB/sec. throughput for years. That, and even's IBM's mid-tier FAStT products can confortably carry 12 TB on one dual-controller storage head.
Still, further abstracting the issue of locality is very exciting stuff. I'd be interested to see exactly how they go about doing it, and if it's anything that you can't get w/ Lustre when it's ready.
PC moderators can suck my White pierced, tattooed dick. If you think pride == hate, s/dick/Aryan meat mallet/g.
I will not be pushed, filed, stamped, indexed, briefed, debriefed, or numbered. My life is my own.
I know you were only joking, but seriously it bothers me that in this day and age we still need a defrag command.
There have been "grown up" filesystems on UNIX and Linux for years -- I believe even extfs managed defragmentation on the fly.
That NTFS on Windows still just leaves fragmented files lying around until you manually ask a program to fix them is frankly outrageous.
10 to 20 Terabytes of data is what the LHC collisioner is going to generate each second while it is running. CERN is expecting to generate at least 5 petabytes of data per year.
It should also be noted that CERN is a large user of lower cost large storage arrays based on 3ware cards, but those won't scale to what the LHC will require.