Sun Unveils Thumper Data Storage
zdzichu writes "At today's press conference, Sun Microsystems is showing off a few new systems. One of them is the Sun Fire x4500, known previously under the 'Thumper' codename. It's a compact dual Opteron rack server, 4U high, packed with 48 SATA-II drives. Yes, when standard for 4U server is four to eight hard disks, Thumper delivers forty-eight HDDs with 24 TB of raw storage. And it will double within the year, when 1TB drives will be sold. More information is also available at Jonathan Schwartz's blog."
24TB... thats almost enough to hold all my pr0n!
*''I can't believe it's not a hyperlink.''
This is perfect for the space constraints applied to many server rooms now days. I wonder how they managed to control the heat output. My laptop only has one HDD and it gets pretty warm. I am very impressed that (according to Sun) costs $2 per gig! As always, I hope it works as promised.
Information wants a fueled airplane waiting at the hangar and no one gets hurt.
I've been talking to the wife about getting a NAS for the house - but now a 1 to 2 terabyte system seems so...puny.
;).
Hey, honey - remember how I said I wanted to store *all* the movies on the server? Get a load of this
52 Weeks, 52 Religions with John Hummel
That's the Bambi Cooling Add-On system.
...but how good is it at repelling the antlions?
Anagram("United States of America") == "Dine out, taste a Mac, fries"
Thumper? I hope the sand worms stay away...
Heat output from all those drives is a concern, but if you look at the photo on the ponytailed hippie's blog, you can see that the box has 20 fans in the front and probably more in the back. Makes you wonder what the thrust-to-weight ratio is. This box is going to make a screaming database server. 2GB/sec throughput to the internal disk beats anything out there, -and- the customer doesn't need to invest in SAN hardware to do it.
Orly Owl: Why, don't you know? He's twitterpated.
Thumper: Twitterpated?
Orly Owl: Yes. Nearly everybody gets twitterpated in the Thumper room. For example: You're walking along, minding your own business. You're looking neither to the left, nor to the right, when all of a sudden you run smack into a pretty rack holding 24 TBs of pretty racks! Woo-woo!
Could you please put the link to your stupid website in your sig, so those of us who are uninterested don't need to read it a dozen times in every story? KTHX...
If my math is right... that's 50,331,648MB / 295,734,134 (US Population) = 174.27683 kilobytes for every man woman and child in the US. In one box!
Always be polite.
Is it? I recognize that the system itself is impressive. But to buy 48 750GB SATA-II 3.5" drives costs around $24,000, and gives you ~36TB. If you notice the pricing, it becomes obvious SUN is drastically over-pricing the drives. The only diff I noticed at a first glance between the $40k and the $90k option was the size of the drives. Perhaps I missed something...
If I didn't, only a fool would buy the more expensive version. Just go in for the cheap array, and purchase 750GB drives yourself, re-sell the original 48x250GB ones, and you'll save yourself a rather large sum of money.
It would be nice if the system had a setting where you could transparently specify a redundancy factor in sacrifice of capasity. For example, I could set a ratio of 1:3 where each bit is stored on three separate disks. This ratio could increase to the number of disks in the system. And of course, little red lights appear on failed disks, at which point you simply swap it out and everything operates as if nothing happened (duH). Sure, we have a degree of this already, but managing redundant arrays is still a very manual process and when we start talking about tens or soon hundreds of terabytes, increased automation becomes a necessity.
Join Tor today!
Why does everybody here get so up with "The HEAT!!111".
Its 48 hds in a 4U case. 48HDs is about 600W under full load.
If you compare this to the fact that there are dual-socket - dual core servers out there that push 300W through a 1U case, thats nothing.
Also, a 4U case allows the use of nice fat 12cm fans in the front, while the horizontal backplane allows for free airflow (in contrast to vertical ones like used before)
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
Actually, software RAID is an advantage, performance-wise.
The old-time "big-ticket" was checksum calculation, but that is now an "also-ran". Distributing the i/o? Software can do it as well as hardware.
Both hardware and software have to be familiar with the blocking factor.
Where software wins is that it can be aware of, and skip reading to fill blocks if the block has never been used (or is not PRESENTLY in use). Which hardware RAID controllers cannot avoid doing.
The idea is to tie the RAID more tightly into the filesystem.
As to lower speed drives -- did you count the heads? Each is active at the same time. Yes, an individual i/o would complete faster with 10k or 15k spin, but the total throughput is based on the number of heads. For RAID5, reading multiple blocks will give you pretty much all the read performance you can stomach.
Write performance for an individual write operation would be improved; but generally application buffering deals with it. The tradeoff is number of heads, spin rate, and heat. The right balance? For you, write performance up, and, keeping heat constant, number of heads down (I presume that you are dealing with transactional loads, with commits). For me? tends to go the other way (my workload is general storage, with a bit of database).
As always, YMMV
Ratboy
Just another "Cubible(sic) Joe" 2 17 3061
The (redundent) power supply is rated at 1800Watts which implies about 6300BTU/Hr heat out of the box. For 24Tb and a server that is remarkably low.
This box is 100% designed to be used in mutual full advantage with ZFS. Thumper is what you would call a modern RAID array, as ZFS in this case blurs the destinction between hardware and software RAID. The CPU and memory horsepower is there for RAID-Z.
From this box, one can serve out file systems with NFS and/or SMB/CIFS (aka a traditional NAS), and in future releases of Solaris 10, also serve out LUNs over iSCSI and FCP while having all that data backed by the performance, reliability, and features of ZFS. The only thing it's missing is a consolidated, centralized CLI for manipulating storage, a la NetApp and ONTAP... but all the requisite pieces are there to turn Solaris, and especially Solaris-on-Thumper, into a NetApp killer at less cost.
We were waiting anxiously for this item to be announced, because we have about 100TB of storage (now) and add about 8TB per month. Perfect customer for these.
d =2348
But, unfortunately, they're not quite as cheap as I had thought. (Friend on the inside thought Sun was going to price them at $1.25 per GB, not $2 per GB)
Instead, we've been using these. Very good cooling:
http://www.rackmountpro.com/productpage.php?prodi
32 SATA-II 750g drives = 24TB, same as the Sun X4500, but for only $16,000 for the entire system (chassis, mobo, ram, drives) instead of $70,000 for the Sun Thumper. Huge difference especially if you're ordering many of them.
This fits nicely with Sun's new ZFS file system.
ZFS blurs the traditional boundaries between volume management, RAID and file systems. All disks are added into one big pool that can be carved out into either the native ZFS filesystem format or virtual volumes that can be formatted as other filesystem formats. It has many other interesting features like instantaneous snapshots and copy-on-write clones.
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
Or the complete text content of the Library of Congress, coupled with 6 Academic Research Libraries, with the capacity to dump the equivalent of 2 pickup trucks worth of books every second . In a 4U rack. For the price of several cars. Now that's my type of bookshelf system!
Yeh dern kids today are gawdamn spoiled. Back in mah day, we didn't have these FANcy tahrabyte arrays! My TRS-80 had 128K -- that's right, KAY-uh -- on a floppy! And the operating system took about 40K of that, leavin' me about 85K left! And I was happy to have it! I had tuh use a paper hole-puncher and cut a write-protect tab so I could flip the floppy over tuh get more space! Damn kids these days... -mumble- -grumble-
Sometimes it's best to just let stupid people be stupid.
I'm glad that they are at least offering a server in this class with 3.5" disks. The 2.5" 10K RPM SAS disks that are on the x4100 and x4200 are just junk pure and simple.
Contrary to popular belief, life is not a bitch. It is far far worse.
You can also buy commodity 3U server chassis that hold 16 drives. We built a number of these as ROCKS cluster head nodes for Los Alamos National Labs. Two 3ware SATA raid cards running 8 drive RAID 5 arrays, bonded together in software as a RAID 0 array. Decent performance relatively inexpensively. Which is after all what the I in RAID is supposed to stand for. If you do this, get the SATA backplane that uses 4 Infiniband cables instead of 16 SATA cables and the cards that support that. I've done it both ways, and trust me, your knuckles will thank you for the four-fold reduction in cables. As an interesting aside, the chassis we used has a space up top for a 2.5" laptop hard drive to use as the system disk. It's is the only way to fit a system disk in that chassis.
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
It might have allowed for 12cm fans, but it you had looked, you would see that they are using 10 much smaller fans. Ick.
Meanwhile, the x4600 (8 dual core Opteron system) does apparently use 2 12cm fans.
With all those disks, I suppose it might not make much difference, but I would have rather seen them using 12cm fans on the x4500 as well.
I'm a loser baby, so why don't you kill me.
Of couse it is software RAID. Every single last RAID is software. OK you might think there is such thing as hardware raid but if you look at the controller card you will find some kind of computer and some RAID software running on it. The only difference is that the software is burned into ROM on the card. If you buy this RAID system from sun you will never see the dual Operon or have need to know what software runs on it. You should think of these two Operons are a very, very powerful controller card.
Sun typically worries more about redundancy than noise. The 10 small fans are hot-swappable and run at ridiculous speeds (and yes, sound like a A320 revving up for takeoff), but I bet the thermal budget allows four of them to be dead at any given time.
-30-
If you liked the concept of the e450, you'll like this box.
If you are interested in storage consolidation and increasing utilization while reducing storage islands. This isn't for you.
With 48disks, you'll want protection... all implemented in software raid. So you do raid-5, probably create raid groups of 12 disks? 8 disks? as the number of disks in the raid group goes down, the amount of disk you waste on parity, and the amount of CPU cycles done on calculating parity goes up.
As the industry moves to FC boot and iSCSI boot to alleviate the need to stock disk drives from 15 different vendors, this is an interesting idea for those who don't want to have a raid array. But in most shops, huge internal storage is sooooo '90s.
How do you replicate this beast? VeritasVolume Replicator. Serverless backup? Nope.
So you're keeping those disks in a bucket and cooling them with slave-girl driven fans or something?
I think you're missing one thing. Where would all the drives go? On the floor? Suspended in mid-air? I'd like to see you get a Chassis+PSU+rails for $1000 that holds not only your Opteron motherboard, but all 48 disks as well. Plus, with that many drives, cooling, a *real* power supply is required (at 15W per drive, that's 720W right there, plus the Opterons, memory, fans, etc. and you're talking about 1100W - not your average power supply).
Another problem is vibration. If you don't have a good mounting scheme for all these disks, cross-drive vibrational issues will adversly affect not only performance, but MTBF as well.
Lastly, what about performance? I've seen this machine sustain raw access to the disks at 3GB/s.
That's *bytes*. Through the filesystem (ZFS), you get close to 2GB/s if you're careful. The machine has 10 fully-independant PCI busses inside - not a bottleneck in sight. Let's see the PCI bridge of your $500 mobo take that.
Once you do all of this, you're not $1/GB anymore, you don't fit in 4RU anymore, and you certainly
won't get the same performance. So I think that to build a similar box, there's no way you can
significantly beat the price. Plus, you have to remember that almost nobody pays Sun's list price.
Most VARs that sell Sun gear will give you a good discount. Comparing Sun list price to We-won't-be-here-next-week computers is not a valid comparison, either.
Just curious how you are going to hook up those 48 SATA drives to your 6 8-port SATA hbas? Where are you going to find a $1000 chassis that fits 48 drives? In one MAINTAINABLE configuration? As far as I know (and could be wrong) SATA is not an external bus. The SATA cards you mentioned would have to run outside of the box to another unless you find that 48 drive chassis I mentioned. Even if you ran a long enough wire to get to the other box mounted above your standard opteron 1U box, there's a lot of slack that has to be on that connection unless you want to disconnect everything just to pull the server out of the rack for maintenance. There are limitations to the SATA cabling you're not taking into account. Also add a couple more power supplies on here for each of the boxes that hold you drives. Cooling is also an issue that tier 1 vendors model very seriously before they put together a kit. Most home baked kits have either dangerous hotspots that effect reliability or are overcooled which wastes money. You should also keep your drives mounted with dampening to avoid vibrations from each other which can cause early drive failure.
There's more to this than simply buying parts. This appears to be another viable option in the storage arena for apps that need very large local storage. The problem with using it for NAS storage is that Solaris has historically been pretty slow compared to NetApp. ZFS could improve the score here with simplified administration if anyone actually understood how it worked.
My $.02
_damnit_
It's my job to freeze you. -- Logan's Run
As to backup and replication, think zfs: http://www.sun.com/2004-0914/feature/ Lots of folks are seeing this as simply a 2 socket server with lots of disk. With zfs it's more like a huge disk farm with an open, hackable interface and nice manners at the back end.
Organization? You must be joking..
The other hidden advantage here is storage density. If for some reason you needed 1PB of data storage in as small a space as possible, this is a big win for you. You would need about 45 of these servers to get 1PB of capacity. That would fit nicely into less than 5 racks of space, with room to spare for your networking and monitoring gear. A 1PB EMC Symmetrix is going to be a _LOT_ bigger.
No other storage platform has higher density (that I am aware of). Power use is good but not amazing (look at Petabox) and price is excellent for the size, but loses out as you scale.
Overall, I am stoked on them and want to try using them as backup servers. Attach one or two LTO3s and a couple 10gbs ethernet cards and you have everything you need! You can spew data over the network from the clients and then spend the whole day making very good use of your tape drive resources.
I can give you a few reasons they might. Having been through some hardware RAID nightmares I have first hand experience with a few of them.
HW RAID makes you dependent upon the manufacturer of the card both for RAID implementation and for drivers. We once a a couple hardware RAID cards managing a large (at the time) RAID0+1 array that would occasionally glitch and fail a drive or two (or occasionally every drive on the controller). The driver and monitoring daemon wouldn't report anything until a second drive failed. Despite battery backup on the card cache, a single drive failure would often corrupt the data on the mirrored drive. The manufacturer was nowhere to be found when requesting updates or bug fixes.
We eventually switched to software RAID and found that in addition to making the array reliable it improved our performance. This was in part because the 6 CPUs on the machine were significantly faster than the 25MHz i960 managing the RAID cards. We could also mirror across controllers on the 4 separate PCI busses which gets rid of a major bottleneck (the I/O on a PCI bus can be easily saturated by a few drives)).
There are other benefits to being able to RAID across controllers. A RAID controller is a single point of failure. If a controller fails on a HW raid system, your array goes down. On SW RAID (done properly) a single controller can go away without a problem.
The most reliable storage system we have (a Network Appliance rack) is entirely software RAID. (RAID 4, a number you don't hear often).
Support SETI@home
Japan's TSUBAME (see the system at http://www.gsic.titech.ac.jp/) is made up of both x4500 and x4600 systems. I've been in the Thumper room - it's loud as a jet engine in there and cooling is an issue, but only because the room is old. It's an impressive set-up, and made to be upgraded. They've got 1.1 Petabytes of storage now.
Some guy claims even a DVD can hold 50TB :
Max.
A 3U chassis that handles 11 drives is currently $140 on PriceWatch. Do the math.
/. It's a discussion forum not a whitepaper or thesis. Thank you for correcting me and providing useful information to everyone.
OK. Your works out to be 3x larger than the 4U Sun's comes in. Did you have some other math for me to do?
As far as External SATA, I was wrong. I'm OK with that sometimes in a congenial conversation. It's little different to say "I may be wrong" than posting with IANAL. This is
Yeah, cabling arbitrary lengths of drives together has been easy since SCSI2 Fast Ultrawide. What you do is use seperate cables every few drives. Magic.
Yes, that'll look nice out the back of the rack. How many cables?
put silicone glue in for the inner rails
You have a lot of spare time.
Er, speed is one of Solaris' big selling points, if you'd actually look before announcing.
Well, I worked at Sun for 7 years and know a few things about Solaris. If you think Solaris is fast at NFS compared to NetApp, well I beg to differ.
Yeah, uh, ZFS takes like five minutes to set up. It's trivially simple. Why would you pretend otherwise? Have you even touched it?
Yes, I spent two years explaining and demonstrating it to customers. It's a huge change and takes a big mindset change for those who are used to Vxvm or other LVMs.
Also, if your going to nitpick the use of effect/affect:
1) Affect, not effect.
2) Cooling a 48-drive box is going to cost less electricity than running a single CPU. These people put down a thousand bucks a month just for the privelege of being in a controlled room. Let's have a sense of scale for things, please: fans just aren't that much power.
At least spell "privilege" right in your nit-picking.
Also, if we're going to have a "sense of scale" here, there is a huge cost difference between 12 RU and 4 RU in datacenter costs.
_damnit_
It's my job to freeze you. -- Logan's Run
OpenSolaris iSCSI target support is underway.
I use Friend/Foe + mod-point modifiers as a karma/reputation system.
The disks would go in the chassis (see my itemized list). You may not know it but Sun is not the 1st company to use a chassis with vertical bays. Here is one example among many. The price would be more likely around 2 or 3 grands by the way, instead of 1 grand. But anyway this doesn't change the fact that this Sun box is way overpriced, even with a good 40% discount.
Regarding the mobo, just pick one with two AMD 8131 or 8132 PCI-X bridges. This will give you 4 independent PCI-X busses. The two PCI-X bridges would have to be on 2 different HT links in order to not dangerously approach the theoretical one-way data throughput limit of 3.2 GB/s of one 1600 MT/s 16 bits HT link. The two PCI-X bridges could be either connected to different CPUs or to the same CPU because the Opteron XBAR _can_ easily handle the ~3 GB/s you speak about, it has been designed to support 19.2 GB/s of HT traffic and even more with the recent upgrade to 2000 MT/s ccHT links. Now with the 4 independent PCI-X busses, you could put 4 SATA HBAs on the 1st and 2nd busses, and 2 HBAs on the 3rd and 4th busses. This way the first 2 busses will run at 100 MHz and the 2 others will run at 133 MHz, giving a practical throughput of 3.4 GB/s (2 * (100 MHz * 64 bits / 8) + 2 * (133 MHz * 64 bits / 8), and assuming a 90% efficiency as found on most PCI/PCI-X busses), this is enough to handle the 3 GB/s you are speaking about. There are plenty of single AMD 8131 mobo on the market right now starting at $250. I am sure you can find one with two AMD 813x for $500 max.
Now when I think about it you could even use SATA port multipliers in order to reduce the number of HBAs, allowing all busses to run at 133 MHz. I am aware of 12-port and 24-port SATA HBAs (Areca comes to mind) but those are outrageously expensive and are not necessary to handle all that throughput. My experience and those of my friends playing with high-end enterprise gear prove that _very_ simple and inexpensive PCI-X SATA chips such as the SII3124 or Marvell 88SXxxxx are way sufficient to handle the max combined read throughput of any number of disks attached to their SATA ports. The reason being that the designers of such chips have come up with a simple and performant hardware interface optimized to reduce the CPU load. I know for a fact that the SII3124 design is somewhat close to the AHCI spec which is the best example of a performant SATA hw interface.
So I _do_ believe that it is possible to build a $13-14k server with 48 SATA disks in 4U offering ~3 GB/s of raw read throughput. I don't understand why so many people refuse to believe that, especially since other posters in this thread have pointed out that some vendors are already selling similarly priced servers !
Bullpucky. Maybe on your planet. A PC 4U NAS box in my world holds 24 SATA HDDs. Oh, you mean a standard 4U Server... Which usually means a quad-CPU box with 4GB of RAM and a couple fugly FC controllers. See, your problem is that thumper is for Storage in which the 4U form-factor is for drives, and the standard is more like 12 to 24.
</flame>You don't.
You run ZFS, which does not require defragging.