Write-mostly workloads to a bunch of consumer grade disks will have errors that you may never detect.
At Backblaze, we try to pass over the data about once every two weeks. We re-read it from disk, recalculate a SHA1 checksum to make sure there wasn't any bits flipped or lost. It is my (informed) opinion that *ALL* hard drives and *ALL* configurations will have errors you may never detect unless you do this. You can't ever trust any file system.
I think many people assume RAID does this checksumming, as far as I know RAID handles entire drives failing, but it doesn't really have anything to do with a drive that has begun to fail and is starting to flip a few bits here and there but the drive is still mostly responsive.
Personally, I'd really recommend RAID6 with at least 2 parity drives. But always remember, RAID is *NOT* backup. RAID doesn't protect against user stupidity like backup does. RAID does not protect against theft. You don't have to use Backblaze for backups, but for goodness sake USE SOMETHING.
After all this research, Backblaze still pick the highest failing drive.
Disclaimer: I work at Backblaze. Every month we ask a list of about 20 suppliers for their best price on a variety of drives. There is a little spreadsheet we have that kicks out which drive to purchase based on those prices and drive failure rates. Even if Hitachi is the very highest reliability in our application, it only justifies a SMALL price premium because when one drive dies, we don't lose any customer data. It saves our datacenter IT team 15 minutes to *NOT* swap a drive, so that's worth 15 minutes of salary to us, but not more.
Disclaimer: I work at Backblaze. I object to the marketing term "Enterprise grade", it is confusing, and I'm not even sure they have the attributes you think they have. There is a completely different blog post Backblaze did about "Enterprise vs Consumer Drives" which comes to the conclusion Enterprise isn't better: http://blog.backblaze.com/2013...
I know of one use case that tripped me up. It turns out CFLs cannot handle being outside in the cold, I killed several in an outside porch light, some failed within a month. It drove me nuts until I figured it out.
I've used many CFLs in the past, some are still here in my living room, so I'm not biased. Recently I'm a much bigger fan of LEDs, which kick ass outside, LEDs last longer in hash climates than incandescents by a long shot. Can't we all agree we need to ban CFLs and skip directly to LEDs and be done with it?
I'm not for or against Obamacare, as a software engineer it simply does not affect me. (I kept my same health insurance I've always had.)
> there has been no law that forced citizens to sign up for.... Jail time
Nonsense. Medicare/Medicaid/Social Security/FICA/Unemployment Insurance - all these are itemized on my pay stub. Again, I am not for or against Obamacare, but there are lots of other government programs I am forced to participate in.
I don't know why you were modded down. I believe most banking is legally required to retain every customer transaction for 7 years. What does it exactly mean to "delete your Wells Fargo Online Account" when they are legally required to maintain your records?
If at any point your relationship involves a financial transaction, that company might have a valid interest in holding onto the receipts through at least the next year's taxes, and may have a responsibility to hold the records for longer.
> I've never given correct information to any website to start.
My electric power bill, my garbage, other services are all website paperless situations. So I give websites correct info in some situations.
I also buy things online all the time from places like Amazon, you have to give them your address and name or the stuff won't come to you.
The thing that bugs me is when they mail catalogs to me ENDLESSLY. Paper catalogs. I mean, I browsed their website and bought their product, so I know the web exists, why are they killing trees for goodness sake?! I belong to a service that helps me unsubscribe from those, but some of these catalogs are dang hard to stop.
> Nobody at the C-level takes responsibility for anything.
I'm not so sure, I think it matters which C-level position you are talking about, some are hotseats...
In small to mid-size businesses (1,000 employees or less) I think it's super common to fire your VP of sales after 2-3 bad quarters and fire your CEO after 4-5 bad quarters, regardless of what situation is to blame. The CTO is almost immune from taking any responsibility, and unless there is embezzling I'm pretty sure the CFO is a cushy job with great security and awesome salary where your underlings do all the real work.
Regardless of whether I like them as human beings, I have been impressed by the risk taken by VP of sales at the high tech startups I've worked at. These men and women are compensated 50% by commission, so early on in a startup (in the era of low sales) their salaries are shockingly low and if sales don't pick up they are personally blamed, even if the product is young, buggy, and has better competitors in the market. VP of sales is a hot seat, we went through 4 in 4 years at one of my previous companies.
The internet existed in 1984. Some of us old timers still remember when AOL opened a gate and let their users into the readnews internet community, everything started going downhill about then.:-)
In a another example, my HP Veer Smartphone (it's the Palm Pre line) has a magnetic charging cable that can ALSO carry data and audio!
Seriously, the HP Veer hardware was nicely designed, but the software is a train wreck. I still can't understand how the iPhone doesn't have a MagSafe recharge option, but my HP Veer does?
I own two, and although they look goofy when turned off, I've been happy with the amount of light it puts out and the color.
BTW, I was a hold out for a long time, I stock piled "dorm burner" halogen stand up lamps for years with bootleg 600 watt bulbs (now banned), I still miss the incredible light those things generated. I hate CFLs, I've broken a couple and the mercury cannot be good to inhale. I'm going LED, even if it costs hundreds of times more than CFL. I just wish the manufacturers would increase the lumens output.
very common for multiple drives in an array to fail within a short time window, due to shared environmental problems
Exactly. We had one interesting incident where in the middle of the night, 3 pods right next to each other in a rack all went berserk and all their RAID fell apart. That's 135 drives all at once (3 pods each with 45 hard drives). We reassembled them all, and the VERY NEXT NIGHT at the same time it happened again. We moved all three servers to different ends of the datacenter -> and finally figured out which server was causing the problems. The fan bearings on a fan were going bad, and when the fan came on it vibrated the entire cabinet. We have "nightly cleanup" jobs that run to verify data integrity and delete files we no longer want, this was enough load to cause the CPU to heat up enough to trigger the bad fan.
I'm not sure what you mean by "turnover"? If you are asking how many customers we have, I apologize but I'm not allowed to release that number (not my fault, I would post it on our homepage with a live number if they let me!)
But I was mostly joking, I think by "major" UnknowingFool meant the largest 4 or 5 companies on earth like Google, Facebook, Apple, Microsoft and maybe Yahoo. I assure you that Backblaze is in no danger of displacing any of the members of that list.:-)
This is a great point. Five years ago Backblaze started with 1 TByte hard drives. Now we are deploying 4 TByte hard drives. The power consumption is about equal. So there is a moment in time that it is worth buying new 4 TByte drives and migrating data from the 1 TByte drives and throw away the 1 TByte drives JUST TO SAVE MONEY ON ELECTRICITY.
Our electrical bill is about $45,000 / month right now. There is a reason Google and Yahoo built those massive datacenters up along the hydro electric 3 cent/kWh Oregon/Washington border. And it's all about total cost of ownership, and EVERYTHING is on the table.
Trying to read a damaged sector is less reliable than reading the undamaged redundant copy.
You're thinking about it wrong. You always want the maximum amount of information from every drive, you can choose to use that information however you like, I don't want "Enterprise" drives that won't try hard to get every last bit.
Here is an example: We have had problems reassembling / resyncing RAID arrays because one stubborn drive pops out and fails too easily (we run two parity drives - so if you are already down 2 drives a 3rd stubborn drive is a bummer). If the drive would just stay in and try harder, we could get through that particular operation. Backblaze then adds it's own end-to-end SHA-1 on every file - trust us, we'll absolutely know for certain whether or not we recovered the file accurately or not from that particular RAID array or not. But until we reassemble the RAID array and get the file system back online, we can't even check what we are holding. Fighting with it costs us IT time. Again-> we have no performance problems at all. I know this is hard for some organizations to grasp when you never seem to have enough IOPS. But the nature of online backup is not like the nature of your billing or account info database.
I'd happily pay 2x or 3x the money to get 20x the write endurance.
That only makes sense if you are hitting the write limits. If the drive dies because the bearings wear out after 5 years of spinning regardless of the number of writes, you have just paid 3x the money and gotten exactly zero benefit.
Enterprise drives typically range from 18000rpm at the very high end...10K rpm probably the most common for bulk storage
Backblaze pays something like $45,000 / month in our electrical bill. We vastly prefer "green" drives that spin slower and use less electricity. There are many, many "Enterprise" applications in the world that are not bottle necked on spindle speed (like backup and Shutterfly-type big-data-rarely-accessed), those enterprises deserve slower drives. I guess I object to using the word "Enterprise" to describe "Fast" - why not just mark your drive as 15,000 RPM or 7,200 RPM and be done with it? No need to add the pointless label "Enterprise Drive".
SMART reporting is much more consistent for enterprise drives
No way. All hard drives do SMART reporting. Sometimes the "bridge" between the processor and the hard drives won't pass the information, so a cheap USB enclosure might be hiding the hard drive SMART stuff from you, but that isn't the hard drive's fault. In fact, we have an expensive Dell drive shelf with an LSI (?) controller that hides our enterprise drive SMART stats from us, very annoying. There is no correlation between "Enterprise" and "SMART reporting".
some manufactures are intentionally disabling typical enterprise firmware features on the consumer models, drive commands that are helpful for hardware raid
The whole concept of RAID is that it is a software layer on top of all the cheap drives. RAID doesn't require any interesting instructions. Pretty much needs to write data to an individual drive and read it back later.
I wouldn't be surprised if usage patterns over 5-10yrs resulted in a significant divergence.
Time will prove you right or wrong, we plan on updating and releasing these numbers every few years. Stay tuned....
Backblaze happens to use software RAID6 - standard Debian Linux, we use the built in mdadm tool. Our current pods have 8 GBytes of RAM, so I guess they could theoretically use all of that (and swap) instead of using "crummy RAID controllers with no memory to speak of".
It all matters what you value - reliability or performance. EITHER ONE is valid for companies, you can't say every "Enterprise" wants drives that error faster and successfully get the data back less often. Backblaze is a company, we value reliability way way WAAAAAAAY over performance. We want the hard drive to take 90 seconds and give us the data - heck, take a full 3 days to get the data back, we'll wait, so will our customers. We have no performance problems at all - customers are extremely happy getting a successful restore FedEx'ed to them in 48 hours (one of the restore options is a $189 3 TByte hard drive sent to you anywhere in the world where you keep the hard drive).
I totally agree that "bureaucracy affects IT decisions". In a previous company we sold spam blocking software (we were the good guys) but our customers asked us to provide the software and hardware in a bundle because they had a hard time convincing their management to purchase stand alone computer hardware. So we pre-bought a PC clone, marked it up by a FACTOR OF 4 (for our trouble), put a sticker on the front with our company name and the IT guys happily passed the price on to their managers who happily signed the P.O.
Write-mostly workloads to a bunch of consumer grade disks will have errors that you may never detect.
At Backblaze, we try to pass over the data about once every two weeks. We re-read it from disk, recalculate a SHA1 checksum to make sure there wasn't any bits flipped or lost. It is my (informed) opinion that *ALL* hard drives and *ALL* configurations will have errors you may never detect unless you do this. You can't ever trust any file system.
I think many people assume RAID does this checksumming, as far as I know RAID handles entire drives failing, but it doesn't really have anything to do with a drive that has begun to fail and is starting to flip a few bits here and there but the drive is still mostly responsive.
Personally, I'd really recommend RAID6 with at least 2 parity drives. But always remember, RAID is *NOT* backup. RAID doesn't protect against user stupidity like backup does. RAID does not protect against theft. You don't have to use Backblaze for backups, but for goodness sake USE SOMETHING.
After all this research, Backblaze still pick the highest failing drive.
Disclaimer: I work at Backblaze. Every month we ask a list of about 20 suppliers for their best price on a variety of drives. There is a little spreadsheet we have that kicks out which drive to purchase based on those prices and drive failure rates. Even if Hitachi is the very highest reliability in our application, it only justifies a SMALL price premium because when one drive dies, we don't lose any customer data. It saves our datacenter IT team 15 minutes to *NOT* swap a drive, so that's worth 15 minutes of salary to us, but not more.
Disclaimer: I work at Backblaze. I object to the marketing term "Enterprise grade", it is confusing, and I'm not even sure they have the attributes you think they have. There is a completely different blog post Backblaze did about "Enterprise vs Consumer Drives" which comes to the conclusion Enterprise isn't better: http://blog.backblaze.com/2013...
> CFLs that die faster than incandescents
I know of one use case that tripped me up. It turns out CFLs cannot handle being outside in the cold, I killed several in an outside porch light, some failed within a month. It drove me nuts until I figured it out.
I've used many CFLs in the past, some are still here in my living room, so I'm not biased. Recently I'm a much bigger fan of LEDs, which kick ass outside, LEDs last longer in hash climates than incandescents by a long shot. Can't we all agree we need to ban CFLs and skip directly to LEDs and be done with it?
I'm not for or against Obamacare, as a software engineer it simply does not affect me. (I kept my same health insurance I've always had.)
> there has been no law that forced citizens to sign up for.... Jail time
Nonsense. Medicare/Medicaid/Social Security/FICA/Unemployment Insurance - all these are itemized on my pay stub. Again, I am not for or against Obamacare, but there are lots of other government programs I am forced to participate in.
I don't know why you were modded down. I believe most banking is legally required to retain every customer transaction for 7 years. What does it exactly mean to "delete your Wells Fargo Online Account" when they are legally required to maintain your records?
If at any point your relationship involves a financial transaction, that company might have a valid interest in holding onto the receipts through at least the next year's taxes, and may have a responsibility to hold the records for longer.
> I've never given correct information to any website to start.
My electric power bill, my garbage, other services are all website paperless situations. So I give websites correct info in some situations.
I also buy things online all the time from places like Amazon, you have to give them your address and name or the stuff won't come to you.
The thing that bugs me is when they mail catalogs to me ENDLESSLY. Paper catalogs. I mean, I browsed their website and bought their product, so I know the web exists, why are they killing trees for goodness sake?! I belong to a service that helps me unsubscribe from those, but some of these catalogs are dang hard to stop.
> Nobody at the C-level takes responsibility for anything.
I'm not so sure, I think it matters which C-level position you are talking about, some are hotseats...
In small to mid-size businesses (1,000 employees or less) I think it's super common to fire your VP of sales after 2-3 bad quarters and fire your CEO after 4-5 bad quarters, regardless of what situation is to blame. The CTO is almost immune from taking any responsibility, and unless there is embezzling I'm pretty sure the CFO is a cushy job with great security and awesome salary where your underlings do all the real work.
Regardless of whether I like them as human beings, I have been impressed by the risk taken by VP of sales at the high tech startups I've worked at. These men and women are compensated 50% by commission, so early on in a startup (in the era of low sales) their salaries are shockingly low and if sales don't pick up they are personally blamed, even if the product is young, buggy, and has better competitors in the market. VP of sales is a hot seat, we went through 4 in 4 years at one of my previous companies.
The internet existed in 1984. Some of us old timers still remember when AOL opened a gate and let their users into the readnews internet community, everything started going downhill about then. :-)
Now you kids get off my lawn!
HP Veer Smartphone (Palm Pre line) had a MagSafe connector that had data transfer: http://www.all4cellular.com/product/hp-veer-4g-usb-cable.html
You can still buy this phone and connector. The phone software is TERRIBLE, but the hardware was innovative and well designed.
In a another example, my HP Veer Smartphone (it's the Palm Pre line) has a magnetic charging cable that can ALSO carry data and audio!
Seriously, the HP Veer hardware was nicely designed, but the software is a train wreck. I still can't understand how the iPhone doesn't have a MagSafe recharge option, but my HP Veer does?
This is 1780 lumens (and $53): http://www.amazon.com/Philips-423525-White-Light-Dimmable/dp/B00B2KUA3Y
I own two, and although they look goofy when turned off, I've been happy with the amount of light it puts out and the color.
BTW, I was a hold out for a long time, I stock piled "dorm burner" halogen stand up lamps for years with bootleg 600 watt bulbs (now banned), I still miss the incredible light those things generated. I hate CFLs, I've broken a couple and the mercury cannot be good to inhale. I'm going LED, even if it costs hundreds of times more than CFL. I just wish the manufacturers would increase the lumens output.
Isn't posting on my Facebook wall the same as actually doing something?
very common for multiple drives in an array to fail within a short time window, due to shared environmental problems
Exactly. We had one interesting incident where in the middle of the night, 3 pods right next to each other in a rack all went berserk and all their RAID fell apart. That's 135 drives all at once (3 pods each with 45 hard drives). We reassembled them all, and the VERY NEXT NIGHT at the same time it happened again. We moved all three servers to different ends of the datacenter -> and finally figured out which server was causing the problems. The fan bearings on a fan were going bad, and when the fan came on it vibrated the entire cabinet. We have "nightly cleanup" jobs that run to verify data integrity and delete files we no longer want, this was enough load to cause the CPU to heat up enough to trigger the bad fan.
I'm not sure what you mean by "turnover"? If you are asking how many customers we have, I apologize but I'm not allowed to release that number (not my fault, I would post it on our homepage with a live number if they let me!)
:-)
But I was mostly joking, I think by "major" UnknowingFool meant the largest 4 or 5 companies on earth like Google, Facebook, Apple, Microsoft and maybe Yahoo. I assure you that Backblaze is in no danger of displacing any of the members of that list.
Engineers know how to tunnel through limestone, they created dedicated machines for it, here is an article: http://science.howstuffworks.com/engineering/structural/tunnel4.htm and here is an example where it was used: http://midwest.construction.com/midwest_construction_projects/2013/0729-deep-below-indianapolis-a-race-to-control-waste.asp
This is a great point. Five years ago Backblaze started with 1 TByte hard drives. Now we are deploying 4 TByte hard drives. The power consumption is about equal. So there is a moment in time that it is worth buying new 4 TByte drives and migrating data from the 1 TByte drives and throw away the 1 TByte drives JUST TO SAVE MONEY ON ELECTRICITY.
Our electrical bill is about $45,000 / month right now. There is a reason Google and Yahoo built those massive datacenters up along the hydro electric 3 cent/kWh Oregon/Washington border. And it's all about total cost of ownership, and EVERYTHING is on the table.
Trying to read a damaged sector is less reliable than reading the undamaged redundant copy.
You're thinking about it wrong. You always want the maximum amount of information from every drive, you can choose to use that information however you like, I don't want "Enterprise" drives that won't try hard to get every last bit.
Here is an example: We have had problems reassembling / resyncing RAID arrays because one stubborn drive pops out and fails too easily (we run two parity drives - so if you are already down 2 drives a 3rd stubborn drive is a bummer). If the drive would just stay in and try harder, we could get through that particular operation. Backblaze then adds it's own end-to-end SHA-1 on every file - trust us, we'll absolutely know for certain whether or not we recovered the file accurately or not from that particular RAID array or not. But until we reassemble the RAID array and get the file system back online, we can't even check what we are holding. Fighting with it costs us IT time. Again-> we have no performance problems at all. I know this is hard for some organizations to grasp when you never seem to have enough IOPS. But the nature of online backup is not like the nature of your billing or account info database.
I'd happily pay 2x or 3x the money to get 20x the write endurance.
That only makes sense if you are hitting the write limits. If the drive dies because the bearings wear out after 5 years of spinning regardless of the number of writes, you have just paid 3x the money and gotten exactly zero benefit.
Enterprise drives typically range from 18000rpm at the very high end...10K rpm probably the most common for bulk storage
Backblaze pays something like $45,000 / month in our electrical bill. We vastly prefer "green" drives that spin slower and use less electricity. There are many, many "Enterprise" applications in the world that are not bottle necked on spindle speed (like backup and Shutterfly-type big-data-rarely-accessed), those enterprises deserve slower drives. I guess I object to using the word "Enterprise" to describe "Fast" - why not just mark your drive as 15,000 RPM or 7,200 RPM and be done with it? No need to add the pointless label "Enterprise Drive".
SMART reporting is much more consistent for enterprise drives
No way. All hard drives do SMART reporting. Sometimes the "bridge" between the processor and the hard drives won't pass the information, so a cheap USB enclosure might be hiding the hard drive SMART stuff from you, but that isn't the hard drive's fault. In fact, we have an expensive Dell drive shelf with an LSI (?) controller that hides our enterprise drive SMART stats from us, very annoying. There is no correlation between "Enterprise" and "SMART reporting".
some manufactures are intentionally disabling typical enterprise firmware features on the consumer models, drive commands that are helpful for hardware raid
The whole concept of RAID is that it is a software layer on top of all the cheap drives. RAID doesn't require any interesting instructions. Pretty much needs to write data to an individual drive and read it back later.
I wouldn't be surprised if usage patterns over 5-10yrs resulted in a significant divergence.
Time will prove you right or wrong, we plan on updating and releasing these numbers every few years. Stay tuned....
Backblaze happens to use software RAID6 - standard Debian Linux, we use the built in mdadm tool. Our current pods have 8 GBytes of RAM, so I guess they could theoretically use all of that (and swap) instead of using "crummy RAID controllers with no memory to speak of".
It all matters what you value - reliability or performance. EITHER ONE is valid for companies, you can't say every "Enterprise" wants drives that error faster and successfully get the data back less often. Backblaze is a company, we value reliability way way WAAAAAAAY over performance. We want the hard drive to take 90 seconds and give us the data - heck, take a full 3 days to get the data back, we'll wait, so will our customers. We have no performance problems at all - customers are extremely happy getting a successful restore FedEx'ed to them in 48 hours (one of the restore options is a $189 3 TByte hard drive sent to you anywhere in the world where you keep the hard drive).
The only major company I know that uses consumer grade HDs in volume is probably Google
What qualifies as "major"? :-) This article is about Backblaze, we have 25,000 consumer hard drives, are we "major"?
I totally agree that "bureaucracy affects IT decisions". In a previous company we sold spam blocking software (we were the good guys) but our customers asked us to provide the software and hardware in a bundle because they had a hard time convincing their management to purchase stand alone computer hardware. So we pre-bought a PC clone, marked it up by a FACTOR OF 4 (for our trouble), put a sticker on the front with our company name and the IT guys happily passed the price on to their managers who happily signed the P.O.