Backblaze's 6 TB Hard Drive Face-Off
Esra Erimez writes: Backblaze is transitioning from using 4 TB hard drives to 6 TB hard drives in the Storage Pods they will be deploying over the coming months. With over 10,000 hard drives, the choice of which 6TB hard drive to use is critical. They deployed 45 and tested Western Digital (WD60EFRX) and Seagate (STBD6000100) hard drives into two pods that were identical in design and configuration except for the hard drives used.
I don't know... I find it odd that the WD drives, at the 5400rpm speed, were able to write data faster than the 7200rpm Seagate drives. That seems counter-intuitive.
It's also nice to see all of the drives go through that sort of "punishment" without a single failure - out of the box. NewEgg reviews aren't terribly helpful, since most only leave reviews when they have issues, and only a few customers ever bother to leave good reviews unless they are overwhelmed by the quality of a product.
- Initial reliability (how many drives failed) – No failures.
- Running reliability (3 months) – No failures
- SMART Stats (3 months) – No error conditions recorded for the 5 stats that we utilize.
- Hard Drive Cost – about the same.
- Energy Use – The Seagate drives were 7200 rpm and used slightly more electricity than the Western Digital drives which were 5400 rpm. This small difference adds up when you place 45 drives in a Storage Pod and then stack 10 Storage Pods in a cabinet.
- Loading speed – Edge to Western Digital, by a little over 1 TB per day on average.
Slashdot, fix the reply notifications... You won't get away with it...
That was about the most useless set of HDD statistics I've ever seen. You don't need more than one drive each to compare power consumption and performance.
So you think there's 0 variance?
NOTHING was said about reliability and who cares how much data was stored on them vs how long it was in service. Those two numbers are completely arbitrary.
45 drives each, no initial failures, no failures in the first 3 months. Right there that tells me the WD Red 6TB drives are hugely better than the 4TB drives I used.
I remember punching the side of 360K floppies to get another 360K on the other side.
Now you can buy a couple of gigs of USB drive next to the gum in the express lane at Wal Mart.
This stuff is awesome and all, but sometimes it's hard to really wrap my head around that pretty much everything about computers (except for physical size) is a billion times bigger than when I started using computers.
It really is hard to explain to people that at one point your entire digital life was about 20 floppy disks in a plastic case, and that what was once a completely hypothetical amount of storage is commonplace.
Lost at C:>. Found at C.
The segate drives being slower probably have to do with the SMR they are using? As each write has an amplification effect on the surrounding bits?
I think you missed the point. Several points, in fact...
Backblaze doesn't care about one drive. Power consumption is a complicated matter, and they have a very simple plan, so it's best for them to build a full pod for testing, and compare the power and performance at the pod level. They can extrapolate that out to their planned expansion considering pods as the units of measure, rather that having to consider drives, controllers, fans, and power supplies as extra variables. That simplification is partly why they're using a pod architecture in the first place.
Reliability doesn't matter much to Backblaze, either. They store redundant copies of data, so their risk of loss is mitigated, jjust as it should be for any enterprise use of such drives.
When you ask "who cares how much data was stored on them vs how long it was in service", clearly the answer is Backblaze, because they cared enough to study that particular metric.
Now, all of this is really only obviously useful to Backblaze. They're running tests in their environment, with their design, for their criteria. Realistically, the vast majority of Slashdotters won't ever handle anything like Backblaze's system, so they have different priorities. Backblaze still released their test results, just in case anyone cares. That's why they've gathered such a following among nerds. They've repeatedly published their research openly, contributing to the public knowledge base for system engineers. Maybe somebody finds it useful, and maybe not, but it's still a noble principle they practice.
You do not have a moral or legal right to do absolutely anything you want.
This is an article driven by the marketing team to drive sales. Take it for what it is....
Disclaimer: I work at Backblaze.
> They've repeatedly published their research openly... just in case anyone cares.
"Research" sounds too official, more like "observations in our environment", but THANK YOU for the kind words. What baffles me is why nobody else publishes these sorts of drive statistics. Why is Amazon silent? Why doesn't Google name drive names and failure rates? And if the answer is: "Google gets a great price on drives in exchange for their silence" then why hasn't Backblaze been offered a deal to keep quiet yet?! I'm serious, how big do you have to get before you get the better prices on drives? We essentially pay "retail".
Yeah, I remember when 1mhz was fast.
Well, there was a google paper about drive failures a few years back, but I don't think they named names...
Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
no, really, not enough said.
do you have a problem with giving vendors your private key? what problem would that be?
they're a US company -- does that engender trust or suspicion?
Remember kids, if you're not paying for the service, YOU ARE THE PRODUCT THAT IS BEING SOLD.
caching isn't a hard concept.
There are two types of people in the world: Those who crave closure
Well, retail at the 10,000 drive order level :-)
There are two types of people in the world: Those who crave closure
Well, if your sample N is 40,000 drives as theirs has been in the past, and you're operating with reasonably rigorous methodology to track problems, then you've got a good case. Write up your experience, and note N. (For 6TB drives, their N is very pretty small, and even moving forward they're only adding 230 WD drives).
I don't think you've got a good case to argue that a sample of 40,000 drives is "noise", but you could well be right about the much tinier smaller samples for 6TB drives. Assuming you've got tens of thousands of Seagates being heavily used, if your results differ from their past ones, that would be very interesting. Publish.
About the only takeaway there is that WD loads faster (about a TB/day, an unexpected result) and uses slightly less electricity.
Sorry, punching the tab out on the other side so that you could flip the disk over only worked on single-sided drives.
Single-sided, single-density: 90K
Single-sided, double-density: 180K
Double-sided, double-density: 360K
So if you were already at 360K, you were already double-sided.
> Their backup scheme require them to have access to your private key (the one you encrypted your backup with).
Disclaimer: I'm a Backblaze engineer who wrote a lot of that code.
Your statement is a bit misleading, there are two levels of security in Backblaze. Data is always encrypted, and the "private key" is a totally standard OpenSSL PEM file that yes, we store for you. By default, this PEM file is secured by a passphrase that Backblaze knows, so your data is essentially only secured by your email address and password and you can recover your password by email. This is pretty light security (if somebody has access to your email they can recover your password), so it's best for backups of stuff you wouldn't mind too much if somebody got ahold of it, like say pictures of your cat. Don't laugh, I backup my public website on Backblaze servers, there is valuable data in the world that does not need encryption, that would be info you don't want to lose but is ALSO publicly readable.
So if you are concerned at all about security, you can set your own personal "passphrase" on that PEM file that Backblaze absolutely never writes to disk - we don't store it. But if you do this you MUST remember that passphrase or your data is GONE. Without that passphrase, nobody will ever retrieve your data, not you, not the US government, not the NSA, NOBODY. You cannot "recover" that passphrase, and we don't know it. This is a good mode of security if you would be arrested on the spot for the contents of your files if the NSA got ahold of your data, because we really don't think it is breakable.
Seagate isn't using SMR on the 6TB drives, at least not yet as far as I know. That's rolling out with the 8TB models.
In Google's big paper on drive reliability, they claimed "we do not show a breakdown of drives per manufacturer, model, or vintage due to the proprietary nature of these data". I'm not sure exactly what that means. Might be part of their purchasing contract, to reduce liability for naming bad vendors, or it might be considering that information a competitive advantage.
I'm surprised Backblaze has published so much without getting into lawsuit trouble already. If you wonder why you haven't been offered a better deal on drives...have you considered that it's because you're not playing the big commercial buyer secrecy game? The best deal isn't necessarily the one you get if people are worried you're going to rat them out as a bad vendor. It's often the buddy who watches out for them that companies want to do large amounts of business with.
> retail at the 10,000 drive order level
You might be surprised how little discount we get. Our last purchase of 4 TByte Hitachi drives (960 drives in one purchase) we paid $135 each before tax and shipping. "B&H Photo" sometimes wins the bid (I don't know how or why), but you can basically get that same price within a couple bucks in units of 1 or 2 from their website. Note: we have no affiliation with B&H other than satisfied customers, and B&H do not win the bid every time.
With that said, if anybody knows how to get more than $2 off "retail" please PLEASE let us know!!
Are you saying seagate doesn't have caching?
> I'm surprised Backblaze has published so much without getting into lawsuit trouble already.
:-) Plus I think the drive companies are aware of the "Streisand effect" https://en.wikipedia.org/wiki/... and don't want to call even more attention to the fact that every hard drive is fully expected to fail eventually.
Hopefully "the truth" is a valid defense?
No no no! Any random Internet user's personal experience trumps any data you have!
Exactly. If the data is decrypted within Backblaze before being transmitted out...fail. Whether or not they store that private key only impacts how they can act when the person requesting the data isn't accessing it. Someone who sniffs the whole operation at the right place in the network while you're accessing your data will still get it. The only hope of real security you have is if the data is encrypted all the way to your computer, and then only decrypted there. Anything less is kidding yourself.
I'd love to be able to publish these statistics for our organization, (I'd estimate we have close to a quarter million drives in the field) but there is a big hurdle in the way: legal liability. If I was to say something negative about Western-Sea-Tachi drives, their lawyers might call our lawyers, and we could easily spend a million in court fees.
The thing I think would be interesting is that we have a completely arbitrary mix of drives, based on drive availability over the last 6 years or so. We also have a mix of different service companies who replace the drives in our workstations. Our contract is such that we don't control the brands, or even the sizes, as long as they meet or exceed our specs. As a service organization, they're responsible for picking the cheapest option for themselves. If our spec says "40 GB minimum", and they can't get anything smaller than 500GB, they'll buy those. If 1TB drives are cheaper than 500GB drives, they'll buy those. And if we're paying them $X/machine/year for service, they can do the reliability decisions on their own, so if they think some premium drives will last two years longer than stock drives, they might be able to avoid an extra service call on each machine if they spend $Y extra per drive. I expect these service organizations all have their preferred drives, but that's not data they're likely to share with their competitors on the service-contract circuit.
John
I work at Backblaze.
Then you boys should make an app that every computer enthusiast can use that tracks smart stats/drive failures and collects them at your servers. It'd be great to monitor drives across the internet with an application that you could just have minimized to the taskbar, maybe you could kickstart the funds for one? Many of us would gladly pitch in to get reliable drive data on a massive scale. Many of us are on the net anyway it would be great to report drive usage/characteristics in realtime across the internet.
I've been using stuff like below to "wing" whether a drive needs to be replaced or not, but usually drives start clicking before they go.
http://panterasoft.com/hdd-hea...
Well, don't keep the lawyers hungry. They have families to feed too.
I have a 125% failure rate for Seagate drives (i.e. 100% of the 12 I bought for my home server and 25% of the warranty replacements have failed). Model number ST3000DM001
Hopefully "the truth" is a valid defense?
Libel and slander against an individual is generally invalidated if you're making a truthful and factual statement. There are exceptions, like when there is intention of malice. And the minute you layer any opinion onto what are straight facts, you're in fuzzy territory.
And statements published by a company about another company are not necessarily protected by the sort of free speech guidelines that guide individual interaction. I don't claim to know those rules. No larger company would publish this sort of information without passing it through legal counsel first to figure it out. And that overhead influences why those companies just don't bother.
The most common reasonable criticism of Backblaze's reports I've seen is that the drives are not being used in their intended environment. I would not want to be part of a legal defense where I had to legally prove the data originating from that use case is strictly factual commentary about the product.
"while you're accessing your data"
That's really the critical part, isn't it? If you're using this for backup you should never need to decrypt it. The only time you need it is if you have a local failure. Then you have to make a choice: give up the data or take a chance that they are at the server siphoning off your data as you request it.
For 99.999999999% of data, I'm going to say that the US government doesn't give a fuck and the chance that they're monitoring your account when your local copy fails and you are getting your data is going to be pretty darned near zero *unless* you happen to be the target of an investigation. If you are, I would suggest that you pay the extra money for something like SpiderOak, where all the encrypt/decrypt is done locally. Though, to be honest, if you're going to be watched by the Feds, a USB drive and a good fire safe is probably a better solution for backing up your "sensitive" data.
Is it just my observation, or are there way too many stupid people in the world?
> Private keys (stored on their owner's PC where they should be) are still encrypted
> with passphrases in case the PC is hacked. That's how important keeping the
> private key completely private is.
The flaw in your design is that when the PC dies, you can no longer decrypt the backup because you just lost the private key.
Some online backup companies in the past have solved this by having you store your private key in yet a 3rd party "escrow" location, so you don't have the only copy and yet the company with your backup data does not have the private key either. In essence that is what Backblaze does, just in an "easy to use" way. We store the private encryption keys on one particular server, completely separate from your data. The data is all on "pods". Is it as secure? I don't think anybody can claim 100 % security, we do the very very best job we can.
I leave you with the following thought -> if you would use encryption (like TrueCrypt) on your most sensitive data, *THEN* back up the TrueCrypt image to Backblaze, even if Backblaze wanted to read your data or if the NSA put their processing power on it and cracked your passphrase, they would have nothing, because you encrypted it BEFORE it was encrypted by Backblaze and sent through HTTPS to our servers. Maybe that would allow you to sleep soundly at night?
This is really not a good approach to using public key crypto. The private key shouldn't be on the servers, it should be on the client. I know it's a pain to handle per-file backups and especially deltas when everything is encrypted, but that's the tradeoff for proper security. In fact there's really no need for expensive public key crypto here at all. Just have the client use a cheapish symmetric key (AES256 perhaps) and send only encrypted data to the servers. There's no need at all for the servers to ever have the data in the clear.
I read the internet for the articles.
"Research" sounds too official, more like "observations in our environment"
Step #1 of real science.
You do not have a moral or legal right to do absolutely anything you want.
You might be shocked (shocked I tell you ) at how capricious a lot of those decisions are.
My Heart Is A Flower
Since we have a Backblaze staff member here, can I ask why did you guys not test Hitachi's 6TB drives?
I'm saying it has half as much. Which gives worse results.
There are two types of people in the world: Those who crave closure
Seeing that the specific Seagate model they used have 128MB cache and the WD model had 64MB cache, I'm not sure how more cache makes it slower.
My bank now offers a storage space that is supposed to automatically receive bills and similar crap (for now .pdf bank statements land there, which is pretty cool if I somehow need to find that old stuff) ; files can be stored as well, uploaded to the web interface, no other means available.
That seems to be a good place to store keys. Else I'd be thinking of paper notes in a bank safe (and/or the kind of attorney that does things on your behalf when you're dead or incapacitated, in growing order of cost)
I hate to say it, but this is probably the correct answer. Every failed write to a sector on the 7200's requires the drives to relocate the data, making the 5400's "faster" to write the complete data if they get fewer write errors.
"we do not show a breakdown of drives per manufacturer, model, or vintagedue to the proprietary nature of these data". I'm not sure exactly what that means.
Perhaps part of their discount is tied to a deal to provide exclusive data of failure rates to the manufacturers? Same effect as buying silence, but seemingly more legit.
I find it odd that the WD drives, at the 5400rpm speed, were able to write data faster than the 7200rpm Seagate drives.
Maybe the Seagates are more sensitive to vibration, either from making more of it when you shove 45 into a cheap metal box, or by being less tolerant to it because they're pushed harder.
> Just have the client use a cheapish symmetric key (AES256 perhaps)
We do use AES to encrypt the files. We used a well known design where we use the public key to encrypt the AES256 key and FEK, then we use the AES key to symmetrically encrypt the file. Then we can use the passphrase to encrypt the private key. So it's kind of an onion, you use the passphrase, decrypt the private key, which is then used to decrypt the AES key and FEK, which is then used to decrypt the file. (We didn't invent this flow, it is used in several encrypted filesystems because it's a great design.) This was it is FAST (symmetric AES) plus has the total awesomeness of pub/private keys and all they imply (the idea that you can encrypt data with the public key that nobody listening can decrypt because they don't have the private key is really quite powerful).
We then use HTTPS to post this data from your laptop to our datacenter. From time to time this "double encryption" of both encrypting on the client and sending the already encrypted data through HTTPS anyway has helped keep our customers safe when HTTPS has been broken for a little while.
Well if what I read on one of the forums (sorry I can't remember which, may have been Tom's) by a person claiming to be a former Seagate employee is true? Seagate suckage makes sense and moreover we now know WHY it happened all of a sudden.
Here is the skinny, according to the insider when Seagate bought Maxtor instead of Seagate making Maxtor better? It brought Seagate down to Maxtor levels. You see Maxtor had these ARM controllers that were dirt cheap to crank out, catch was you had to keep 'em in 5400 RPM drives (and even then they better be well ventilated) because if they got hot 1+1 could equal anything from 1-5 and so the controller would lose its little mind and forget where the end of the drive was and the drive geometry. Of course all the Seagate execs saw was $$$ on how much they'd save on the cost of manufacture so they started using them on ALL Seagate drives and...you know the rest. The reason why you can see a dozen shitty and one good in every batch is that when they run low on the shitty Maxtor chips they will occasionally use some of the more expensive Seagate chips, hence the pearl among the poop.
But after reading that it all made sense, the sudden plummet in quality, why drives below 750GB are good (those chips are based on the old Seagate chips and code) and why we see a good one among the crap, you get lucky on the ARM lotto. It all sadly makes sense, just more short sighted corporate douchebaggery.
ACs don't waste your time replying, your posts are never seen by me.
I find it odd that the WD drives, at the 5400rpm speed, were able to write data faster than the 7200rpm Seagate drives. That seems counter-intuitive.
If there are less platters in the WD then the density will mean a speed boost even at a lower spin speed.
The one with slower drive speed has more cache (Seagate), making it actually faster.
http://www.bhphotovideo.com/c/...
There are two types of people in the world: Those who crave closure
You seem to have trouble understanding statistics or reading plain text. All the information you are missing was actually there.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I have actually considered the truecrypt container backup but I was unclear that Backblaze supported incremental backups. I was under the impression that each time a file is backed up the entire file is re-uploaded. Is a diff process used instead rather than re-uploading 1Gb every time a text file within the truecrypt container is changed? How do you handle the issue that many truecrypt volumes do not update their access and modified timestamps on files to be aware that changes have been made?
Thanks for all the responses to this post. While your drive observations are perhaps unique to your setup they still have information to offer the rest of us.
As much as I like to bash Seagate due to their crappy reliability in my personal experience, TFA states that there were no issues identified in the SMART data between the 6TB drives. One of the metrics they use to determine reliability is SMART 5 Reallocated Sector Count.
Either Seagate's stats are lying or the drive isn't having a problem with failed writes.
But I'm overthinking this. Maybe they are slow because they are just crap.
I would personally like to see Western Digital sue Backblaze claiming that the WD Red drives specifically designed for NAS are not being used in their intended environment.
As for the criticism, I don't think there's such a thing as an intended environment for a HDD other than ruggedly mobile or stationary. I'm typing this from a laptop right now. Who is a harddrive vendor to say the level of vibration, temperature or movement my laptop experiences? At the same time I want those vendors to come out and tell me how their harddrives are not sitting in their "intended environment" when they are in a fixed rack serving up data, being kept in stable environmental conditions.
Personally I think the criticism is bullshit.
The intended environment for WD's drives includes a description of how many drives should be in the array. They are numbers like "NAS with 1 to 5 disks". They state that the lower tier models will not work well inside of massive arrays, where things like vibration need to be better controlled. Their more expensive models have specific technology (at an extra cost) aimed at keeping vibration related issues under better control.
BackBlaze ignores those guidelines, putting drives that were not designed for the vibration of a dense drive array into one. When Backblaze drives fail, it's completely appropriate to ask "would they have failed there if they were used only as specified"?, which means putting them into smaller arrays. There's a very real possibility that the failure rate heavy reflects that unusual setup, and that it is not representative of reliability for the disks in other environments.
Which is why it is interesting the WD drives transfer 1TB more than the Seagates.
I always wondered if Google and other huge outfits get prototype drives well ahead of the market. Those 20Tb drives have to be tested somewhere before they are introduced next year, right ?
Non-Linux Penguins ?
seems like it should be possible for backblaze to store whatever encrypted data (including the PEM file) without the passkey that can open the PEM file ever entering their system. Of course, then my client has to do all the encryption/decryption, but then again, if I care whether the passkey leaves my system I'm probably willing to pay that price.
Of course, it's entirely possible that that's exactly how it's handled when using a non-web client... in which case I would just avoid the website.
First, let me say thank you for publishing this information.
Second, the reason nobody else shares information is: Information is power. Power shared is power lost.
I forget who did that quote originally. Some claim it is from the Art of War.
Regardless, any power "lost" is surely gained back in spades with good will. You guys rock. :)
"Someone needs to talk to the tree of liberty about its ghoulish drinking problem." by ohnocitizen
"NAS with 1 to 5 disks" is not an environmental spec.
The number of discs does not relate to the vibration or heat or any other factors. Those can only be measured directly. Now if WD specified that drives should not be placed in an environment where they will be subjected to x um vibration measured to some ISO standard then I would be right there with you.
How do 1-5 disks compare to a computer with 5 poorly balanced fans?
How do 1-5 disks compare to a single metal enclosure direct mounted, vs disks mounted via rubber grommets?
finally:
How do 1-5 disks placed horizontally next to each other or double stacked compared to drives mounted vertically and held in place with an anti-vibration sleeve such as the one used by Backblaze which they posted gave them a measurable performance improvement?
Even some braindead lawyer could point out the difference between a direct measurable specification and the completely subjective "NAS with 1-5 disks"
And as a side note Backblaze see no reliability differences between their consumer and enterprise grade drives, of which they have several thousand.
The number of discs does not relate to the vibration or heat or any other factors.
They are correlated. More discs guarantees more vibration and heat, all other things being equal. Yes, there are other sources too, and all the other things are not equal. So what?
That you are calling ""NAS with 1-5 disks" a subjective specification means you're not actually using words in a way I can respond to there. Whether Backblaze's custom modifications net better or worse levels of vibration is a complicated discussion that could use some direct measurements; agreed. But what's extremely clear is that they are not using the consumer drives in anything like a consumer environment. That means using their results as a commentary on what people will see in the broader consumer system market is extrapolation, with the obvious risks that come along with it.
For example, "Backblaze sees no reliability differences between their consumer and enterprise grade drives" is a fact. Saying "there is no reliability differences between consumer and enterprise grade drives" is an invalid extrapolation of that data.
Using your example, what if one of the consumer drive models has a serious vibration issue, and Backblaze's anti-vibration sleeve makes it wildly more reliable than it would otherwise be? That would make their statistics pretty worthless for consumers who don't have one of those sleeves. Home users might actually see better reliability with one of the enterprise drives that include anti-vibration technology in that case. That's all I was saying here--that you can't just assume their numbers will translate into other environments.
How does caching help bulk write performance?
CLI paste? paste.pr0.tips!
> Then you boys should make an app that every computer enthusiast can use that tracks smart stats/drive failures and collects them at your servers.
I think this would be really awesome. Here's where it gets neat-> we already have an app running in hundreds of thousands of desktop and laptop computers! (Our "online backup application" involves a tiny service that runs to send your files at the datacenter through HTTPS.) So if we just updated the client with a small amount of statistics tracking (and maybe a nice checkbox to opt in or out) then we could immediately start collecting info.
Sort of related: A few years ago I played an online 3D video game (can't remember which one, might have been Quake?) that you could both report your graphics card and RAM configuration to the server, and the server would list the aggregate statistics. So there is some precedent for this kind of data collection and publication.
> unclear that Backblaze supported incremental backups
Backblaze does support incremental backups, but it is a fairly simplistic incremental. For any file less than 30 MBytes there are no partial files, we just push a whole new copy to a whole new location in our datacenter. For any file more than 30 MBytes, we break the file into 10 MByte "chunks" and push each individual chunk if that chunk has changed. So the WORST thing you can do is prepend a single byte to the large file - this essentially causes every single 10 MByte chunk to change (slide to the right?) and so we have to retransmit the entire thing.
For a lot of programs dealing with large files, they tend to append bytes to the end of their file formats, which works great. If it is an entire bootable computer image, a lot of stuff will probably not move around (like huge swaths of binaries sitting in that computer image) and a lot of stuff WILL move around that will "accidentally" be backed up.
One final hint: by default TrueCrypt specifically thinks changing the modification time is "leaking information". Make sure you check the checkbox that when TrueCrypt changes the image, it needs to also update the last modified time. Backblaze uses that as a hint to go examine every byte in the file to see if it should be retransmitted.
> counter Linux-unfriendly Backblaze's propaganda
Backblaze employee here. By the way, we're not "Linux-unfriendly", every single last datacenter machine is running Debian, that's like 950 machines! Most laptop customers use Windows or Mac so we did those versions first, and we're trying to get the Linux client finished up, it just got pushed down in priority a few times, but we don't mean it as a slight against Linux.
About CrashPlan - I have ALWAYS liked CrashPlan, and I think they are great and people should certainly consider CrashPlan if it fits their needs. You might also consider Carbonite and Mozy, I think these are all good products with a few tradeoffs here and there. Backblaze isn't perfect for all customers, for example, we don't yet have a Linux client. I believe Mozy has a better small business administration portal than Backblaze has also, if that's what your needs are.
> BackBlaze could find a way to get more bandwidth so their shitty service backed up a rate faster than 300KB/sec per client
You should absolutely be getting more bandwidth than that, you might contact our support to see what's up? We have students from Universities hitting 100 Mbits/sec upload rates, plus I suspect a few engineers in datacenters are getting even higher. We do not inherently throttle, although we use RAID6 with groups of 15 drives so inherently you are probably rate limited to 1 Gbit/sec by either the 1 Gbit/sec network card in the pod, or ?? which is the disk drive transfer rate.
There's also steam:
http://store.steampowered.com/...
The flaw in your design is that when the PC dies, you can no longer decrypt the backup because you just lost the private key.
I see it as a requirement rather than a flaw. If my data can be decripted after I have lost my key, then other people had copies of my key. It is a well known and documented fact that we can't trust everyone with access to the other copies of my key.
You never see my requirements or feature requests or responses on user serveys, or those from people who ask me for help, because your product doesn't meet my needs and gets discounted in the first round (along with almost all of your competitors).
Some online backup companies in the past have solved this by having you store your private key in yet a 3rd party "escrow" location, so you don't have the only copy and yet the company with your backup data does not have the private key either. In essence that is what Backblaze does, just in an "easy to use" way. We store the private encryption keys on one particular server, completely separate from your data. The data is all on "pods". Is it as secure? I don't think anybody can claim 100 % security, we do the very very best job we can.
Yes, the escrow solution has exactly the sames flaw as Backblaze's model. Security is fundamentally flawed as soon as users lose control of their key. All that effort ensuring keys are never writen to disk provides some protection against hackers, but can be completely bypassed by authority. The list of people and organizations that can gain or already have such authority is always surprisingly large. You are doing the very very best job you can for the model you have chosen to implement.
Fixing key loss problems requires guiding or ensuring that the user to keeps copies of their key. Maybe you can even offer to keep a copy for nieve users, or make some pocket money selling keyfobs, but if you start from the position of compromiable keys you can't support people with a healthy dose of paranoia. And that is becoming more and more of us. We are stuck with encrypting *before* we use your service, which makes your service less usable and less attractive.
I always find it sad when people advocate blacklists to protect their sensitive data. 'Encrypt your most sensitive data first'. It doesn't work, as it assumes you know what your most sensitive data actually is and don't make mistakes. You need to protect *all* your data by default, and open up data you determine to be not sensitive when necessary ('Share this photo with friends', 'Sync with Contacts').
My bank now offers a storage space that is supposed to automatically receive bills and similar crap (for now .pdf bank statements land there, which is pretty cool if I somehow need to find that old stuff) ; files can be stored as well, uploaded to the web interface, no other means available.
That seems to be a good place to store keys. Else I'd be thinking of paper notes in a bank safe (and/or the kind of attorney that does things on your behalf when you're dead or incapacitated, in growing order of cost)
If the keys are encrypted, maybe. The bank is using this to store bills and bank statements. This storage doesn't need to be secure, it just needs to be more secure than your letter box. The bank doesn't need to keep the storage private from its employees, as its employees already have access to your bank statements and bills. About the worst thing you could upload there is your internet or phone banking password in cleartext, as it would be visible to exactly the people who know how to best exploit it.
They are correlated. More discs guarantees more vibration and heat, all other things being equal.
No they aren't. It's an indirect association combined with a lot of subjective assumptions. Are vibrations in phase or out of phase? Are they co-coherent? Having 2 vibrating sources does not guarantee an increase in environment vibration. You can have anything from an doubling to a complete elimination to changing frequencies with no change in magnitude. Regardless of what you say measuring vibration in "number of disks in a NAS" is not at all any kind of engineering specification. It's like measuring load capacity in an elevator in persons without defining what a person actually weighs, which is also why it's not legal to write just the number of persons on the load capacity.
The reality is their pods will have wildly different vibration characteristics and I say this as a person who has has spent the best part of the last 4 years working in industrial vibration monitoring. No two machines vibrate alike regardless of what environment you put them in. That's why predictive maintenance is not done based on absolutes but relative measurements. It is very possible that one of Backblaze's pods will vibrate itself to all crap, while another will experience even less vibration than a single drive PC.
But what's extremely clear is that they are not using the consumer drives in anything like a consumer environment.
Really? Because I keep my harddrives in a metal box with fans and sources of heat. Don't you?
For example, "Backblaze sees no reliability differences between their consumer and enterprise grade drives" is a fact. Saying "there is no reliability differences between consumer and enterprise grade drives" is an invalid extrapolation of that data.
You're right. If you extrapolated the data properly you're saying that consumer drives are far more reliable than enterprise drives given that Backblaze sees no difference while running the enterprise drives within spec and the consumer drives outside of your mythical spec. I never thought about it that way.
Using your example, what if one of the consumer drive models has a serious vibration issue, and Backblaze's anti-vibration sleeve makes it wildly more reliable than it would otherwise be? That would make their statistics pretty worthless for consumers who don't have one of those sleeves.
This is true but in any case you're stretching here and moving the goalposts of the argument. The original argument was that Backblaze are using drive outside of "spec" and what this mythical "spec" actually means.
I'd be happy to agree with you on random distribution of environmental variables in other circumstances, but lets define those variables first. I propose we define the environment in number of vehicles with a GVM of 18T driving within 80m of the computer. It's about as useful as number of the drives in the array.
Anyway I'm more than happy to follow Backblaze's numbers and translate them into other environments. I'd be even more happier if you gave me some other data to work with. Because if we ignore Backblaze's numbers all we are back to is a 110 years MTBF if used with 1-5 drives in a NAS.
Actually maybe we should correlate it to solar activity. It would probably be more accurate than the manufacturer's useless numbers.
It is advertised as a secure place to store files (a "digital safe") and I'm pretty sure the bank is unable to access the files.
The password is weak, though (but at least minimally protected against key loggers : you click on numbers whose order was scrambled). That makes it fail slashdotian standards.
Steam does this. http://store.steampowered.com/...