Yes, modification time and file size are the two things that rsync checks by default.
Now, while reflecting on the funny mod someone gave you, what operations need to be done to retrieve that information for each file? And furthermore, how does that number of operations scale with regard to the number of files in your working set?
Not that it matters in this case anyhow. Moving the problem to the desktops would multiply the original problem, not resolve it.
Have you ever actually used rsyng on a decent sized file set? Determining the changed file set requires significant disk activity.
It's a certain win when compared to just blindly transferring everything. But if you think that rsyncing 20 changed files in a 100 file working set is the same as rsyncing 20 changed files out of a 2,000,000 file working set you are very very wrong.
Completely aside from the absolute insanity of suggesting that you replicate the full contents of the fileserver to every desktop, which has been covered by others.
Somehow, I think you misunderstood what this article is about.
Given the very frequent mention of 'disk based storage', and how flash is so much better, I'm not sure that I did.
It's not about SSD.
No it's not about SSD, that is the problem, it reads like they have never heard of them.
Memcached prevents Facebook's disk-based databases from being overwhelmed by a fire hose of millions of simultaneous requests for small chunks of information.
flash memory has much faster random access than disk-based storage
Each FAWN node performs 364 queries per second per watt, which is a hundred times better than can be accomplished by a traditional disk-based system
Swanson's goal is to exploit the unique qualities of flash memory to handle problems that are currently impossible to address with anything other than the most powerful and expensive supercomputers on earth
Swanson's own high-performance, flash-memory-based server, called Gordon, which currently exists only as a simulation...
I'm not saying that a wide array of low-power nodes is a bad idea. But unless they address the current state of technology, rather than a conveniently quaint world in which using flash as your primary storage makes you some sort of innovator, it's hard to take them seriously.
"you could very easily run a small website on one of these servers, and it would draw 10 watts," says Andersen--a tenth of what a typical Web server draws.
And how does that per-website energy usage compare to a normal server, using SSDs, and running enough virtualized instances (or just virtual domains) to match the per-website performance offered by a single FAWN node?
You need to address the actual state of things, and not the strawman of what computing was 6 years ago (or however long) when the project was started. While they've been working, the world hasn't been standing still, and you cannot pretend that spinning disks are the only thing going.
Perhaps I'm being too harsh and it's a failing of TFA and not the original researchers. Given that a dual core Atom330 takes like 8 watts, it is entirely reasonable that you could build a very efficient cluster out of a whole mess of them and a few SSDs, and produce something like you insist that the article was about. That would be interesting, provided that it compared favourably against similarly state of the art systems of course.
Intel X25-E, 2.6 watts, 3300 Write IOPS, 35000 read IOPS*. So only one or two orders of magnitude more efficient...
And though no prices are given in the article for the FAWN, at $800 for the X25-E it's probably less expensive too. Particularly if you include setup and administration costs.
Not a bad idea in general, and not a bad idea in specific for 5 years ago, but pathetically outclassed in every area by a high end modern SSD.
I mean, those numbers sound small, but even I have no clue how many IO requests I am making right now... is ten cents per million a good price or a bad price? Dunno! Is a penny per 10,000 GET's a good price? Probably--that is ten bucks for 10 million requests, right?
It can add up fast.
My company provides an offsite backup solution, and we've been using S3 as our primary storage backend since a few months after S3 went live. It is not unusual for us that the per-op costs are greater than the actual data storage or transfer costs. It is worth noting however that our use of S3 is quite non-standard. We do some pretty extensive verification to catch bitrot should it ever occur, as well as some fairly convoluted data processing to minimize actual transfer overhead for updated files.
So the answer is that it really depends. If you just throw data up there all at once and hope it sticks, those additional costs won't matter. However, if you want to build something more complicated that doesn't just blindly trust S3, or that does efficient data updating, then yes, those costs do matter quite a lot.
Those that work at companies that are entirely family or employ owned, do you feel that your company is in better shape than those public stock corporations?
I am co-founder of a small company that's been around for a few years now. While we have certainly noticed the recession, we continue to grow and our monthly revenue is the highest it has ever been.
We are debt free, and actually appreciating the slowdown a little bit as it gives us time to step back and take a longer view of our product development.
You might want to go look up the differences between MLC and SLC Flash.
It's just bit packing. For example (and ignoring many low level details*) your 512-byte sector would be stored in 2048 hardware bit buckets instead of 4096 individual storage quanta.
* For purposes of illustration and ignoring the smart little tricks of hardware reality.
10 months, in a low humidity location (max 30% according to the paper), and with no dust filters. How... rigorous.
Let's see how those servers are doing in another 14 months with 90 degree intake air and two years worth of dust blocking the intake channels, covering the heatsinks, and increasing the mechanical wear on the fans.
In my experience that is one of the most useful aspects of climate controlled datacenters. Predictability. It takes more than 10 months for those sorts of issues to begin presenting themselves. Unfortunately the article does not give the rate of failures over time, which seems like a rather critical piece of data.
If the increased rate of failures all manifested at the start it might be reasonable to conclude that the higher temperatures were just culling the heard so to say, but if they were weighted toward the end of the 10 month trial period it would be a very different story indeed.
Sounds like a frustrating time. Completely out of my area of expertise, but good luck with future work. Reminds me of why I decided to get out with a MS and start a business...
and I think it took 18-24months to break even on hardware
Which is a bit further than you can hope to stretch things when you're just getting started. That's a long time to not be making any money. Additionally unless you have piles of money laying about to cover the unexpected, it will make your service a lot more fragile because you simply won't have left yourself the resources to deal with problems. For instance, most of our current service offerings break even within 12 months, because anything longer would seriously compromise our ability to provide support and hardware replacement to our customers.
cloud storage
It's a very interesting problem, and I agree that it will be an inseparable part of future data storage. This gets back to where the discussion started, on Padlock. I've yet to see a cloud storage or clustered storage implementation that isn't shit. Want to make yet another shitty cloud storage system? Grab FUSE, convert file paths + (offset % 8192) to a hash, and stuff it into a DHT (Chord, CAN, Tapestry, take your pick). Congratulations, you're the newest crappy cloud storage provider!
The reason my company didn't do that is because we didn't think that anyone would seriously be interested in something so shabby, though the market seems to be
proving us wrong. Guess that's what we get for all having come from technical backgrounds.
However, there is also the problem that when all you have is a crappy product that is basically on par with what everyone else has, it comes down to a battle of marketing. Instead I'd rather spend my time working on something that is far enough ahead of the curve that even the combined marketing suck of a bunch of CS people can't drag it down.
I figure padlock works about as well as everything else on a VIA platform, so my expectations are pretty low. You seem to be in a much better position to judge just how bad it actually is. My experience has been that with increased knowledge in any specific field comes increased scorn for just how bad most people are doing things.
Sounds like an interesting paper, did you end up with a working test implementation that gave good results? I'm guessing that you may have run into annoying latency, particularly when reading data back from the GPU?
My business, of which I am a founder but not the entirety, is pretty interesting (Then again I would say that...). Our middle tier of service uses VIA boards and takes advantage of padlock to do full encryption of the system and storage disks, which is why I've an interest in padlock. (And a bit of worry about the recently released cold boot attacks on encryption keys.)
Our top tier hardware for example does not use full disk encryption, because it would impact performance. We get ~350MB/sec internally from the disk array and the rate at which we currently can process data for backups is limited to how fast we can SHA-256 chunks of an updated file, which is just a bit over the 350MB/sec the disk array can do. Adding more processor power is easier said than done as the thermal load of 12 storage disks, 2 SAS database disks, and two dual-core CPUs is really pushing the cooling limits of the amount of airflow you can squeeze past all of the hardware in a 2U chassis. Which is why something than can do AES-256 at 600MB/sec in ~12 watts would be very appealing.
As for 2-3TB of offsite data, what with 1TB disks being as inexpensive as they are, the biggest problem is really what you mean by "robust and highly available". What that says to me is two sets of offsite hardware, both of high quality, and two sets of colo bills. In addition to a replication scheme between the offsite storage locations. All of which starts to get kind of incompatible with the "don't want to pay the earth for it" constraint. Even so $50/250GB doesn't leave a lot to work with.
To price it out using some hardware prices that are fresh in my mind: 10TB servers (12 disks, RAID6) with quality components such as redundant PSUs will set you back about $6.5K each, more if you don't build them yourself. Slowish 10mbit colo costs $100-200/Month.
For a redundent setup that's $13k in hardware, and say $150 *2 * 12 = $3600 in colo costs per year.
$50/Year for a 250GB slice would get you $50 * 40 = $2000/Year. Not enough to even cover colo costs if you want replication, and only $200/Year more otherwise. Unless I'm missing something it doesn't seem like $50/Year for 250GB of offsite storage is doable unless you already own your own datacenter/colo facility. And even ignoring the colo costs you would have to cut a lot of corners on the hardware to make a profit at $50/year/250GB.
The best we could do would be about $0.80/GB/Year for a single offsite copy. We have plans for something vaguely like this, and by the time we get anything ready to release, that price will probably have dropped a bit. But $0.20/GB/Year is just not doable without seriously large volume. $0.20/GB is in the range of what current commodity storage such as S3 charges per month.
If you looked at our website you probably saw that what we're doing is pretty constrained. Basically a NAS box with some extra features that keeps itself completely backed up to replicated storage, and is replaced at no additional cost and will a full copy of the backed up data if it ever fails. Our prices, though better than anyone else in the industry that we've found for high capacity, are still probably quite a bit higher than you're looking for in the 2-3TB range.
That being said we do have some really interesting and novel stuff in development for how distributed data storage is done. Unfortunately I'd have to kick my ass really hard if I discussed it on an open forum before we finish and release it.
I'm getting the figures I have from openssl 0.9.8g running in Linux on either a 1GHZ Nehemiah CPU, or a 1.2GHz Esther.
You can find similar numbers by searching for "openssl speed padlock aes".
Maybe I'm missing something in your figures but how is 511MB/sec a 6-12x speedup over Core2? You can't just multiply out the clock-speed because the length of the longest path in the AES circuit won't allow the circuit to scale to 3Ghz.
I said "the performance adjusted for clock speed", yes it's a pretty meaningless number. The hardware implementation certainly won't effortlessly scale to 3GHz for the reasons you mentioned any others. It was a poor attempt at normalization.
I should have either given the encryption per watt, or noted that I'm seeing linear scaling between the 1GHz and 1.2GHz parts I have, so perhaps one could expect the 1.8GHz part they ship to continue that trend. However as I don't have one on hand, I can't verify that.
Anyway, the point I'm trying to get to is not that the VIA CPUs are better than a Core2. But that I would like to see hardware AES, or at least something that matched or bettered the performance of Padlock, on a Core2.
Specifically I want to be able to read off an encrypted aggregate block device, and stream that out a gigabit connection over encrypted IPv6/IPsec connections. Without completely monopolizing the CPU.
The point is that if core2 had a padlock equivalent running at 3GHz*, you could do the above on a 10GigE connection. Or more practically, you could encrypt everything with no meaningful processing overhead. Yes this is a minority consideration, but it is one that's useful to me. Check my signature and you can probably guess why I care.
* I don't think it's too much of a stretch that between correcting the "shit" padlock implementation, and the compromises needed to make it run at 3GHz, it might be about even.
Look, this is not a Core2 vs. VIA argument. There is no contest there, the Core2 is on the order of 10x faster at everything (except AES/SHA), as it ought to be. The VIA CPUs are not as good. It's a smaller design team, with a smaller budget, running on really old process tech. Of course they are crappy, that they are slow and shitty is given, it does not need to be argued.
I don't particularly care if the CPU comes in a blue box or a green one, or if it's extra special hardware or a new software technique using existing hardware. All I want is to see a fast general purpose CPU that can do AES-256 at 1GB/Sec or better per core.
So no, it is not pathetic that a 3Ghz general purpose processor can match the special purpose extensions on the C7. Given that the achievable speedup is much larger than the ratio in clock speeds (let alone the extra the Core2 is doing) is shows that the VIA performance is shit.
If it were true that the Core2 could match the speed of the Padlock AES, adjusted for clock speed, then padlock would be disappointing. Not worthless, as a complete VIA based system including a pair of disks will run on ~30 watts, but disappointing.
However that's simply not the case. The best numbers I can find are around 125MB/sec for a 3GHz core2, you list 250MB/sec and reference "unpublished results". For the sake of argument I'll accept your numbers.
AES-256-CBC on a 1GHz Via part, an older board than my other post, does 511MB/sec with all of the overhead of openssl.
So the performance adjusted for clock speed is 6x the Core2 using your numbers, and 12x using the best numbers I can find. If you think a 6-12x speedup is "shit" then that is your prerogative, however I think you will have a hard time finding people who agree with you.
There are a lot of things about the VIA boards that are shit. The general purpose speed for example. Without padlock the same board that got 511MB/Sec gets 11MB/sec, that is shitty performance. But what do you expect out of a CPU that doesn't need a cooling fan and has a heatsink that will fit in a 1U chassis?
A single core on a 3Ghz Core2 can match the performance of Padlock. I can't provide a link as the figures are unpublished but it's not particularly hard to work out how.
I don't suppose you could provide any numbers along with that claim? Because a non-padlock CPU matching the performance for AES-256 would be really useful sometimes.
For reference here are padlock numbers on a moderate Padlock equipped CPU:
cpu family : 6
model : 10
model name : VIA Esther processor 1200MHz
stepping : 9
cpu MHz : 1197.115
cache size : 128 KB
My company (http://www.zettabytestorage.com/) makes a managed NAS device which would have completely prevented their problems. Better still, our "Professional" line of products includes local disk encryption, meaning that the thieves get nothing but a fancy NAS device on which they need to reformat the drives before they can use it.
There are a nearly innumerable number of other companies providing some sort of offsite backup at varying mixes of ease of use, capacity, and price. Some of them, like ours, are extremely easy to setup, and require no further active participation from the user. They pass the "mom" test, the "CPA who doesn't like computers" test, and almost certainly would also pass the "rich old man with 15 years worth of work who was able to setup an external USB drive" test.
Our home lines starts at 30GB for $34/Month and our Professional at 140GB for $139/Month. It's not free, but it's a whole lot less expensive than losing 15 years worth of work. And includes geographically remote replication, hardware replacement in the event of loss or failure, all shipping charges, and any other applicable costs.
Additionally, there are a huge number of DIY solutions out there for remote data backup. They are not as easy, but they are less expensive.
If you put months or years of work into creating your data, but then don't either take the time to learn how to do it yourself, or pay the pittance required for a professional backup solution, you should probably spend some time thinking about your priorities.
A textbook could contain a work covered by copyright, and licensed under the Gnu GPL without requiring the entire book to be released under GPL. The relevant portion of the GPL is at the end of Section 2:
In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.
Mere Aggregation is, in my opinion, the most overlooked aspect of the GPL. It gives developers the freedom to work with GPL code, without requiring that everything they touch then be GPL. Thus allowing GPL code to work its way into a closed system, one component at a time, and may the best code win. In this manner GPL projects can then attract contributers and maintainers from a much larger developer pool, thereby increasing the overall robustness of the community.
This is the "real" viral nature of the GPL, not that FUDish crud about 'if anything is GPL then everything is required to be GPL'.
(Please note, I am not talking source code level components, but rather independent pieces of software working in concert.)
Backups for the home or small business user do not need to be tricky, difficult, inconvenient or time consuming. But you do need to have the right equipment and software for the job.
Which is why I founded a company to do just that (I did say shameless promotion). Backups should occur at LAN speeds, be strongly encrypted, stored offsite, and not require any great effort by the user. Further, they should not charge an absurd fee and have an annoying interface, as most online storage providers do.
Thus I give you http://www.zettabytestorage.com/, secure backups in at least two geographically diverse locations, with local LAN access speeds, for less then $0.50 per GB and up to 700GB. You put your data on our NAS box and it gets backed up, thats it. You don't have to worry about failing hardware (we replace it free), local disaster (fire, flood, etc.), or really anything this side of the collapse of civilization as we know it. Your data is safe, both local and remote.
I do work for Zettabyte Storage, and if you know of an easier way to backup your data, I'd like to hear about it.
The problem is not the people doing the tracking, but the funding they don't get.
There are some effots being made such as http://neat.jpl.nasa.gov/> but they get next to no funding.
How many people are you going to be able to convince when all you can say is that "It's likely one will hit a populated area sometime in the future".The general reaction that I've witnessed is "If it was going to happen, why hasn't it yet?" and "That's just science fiction".
It's far to abstract a threat for the vast majority of people to care about. . .
You're not very good at this Physics thing are you?
A resistor will convert electricity to heat with near 100% efficiancy (a little is lost to unabsorbed EM radiation). To change the temperature of a body of water, you need to add energy to it. Thus unless you can convert with better then 100% efficiancy. . . . . the resistor wins the energy game.
Note that this is not the only consideration in making a device of this nature, but to say that microwaves do it with higher efficiancy is stupidity of the highest order.
I've got to disagree with you with regard to CRT vs. LCD. I limit the discussion to those as they are what are available on the market now.
Which display to get should be decided on it's primary task. If you care most about gaming, get a nice CRT. If you care most about anything else (with the exception I suppose fo very accurate color work) get a LCD.
I'm sitting in front of a 21" Trinitron CRT, and a 20" Dell 2000fp LCD. I run my IDE across the full expanse of both, the LCD is far easier to look at then the CRT is. However I do all of my gaming on the CRT as in that case the LCD is clearly inferior.
Get what is appropriate for the tasks you will use it for.
By "lose their jobs" do you mean "become available for employment in a less brain-dead position"?
If the job can be automated, and that automation is as or more effective, then it is a waste of effort to have a person do it. It is a foolish squandering of human resources that should be better spent in another area.
Good point, why don't you just hop on over there and give us an update on the current situation? Don't forget to pack an extra sandwich for when you get hungry . . .
Accepting any scientific theory as the absolute, unchanging, eternally correct TRUTH is as silly and wrong minded as not accepting the value of scientific theory at all.
Science is an asymptotic (but not monotonic) approach to "the truth". Any claim to the contrary is almost certainly the result of some misunderstanding about the nature of 'science'.
Science is a process, 'scientific fact' is the best guess with regard to any given subject at the current time. This is subject to change.
That ability to change is not a weakness of science, "scientists are always saying they are wrong" is not a valid argument against science. The fact that science changes it's opinion on matters is simply a result of the lack of omniscience of those practicing it.
The sad result that the 'general public' accepts what 'scientists say' as the absolute truth is an unfortunate miscarriage of intellect.A regrettable result of many people merely replacing one system of faith for another.
Innocent until proven guilty. "The magic box says so" is not proof.
The entire point of a government split into three branches (Judicial, Legislative, Executive) is so that each branch provides a measure of resistance to instability in the others.
However that is beside the point, as you seem to have confused "legislating from the bench" (which is part of the job) with following the procedures laid out by the law.
If their egos grow because they know that they are doing their best to uphold the law then that is merely the consequence of high job satisfaction. . ..
Finaly, with regard to this not applying to "items of revenue enhancement" (great wording by the way) have you checked the fines for a DUI? They are generally not light. . .
File modification date.
Yes, modification time and file size are the two things that rsync checks by default.
Now, while reflecting on the funny mod someone gave you, what operations need to be done to retrieve that information for each file? And furthermore, how does that number of operations scale with regard to the number of files in your working set?
Not that it matters in this case anyhow. Moving the problem to the desktops would multiply the original problem, not resolve it.
Have you ever actually used rsyng on a decent sized file set? Determining the changed file set requires significant disk activity.
It's a certain win when compared to just blindly transferring everything. But if you think that rsyncing 20 changed files in a 100 file working set is the same as rsyncing 20 changed files out of a 2,000,000 file working set you are very very wrong.
Completely aside from the absolute insanity of suggesting that you replicate the full contents of the fileserver to every desktop, which has been covered by others.
Somehow, I think you misunderstood what this article is about.
Given the very frequent mention of 'disk based storage', and how flash is so much better, I'm not sure that I did.
It's not about SSD.
No it's not about SSD, that is the problem, it reads like they have never heard of them.
Memcached prevents Facebook's disk-based databases from being overwhelmed by a fire hose of millions of simultaneous requests for small chunks of information.
flash memory has much faster random access than disk-based storage
Each FAWN node performs 364 queries per second per watt, which is a hundred times better than can be accomplished by a traditional disk-based system
Swanson's goal is to exploit the unique qualities of flash memory to handle problems that are currently impossible to address with anything other than the most powerful and expensive supercomputers on earth
Swanson's own high-performance, flash-memory-based server, called Gordon, which currently exists only as a simulation...
I'm not saying that a wide array of low-power nodes is a bad idea. But unless they address the current state of technology, rather than a conveniently quaint world in which using flash as your primary storage makes you some sort of innovator, it's hard to take them seriously.
"you could very easily run a small website on one of these servers, and it would draw 10 watts," says Andersen--a tenth of what a typical Web server draws.
And how does that per-website energy usage compare to a normal server, using SSDs, and running enough virtualized instances (or just virtual domains) to match the per-website performance offered by a single FAWN node?
You need to address the actual state of things, and not the strawman of what computing was 6 years ago (or however long) when the project was started. While they've been working, the world hasn't been standing still, and you cannot pretend that spinning disks are the only thing going.
Perhaps I'm being too harsh and it's a failing of TFA and not the original researchers. Given that a dual core Atom330 takes like 8 watts, it is entirely reasonable that you could build a very efficient cluster out of a whole mess of them and a few SSDs, and produce something like you insist that the article was about. That would be interesting, provided that it compared favourably against similarly state of the art systems of course.
Intel X25-E, 2.6 watts, 3300 Write IOPS, 35000 read IOPS*. So only one or two orders of magnitude more efficient...
And though no prices are given in the article for the FAWN, at $800 for the X25-E it's probably less expensive too. Particularly if you include setup and administration costs.
Not a bad idea in general, and not a bad idea in specific for 5 years ago, but pathetically outclassed in every area by a high end modern SSD.
* http://download.intel.com/design/flash/nand/extreme/extreme-sata-ssd-datasheet.pdf
I mean, those numbers sound small, but even I have no clue how many IO requests I am making right now... is ten cents per million a good price or a bad price? Dunno! Is a penny per 10,000 GET's a good price? Probably--that is ten bucks for 10 million requests, right?
It can add up fast.
My company provides an offsite backup solution, and we've been using S3 as our primary storage backend since a few months after S3 went live. It is not unusual for us that the per-op costs are greater than the actual data storage or transfer costs. It is worth noting however that our use of S3 is quite non-standard. We do some pretty extensive verification to catch bitrot should it ever occur, as well as some fairly convoluted data processing to minimize actual transfer overhead for updated files.
So the answer is that it really depends. If you just throw data up there all at once and hope it sticks, those additional costs won't matter. However, if you want to build something more complicated that doesn't just blindly trust S3, or that does efficient data updating, then yes, those costs do matter quite a lot.
Those that work at companies that are entirely family or employ owned, do you feel that your company is in better shape than those public stock corporations?
I am co-founder of a small company that's been around for a few years now. While we have certainly noticed the recession, we continue to grow and our monthly revenue is the highest it has ever been.
We are debt free, and actually appreciating the slowdown a little bit as it gives us time to step back and take a longer view of our product development.
You might want to go look up the differences between MLC and SLC Flash.
It's just bit packing. For example (and ignoring many low level details*) your 512-byte sector would be stored in 2048 hardware bit buckets instead of 4096 individual storage quanta.
* For purposes of illustration and ignoring the smart little tricks of hardware reality.
10 months, in a low humidity location (max 30% according to the paper), and with no dust filters. How ... rigorous.
Let's see how those servers are doing in another 14 months with 90 degree intake air and two years worth of dust blocking the intake channels, covering the heatsinks, and increasing the mechanical wear on the fans.
In my experience that is one of the most useful aspects of climate controlled datacenters. Predictability. It takes more than 10 months for those sorts of issues to begin presenting themselves. Unfortunately the article does not give the rate of failures over time, which seems like a rather critical piece of data.
If the increased rate of failures all manifested at the start it might be reasonable to conclude that the higher temperatures were just culling the heard so to say, but if they were weighted toward the end of the 10 month trial period it would be a very different story indeed.
That makes about as much sense as saying that you shouldn't bother to wear a seatbelt, because in a subset of car accidents you will die anyway.
Sounds like a frustrating time. Completely out of my area of expertise, but good luck with future work. Reminds me of why I decided to get out with a MS and start a business...
and I think it took 18-24months to break even on hardware
Which is a bit further than you can hope to stretch things when you're just getting started. That's a long time to not be making any money. Additionally unless you have piles of money laying about to cover the unexpected, it will make your service a lot more fragile because you simply won't have left yourself the resources to deal with problems. For instance, most of our current service offerings break even within 12 months, because anything longer would seriously compromise our ability to provide support and hardware replacement to our customers.
cloud storage
It's a very interesting problem, and I agree that it will be an inseparable part of future data storage. This gets back to where the discussion started, on Padlock. I've yet to see a cloud storage or clustered storage implementation that isn't shit. Want to make yet another shitty cloud storage system? Grab FUSE, convert file paths + (offset % 8192) to a hash, and stuff it into a DHT (Chord, CAN, Tapestry, take your pick). Congratulations, you're the newest crappy cloud storage provider!
The reason my company didn't do that is because we didn't think that anyone would seriously be interested in something so shabby, though the market seems to be proving us wrong. Guess that's what we get for all having come from technical backgrounds.
However, there is also the problem that when all you have is a crappy product that is basically on par with what everyone else has, it comes down to a battle of marketing. Instead I'd rather spend my time working on something that is far enough ahead of the curve that even the combined marketing suck of a bunch of CS people can't drag it down.
I figure padlock works about as well as everything else on a VIA platform, so my expectations are pretty low. You seem to be in a much better position to judge just how bad it actually is. My experience has been that with increased knowledge in any specific field comes increased scorn for just how bad most people are doing things.
Sounds like an interesting paper, did you end up with a working test implementation that gave good results? I'm guessing that you may have run into annoying latency, particularly when reading data back from the GPU?
My business, of which I am a founder but not the entirety, is pretty interesting (Then again I would say that...). Our middle tier of service uses VIA boards and takes advantage of padlock to do full encryption of the system and storage disks, which is why I've an interest in padlock. (And a bit of worry about the recently released cold boot attacks on encryption keys.)
Our top tier hardware for example does not use full disk encryption, because it would impact performance. We get ~350MB/sec internally from the disk array and the rate at which we currently can process data for backups is limited to how fast we can SHA-256 chunks of an updated file, which is just a bit over the 350MB/sec the disk array can do. Adding more processor power is easier said than done as the thermal load of 12 storage disks, 2 SAS database disks, and two dual-core CPUs is really pushing the cooling limits of the amount of airflow you can squeeze past all of the hardware in a 2U chassis. Which is why something than can do AES-256 at 600MB/sec in ~12 watts would be very appealing.
As for 2-3TB of offsite data, what with 1TB disks being as inexpensive as they are, the biggest problem is really what you mean by "robust and highly available". What that says to me is two sets of offsite hardware, both of high quality, and two sets of colo bills. In addition to a replication scheme between the offsite storage locations. All of which starts to get kind of incompatible with the "don't want to pay the earth for it" constraint. Even so $50/250GB doesn't leave a lot to work with.
To price it out using some hardware prices that are fresh in my mind: 10TB servers (12 disks, RAID6) with quality components such as redundant PSUs will set you back about $6.5K each, more if you don't build them yourself. Slowish 10mbit colo costs $100-200/Month. For a redundent setup that's $13k in hardware, and say $150 *2 * 12 = $3600 in colo costs per year.
$50/Year for a 250GB slice would get you $50 * 40 = $2000/Year. Not enough to even cover colo costs if you want replication, and only $200/Year more otherwise. Unless I'm missing something it doesn't seem like $50/Year for 250GB of offsite storage is doable unless you already own your own datacenter/colo facility. And even ignoring the colo costs you would have to cut a lot of corners on the hardware to make a profit at $50/year/250GB.
The best we could do would be about $0.80/GB/Year for a single offsite copy. We have plans for something vaguely like this, and by the time we get anything ready to release, that price will probably have dropped a bit. But $0.20/GB/Year is just not doable without seriously large volume. $0.20/GB is in the range of what current commodity storage such as S3 charges per month.
If you looked at our website you probably saw that what we're doing is pretty constrained. Basically a NAS box with some extra features that keeps itself completely backed up to replicated storage, and is replaced at no additional cost and will a full copy of the backed up data if it ever fails. Our prices, though better than anyone else in the industry that we've found for high capacity, are still probably quite a bit higher than you're looking for in the 2-3TB range.
That being said we do have some really interesting and novel stuff in development for how distributed data storage is done. Unfortunately I'd have to kick my ass really hard if I discussed it on an open forum before we finish and release it.
I'm getting the figures I have from openssl 0.9.8g running in Linux on either a 1GHZ Nehemiah CPU, or a 1.2GHz Esther. You can find similar numbers by searching for "openssl speed padlock aes".
Maybe I'm missing something in your figures but how is 511MB/sec a 6-12x speedup over Core2? You can't just multiply out the clock-speed because the length of the longest path in the AES circuit won't allow the circuit to scale to 3Ghz.
I said "the performance adjusted for clock speed", yes it's a pretty meaningless number. The hardware implementation certainly won't effortlessly scale to 3GHz for the reasons you mentioned any others. It was a poor attempt at normalization.
I should have either given the encryption per watt, or noted that I'm seeing linear scaling between the 1GHz and 1.2GHz parts I have, so perhaps one could expect the 1.8GHz part they ship to continue that trend. However as I don't have one on hand, I can't verify that.
Anyway, the point I'm trying to get to is not that the VIA CPUs are better than a Core2. But that I would like to see hardware AES, or at least something that matched or bettered the performance of Padlock, on a Core2.
Specifically I want to be able to read off an encrypted aggregate block device, and stream that out a gigabit connection over encrypted IPv6/IPsec connections. Without completely monopolizing the CPU.
The point is that if core2 had a padlock equivalent running at 3GHz*, you could do the above on a 10GigE connection. Or more practically, you could encrypt everything with no meaningful processing overhead. Yes this is a minority consideration, but it is one that's useful to me. Check my signature and you can probably guess why I care.
* I don't think it's too much of a stretch that between correcting the "shit" padlock implementation, and the compromises needed to make it run at 3GHz, it might be about even.
Look, this is not a Core2 vs. VIA argument. There is no contest there, the Core2 is on the order of 10x faster at everything (except AES/SHA), as it ought to be. The VIA CPUs are not as good. It's a smaller design team, with a smaller budget, running on really old process tech. Of course they are crappy, that they are slow and shitty is given, it does not need to be argued.
I don't particularly care if the CPU comes in a blue box or a green one, or if it's extra special hardware or a new software technique using existing hardware. All I want is to see a fast general purpose CPU that can do AES-256 at 1GB/Sec or better per core.
So no, it is not pathetic that a 3Ghz general purpose processor can match the special purpose extensions on the C7. Given that the achievable speedup is much larger than the ratio in clock speeds (let alone the extra the Core2 is doing) is shows that the VIA performance is shit.
If it were true that the Core2 could match the speed of the Padlock AES, adjusted for clock speed, then padlock would be disappointing. Not worthless, as a complete VIA based system including a pair of disks will run on ~30 watts, but disappointing.
However that's simply not the case. The best numbers I can find are around 125MB/sec for a 3GHz core2, you list 250MB/sec and reference "unpublished results". For the sake of argument I'll accept your numbers.
AES-256-CBC on a 1GHz Via part, an older board than my other post, does 511MB/sec with all of the overhead of openssl.
So the performance adjusted for clock speed is 6x the Core2 using your numbers, and 12x using the best numbers I can find. If you think a 6-12x speedup is "shit" then that is your prerogative, however I think you will have a hard time finding people who agree with you.
There are a lot of things about the VIA boards that are shit. The general purpose speed for example. Without padlock the same board that got 511MB/Sec gets 11MB/sec, that is shitty performance. But what do you expect out of a CPU that doesn't need a cooling fan and has a heatsink that will fit in a 1U chassis?
A single core on a 3Ghz Core2 can match the performance of Padlock. I can't provide a link as the figures are unpublished but it's not particularly hard to work out how.
I don't suppose you could provide any numbers along with that claim? Because a non-padlock CPU matching the performance for AES-256 would be really useful sometimes.
For reference here are padlock numbers on a moderate Padlock equipped CPU:
cpu family : 6
model : 10
model name : VIA Esther processor 1200MHz
stepping : 9
cpu MHz : 1197.115
cache size : 128 KB
Using "openssl speed":
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 47592.92k 155506.46k 359193.00k 531778.27k 621832.68k
aes-256-ecb 58605.48k 213317.70k 578567.91k 1008950.60k 1287371.44k
And of course, as has already been mentioned, watts matter.
My company (http://www.zettabytestorage.com/) makes a managed NAS device which would have completely prevented their problems. Better still, our "Professional" line of products includes local disk encryption, meaning that the thieves get nothing but a fancy NAS device on which they need to reformat the drives before they can use it.
There are a nearly innumerable number of other companies providing some sort of offsite backup at varying mixes of ease of use, capacity, and price. Some of them, like ours, are extremely easy to setup, and require no further active participation from the user. They pass the "mom" test, the "CPA who doesn't like computers" test, and almost certainly would also pass the "rich old man with 15 years worth of work who was able to setup an external USB drive" test.
Our home lines starts at 30GB for $34/Month and our Professional at 140GB for $139/Month. It's not free, but it's a whole lot less expensive than losing 15 years worth of work. And includes geographically remote replication, hardware replacement in the event of loss or failure, all shipping charges, and any other applicable costs.
Additionally, there are a huge number of DIY solutions out there for remote data backup. They are not as easy, but they are less expensive.
If you put months or years of work into creating your data, but then don't either take the time to learn how to do it yourself, or pay the pittance required for a professional backup solution, you should probably spend some time thinking about your priorities.
Mere Aggregation is, in my opinion, the most overlooked aspect of the GPL. It gives developers the freedom to work with GPL code, without requiring that everything they touch then be GPL. Thus allowing GPL code to work its way into a closed system, one component at a time, and may the best code win. In this manner GPL projects can then attract contributers and maintainers from a much larger developer pool, thereby increasing the overall robustness of the community.
This is the "real" viral nature of the GPL, not that FUDish crud about 'if anything is GPL then everything is required to be GPL'.
(Please note, I am not talking source code level components, but rather independent pieces of software working in concert.)
Which is why I founded a company to do just that (I did say shameless promotion). Backups should occur at LAN speeds, be strongly encrypted, stored offsite, and not require any great effort by the user. Further, they should not charge an absurd fee and have an annoying interface, as most online storage providers do.
Thus I give you http://www.zettabytestorage.com/, secure backups in at least two geographically diverse locations, with local LAN access speeds, for less then $0.50 per GB and up to 700GB. You put your data on our NAS box and it gets backed up, thats it. You don't have to worry about failing hardware (we replace it free), local disaster (fire, flood, etc.), or really anything this side of the collapse of civilization as we know it. Your data is safe, both local and remote.
I do work for Zettabyte Storage, and if you know of an easier way to backup your data, I'd like to hear about it.
The problem is not the people doing the tracking, but the funding they don't get.
There are some effots being made such as http://neat.jpl.nasa.gov/> but they get next to no funding.
How many people are you going to be able to convince when all you can say is that "It's likely one will hit a populated area sometime in the future".The general reaction that I've witnessed is "If it was going to happen, why hasn't it yet?" and "That's just science fiction".
It's far to abstract a threat for the vast majority of people to care about. . .
Physical Disk Sector Sizes Supported
512 bytes through to 32 kilobytes (in powers of 2), with the caveat that the sector size must be less than or equal to the filesystem blocksize.
http://oss.sgi.com/projects/xfs/
You're not very good at this Physics thing are you?
A resistor will convert electricity to heat with near 100% efficiancy (a little is lost to unabsorbed EM radiation). To change the temperature of a body of water, you need to add energy to it. Thus unless you can convert with better then 100% efficiancy. . . . . the resistor wins the energy game.
Note that this is not the only consideration in making a device of this nature, but to say that microwaves do it with higher efficiancy is stupidity of the highest order.
I've got to disagree with you with regard to CRT vs. LCD. I limit the discussion to those as they are what are available on the market now.
Which display to get should be decided on it's primary task. If you care most about gaming, get a nice CRT. If you care most about anything else (with the exception I suppose fo very accurate color work) get a LCD.
I'm sitting in front of a 21" Trinitron CRT, and a 20" Dell 2000fp LCD. I run my IDE across the full expanse of both, the LCD is far easier to look at then the CRT is. However I do all of my gaming on the CRT as in that case the LCD is clearly inferior.
Get what is appropriate for the tasks you will use it for.
By "lose their jobs" do you mean "become available for employment in a less brain-dead position"?
If the job can be automated, and that automation is as or more effective, then it is a waste of effort to have a person do it. It is a foolish squandering of human resources that should be better spent in another area.
Good point, why don't you just hop on over there and give us an update on the current situation? Don't forget to pack an extra sandwich for when you get hungry . . .
Accepting any scientific theory as the absolute, unchanging, eternally correct TRUTH is as silly and wrong minded as not accepting the value of scientific theory at all.
Science is an asymptotic (but not monotonic) approach to "the truth". Any claim to the contrary is almost certainly the result of some misunderstanding about the nature of 'science'.
Science is a process, 'scientific fact' is the best guess with regard to any given subject at the current time. This is subject to change.
That ability to change is not a weakness of science, "scientists are always saying they are wrong" is not a valid argument against science. The fact that science changes it's opinion on matters is simply a result of the lack of omniscience of those practicing it.
The sad result that the 'general public' accepts what 'scientists say' as the absolute truth is an unfortunate miscarriage of intellect.A regrettable result of many people merely replacing one system of faith for another.
Innocent until proven guilty. "The magic box says so" is not proof.
.
The entire point of a government split into three branches (Judicial, Legislative, Executive) is so that each branch provides a measure of resistance to instability in the others.
However that is beside the point, as you seem to have confused "legislating from the bench" (which is part of the job) with following the procedures laid out by the law.
If their egos grow because they know that they are doing their best to uphold the law then that is merely the consequence of high job satisfaction. . .
Finaly, with regard to this not applying to "items of revenue enhancement" (great wording by the way) have you checked the fines for a DUI? They are generally not light. . .