Also, some big brick and mortar retailers that do their own loans will match Apple's student price if you simply say that's the price they have to match or you're going somewhere else. No actual studenting necessary.
They are especially likely to offer you the discount if you buy AppleCare, which is the only extended warranty I've ever recommended but nonetheless has a very large retailer markup.
I love that you're not smart enough to notice that the OP started with 3.4 10^10 - which should clearly have been 10^9. So all of his math is off by a factor of 10.
But you're still troll enough to nitpick over the difference between 3 and 2 (a factor of 0.5) even though he clearly stated the units he was using.
I probably shouldn't feed the troll any more than that, but:
To focus more on the original post: a) 10^18 is clearly the important part of his math. Changing the 2 to a 1 is really not that important to math that demonstrated going from 10^10 to 10^18. b) he probably didn't want to go look up how many Ghz his Athlon was. c) OP stated, VERY clearly, that he knew those weren't "real" cycles. Since it was not the important part of his post and it was a reasonable approximation, he left it at that.
the OP said:
"about 3.4 GHz worth of cycles each second"
He said "worth of cycles" not "cycles" which you've chosen to simply ignore. He did not say "worth of P4 cycles" which would have been more specific but was still clearly what he meant - because it's what AMD compares to.
I stipulate, with the support of my gp post that "a cycle" is (between CPU classes) a useless unit of measurement of "worth". Furthermore, in your parent post, you apparently agree with all of my supporting points.
I further stipulate that "a P4 cycle worth of work" is an imprecise unit of measure but much LESS useless than "cycle" Just in case you think it's appropriate to disregard what his text said and only look at his figure, I'd like to also point out that:
Yeah, except that theory is wrong. You run out of universe, you don't have enough material, your rod collapses under it's own gravitational force and you probably can't get to the other end to measure it in your lifetime. For 18 characters. (math below)
Put another way: There are many, MANY more efficient ways to machine information onto a rod. If you could machine and measure multiple etchings with 1/9 of the precision I mention below (1nm) you'd get 1 Gb/m using the simplest of binary bar coding. You could instead use pits in two dimensions to get more density, and then you could move them really fast to read them quickly, and then you could make them round so you could move them in a circular motion. Then you could make them a convenient hand-held size. And then you'd have a metal CD/DVD. In fact, many CD/DVDs are metal in the data area, they're just covered in cheap plastic for durability. Funny about that, we just reinvented the CD. Well, I did, you got it wrong.
When fine enough, a line in a metal rod has a discrete number of positions.
Given incredibly ideal circumstances and a clean crystalline structure you could at least theoretically measure the distance to the precision of the space between molecules of that metal.
Given a known unclean crystalline structure or a guarantee of it being at a convenient angle - and even MORE ideal circumstances - you could theoretically "machine" a microscopically jagged edge and measure the average of those positions to get the length. At absolutely best, though, even given stable positions and completely perfect measurement and machining, you're limited to a precision of one molecular space / the number of molecules in the edge.
Of course, this assumes you machine and measure with something with at least the precision of a tunneling electron microscope..
It's simple to experiement with this at home - take the tube from a paper towel roll (or similar) fill it with marbles and try to see how many unique lengths you can get out of it.
Furthermore, your encoding system isn't bit-efficient. As a simple example, each decimal digit is about 3.3 bits. It should only take 4.9 bits (not almost 7) to encode 26 values.
Or put in a decimal-only way - all possible permutations of 2 letters is 676. You can fit that in 3 digits, so each letter should only take 1.5 digits, not 2. So each block of 3 numbers represents 2 letters. You can actually fit 7 letters in 10 digits (1.43 digits) (The ideal case is log(26) = 1.415 )
Even using another post's parsec rod, this slight compression, and a 125 pm precision (the covalent radius of iron)... You still only have 3x10^26, you still only fit 18 (18.7) characters in that parsec rod. (or 18 character in a 3x10^15 rod.)
First, I completely agree with you that it does depend a lot on what you're doing. For instance, last I heard _cycle for cycle_ the Pentium is still the king of integer - such as chess. But the crown is different for flops... Raw clock is meaningless and you are highly misguided. Furthermore, MANY operations are multicycle, and I guarantee you they are used on anything mathmatically intense enough to be worth sending out over the network.
Interestingly, you don't have to leave Intel to see this: the Celeron, "vanilla" P4, Xeon and Pentium M have a lot of good differences just within the current Intel x86 line. The Pentium M is awesome, for instance.
But I'll provide just a few examples of how a cycle is NOT a cycle.
Many of these only help you if you compile for that architecture or it does something fancy in the background to compensate - but you could certainly distribute a mixed exe that ran the appropriate binary for the platform.
- First, bitwidth: 64 bit addition requires 1 cycle on a 64 bit cpu, but at least about 3 on a 32 bit. 64 bit multiplication is MUCH worse on a 32bit machine. Similarly, 128 bit vector math is much cheaper on a G4 ("altivec") than on a CPU limited to 64 or 32 bits in that arena.
- registers: A CPU can only actually DO operations on values in registers. If you have more registers you can do much more complicated (longer-chained) operations without having to go to RAM or cache. This is intensely true on highly serial but complicated math and amazingly significant if the operation data actually fits in registers in one CPU and not in another.
- branch prediction and shorter pipeline depth. All other things being equal you want the shortest pipeline possible because it means you have the lowest branch prediction penalty. Coupled with the quality of your branch predictor, this makes a big difference. (Of course, things _aren't_ equal, and longer pipelines make it easier to physically build faster CPUs) Even if branch prediction is meaningless, the pipeline depth is still important.
- parallelization: _All_ modern computers let you run some multiple commands in parallel using multiple CPUs, cores, hyperthreading and/or multiple processing units. Many computers come with two CPUs. Some newer CPUs comes with two cores. Hyperthreading decreases the process switching penalty. Modern CPUs have separate integer and flop units, often more than 1. Clearly the quantity and efficiency of these multiple units would make a big difference.
At an absolute minimum, all of these things help you run the OS without interfering too much with your actual work. But since we're talking about stuff that's already being distributed over a wide network to multiple computers, on some level this work is clearly parallelizeable. Even if your second core can't help on your first 'chunk' you could likely be executing two chunks at nearly the same speed (barring other constraints listed here)
- cache(L1/L2/L3), cache prediction, RAM, bandwidth, chipsets. I'm not going to go into all the details, but suffice to say that the cores need data and code to function and unless your entire process fits in registers, they have to get it from somewhere. The arrangement of memory has a big impact on 1) how much work the CPU has to do to get information and 2) how much the CPU has to wait for that information.
- I/O - I know this is out of our case, but the CPU efficiency of IDE has increased dramatically, but there is still some variance from system to system and driver to driver. Furthermore, different network cards/drivers use significantly different amounts of CPU time to send large amounts of data. This is true even if the speed of execution is not I/O bound - it still takes some main processor clocks and the quantity varies.
Furthermore, this arbitrary driver code and any OS code - for instance - is definitely susceptible to traditional branch prediction, cache hits, etc - even if your main crunching loop did fit in registers.
Your example IS the example of why sigfigs are inherently tricky. Your answer is right - and in addition, decimal places will always get you a reasonable answer. But you've actually increased the number of significant figures (from 3 to 4) - AND that's actually the right answer.
You get into really, really big problems when you mix flop and integer math, and the calculator couldn't know which one you're doing. The basic problem comes from the fact that "integer" precision is commonly notated the same way as "no precision at all"
Here's some interesting examples: If I divide 1 by 2, the answer should be.5 - so your "dp" thing doesn't work at all - we added a dp of precision.
If I multiply 5 x.5 as decimals 3 is probably the right answer IF you can guarantee there are no additional sigfigs. But if I entered.5 as a shorthand for the integer 1/2, then 2.5 is absolutely the right answer.
If I entered 5 x.5 in a calculator and got 3, I would return it immediately.
Furthermore, 2 1/2 is _probably_ the right answer, because 1/2 is only a small fraction of a significant figure. But your calculator only knows how to display "1/2" as.5, it has no way of displaying or expressing fractional significant figures.
Going the other way is even worse - unless you're going to make everyone enter everything in SI - which will never happen - there's DEFINITELY no way to differentiate between estimated and real values. What's the right answer to 80 x 90? 7200? or 7000 ? It depends on whether those zeros were significant zeros or placeholders...
Finally, some really, really crazy things start to go on when you have exponents and the like - very commonly you get cases where you probably meant the base of the exponent to be an integer even if some part of the exponent itself is a decimal - because 2.0 ^ 32.0 has NO significant digits unless the 2 is actually an integer. (even if the 32 DOES have infinite precision. (for instance: 2^32 ~ 4 bil. 1.96^32 ~ 2 bil; 2.04 ~8 bil )
But nonethless not EVERY exponent is supposed to be an integer - especially when you're simply squaring something (pythagorean theroem on an arbitrary length, anyone? )
You really need a calculator that is very advanced - not to do the math, but to have input and display that can reasonably interact with how poorly the PEOPLE using them know sigfigs - and how poor an idea the PEOPLE usually have about their input method.
I've never seen a calculator with an _interface_ that could handle it. I actually think it might be easiest to do in a software calculator (even if my hardware calculator was better at some of the actual math)
-1 idiot/-1 wrong. For a post that was _factually inaccurate_ - not necessarily flamebait, or overrated... just wrong. Like the opposite of "informative"
That's awesome, and I should know by now to expect Linux to have already done stuff in a good way.
except - can you find some part of the LVM doc that says that? Although I apparently was unusually incompetent about it, I really went and googled LVM before the gp post and the doc "Why you'd want to use LVM snapshots" said basically "so you can make a complete consistent backup of the FS somewhere else including any busy files"
I suspect that you're right, but I still want to know what super-docs you're reading : )
I read your link and I still believe I'm more correct than the previous poster. Your link has a better ancient history of duct tape, mine only gave the most recent 2/3rds. Here's a revised history:
duck tape, originally made for the military by Johnson & Johnson. Apparently not trademarked.
Their "all weather" and "professional" tape meets UL standards. Although they also have a "Home and Shop" version that I haven't yet used which doesn't meet the standards.
Duck Tape, which is usually quite inferior in my experience, is made by Henkle. As _your link_ says, the original term fell out of usage and was only reintroduced recently by this trademark. The Duck Tape was not common until about 1995, IIRC.
Dude! Card Sharks. I forgot that existed. Dumbest game show ever. "higher... " "lower... "
To weigh in on the thread - I'm not familiar with "card sharp" - but the bottom dealer explanation seems most appropriate and has the most documentation.
I'm definitely familiar with shark - and it definitely means someone who uses deception to have other players play against them with large amount of money so they can take it. The simplest deception is just to appear to be a very bad player, but I wouldn't say it is the only one.
Oh, and in response to a child post - you'd absolutely tape a duct, unless you like wasting energy. You want to make the best seal possible. Start with a solid mechanical joint then add tape (regular duct tape or, better yet, aluminum tape) then for bonus points blow some stuff in it to seal microholes. The stuff resembles fix-a-flat.
It's worse than you think. Duck Tape ISN'T Duct Tape. And I'm tempted to think somebody ought to make a false advertising lawsuit about it...
Duct Tape meets particular HVAC standards - essentially you have to be able to tape ducts with it, which may be at fairly high temp.
(At least when I worked at Ace and sold it) Duck Tape is a brand of tape. The most common Duck Tape is a silver tape similar in appearance to Duct Tape but with a much less durable adhesive, especially if you warm it up a bit. It does not meet the standards to be Duct Tape - they can't and don't call it that. (In its defense, it was also signficantly cheaper than "real" Duct Tape)
So Duck Tape is useful when you need a cheaper, lower quality tape than Duct Tape - and is mostly used by people who just don't know any better and who thought it was the same thing. Some of whom then complain to me that it doesn't hold as well as I described on a project. I also occasionally use it for color, because it is commonly available in a variety of colors.
There may be some Duck Tape that is also Duct Tape, but I haven't seen it.
Apparently I suck at explaining things recently. What I mean is this:
Normally drives expect a lot of wear in the "spinning" and "eccentric motion in the directions of spinning" ways. So they MUST be heavily engineered to resist these kinds of motions. Whereas they don't induce a lot of "shock perpendicular to spinning" wear.
Your specific explanation may well be spot on; I expect that bearing failure is exactly it. To better explain where I was coming from:
Usually if something is reasonably well engineered you want to store it with the direction of stresses most closely matching how you engineered it to work. But it never occurred to me to think about what orientation most closely replicated "spinning around really fast."
So, I bought into the idea that I wanted to consider setting up snapshots on a new server we're about to setup.
But then I realized that my idea of a snapshot and the practice are very different - I think I heard "versioning filesystem" when you said "snapshot"
First, for anyone not following my myriad posts on this thread, I like something I'm calling an "Incremental Versioning" or IV backup - it's an additive incremental backup leaving both versions available in the case of any change. To me this is key functionality, because I want to be able to run a small backup often and catch small file changes without using enough disk space for a full backup.
So, I like the idea of a versioning filesystem over RAID. Ideally the filesystem would manage a COW (copy on write) layer, so that it would keep both versions of changed files but only need one copy of unchanged files. So, used with RAID, you get the complete versioning of backups at a perhaps 1 +.1n (where n is the "number of nights of backups) multiplier depending on the situation, or possibly much less.
The Wayback filesystem does this, but it seems a little fringe for the filesystem in a production server.
Alternatively you can setup a cron job to rsync the data to a different, protected directory. Full backups use 1 + 1n space. But a full backup along with IV backups uses 1 + 1 +.1n or so for each cycle.
So, based on a little googling, I'm guessing that by snapshots you meant LVM snapshots. LVM does not seem to support versioning, and only seems to support "full backup" snapshots.
I understand that in cases of very high disk usage locking the drive momentary to create a snapshot may be valuable. But I see no other general advantages to using snapshotting over rsync on the same machine for the purposes of backup.
I think that if I were to use LVM snapshots I'd feel compelled to do rsync IV updates more frequently anyway, so I can fit more versions in less space.
Incidentally, this seems to justify my original idea that RAID as backup is bad, because this only works by giving up more than 50% of your available space without any disk-level redundancy.
So in the fringe case where you have two big disks and you only need less than 1/4 the space you have, that's great.
But given at least 3 disks and needing 1/4 the space you have I'd MUCH rather do a round robin backup and have no more than 1 complete copy per HD, I think having multiple _full_ copies on the same drive of most of your data is generally wasteful, they should be spread around.
In the general case: A. It is much more efficient to cool/heat only exactly what you need cool/hot.
B. It is usually more efficient to use one big cooler than many small ones - but it can depend massively on the design/cost/age of the coolers/heaters. Generally, every reversible process generates waste heat. So, unlike ACs, _typical_ electric heaters can be 100% efficient and a heatpump is technically more than 100% efficient at generating heat in your room (based on "thermal energy changed in your room / power energy used " )
So if you wrap all the servers in a small space with the inside end of the ACs, you save a lot of power from "A". Since you only have accidental AC left for the rest of the room, it can get quite hot - but only if something else makes it hot (poorly insulated exterior walls, people, sun-facing windows...)
Hmm. As a Mechanical Engineer by training, your explanation doesn't seem right. On the other hand, I've _definitely_ seen weirder things that were true...
For "evenness" you'd want them to be flat, I'd think. But "even" isn't always better... My guess is more like the below, but it's just a guess and takes your statement as a precondition...
"In order to compensate for tiny bits of drive dissymmetry magnified by high rotational speeds a lot of cost goes into the resistance to side-to-side motion on the spindle.
On the other hand, I suspect you can't effectively touch the outside of the spindle while it's spinning so it has to be held up only the the force of spinning.
So if you put them parallel to gravity the platters will sag at the edges, and may especially move/grind if subjected to shocks (which are most likely in the gravity-direction) But if you put them parallel they can only press against the spindle, and it already expects forces much greater than that.
"I'm sure you understand that. I'm just clarifying for anyone else: RAID+snapshots is nearly as good as backups. Either one alone is useful, but inadequate."
Exactly.
SuSE (novel) has had a RAID installer like you describe - a very nice one, actually, for at least years; I believe I first used SuSE in 7.3 and it was there.
I believe that some of the newest FS versions automatically support snapshots in the FS driver. So theoretically installing RAID under a FS with the snapshots on should do what we want. But I still definitely want the installer to go a step further and say.
"YO! this is a big pretty button that's one step to reasonable data safety. Buy two harddrives, then click here! "
It was my post, and I didn't think it was funny. Except possibly the fact that I don't get many Funny mods, and then I got one, and then you took it away by pointing out it shouldn't have been funny. That's a painful kind of funny...
My fault, I wasn't clear. I agree with your reply completely. Let me rephrase:
I'm OK with snapshots. (snapshots are != RAID, of course)
I have the following tiny issues... (rest of post)
In short, I think that snapshots are basically fine but personally I'd rather have my backup on a machine that wasn't a server in the "accepts incoming filesharing connections" sense.
I welcome the day when my mainstream Linux distribution comes with "install snapshot filesystem over RAID1" is a standard simple installation option.
I didn't mean it quite as simplistically as "accidental deletion" I basically mean that there are too many points of failure that I don't like: users operating the machine. Applications writing to the file. bad RAM. OS. FS drivers. drive controllers. viruses or malicious hacking. If any of these things go wrong, your data is toast, and it's likely toast on ALL of your mirrors.
Normally your RAID array is a read-write server, so a virus on a _client_ machine can wipe out big (enough to be important) sections of data. And it's more vulnerable to hacking because it's providing "public" services (at least on your intranet)
So I'm going to give two examples of where I think you're reasonably right: 1) A very well-secured RAID fileserver that doesn't actually give client machines permission to change/delete files. Using snapshots is a reasonable example - but it isn't the RAID that makes it a backup, it's the snapshots. And some FS have snapshots without requiring RAID...
2) Increasing redundancy by adding 2 drives to a backup server that is already operating behind the kind of protections I discussed originally.
In both of these cases I still think it's better to use a different backup machine because it's more redundant at a pretty marginal cost. [If you scale this up enough it's at NO additional cost, because you "fill" every backup server HDs and with one copy of as much data as possible. If you don't scale it up then it's a couple of lowend machines.]
I've definitely heard reports of multiple same-type HDs going bad at the same time... which makes sense. If they're made at the same time and subjected to the same environment. Reducing the likelyhood of more than one of my drives failing at the same time seems like common sense to me.
timestamps are usually reliable with regard to whether the file has changed on the same machine - the chances of the clock being off such that the changed file has the identical timestamp are fairly low. But I certainly agree that also MD5 checking is even more safe.
With a normal incremental backup I would agree with you. But the "IV" system I discussed doesn't need to be reconstructed like that - which is pretty much why I set it up like that.
1) It's already writing to a filesystem. So it is already "reconstructed" on a system level.
2) It's not saving the diffs of individual files, it's saving full backups of the individual files - if they actually change.
If you have big disks and small data, I wouldn't object to writing everything to 2 parts of the same disk in addition to everything else. But really I'd rather put that effort towards more backup machines.
1. As I just mentioned in response to another post, I very much encourage your backups to actually be on another machine - if your server is own3d, or your OS/RAM/MB freaks out you have no idea what it'll do to your backup drive. If you're posting this on/., I figure you can get a machine out of the garbage and put this together... Again, it's my opinion that having more than one copy of the data per computer is a waste of HD. (other than for high availability, a la RAID)
2. I'm all about using your friend's internet connection to do this. Furthermore, in response to someone else - if you think your friend is spilling Cheerios on it.. a) get better friends and b) get MORE friends/backups. I'll take redundancy over perfection any day.
3. RAID is great in those situations where your intra-backup loss (ie, from a day) is very great. I agree with you that a lot of people recommend RAID 10, but I think they are quite wrong OR they are using crap systems - my lengthy explanation follows.
Good RAID controllers use battery-backed write cache - that means they "accept" your write immediately and use a battery to actually put it on the harddrive LATER, even if the power goes out. This is a HUGE speed improvement for multiple small write situations, even with just ONE disk. I ignore this effect in the below discussion.
I'm going to assume a system where you have two similar drives on different buses on the same machine. I'm also going to assume that you're HD I/O bound (ie, the harddrive platters/heads are what's causing the slowdown, not your CPU) I'm also going to assume you do more reading than writing - at least more files if not more bits (which is pretty typical)
--- First, why RAID 0 is stupid (unless you're using very large files AND not using them at the same time) I'm going to compare RAID0 to just putting different stuff on different drives (for instance, OS/swap/apps on drive 0 and data on drive 1.) I'm calling this setup "noRAID"
RAID 0 is straight striping - it writes half of every file to each disk. This means that the _write_ time (time from the time it starts to the time it finished writing) is twice as fast, but the _seek_ time (time to get the head to the right place to write) is exactly the same as a single disk. For writing very large files this is almost twice as fast. For writing smaller files it is not faster at all because the seek time (time to find where to write it) totally overwhelms the time to actually do the writing. For reading the same thing is true. The "bulk" of reading a file is exactly twice as fast but the seek is not changed at all. So most of the time it really isn't faster except for really big files.
The short answer is that RAID0 is stupid because it has no benefits when seeking.
Compare this to just using 2 drives: if you try to read or write simultaneous small files that are on different disks, noRAID is absolutely _twice as fast_ If you try to write a single very large file RAID0 approaches being twice as fast as the write time becomes much larger than the seek time. Of course, the weak point in this argument is that sometimes you want two things on the same disk - then noRAID is only the same speed as RAID0 for small files. So noRAID doesn't average being actually twice as fast.
In addition, RAID0 is half as redundant because either disk failing destroys everything.
--- Second, why RAID1 is good.
RAID 1 is straight mirroring. On a modern RAID system (like Linux's SoftRAID) this gives you performance that - compared to a single disk - is exactly identical on write to a single disk (Assuming your CPU can always keep up) For multiple file reading, though, it peforms better than any other setup, even _better than noRAID_ because it only needs to read from 1 disk and it reads from whichever disk has a head in a convenient spot to do THAT read.
It doesn't have the disadvantage of noRAID, because it ALWAYS has a copy of the data it needs on the
Lifespan is an important HD consideration. And furthermore, it's one that I forgot to mention. I have a fair number of 7 year old HDs running and at least 1 at 9. But I agree you can't count on that kind of lifespan.
At some intervals you should definitely add new backup servers with new harddrives and let them sync up. I really paranoid person might have 3 or so backup servers and might add a new one each year...
The great advantage of HD based backups is that adding an entirely new setup to the mix is very easy because you don't need to swap tapes to get _all_ of the data.
You missed my point fairly entirely. I have a hard time believing you could have read my post and not figured that out, so perhaps you're trolling - but in case you've managed to confuse anybody...
_hard drives_ are a perfectly acceptable backup medium. I went into great detail about that.
RAID is NOT a backup medium. Backup to harddrive != RAID. !!! RAID is explicitly about consistency - so if you (or a hacker) delete a file from a functioning RAID it immediately gets deleted from everything. This is not a backup.
Also, the linux-HA guys say you should NOT buy the same kind of disks because it increases the chance they fail at the same time.
Finally, you don't need to diff the files you only need to gather the timestamps. (Your way is more space efficient, but I think the gains are marginal)
Become a student. Apple does loans. voila.
Also, some big brick and mortar retailers that do their own loans will match Apple's student price if you simply say that's the price they have to match or you're going somewhere else. No actual studenting necessary.
They are especially likely to offer you the discount if you buy AppleCare, which is the only extended warranty I've ever recommended but nonetheless has a very large retailer markup.
I love that you're not smart enough to notice that the OP started with 3.4 10^10 - which should clearly have been 10^9. So all of his math is off by a factor of 10.
But you're still troll enough to nitpick over the difference between 3 and 2 (a factor of 0.5) even though he clearly stated the units he was using.
I probably shouldn't feed the troll any more than that, but:
To focus more on the original post:
a) 10^18 is clearly the important part of his math. Changing the 2 to a 1 is really not that important to math that demonstrated going from 10^10 to 10^18.
b) he probably didn't want to go look up how many Ghz his Athlon was.
c) OP stated, VERY clearly, that he knew those weren't "real" cycles. Since it was not the important part of his post and it was a reasonable approximation, he left it at that.
the OP said:
"about 3.4 GHz worth of cycles each second"
He said "worth of cycles" not "cycles" which you've chosen to simply ignore. He did not say "worth of P4 cycles" which would have been more specific but was still clearly what he meant - because it's what AMD compares to.
I stipulate, with the support of my gp post that "a cycle" is (between CPU classes) a useless unit of measurement of "worth". Furthermore, in your parent post, you apparently agree with all of my supporting points.
I further stipulate that "a P4 cycle worth of work" is an imprecise unit of measure but much LESS useless than "cycle"
Just in case you think it's appropriate to disregard what his text said and only look at his figure, I'd like to also point out that:
Yeah, except that theory is wrong. You run out of universe, you don't have enough material, your rod collapses under it's own gravitational force and you probably can't get to the other end to measure it in your lifetime. For 18 characters. (math below)
... You still only have 3x10^26, you still only fit 18 (18.7) characters in that parsec rod. (or 18 character in a 3x10^15 rod.)
Put another way: There are many, MANY more efficient ways to machine information onto a rod. If you could machine and measure multiple etchings with 1/9 of the precision I mention below (1nm) you'd get 1 Gb/m using the simplest of binary bar coding. You could instead use pits in two dimensions to get more density, and then you could move them really fast to read them quickly, and then you could make them round so you could move them in a circular motion. Then you could make them a convenient hand-held size. And then you'd have a metal CD/DVD. In fact, many CD/DVDs are metal in the data area, they're just covered in cheap plastic for durability. Funny about that, we just reinvented the CD. Well, I did, you got it wrong.
When fine enough, a line in a metal rod has a discrete number of positions.
Given incredibly ideal circumstances and a clean crystalline structure you could at least theoretically measure the distance to the precision of the space between molecules of that metal.
Given a known unclean crystalline structure or a guarantee of it being at a convenient angle - and even MORE ideal circumstances - you could theoretically "machine" a microscopically jagged edge and measure the average of those positions to get the length. At absolutely best, though, even given stable positions and completely perfect measurement and machining, you're limited to a precision of one molecular space / the number of molecules in the edge.
Of course, this assumes you machine and measure with something with at least the precision of a tunneling electron microscope..
It's simple to experiement with this at home - take the tube from a paper towel roll (or similar) fill it with marbles and try to see how many unique lengths you can get out of it.
Furthermore, your encoding system isn't bit-efficient. As a simple example, each decimal digit is about 3.3 bits. It should only take 4.9 bits (not almost 7) to encode 26 values.
Or put in a decimal-only way - all possible permutations of 2 letters is 676. You can fit that in 3 digits, so each letter should only take 1.5 digits, not 2. So each block of 3 numbers represents 2 letters. You can actually fit 7 letters in 10 digits (1.43 digits) (The ideal case is log(26) = 1.415 )
Even using another post's parsec rod, this slight compression, and a 125 pm precision (the covalent radius of iron)
First, I completely agree with you that it does depend a lot on what you're doing. For instance, last I heard _cycle for cycle_ the Pentium is still the king of integer - such as chess. But the crown is different for flops... Raw clock is meaningless and you are highly misguided. Furthermore, MANY operations are multicycle, and I guarantee you they are used on anything mathmatically intense enough to be worth sending out over the network.
Interestingly, you don't have to leave Intel to see this: the Celeron, "vanilla" P4, Xeon and Pentium M have a lot of good differences just within the current Intel x86 line. The Pentium M is awesome, for instance.
But I'll provide just a few examples of how a cycle is NOT a cycle.
Many of these only help you if you compile for that architecture or it does something fancy in the background to compensate - but you could certainly distribute a mixed exe that ran the appropriate binary for the platform.
- First, bitwidth:
64 bit addition requires 1 cycle on a 64 bit cpu, but at least about 3 on a 32 bit. 64 bit multiplication is MUCH worse on a 32bit machine. Similarly, 128 bit vector math is much cheaper on a G4 ("altivec") than on a CPU limited to 64 or 32 bits in that arena.
- registers: A CPU can only actually DO operations on values in registers. If you have more registers you can do much more complicated (longer-chained) operations without having to go to RAM or cache. This is intensely true on highly serial but complicated math and amazingly significant if the operation data actually fits in registers in one CPU and not in another.
- branch prediction and shorter pipeline depth. All other things being equal you want the shortest pipeline possible because it means you have the lowest branch prediction penalty. Coupled with the quality of your branch predictor, this makes a big difference. (Of course, things _aren't_ equal, and longer pipelines make it easier to physically build faster CPUs) Even if branch prediction is meaningless, the pipeline depth is still important.
- parallelization: _All_ modern computers let you run some multiple commands in parallel using multiple CPUs, cores, hyperthreading and/or multiple processing units. Many computers come with two CPUs. Some newer CPUs comes with two cores. Hyperthreading decreases the process switching penalty. Modern CPUs have separate integer and flop units, often more than 1. Clearly the quantity and efficiency of these multiple units would make a big difference.
At an absolute minimum, all of these things help you run the OS without interfering too much with your actual work. But since we're talking about stuff that's already being distributed over a wide network to multiple computers, on some level this work is clearly parallelizeable. Even if your second core can't help on your first 'chunk' you could likely be executing two chunks at nearly the same speed (barring other constraints listed here)
- cache(L1/L2/L3), cache prediction, RAM, bandwidth, chipsets. I'm not going to go into all the details, but suffice to say that the cores need data and code to function and unless your entire process fits in registers, they have to get it from somewhere. The arrangement of memory has a big impact on 1) how much work the CPU has to do to get information and 2) how much the CPU has to wait for that information.
- I/O - I know this is out of our case, but the CPU efficiency of IDE has increased dramatically, but there is still some variance from system to system and driver to driver. Furthermore, different network cards/drivers use significantly different amounts of CPU time to send large amounts of data. This is true even if the speed of execution is not I/O bound - it still takes some main processor clocks and the quantity varies.
Furthermore, this arbitrary driver code and any OS code - for instance - is definitely susceptible to traditional branch prediction, cache hits, etc - even if your main crunching loop did fit in registers.
I'm sure there's more, but I'm done for now.
Your example IS the example of why sigfigs are inherently tricky. Your answer is right - and in addition, decimal places will always get you a reasonable answer. But you've actually increased the number of significant figures (from 3 to 4) - AND that's actually the right answer.
.5 - so your "dp" thing doesn't work at all - we added a dp of precision.
.5 as a shorthand for the integer 1/2, then 2.5 is absolutely the right answer.
.5 in a calculator and got 3, I would return it immediately.
.5, it has no way of displaying or expressing fractional significant figures.
You get into really, really big problems when you mix flop and integer math, and the calculator couldn't know which one you're doing. The basic problem comes from the fact that "integer" precision is commonly notated the same way as "no precision at all"
Here's some interesting examples:
If I divide 1 by 2, the answer should be
If I multiply 5 x.5 as decimals 3 is probably the right answer IF you can guarantee there are no additional sigfigs. But if I entered
If I entered 5 x
Furthermore, 2 1/2 is _probably_ the right answer, because 1/2 is only a small fraction of a significant figure. But your calculator only knows how to display "1/2" as
Going the other way is even worse - unless you're going to make everyone enter everything in SI - which will never happen - there's DEFINITELY no way to differentiate between estimated and real values. What's the right answer to 80 x 90? 7200? or 7000 ? It depends on whether those zeros were significant zeros or placeholders...
Finally, some really, really crazy things start to go on when you have exponents and the like - very commonly you get cases where you probably meant the base of the exponent to be an integer even if some part of the exponent itself is a decimal - because 2.0 ^ 32.0 has NO significant digits unless the 2 is actually an integer. (even if the 32 DOES have infinite precision. (for instance: 2^32 ~ 4 bil. 1.96^32 ~ 2 bil; 2.04 ~8 bil )
But nonethless not EVERY exponent is supposed to be an integer - especially when you're simply squaring something (pythagorean theroem on an arbitrary length, anyone? )
You really need a calculator that is very advanced - not to do the math, but to have input and display that can reasonably interact with how poorly the PEOPLE using them know sigfigs - and how poor an idea the PEOPLE usually have about their input method.
I've never seen a calculator with an _interface_ that could handle it. I actually think it might be easiest to do in a software calculator (even if my hardware calculator was better at some of the actual math)
"about 3.4 GHz worth of _P4_ cycles each second"
there, happy? It is clearly what they meant...
And it's not a bad way to think about it, because the "3400+" numbers and the P4 numbers are the only numbers that ARE comparable.
-1 idiot/-1 wrong. For a post that was _factually inaccurate_ - not necessarily flamebait, or overrated... just wrong. Like the opposite of "informative"
That's awesome, and I should know by now to expect Linux to have already done stuff in a good way.
except - can you find some part of the LVM doc that says that? Although I apparently was unusually incompetent about it, I really went and googled LVM before the gp post and the doc "Why you'd want to use LVM snapshots" said basically "so you can make a complete consistent backup of the FS somewhere else including any busy files"
I suspect that you're right, but I still want to know what super-docs you're reading : )
I read your link and I still believe I'm more correct than the previous poster. Your link has a better ancient history of duct tape, mine only gave the most recent 2/3rds.
i sure/duct_tape/index.html
Here's a revised history:
duck tape, originally made for the military by Johnson & Johnson. Apparently not trademarked.
duct tape, came into common usage. Is made by many manufacturers. 3M, for instance, makes some really nice stuff and has been for 75 years. http://www.3m.com/intl/CA/english/centres/home_le
Their "all weather" and "professional" tape meets UL standards. Although they also have a "Home and Shop" version that I haven't yet used which doesn't meet the standards.
Duck Tape, which is usually quite inferior in my experience, is made by Henkle. As _your link_ says, the original term fell out of usage and was only reintroduced recently by this trademark. The Duck Tape was not common until about 1995, IIRC.
Dude! Card Sharks. I forgot that existed. Dumbest game show ever. "higher... " "lower... "
To weigh in on the thread -
I'm not familiar with "card sharp" - but the bottom dealer explanation seems most appropriate and has the most documentation.
I'm definitely familiar with shark - and it definitely means someone who uses deception to have other players play against them with large amount of money so they can take it. The simplest deception is just to appear to be a very bad player, but I wouldn't say it is the only one.
Oh, and in response to a child post - you'd absolutely tape a duct, unless you like wasting energy. You want to make the best seal possible. Start with a solid mechanical joint then add tape (regular duct tape or, better yet, aluminum tape) then for bonus points blow some stuff in it to seal microholes. The stuff resembles fix-a-flat.
It's worse than you think. Duck Tape ISN'T Duct Tape. And I'm tempted to think somebody ought to make a false advertising lawsuit about it...
Duct Tape meets particular HVAC standards - essentially you have to be able to tape ducts with it, which may be at fairly high temp.
(At least when I worked at Ace and sold it) Duck Tape is a brand of tape. The most common Duck Tape is a silver tape similar in appearance to Duct Tape but with a much less durable adhesive, especially if you warm it up a bit. It does not meet the standards to be Duct Tape - they can't and don't call it that. (In its defense, it was also signficantly cheaper than "real" Duct Tape)
So Duck Tape is useful when you need a cheaper, lower quality tape than Duct Tape - and is mostly used by people who just don't know any better and who thought it was the same thing. Some of whom then complain to me that it doesn't hold as well as I described on a project. I also occasionally use it for color, because it is commonly available in a variety of colors.
There may be some Duck Tape that is also Duct Tape, but I haven't seen it.
Apparently I suck at explaining things recently. What I mean is this:
Normally drives expect a lot of wear in the "spinning" and "eccentric motion in the directions of spinning" ways. So they MUST be heavily engineered to resist these kinds of motions. Whereas they don't induce a lot of "shock perpendicular to spinning" wear.
Your specific explanation may well be spot on; I expect that bearing failure is exactly it. To better explain where I was coming from:
Usually if something is reasonably well engineered you want to store it with the direction of stresses most closely matching how you engineered it to work. But it never occurred to me to think about what orientation most closely replicated "spinning around really fast."
So, I bought into the idea that I wanted to consider setting up snapshots on a new server we're about to setup.
.1n (where n is the "number of nights of backups) multiplier depending on the situation, or possibly much less.
.1n or so for each cycle.
But then I realized that my idea of a snapshot and the practice are very different - I think I heard "versioning filesystem" when you said "snapshot"
First, for anyone not following my myriad posts on this thread, I like something I'm calling an "Incremental Versioning" or IV backup - it's an additive incremental backup leaving both versions available in the case of any change. To me this is key functionality, because I want to be able to run a small backup often and catch small file changes without using enough disk space for a full backup.
So, I like the idea of a versioning filesystem over RAID. Ideally the filesystem would manage a COW (copy on write) layer, so that it would keep both versions of changed files but only need one copy of unchanged files. So, used with RAID, you get the complete versioning of backups at a perhaps 1 +
The Wayback filesystem does this, but it seems a little fringe for the filesystem in a production server.
Alternatively you can setup a cron job to rsync the data to a different, protected directory. Full backups use 1 + 1n space. But a full backup along with IV backups uses 1 + 1 +
So, based on a little googling, I'm guessing that by snapshots you meant LVM snapshots. LVM does not seem to support versioning, and only seems to support "full backup" snapshots.
I understand that in cases of very high disk usage locking the drive momentary to create a snapshot may be valuable. But I see no other general advantages to using snapshotting over rsync on the same machine for the purposes of backup.
I think that if I were to use LVM snapshots I'd feel compelled to do rsync IV updates more frequently anyway, so I can fit more versions in less space.
Incidentally, this seems to justify my original idea that RAID as backup is bad, because this only works by giving up more than 50% of your available space without any disk-level redundancy.
So in the fringe case where you have two big disks and you only need less than 1/4 the space you have, that's great.
But given at least 3 disks and needing 1/4 the space you have I'd MUCH rather do a round robin backup and have no more than 1 complete copy per HD, I think having multiple _full_ copies on the same drive of most of your data is generally wasteful, they should be spread around.
In the general case:
A. It is much more efficient to cool/heat only exactly what you need cool/hot.
B. It is usually more efficient to use one big cooler than many small ones - but it can depend massively on the design/cost/age of the coolers/heaters. Generally, every reversible process generates waste heat. So, unlike ACs, _typical_ electric heaters can be 100% efficient and a heatpump is technically more than 100% efficient at generating heat in your room (based on "thermal energy changed in your room / power energy used " )
So if you wrap all the servers in a small space with the inside end of the ACs, you save a lot of power from "A". Since you only have accidental AC left for the rest of the room, it can get quite hot - but only if something else makes it hot (poorly insulated exterior walls, people, sun-facing windows...)
Hmm. As a Mechanical Engineer by training, your explanation doesn't seem right. On the other hand, I've _definitely_ seen weirder things that were true...
For "evenness" you'd want them to be flat, I'd think. But "even" isn't always better... My guess is more like the below, but it's just a guess and takes your statement as a precondition...
"In order to compensate for tiny bits of drive dissymmetry magnified by high rotational speeds a lot of cost goes into the resistance to side-to-side motion on the spindle.
On the other hand, I suspect you can't effectively touch the outside of the spindle while it's spinning so it has to be held up only the the force of spinning.
So if you put them parallel to gravity the platters will sag at the edges, and may especially move/grind if subjected to shocks (which are most likely in the gravity-direction) But if you put them parallel they can only press against the spindle, and it already expects forces much greater than that.
"I'm sure you understand that. I'm just clarifying for anyone else: RAID+snapshots is nearly as good as backups. Either one alone is useful, but inadequate."
Exactly.
SuSE (novel) has had a RAID installer like you describe - a very nice one, actually, for at least years; I believe I first used SuSE in 7.3 and it was there.
I believe that some of the newest FS versions automatically support snapshots in the FS driver. So theoretically installing RAID under a FS with the snapshots on should do what we want. But I still definitely want the installer to go a step further and say.
"YO! this is a big pretty button that's one step to reasonable data safety. Buy two harddrives, then click here! "
It was my post, and I didn't think it was funny. Except possibly the fact that I don't get many Funny mods, and then I got one, and then you took it away by pointing out it shouldn't have been funny. That's a painful kind of funny...
:)
But definitely the funniest part of that post.
I hadn't heard this before - which edge are they supposed to be stored on?
Are they better to _run_ "on edge" or flat?
If different, does this mean I should turn my machines sideways when I turn them off?
My fault, I wasn't clear. I agree with your reply completely. Let me rephrase:
I'm OK with snapshots. (snapshots are != RAID, of course)
I have the following tiny issues...
(rest of post)
In short, I think that snapshots are basically fine but personally I'd rather have my backup on a machine that wasn't a server in the "accepts incoming filesharing connections" sense.
I welcome the day when my mainstream Linux distribution comes with "install snapshot filesystem over RAID1" is a standard simple installation option.
I didn't mean it quite as simplistically as "accidental deletion" I basically mean that there are too many points of failure that I don't like: users operating the machine. Applications writing to the file. bad RAM. OS. FS drivers. drive controllers. viruses or malicious hacking. If any of these things go wrong, your data is toast, and it's likely toast on ALL of your mirrors.
Normally your RAID array is a read-write server, so a virus on a _client_ machine can wipe out big (enough to be important) sections of data. And it's more vulnerable to hacking because it's providing "public" services (at least on your intranet)
So I'm going to give two examples of where I think you're reasonably right:
1) A very well-secured RAID fileserver that doesn't actually give client machines permission to change/delete files. Using snapshots is a reasonable example - but it isn't the RAID that makes it a backup, it's the snapshots. And some FS have snapshots without requiring RAID...
2) Increasing redundancy by adding 2 drives to a backup server that is already operating behind the kind of protections I discussed originally.
In both of these cases I still think it's better to use a different backup machine because it's more redundant at a pretty marginal cost. [If you scale this up enough it's at NO additional cost, because you "fill" every backup server HDs and with one copy of as much data as possible. If you don't scale it up then it's a couple of lowend machines.]
I've definitely heard reports of multiple same-type HDs going bad at the same time... which makes sense. If they're made at the same time and subjected to the same environment. Reducing the likelyhood of more than one of my drives failing at the same time seems like common sense to me.
timestamps are usually reliable with regard to whether the file has changed on the same machine - the chances of the clock being off such that the changed file has the identical timestamp are fairly low. But I certainly agree that also MD5 checking is even more safe.
With a normal incremental backup I would agree with you. But the "IV" system I discussed doesn't need to be reconstructed like that - which is pretty much why I set it up like that.
1) It's already writing to a filesystem. So it is already "reconstructed" on a system level.
2) It's not saving the diffs of individual files, it's saving full backups of the individual files - if they actually change.
If you have big disks and small data, I wouldn't object to writing everything to 2 parts of the same disk in addition to everything else. But really I'd rather put that effort towards more backup machines.
1. As I just mentioned in response to another post, I very much encourage your backups to actually be on another machine - if your server is own3d, or your OS/RAM/MB freaks out you have no idea what it'll do to your backup drive. If you're posting this on /., I figure you can get a machine out of the garbage and put this together... Again, it's my opinion that having more than one copy of the data per computer is a waste of HD. (other than for high availability, a la RAID)
2. I'm all about using your friend's internet connection to do this. Furthermore, in response to someone else - if you think your friend is spilling Cheerios on it.. a) get better friends and b) get MORE friends/backups. I'll take redundancy over perfection any day.
3. RAID is great in those situations where your intra-backup loss (ie, from a day) is very great.
I agree with you that a lot of people recommend RAID 10, but I think they are quite wrong OR they are using crap systems - my lengthy explanation follows.
Good RAID controllers use battery-backed write cache - that means they "accept" your write immediately and use a battery to actually put it on the harddrive LATER, even if the power goes out. This is a HUGE speed improvement for multiple small write situations, even with just ONE disk. I ignore this effect in the below discussion.
I'm going to assume a system where you have two similar drives on different buses on the same machine. I'm also going to assume that you're HD I/O bound (ie, the harddrive platters/heads are what's causing the slowdown, not your CPU) I'm also going to assume you do more reading than writing - at least more files if not more bits (which is pretty typical)
--- First, why RAID 0 is stupid (unless you're using very large files AND not using them at the same time) I'm going to compare RAID0 to just putting different stuff on different drives (for instance, OS/swap/apps on drive 0 and data on drive 1.) I'm calling this setup "noRAID"
RAID 0 is straight striping - it writes half of every file to each disk. This means that the _write_ time (time from the time it starts to the time it finished writing) is twice as fast, but the _seek_ time (time to get the head to the right place to write) is exactly the same as a single disk. For writing very large files this is almost twice as fast. For writing smaller files it is not faster at all because the seek time (time to find where to write it) totally overwhelms the time to actually do the writing. For reading the same thing is true. The "bulk" of reading a file is exactly twice as fast but the seek is not changed at all. So most of the time it really isn't faster except for really big files.
The short answer is that RAID0 is stupid because it has no benefits when seeking.
Compare this to just using 2 drives: if you try to read or write simultaneous small files that are on different disks, noRAID is absolutely _twice as fast_ If you try to write a single very large file RAID0 approaches being twice as fast as the write time becomes much larger than the seek time. Of course, the weak point in this argument is that sometimes you want two things on the same disk - then noRAID is only the same speed as RAID0 for small files. So noRAID doesn't average being actually twice as fast.
In addition, RAID0 is half as redundant because either disk failing destroys everything.
--- Second, why RAID1 is good.
RAID 1 is straight mirroring. On a modern RAID system (like Linux's SoftRAID) this gives you performance that - compared to a single disk - is exactly identical on write to a single disk (Assuming your CPU can always keep up) For multiple file reading, though, it peforms better than any other setup, even _better than noRAID_ because it only needs to read from 1 disk and it reads from whichever disk has a head in a convenient spot to do THAT read.
It doesn't have the disadvantage of noRAID, because it ALWAYS has a copy of the data it needs on the
Lifespan is an important HD consideration. And furthermore, it's one that I forgot to mention. I have a fair number of 7 year old HDs running and at least 1 at 9. But I agree you can't count on that kind of lifespan.
At some intervals you should definitely add new backup servers with new harddrives and let them sync up. I really paranoid person might have 3 or so backup servers and might add a new one each year...
The great advantage of HD based backups is that adding an entirely new setup to the mix is very easy because you don't need to swap tapes to get _all_ of the data.
I'm ok with snapshotted volumes (on or off RAID) except for:
1. Single point of failure at the PS/MB/controller level.
2. If that machine is owned or the OS goes corrupt it may delete your snapshotted data also.
You missed my point fairly entirely. I have a hard time believing you could have read my post and not figured that out, so perhaps you're trolling - but in case you've managed to confuse anybody...
_hard drives_ are a perfectly acceptable backup medium. I went into great detail about that.
RAID is NOT a backup medium. Backup to harddrive != RAID. !!! RAID is explicitly about consistency - so if you (or a hacker) delete a file from a functioning RAID it immediately gets deleted from everything. This is not a backup.
Also, the linux-HA guys say you should NOT buy the same kind of disks because it increases the chance they fail at the same time.
Finally, you don't need to diff the files you only need to gather the timestamps. (Your way is more space efficient, but I think the gains are marginal)