With enough of a parallel read load command queueing buys you about 2x the IOPS. So if the drive was doing ~200 IOPS without it it will be doing ~400 IOPS with it. But this is still nothing compared to the 20,000 IOPS random read a SSD can do. And, of course, NCQ has virtually no effect on write performance since the drive is draining the writes from its internal cache and the write command(s) themselves are effectively asynchronous.
Also, the keyword is 'parallel read load'. If the work load isn't parallel enough NCQ isn't going to help much.
As with all storage systems, tuning can make a big difference. A SSD doesn't hold much over a HD if all you are doing are large linear reads. The moment you have to seek, though, the game's over.
Similarly when writing to a SSD there is a big difference between doing random 512 byte writes and doing random, aligned 128K writes, and a big difference between doing unaligned 512 byte writes and doing aligned 8K writes (which is more typical of a filesystem). Larger aligned writes reduce the complexity of the SSD's lookup tables and reduce the need for complex write combining, leading to better performance in the long term.
This is hard to benchmark because most SSDs (particularly obvious with OCZs) will will clean up write combining table complexity using idle time, and benchmarks usually don't give the SSDs enough idle time to do the work.
The general tradeoff between Intel and AMD is that AMD optimizes its instructions to run fast from cache and poorly when cache misses occur. Intel optimizes its instructions to run fast (as is possible) when cache misses occur and to run modestly otherwise. That's the best way I can describe it.
For a long time AMD was able to compensate by placing larger caches on the cpu die, and Intel didn't care and generally had smaller caches. That changed with core 2 duo and later chips (particularly SandyBridge). Now Intel has just as much cache as AMD on the cpu die AND it has a memory subsystem that is at least twice as fast or faster, AND a shorter pipeline (faster pipeline stall recovery on cache misses)... and better cache coherency handling... well, it all adds up 30%.
So for the SandyBridge vs phenom comparison, in a 100% L1 cache case AMD is somewhat faster (e.g. doing a syscall overhead test... a thousand instructions at most), but the moment you put a real workload on the sandybridge is considerably faster (e.g. gcc compile, with any amount of concurrency from 1 to N).
A lot of PC benchmarks are designed specifically to blow out AMD's caches and skew the numbers (or just outright not use AMD's higher-end floating point instructions) but Intel doesn't need to pull those shenanigans any more with SandyBridge. It's just faster hands down.
DragonFly's concurrency bottlenecks are basically down to the vm_page_t and pmap level now. vm_object's were just de-globalized a few months ago. It's still an issue in the vm_page allocation path and in pmap operations (particularly pmap_protect and anything else that has to run a pv_entry scan). And findpid() I guess too, and (like FreeBSD), non critical-path drivers we just don't care. That's pretty much it. If there are other global locks still present in the critical path they're there only because we haven't done stability testing with them removed yet.
A buildworld on the 48-core monster these days is more limited by serialization within the Makefile's themselves and not so much due to concurrency issues. e.g. the 'ld' line for a utility is still just one process, and with the cores only running at 2GHz that causes the whole build to be slower than on faster single-chip multi-core cpus.
Intel core-i5-2310 Sandy Bridge - $190. AMD Phenom II X6 1090T Black - $170.
That's a $20 difference and BTW the i5 blows away the Phenom (any Phenom). You don't even need an i7.
Intel is able to price their cpus at a bit of a premium over AMD, which is why Intel is rolling in money and AMD is not. But there's a good reason why Intel has that pricing power and its one word: "SandyBridge".
It is also true that the absolute highest-end unlocked Intel cpu is priced at a very serious premium... but if you are trying to compare roughly similar cpus there's nothing to compare that against.
The AMD Phenom (AM3 socket) series has one advantage over Intel for consumer cpus, and that is they all support ECC while Intel's consumer SandyBridge does not (caveat: you have to find an AMD mobo which supports ECC, not all of them do even though the socket format does). You have to move on to Intel's Xeon SandyBridge to get ECC, and there the pricing premium becomes significant. Very few people want the added cost of ECC (I seem to be the only one who really cares:-( ) for a consumer cpu. Intel clearly has pricing power here too.
I think this was true ~8 months ago but Intel mobos are priced about the same as AMD mobos these days... really ever since SandyBridge came out. There is so much chip integration now that the only real differentiation between mobos is added features and BIOS software.
Most of the costs involved in building a gaming system are unrelated to the cpu. I don't count built-in graphics as being decent, though you might, and I've fried enough systems with cheap PSUs that I don't buy cheap PSUs any more. So my concept of a decent gaming system is probably ~$100 to ~$150 more than yours.
At best whatever price advantage AMD might have goes away in $$ savings from the lower power consumption Intel systems have.
Basically you are right. AMD has nothing even remotely close to SandyBridge and Bulldozer won't get them there either. I've been a long-time AMD fan, and over the years AMD has saved me bundles of money with their socket compatibility.
But AMD has to make a socket switch now and there are way too few AM3+ mobos available. Not only that but the mobos that are available are wired for compatibility.. they will work with AM3+ cpus but they won't be able to make use of all the new performance capabilities. So right now jumping to whatever AMD comes out with next is going to require a mobo replacement, and there's no point buying any current AM3+ mobo to get it.
SandyBridge is 30% faster than AMDs fastest cpu (either the x4 running all cpus accelerated or the x6). In addition, SandyBridge uses 30% less power at similar load levels (whole systems are running around ~40W at idle without having to sleep). Think about it. It's a HUGE advantage for Intel.
This isn't a benchmark... this is running DragonFlyBSD (basically a BSD), and linux will have similar results, doing things like parallel gcc compiles and such. No benchmark fakery here. These are real loads. I have many high-end AMD systems and I also have an Intel i7-2600K system and it runs rings around both my Phenom x 6 black and my newer x 4 with all four cores running at top speed (which is actually faster than the x6 in most cases).
And whatever lead AMD had with overclockers before is gone now. People have been overclocking i7's to almost 5GHz with water cooling.
SandyBridge completely blows AMD away on raw memory bandwidth too. The performance is across the board.
So Intel definitely doesn't have to rush to come out with their next architecture. They have AMD by the throat.
I'm not sure why people think ARM will blow away Intel. ARM is a slow cpu. It doesn't come close to AMD or Intel in performance. It's a cpu for portable devices. ARM does have a major advantage in low power use and 'enough' cpu suds to run devices, and they are certainly taking market share away from desktops, but you won't be finding ARMs in high-end servers any time soon (or even ever). Intel has the best fabs in the world and regardless of what happens with their tit-for-tat with Apple they will be diving into the low power arena over the next few years anyway. They'll lose some share now, but they'll get it all back in a few years.
Right now though it isn't a big deal because Intel can charge a $100-$200 premium for their cpus over AMD, while AMD is forced to sell their cpus at firesale prices just to keep the pipeline going. SandyBridge is that good. For a server that premium takes less than 2 years in reduced power consumption to zero out AMD's price advantage. Intel is a major cash machine because of this. AMD is not. Big difference.
So AMD has lost the high-end cpu war. AMD still has a fighting chance in the integrated graphics arena for lower-end machines but remember Intel has a 2 year Fab advantage. Intel can destroy AMD in this arena too if they feel AMD is getting too much good press.
I'm sure there is but it's pretty simple. Both AMD and Intel make the complete chipset these days, instead of relying on third-party vendors like they used to. And they run in a fairly straightforward progression. Motherboard manufacturers may add additional discrete chips (a RAID controller is quite common) but the differentiation between mobo vendors is far, far less now than it was 4 years ago.
These core Intel or AMD chipsets essentially determine the major features of the mobo. On Intel mobos there's a little room to wiggle, e.g. more USB ports or more SATA ports. Choose your poison (I prefer more SATA ports myself).
I'm not going to list them all but it only takes about ~20 minutes with google to get a breakdown of chipsets and what they support. Then go from that base when selecting a motherboard.
Beyond that it comes down to how good the BIOS is. ASUS seems to be at the top of the pack and they charge a premium for the privilege (at least compared to e.g. Gigabyte, biostar, MSI, and other bulk mobo makers).
The system PSU is just as important than the mobo now. On newer systems I've started buying more expensive, better constructed PSUs because the cheap ones seem to go bad much more quickly than they used to. The main reason is that a modern day PSU has to pump out a huge amount of current at low voltages and if it isn't made right the safety mechanisms (which have to deal with the huge amount of current) also don't work properly. From a design standpoint PSUs that pump out higher voltages at lower currents are easier to construct than PSUs that pump out lower voltages at higher currents. There is more room for error. Since all modern mobos need lots of current at lower voltages... well.
And when a cheap PSU goes bad it can pump out a lot of voltage and destroy the mobo, hard drive(s), cpus... everything. Also cooling has become important enough that you can't really afford to have unused PSU wiring hanging around in the case any more, so having a modular PSU makes a big difference too. There are numerous quality PSU vendors, and hundreds of poor-quality vendors. Just google it.
I don't buy cases which include the PSU any more. They pretty much universally include a poor PSU. I've had too many burn out on me and I'm getting tired of it.
No more small fans, either. 120mm or better or I don't buy it (PSU or case). The bigger fans spin slower, are quieter, and last a lot longer. Even cpu fans, though most of my boxes still have smaller cpu fans in them.
Basically for SATA-III you really just want to use the Intel (or AMD) native SATA ports and you want to configure the BIOS to put them in AHCI mode and use a relatively modern mobo.
For Intel you want something based on SandyBridge, LGA 1155 socket, H67 chipset for a consumer workstation or server. Unlike AMD, Intel basically does not support ECC on their consumer cpus.
If you want to stuff your box full of memory, even with only four slots in a micro-atx form factor, I think you can fit ~16G or so which is going to give you the best bang for the buck in an Intel consumer mobo.
Higher-end server mobos with more memory slots tend to eat a ton more power, even with lower power memory installed. Most people don't need it unless they have no choice but to run VMs.
For AMD you need an AM3 mobo and chipset (AM3+ mobos exist but are typically wired for AM3 compatibility and don't really have any AM3+ features, and there are so few of them you wind up paying a premium for very little gain). You'll want to get the hex-cores with cpu acceleration and even with that AMD is far behind Intel's SandyBridge now. I have both types of systems and the best AMD has to offer is 30% slower and uses 30% more power than Intel, and only about $100 or so cheaper. You can get ECC in a consumer box but the power savings on the Intel chipsets alone easily make up for the $100 increase in price (without ECC on the Intel, of course).
--
For server systems you go with Intel Xeon or AMD opteron. AMD has 12-core opteron chipsets now but they only run at 2 GHz and the Intel Xeons eat a lot less power and have faster cores. The Intels will tend to be a better fit unless you are going for full-on VT with partitioned memory on a multi-socket opteron.
Intel Xeons are priced at a significant premium over Intel consumer cpus, despite being basically the same thing (just with ECC enabled). As always, Intel differentiates product lines by cpu socket but with AMD having to transition to a new socket now there's no longer an upgrade path for AMD beyond AM3 that doesn't also require a new mobo. To get ECC on an Intel you basically need a Xeon (Sandybridge based or better) as well as a mobo that supports ECC. The combination will cost you $200-$300 over a non-ECC consumer setup with the same cpu performance and mobo capabilities.
Personally speaking with the amount of ram one can stuff in even consumer mobos these days, I prefer ECC for any serious server work. You can get away without it if you only have a few systems but.... ah well. Intel's basically won and their pricing shows that they know it.
In terms of form factors, Micro-ATX is the way to go. Mini-ITX is smaller but you have fewer mobo and case choices and the better combinations tend to be priced at a premium. So typically you would want to go with a Micro-ATX form factor on focus on cases which take fewer larger fans and (typically) front-back airflow rather than smaller fans and front+top/side airflow.
Full-ATX gives you a lot more slots but you don't actually get all that much more in the way of PCI2 lanes... it isn't usually worth it unless you really really need a dual-SLI graphics setup (and I've never seen the point considering the cost in power supply suds, noise, power use, and meltdown potential).
--
Pricing wise you can get a decent computer sans operating system built from parts for around $700. If you add a SSD in addition to your primary HD storage it will come to around $800. An SSD with the right setup generally boosts performance by a huge amount as it effectively allows your system to page memory to 40GB to 200GB of 'swap', or to otherwise use the SSD to cache data. But I'd still put the primary OS and file storage on the HD. Use the SSD strictly for caching.
Generally speaking if you want SATA-III to operate satisfactorily you need to use the AHCI controller built into the cpu chipset bundle. That is, the one that Intel and AMD bundle. That will get you a reliable 32-tag-per-port controller. You definitely do not want to use an external controller or a third-party chipset controller (aka Marvell), at least not if you can help it. You won't have a choice if you want hardware RAID, AMD and Intel's controllers don't do RAID (BIOS-based fakeraid doesn't count).
All chipsets have bugs, even AMD and Intel chipsets. Intel AHCI controllers have problems probing Intel SSDs (go figure) and require a driver workaround to unbrick the port when the problem occurs during probe. AMD chipsets don't mask phy errors during initial training, which creates a lot of superfluous interrupts. Both controllers play fast and loose with the AHCI spec and the AHCI spec itself is pretty badly designed, with tons of issues (though not as badly designed as the immensely idiotic USB HCIs).
Another big problem is that the firmware controller that runs the chipset side of the AHCI is typically responsible for ALL the SATA ports, which means that hotplug on one port can actually interfere with operations on another. It pisses me off, but there's no avoiding it.
The external chipsets are even worse. Marvell is a joke. Silicon Image chipsets are full of HARDWARE bugs (not just firmware bugs) which require a lot of workarounds in driver code (for example, you can't abort a soft-reset sequence reliably on a SIL chipset and you can't access the on-chip shared memory while commands are in progress without corrupting any DMA that happens to be occuring).
The stuff is getting better, slowly. The manufacturers of these chipsets have traditionally not really cared about these sorts of bugs because 99.9% of their users are consumers who don't care. The remaining 0.1% professionals who do care aren't a big enough crowd to make the manufacturers actually fix their firmware.
SATA at least has the AHCI spec, too bad more chip manufacturers don't use it. If you want to talk wireless and ethernet chipsets matters are far, far worse.
-Matt (who wrote and maintains DragonFly's AHCI driver)
That's my take. And unlike a hard drive, firmware is something which can be continuously improved. SSD manufacturers are starting to understand and deal with the failure modes.
One thing they don't mention is off-line storage. If I take a hard drive out of service and store it on a shelf for a year, it's virtually guaranteed to fail when I power it up. That is, every single HDD I've taken off the shelf will tend to work for a short while, long enough for me to get the data off of it usually, but every single one has failed within a month of being repowered.
I expect over the next few years the combination of firmware improvements and flash improvements (that is, improvements in being able to predict a flash cell failure) will result in the SSDs running away in the reliability department. Hard drives have been around for a long time and yet they still fail at a horribly high rate... too high for the higher capacities they now have. Intel has certainly already seen the light.
Several vendors are now putting hi capacity caps in their SSDs to remove one common failure mode... exploded meta-data/table table due to unexpected power downs, which is a particular problem for SSDs which use idle time for wear leveling activities.
One thing for sure, we are going to get some excellent statistics over the next decade.
It's really quite simple. Without regulation the automakers will wind up in another race to the bottom (mpg wise) for the heavier SUV-style vehicles consumers still love. The problem with that is that the 2008 crash showed that doing that is suicide and the gasoline price point that puts you into suicide territory is $4-5 (which we are blasted close to already). Only a regulation-imposed bottom can actually allow the automakers to compete with each other in a more comfortable mpg zone.
For lighter vehicles the automakers are terrified not only about a possible double-dip recession but also any spike in fuel prices ripping the bottom out of their other markets. They know they need a viable high mpg product for consumers to shift to when those spikes occur. Without regulation these products will simply not have good enough profit margins due to competition against lower-mpg products during periods where gasoline prices are lower.
Basically, they've seen the light, but nobody should be fooled into thinking that the automakers have suddenly become environmentally conscious. There's a reason why Japanese vehicles almost destroyed American-made vehicles in the U.S. market post-crash-2008. Japanese automakers already had to contend with a large non-US consumer base desiring fuel efficiency so they had the products ready to go when the American market for gas guzzlers cratered. The American auto makers can't compete with the Japanese without regulation! It may sound ass-backwards but this is a case where regulation will actually improve margins for the automakers. It's that simple.
Well, I certainly was not expecting you to use RCS as a comparison point. RCS is utterly horrible when dealing with large data sets. Any modification to a file requires rewriting the entire rcs file and doing something like, oh, tagging, requires rewriting every single file in the repo. Every single one.
RCS is a very filesystem-heavy repo management system. Updates, checkouts, pretty much everything you do *except* single-file log displays are expensive. Such operations have to scan or access nearly every file in the repo and at least stat every file in the checked out tree. For large repos with hundreds of thousands of files RCS/CVS is nasty as hell.
Nor can you can you reliably mirror or replicate a RCS or CVS repo. Neither rsync nor cvsup are capable of reliably replicating a live, heavily used RCS/CVS repo. I've tried many times... I have to mirror the NetBSD CVS repo to get their pkgsrc into a git mirror and it takes a complex script to try to detect a point where the entire CVS repo is quiescent. Even with the quiescence check my script *still* has to do a full cvs checkout and an actual diff -r between the checked out CVS repo and the checked out git repo to catch occasional failures.
In short RCS/CVS is a mess. GIT is not a mess. With git you just use git-daemon and git:// URLs and you can get massive, reliable replication of the repo.
The only other issue involved here seems to be one of machine resources. But in today's world machine resources are cheap. Even a large 50G+ repo trivially fits on a sub-$100 2TB hard drive, and it takes only a moderately-sized SSD caching layer (~100G) to make the repo operations efficient. That's cheap enough that every developer can keep multiple full repos on their workstations.
In many respects the GIT concept has grown into its own by virtue of the greatly improved storage resources available on today's machines. In the 80's and the early 90's a centralized repo would have been far more important simply by virtue of the relative disk space required. In 2011 the relative disk space required for even a large repo is tiny.
I'm not surprised that Google wants people to use actual account names. It still doesn't have to be your real name, you can always create another google account after all! So the title is misleading. What Google is doing is not allowing people to trivially create dozens or hundreds of pseudonyms from one convenient account
The pseudonym mechanic completely destroyed Yahoo's message boards. And I mean completely. The abuse is so high that the value of the boards is gone. They're worthless now. Google is taking that lesson to heart, hopefully.
-Matt
Re:It's because
on
The Rise of Git
·
· Score: 3, Informative
Yes, but at the same time I only recall a few minor instances where I ever wanted to extract just a portion of a CVS archive, and the only reason was because, at the time, the system I was running on wasn't all that fast.
These days extracting a repo, even a large one, doesn't take all that much time, nor is disk space that big an issue. I just extract the whole thing (git, cvs, whatever) and then pick out what I want.
It only takes ~3 seconds or so to switch branches on a checked out repo of around ~100,000 files, and certainly less than ~10 seconds to do an initial checkout of such a repo. Not to mention the fact that 2TB hard drives are $100 these days so there's no real excuse to be tight on disk space.
When I first started using git I did worry somewhat about disk space. I quickly came to the conclusion that a few extra gigabytes didn't matter in today's world of cheap multi-terrabyte hard drives. I typically have 4-5 copies of the DragonFly source base broken out, each with its own copy of the.git repo. A simple git pull is all I need to synchronize whatever directory I've decided to work in (since I'm often reviewing other developer's branches I have multiple independent copies). That's how little I care these days.
That said, it *is* possible to tell git to hard links or otherwise share repo files in order to reduce the size of the.git/ subdirectory in the checkout directories. We do this on our developer box (where each account is given its own private repo which syncs against the DragonFly master repo). I don't bother optimizing my own personal copies though.
And one final thing to note... if the filesystem can de-duplicate data, having a lot of copies lying around is even less of an issue. I've never had to depend on de-dup... it's kinda hard to actually run a 2TB drive that isn't being used to archive media files out of space... but it does work particularly well on backup machines.
Well, you should post with your real name if you are going to make such an encompassing statement, instead of anonymously. I'm kinda wondering what repo management tools you are using that can handle 50GB+ data sets that you are trying to compare against something like git?
From my experience handling large data sets is less a function of the repo and more a function of disk bandwidth and memory. Putting, say, a million files into a repo (any repo) is not a big deal but managing it will definitely be dependent on the operations you are trying to do, memory, and storage.
I've found, in general, for the DragonFly project which manages ~500MB and ~1GB repos, that regardless of the repo if you want operations to be efficient you need to have a high speed caching layer helping out the filesystem. For DragonFly, of course, that means having a SSD in the system and using the swapcache feature to cache filesystem data and meta-data on the SSD. Then repo operations run fast regardless of the repo system used.
A nominally priced SSD can cache 100G of data fairly cheaply, so handling a large repo isn't a big issue. In our case we have two machines which keep about a dozen repos from different projects synchronized, in order to make them available to our developers, as well as perform incremental translations from CVS to GIT for pkgsrc (which is an ultra-nasty script). The scripts run twice a day and have a run-time of around an hour with the SSD caching layer. Without the SSD caching layer those scripts will take 6+ hours of time to run (12 hours a day of run time if I run them twice a day). The SSD makes a huge difference in manageability.
In terms of trying to manage fewer larger files, such as images... large numbers of binary files are best managed outside the repo infrastructure. Sure, a few here and there (such as a web site's icons) can easily be managed inside a repo, but trying to manage large amounts of bulk data in a repo generally just results in a lot of unnecessary pain. It's better to manage bulk data in a filesystem capable of performing snapshots.
Similarly for backups... repos aren't good mechanisms for making backups. You want something more closely integrated with the filesystem (and the filesystem's snapshot capabilities presuming you are using a filesystem with snapshot capabilities) to do LAN and off-site backups. Not a repo.
So the question here is: Are you complaining about the amount of time it takes to do an operation due to being disk-bound, or are you talking about bugs in the repo system causing the program(s) to crash or eat too much memory? I haven't had any significant memory issues with git myself though I can definitely see needing a 64-bit VM space if the repo becomes large enough for certain operations.
-Matt
Re:Git could use revision numbers
on
The Rise of Git
·
· Score: 2
Yes, for a git user the sha key is effectively the commit id / revision number, and it works incredibly well. I don't miss the crazy multi-dotted revision numbers from e.g. CVS, or even the simplified version numbers from svn, or anything else. The sha commit id works so well in git that our kernels include the first few digits of it in their version string printed out in the dmesg, which makes figuring out the basis for a bug report very easy.
Our use of git effectively has a master repo as well, and it is kept very clean relative to developer's local repos which have all of their local development branches.
But I think the most important feature is the utterly trivial incremental replication git supports. When we ship a new release we just include the current git repo in the disk image. If someone installs from that image they can then update their on-disk repo incrementally using the shipped repo as a base and only pull down a small amount of data over the network verses having to download the entire repo.
The incremental replication is also extremely reliable when using the git server feature (git://...), night and day compared to trying to distribute a CVS repo. My CVS repo syncnronization scripts are ridiculously complex. rsync, find the most recent change, rsync again over and over again until the repo is found to be stable and even then there is no guarantee that you have a stable copy. (and no, cvsup doesn't work too well either).
Being able to have a chain of git repos from a single master which are incrementally updated in a reliable fashion makes distributing code bases utterly trivial, and being able to ship a git repo and then incrementally update it over the net to bring it up to current is priceless. It's impossible to replicate with server-only repos.
I have no regrets switching the DragonFly project over to git.
Even dealing with pkgsrc is a lot easier in git than it is with CVS. We want to make pkgsrc available to our users but the master repo (which is in NetBSD's CVS repo) is impossible to keep synchronized using CVS, let alone be able to distribute to our user base in an efficient fashion. So what we do instead is track the CVS with a cron job and dump it into a git repo which we distribute to our userbase instead. The scripts are complex, but they work quite well and we can use the same trick of shipping out the current pkgsrc set as a git repo and have users simply do a small, simple, short, low-bandwidth incremental update to synchronize it with the latest available data.
I'm not sure that repo-managed commit meta-data is necessarily a good thing, other than having the (obvious) log entry. Meta-data is so project-dependent that it is probably better to implement it as a wrapper and store it as a file within the repo instead of trying to integrate it with the repo.
Among other things, anyone who's tried to manage meta-data in repos for any length of time knows how nasty things can get when you need to edit previously committed meta-data. In otherwords, meta-data management can have far more unintended side effects then one might otherwise expect, and the result is that the meta-data management winds up locking you into a particular repo product (which is very bad).
The simpler the data management, the less tied you are to the repo.
More horse-tripe, unfortunately. 'Unfunded liabilities' is just another way of projecting a deficit into the future and assuming nothing will ever be done to deal with it. It's a fun way to come up with a big-assed number but worthless in any real discussion.
The reality is that no deficit survives long enough to come remotely close to the numbers the talking heads spew out. Changes are made (or forced to be made) long before then.
Just like the 'hundreds of trillions' of dollars of face value people were screaming about for CDSs, these numbers are meaningless. You are taking the scare-mongers statements hook line and sinker without bothering to understand what the numbers actually mean.
Here's a little hint: Everything congress mandates comes out of somebody's pocket. 'Funded' vs 'Unfunded' is a feel-good gimick used by politicians. It doesn't matter how it is designated. You know, our two wars have been unfunded as well... congress simply tacked their costs onto the debt and didn't count them at all in their budgets. For ten fracking years.
Similarly if you were to talk about, say, Medicare, throwing around big numbers won't help you actually solve the problem.
p.s. this is also why people generally have no clue as to how healthy or non-healthy the banking system is. From an investor's standpoint the banking system is still in trouble, but from a fundamental solvency standpoint the banking system is not. Anyone looking for a large U.S. bank to go under due to mortgages or European debt or whatever is going to be mightily confused when it doesn't materialize.
We got the same tripe with the FDIC too. People were screaming about the FDIC's balance sheet going negative and having to dig into their line of credit with the Fed. It didn't happen, and anyone with a clue knew it wouldn't happen. But the screaming masses listening to the talking heads thought it might and some people (obviously) STILL think it might. Sigh.
In anycase, if you want to talk meaningful numbers then stick to the actual debt, which is $14T. Stick to current costs (medicare is good fodder here, but Social security isn't). There's plenty to talk about there without having to throw out meaningless synthesized numbers.
You shouldn't just believe every bit of sensational news bits you hear on T.V. This isn't even remotely true.
You know, the same sort of crap screwed up the municipal bond market for almost 6 months. One talking head on T.V. and suddenly the mob thinks munis are going to collapse when, in fact, the default rate was actually going down to historic lows.
This isn't really news. These weren't even real loans, they were just 28-day backstops during the money market meltdown. Just like the inflated values reporters loved to throw around about CDSs, it's more of the same here. They're just adding them all together sequentially (and conveniently forgetting to report the short durations and senior debt status).
And, really, only a complete fool hopes and prays for the banking system to fail.
Go looking somewhere else, this was one thing the Fed actually did right. And like TARP, the government didn't lose any money doing it either.
If you want to complain about something complain about the use of the AIG bailout as an indirect method of bailing out the (mostly bank) counterparties. That was real money that didn't have to be paid back to the government.
I think all they mean is that dram isn't really all that cost effective as a data cache. For data that one intends to export out the network. Storing that data on a SSD, assuming it's a relatively static data set (which most is), uses far less power and costs less than purchasing an equivalent amount of DRAM (and the much larger mobo required to hold that DRAM). The access times are plenty fast enough to still saturate the network. That's all. Not rocket science.
This has been known for several years. Replicate a small server with 8-16G of ram + a 160G SSD + a 2TB HDD sits right on the sweet spot. In fact, even 4G of ram would probably be fine. The idea is not to replace your hard drive but instead to insert another layer of cheap caching to avoid having to maintain a complex, expensive, power hungry HDD storage system just to get better throughput.
Yah, that's me (DragonFly). You mean for the default password encryption? We switched the password encryption to sha-256 by default as one of the google code-in projects. I wasn't particularly involved but I'm sure you are correct. All I can say is that it was better than what we had before:-)
I guess SHA is turning out to not be quite as secure for its key-length as RSA was/is.
My personal position on password files is that it doesn't matter how good the encryption is. Social, physical, and psychological algorithms radically reduce the number of attempts needed to figure out someone's badly thought-out password and things like requiring a number, a capital letter, and a minimum of 8 or 10 chars just isn't good enough. Businesses can't really require ultra-secure passwords without pissing off their user base, so it's a losing proposition no matter how you twist it.
So we basically do not allow passwords at all on any of our development machines or servers, with exception to console access (a weak point to be sure but there isn't a whole lot I can do about it). The console server itself uses ssh. The master password file is *'d out.
Developers have to login via ssh or not at all, and are not allowed to use private keys on shared shell machines (they have to key-forward or not fan-out from the shared shell machine at all, which is more preferable). Then a security breech can't fan-out into multiple vectors as it does when an (encrypted) master password file is stolen from a computer and the attacker has an infinite amount of time to crack multiple passwords offline, or if the private key is stolen.
The biggest problem businesses have when their sites get hacked is not the closing of the security hole, but instead preventing the attacker from re-entering the system via an infinite number of broken password. Having to force millions of users to change their passwords is becoming a non-starter (I think Sony found that out the hard way).
Random passwords might be more workable but since random passwords cannot be remembered anyway one might as well go with a more secure and much longer public key.
Very cool. Any chance of using Intel's AES-NI instructions on the i5 and i7 to shortcut some of the work for the cpu version? Or are they too specific to AES? AES-NI can run AES (I forget exactly which one) at something like 3 GBytes/sec on an i7.
(And, of course, programming it directly into a FPGA, but that's a separate topic).
I really doubt that forced repurchase clause is even remotely legal. The whole point is for the vesting period to be the carrot, and not anything else. If shares had vested and he elected to exercise the options (within typically 90 days of employment if terminated, for any vested options), then he owns the shares free and clear and Skype can't steal them back. Unvested shares are typically lost, and that is standard.
Contracts often have a right of first refusal, that is if an employee owns stock on a company which has not gone public yet and that employee wishes to sell those shares to another person in a private transaction, the company has a right to purchase those shares at that same price first.
But I've never heard of a company being allowed to force a shareholder to sell shares back to the company at a price determined by the company. I really doubt that would stand up in court, because prior to going public it is the company itself that sets the fair market value for the shares (not the public market). It would be ripe for fraud otherwise.
I think this person has a real case if they decide to go to court. Skype should never have put such a clause in their employment contract, I don't know what they hell they were thinking.
Sometimes its hard to back off the excess greed. Kudos to them. That's a lot of money to be able to retire on, easily $15M each after taxes. About $750K in income every year if maintaining their basis at-par (inflation adjusted) with a conservative portfolio.
It's not even worth writing a long essay about. Oooooh, it's 'the cloud', by which they mean it's just a client/server model like most of the internet already resembles. Sheesh.
With enough of a parallel read load command queueing buys you about 2x the IOPS. So if the drive was doing ~200 IOPS without it it will be doing ~400 IOPS with it. But this is still nothing compared to the 20,000 IOPS random read a SSD can do. And, of course, NCQ has virtually no effect on write performance since the drive is draining the writes from its internal cache and the write command(s) themselves are effectively asynchronous.
Also, the keyword is 'parallel read load'. If the work load isn't parallel enough NCQ isn't going to help much.
As with all storage systems, tuning can make a big difference. A SSD doesn't hold much over a HD if all you are doing are large linear reads. The moment you have to seek, though, the game's over.
Similarly when writing to a SSD there is a big difference between doing random 512 byte writes and doing random, aligned 128K writes, and a big difference between doing unaligned 512 byte writes and doing aligned 8K writes (which is more typical of a filesystem). Larger aligned writes reduce the complexity of the SSD's lookup tables and reduce the need for complex write combining, leading to better performance in the long term.
This is hard to benchmark because most SSDs (particularly obvious with OCZs) will will clean up write combining table complexity using idle time, and benchmarks usually don't give the SSDs enough idle time to do the work.
-Matt
The general tradeoff between Intel and AMD is that AMD optimizes its instructions to run fast from cache and poorly when cache misses occur. Intel optimizes its instructions to run fast (as is possible) when cache misses occur and to run modestly otherwise. That's the best way I can describe it.
For a long time AMD was able to compensate by placing larger caches on the cpu die, and Intel didn't care and generally had smaller caches. That changed with core 2 duo and later chips (particularly SandyBridge). Now Intel has just as much cache as AMD on the cpu die AND it has a memory subsystem that is at least twice as fast or faster, AND a shorter pipeline (faster pipeline stall recovery on cache misses)... and better cache coherency handling... well, it all adds up 30%.
So for the SandyBridge vs phenom comparison, in a 100% L1 cache case AMD is somewhat faster (e.g. doing a syscall overhead test... a thousand instructions at most), but the moment you put a real workload on the sandybridge is considerably faster (e.g. gcc compile, with any amount of concurrency from 1 to N).
A lot of PC benchmarks are designed specifically to blow out AMD's caches and skew the numbers (or just outright not use AMD's higher-end floating point instructions) but Intel doesn't need to pull those shenanigans any more with SandyBridge. It's just faster hands down.
DragonFly's concurrency bottlenecks are basically down to the vm_page_t and pmap level now. vm_object's were just de-globalized a few months ago. It's still an issue in the vm_page allocation path and in pmap operations (particularly pmap_protect and anything else that has to run a pv_entry scan). And findpid() I guess too, and (like FreeBSD), non critical-path drivers we just don't care. That's pretty much it. If there are other global locks still present in the critical path they're there only because we haven't done stability testing with them removed yet.
A buildworld on the 48-core monster these days is more limited by serialization within the Makefile's themselves and not so much due to concurrency issues. e.g. the 'ld' line for a utility is still just one process, and with the cores only running at 2GHz that causes the whole build to be slower than on faster single-chip multi-core cpus.
-Matt
Intel core-i5-2310 Sandy Bridge - $190.
AMD Phenom II X6 1090T Black - $170.
That's a $20 difference and BTW the i5 blows away the Phenom (any Phenom). You don't even need an i7.
Intel is able to price their cpus at a bit of a premium over AMD, which is why Intel is rolling in money and AMD is not. But there's a good reason why Intel has that pricing power and its one word: "SandyBridge".
It is also true that the absolute highest-end unlocked Intel cpu is priced at a very serious premium... but if you are trying to compare roughly similar cpus there's nothing to compare that against.
The AMD Phenom (AM3 socket) series has one advantage over Intel for consumer cpus, and that is they all support ECC while Intel's consumer SandyBridge does not (caveat: you have to find an AMD mobo which supports ECC, not all of them do even though the socket format does). You have to move on to Intel's Xeon SandyBridge to get ECC, and there the pricing premium becomes significant. Very few people want the added cost of ECC (I seem to be the only one who really cares :-( ) for a consumer cpu. Intel clearly has pricing power here too.
-Matt
I think this was true ~8 months ago but Intel mobos are priced about the same as AMD mobos these days... really ever since SandyBridge came out. There is so much chip integration now that the only real differentiation between mobos is added features and BIOS software.
Most of the costs involved in building a gaming system are unrelated to the cpu. I don't count built-in graphics as being decent, though you might, and I've fried enough systems with cheap PSUs that I don't buy cheap PSUs any more. So my concept of a decent gaming system is probably ~$100 to ~$150 more than yours.
At best whatever price advantage AMD might have goes away in $$ savings from the lower power consumption Intel systems have.
-Matt
Basically you are right. AMD has nothing even remotely close to SandyBridge and Bulldozer won't get them there either. I've been a long-time AMD fan, and over the years AMD has saved me bundles of money with their socket compatibility.
But AMD has to make a socket switch now and there are way too few AM3+ mobos available. Not only that but the mobos that are available are wired for compatibility.. they will work with AM3+ cpus but they won't be able to make use of all the new performance capabilities. So right now jumping to whatever AMD comes out with next is going to require a mobo replacement, and there's no point buying any current AM3+ mobo to get it.
SandyBridge is 30% faster than AMDs fastest cpu (either the x4 running all cpus accelerated or the x6). In addition, SandyBridge uses 30% less power at similar load levels (whole systems are running around ~40W at idle without having to sleep). Think about it. It's a HUGE advantage for Intel.
This isn't a benchmark... this is running DragonFlyBSD (basically a BSD), and linux will have similar results, doing things like parallel gcc compiles and such. No benchmark fakery here. These are real loads. I have many high-end AMD systems and I also have an Intel i7-2600K system and it runs rings around both my Phenom x 6 black and my newer x 4 with all four cores running at top speed (which is actually faster than the x6 in most cases).
And whatever lead AMD had with overclockers before is gone now. People have been overclocking i7's to almost 5GHz with water cooling.
SandyBridge completely blows AMD away on raw memory bandwidth too. The performance is across the board.
So Intel definitely doesn't have to rush to come out with their next architecture. They have AMD by the throat.
I'm not sure why people think ARM will blow away Intel. ARM is a slow cpu. It doesn't come close to AMD or Intel in performance. It's a cpu for portable devices. ARM does have a major advantage in low power use and 'enough' cpu suds to run devices, and they are certainly taking market share away from desktops, but you won't be finding ARMs in high-end servers any time soon (or even ever). Intel has the best fabs in the world and regardless of what happens with their tit-for-tat with Apple they will be diving into the low power arena over the next few years anyway. They'll lose some share now, but they'll get it all back in a few years.
Right now though it isn't a big deal because Intel can charge a $100-$200 premium for their cpus over AMD, while AMD is forced to sell their cpus at firesale prices just to keep the pipeline going. SandyBridge is that good. For a server that premium takes less than 2 years in reduced power consumption to zero out AMD's price advantage. Intel is a major cash machine because of this. AMD is not. Big difference.
So AMD has lost the high-end cpu war. AMD still has a fighting chance in the integrated graphics arena for lower-end machines but remember Intel has a 2 year Fab advantage. Intel can destroy AMD in this arena too if they feel AMD is getting too much good press.
-Matt
I'm sure there is but it's pretty simple. Both AMD and Intel make the complete chipset these days, instead of relying on third-party vendors like they used to. And they run in a fairly straightforward progression. Motherboard manufacturers may add additional discrete chips (a RAID controller is quite common) but the differentiation between mobo vendors is far, far less now than it was 4 years ago.
These core Intel or AMD chipsets essentially determine the major features of the mobo. On Intel mobos there's a little room to wiggle, e.g. more USB ports or more SATA ports. Choose your poison (I prefer more SATA ports myself).
I'm not going to list them all but it only takes about ~20 minutes with google to get a breakdown of chipsets and what they support. Then go from that base when selecting a motherboard.
Beyond that it comes down to how good the BIOS is. ASUS seems to be at the top of the pack and they charge a premium for the privilege (at least compared to e.g. Gigabyte, biostar, MSI, and other bulk mobo makers).
The system PSU is just as important than the mobo now. On newer systems I've started buying more expensive, better constructed PSUs because the cheap ones seem to go bad much more quickly than they used to. The main reason is that a modern day PSU has to pump out a huge amount of current at low voltages and if it isn't made right the safety mechanisms (which have to deal with the huge amount of current) also don't work properly. From a design standpoint PSUs that pump out higher voltages at lower currents are easier to construct than PSUs that pump out lower voltages at higher currents. There is more room for error. Since all modern mobos need lots of current at lower voltages... well.
And when a cheap PSU goes bad it can pump out a lot of voltage and destroy the mobo, hard drive(s), cpus... everything. Also cooling has become important enough that you can't really afford to have unused PSU wiring hanging around in the case any more, so having a modular PSU makes a big difference too. There are numerous quality PSU vendors, and hundreds of poor-quality vendors. Just google it.
I don't buy cases which include the PSU any more. They pretty much universally include a poor PSU. I've had too many burn out on me and I'm getting tired of it.
No more small fans, either. 120mm or better or I don't buy it (PSU or case). The bigger fans spin slower, are quieter, and last a lot longer. Even cpu fans, though most of my boxes still have smaller cpu fans in them.
-Matt
Basically for SATA-III you really just want to use the Intel (or AMD) native SATA ports and you want to configure the BIOS to put them in AHCI mode and use a relatively modern mobo.
For Intel you want something based on SandyBridge, LGA 1155 socket, H67 chipset for a consumer workstation or server. Unlike AMD, Intel basically does not support ECC on their consumer cpus.
If you want to stuff your box full of memory, even with only four slots in a micro-atx form factor, I think you can fit ~16G or so which is going to give you the best bang for the buck in an Intel consumer mobo.
Higher-end server mobos with more memory slots tend to eat a ton more power, even with lower power memory installed. Most people don't need it unless they have no choice but to run VMs.
For AMD you need an AM3 mobo and chipset (AM3+ mobos exist but are typically wired for AM3 compatibility and don't really have any AM3+ features, and there are so few of them you wind up paying a premium for very little gain). You'll want to get the hex-cores with cpu acceleration and even with that AMD is far behind Intel's SandyBridge now. I have both types of systems and the best AMD has to offer is 30% slower and uses 30% more power than Intel, and only about $100 or so cheaper. You can get ECC in a consumer box but the power savings on the Intel chipsets alone easily make up for the $100 increase in price (without ECC on the Intel, of course).
--
For server systems you go with Intel Xeon or AMD opteron. AMD has 12-core opteron chipsets now but they only run at 2 GHz and the Intel Xeons eat a lot less power and have faster cores. The Intels will tend to be a better fit unless you are going for full-on VT with partitioned memory on a multi-socket opteron.
Intel Xeons are priced at a significant premium over Intel consumer cpus, despite being basically the same thing (just with ECC enabled). As always, Intel differentiates product lines by cpu socket but with AMD having to transition to a new socket now there's no longer an upgrade path for AMD beyond AM3 that doesn't also require a new mobo. To get ECC on an Intel you basically need a Xeon (Sandybridge based or better) as well as a mobo that supports ECC. The combination will cost you $200-$300 over a non-ECC consumer setup with the same cpu performance and mobo capabilities.
Personally speaking with the amount of ram one can stuff in even consumer mobos these days, I prefer ECC for any serious server work. You can get away without it if you only have a few systems but.... ah well. Intel's basically won and their pricing shows that they know it.
In terms of form factors, Micro-ATX is the way to go. Mini-ITX is smaller but you have fewer mobo and case choices and the better combinations tend to be priced at a premium. So typically you would want to go with a Micro-ATX form factor on focus on cases which take fewer larger fans and (typically) front-back airflow rather than smaller fans and front+top/side airflow.
Full-ATX gives you a lot more slots but you don't actually get all that much more in the way of PCI2 lanes... it isn't usually worth it unless you really really need a dual-SLI graphics setup (and I've never seen the point considering the cost in power supply suds, noise, power use, and meltdown potential).
--
Pricing wise you can get a decent computer sans operating system built from parts for around $700. If you add a SSD in addition to your primary HD storage it will come to around $800. An SSD with the right setup generally boosts performance by a huge amount as it effectively allows your system to page memory to 40GB to 200GB of 'swap', or to otherwise use the SSD to cache data. But I'd still put the primary OS and file storage on the HD. Use the SSD strictly for caching.
-Matt
Generally speaking if you want SATA-III to operate satisfactorily you need to use the AHCI controller built into the cpu chipset bundle. That is, the one that Intel and AMD bundle. That will get you a reliable 32-tag-per-port controller. You definitely do not want to use an external controller or a third-party chipset controller (aka Marvell), at least not if you can help it. You won't have a choice if you want hardware RAID, AMD and Intel's controllers don't do RAID (BIOS-based fakeraid doesn't count).
All chipsets have bugs, even AMD and Intel chipsets. Intel AHCI controllers have problems probing Intel SSDs (go figure) and require a driver workaround to unbrick the port when the problem occurs during probe. AMD chipsets don't mask phy errors during initial training, which creates a lot of superfluous interrupts. Both controllers play fast and loose with the AHCI spec and the AHCI spec itself is pretty badly designed, with tons of issues (though not as badly designed as the immensely idiotic USB HCIs).
Another big problem is that the firmware controller that runs the chipset side of the AHCI is typically responsible for ALL the SATA ports, which means that hotplug on one port can actually interfere with operations on another. It pisses me off, but there's no avoiding it.
The external chipsets are even worse. Marvell is a joke. Silicon Image chipsets are full of HARDWARE bugs (not just firmware bugs) which require a lot of workarounds in driver code (for example, you can't abort a soft-reset sequence reliably on a SIL chipset and you can't access the on-chip shared memory while commands are in progress without corrupting any DMA that happens to be occuring).
The stuff is getting better, slowly. The manufacturers of these chipsets have traditionally not really cared about these sorts of bugs because 99.9% of their users are consumers who don't care. The remaining 0.1% professionals who do care aren't a big enough crowd to make the manufacturers actually fix their firmware.
SATA at least has the AHCI spec, too bad more chip manufacturers don't use it. If you want to talk wireless and ethernet chipsets matters are far, far worse.
-Matt (who wrote and maintains DragonFly's AHCI driver)
That's my take. And unlike a hard drive, firmware is something which can be continuously improved. SSD manufacturers are starting to understand and deal with the failure modes.
One thing they don't mention is off-line storage. If I take a hard drive out of service and store it on a shelf for a year, it's virtually guaranteed to fail when I power it up. That is, every single HDD I've taken off the shelf will tend to work for a short while, long enough for me to get the data off of it usually, but every single one has failed within a month of being repowered.
I expect over the next few years the combination of firmware improvements and flash improvements (that is, improvements in being able to predict a flash cell failure) will result in the SSDs running away in the reliability department. Hard drives have been around for a long time and yet they still fail at a horribly high rate... too high for the higher capacities they now have. Intel has certainly already seen the light.
Several vendors are now putting hi capacity caps in their SSDs to remove one common failure mode... exploded meta-data/table table due to unexpected power downs, which is a particular problem for SSDs which use idle time for wear leveling activities.
One thing for sure, we are going to get some excellent statistics over the next decade.
-Matt
It's really quite simple. Without regulation the automakers will wind up in another race to the bottom (mpg wise) for the heavier SUV-style vehicles consumers still love. The problem with that is that the 2008 crash showed that doing that is suicide and the gasoline price point that puts you into suicide territory is $4-5 (which we are blasted close to already). Only a regulation-imposed bottom can actually allow the automakers to compete with each other in a more comfortable mpg zone.
For lighter vehicles the automakers are terrified not only about a possible double-dip recession but also any spike in fuel prices ripping the bottom out of their other markets. They know they need a viable high mpg product for consumers to shift to when those spikes occur. Without regulation these products will simply not have good enough profit margins due to competition against lower-mpg products during periods where gasoline prices are lower.
Basically, they've seen the light, but nobody should be fooled into thinking that the automakers have suddenly become environmentally conscious. There's a reason why Japanese vehicles almost destroyed American-made vehicles in the U.S. market post-crash-2008. Japanese automakers already had to contend with a large non-US consumer base desiring fuel efficiency so they had the products ready to go when the American market for gas guzzlers cratered. The American auto makers can't compete with the Japanese without regulation! It may sound ass-backwards but this is a case where regulation will actually improve margins for the automakers. It's that simple.
-Matt
Well, I certainly was not expecting you to use RCS as a comparison point. RCS is utterly horrible when dealing with large data sets. Any modification to a file requires rewriting the entire rcs file and doing something like, oh, tagging, requires rewriting every single file in the repo. Every single one.
RCS is a very filesystem-heavy repo management system. Updates, checkouts, pretty much everything you do *except* single-file log displays are expensive. Such operations have to scan or access nearly every file in the repo and at least stat every file in the checked out tree. For large repos with hundreds of thousands of files RCS/CVS is nasty as hell.
Nor can you can you reliably mirror or replicate a RCS or CVS repo. Neither rsync nor cvsup are capable of reliably replicating a live, heavily used RCS/CVS repo. I've tried many times... I have to mirror the NetBSD CVS repo to get their pkgsrc into a git mirror and it takes a complex script to try to detect a point where the entire CVS repo is quiescent. Even with the quiescence check my script *still* has to do a full cvs checkout and an actual diff -r between the checked out CVS repo and the checked out git repo to catch occasional failures.
In short RCS/CVS is a mess. GIT is not a mess. With git you just use git-daemon and git:// URLs and you can get massive, reliable replication of the repo.
The only other issue involved here seems to be one of machine resources. But in today's world machine resources are cheap. Even a large 50G+ repo trivially fits on a sub-$100 2TB hard drive, and it takes only a moderately-sized SSD caching layer (~100G) to make the repo operations efficient. That's cheap enough that every developer can keep multiple full repos on their workstations.
In many respects the GIT concept has grown into its own by virtue of the greatly improved storage resources available on today's machines. In the 80's and the early 90's a centralized repo would have been far more important simply by virtue of the relative disk space required. In 2011 the relative disk space required for even a large repo is tiny.
-Matt
I'm not surprised that Google wants people to use actual account names. It still doesn't have to be your real name, you can always create another google account after all! So the title is misleading. What Google is doing is not allowing people to trivially create dozens or hundreds of pseudonyms from one convenient account
The pseudonym mechanic completely destroyed Yahoo's message boards. And I mean completely. The abuse is so high that the value of the boards is gone. They're worthless now. Google is taking that lesson to heart, hopefully.
-Matt
Yes, but at the same time I only recall a few minor instances where I ever wanted to extract just a portion of a CVS archive, and the only reason was because, at the time, the system I was running on wasn't all that fast.
These days extracting a repo, even a large one, doesn't take all that much time, nor is disk space that big an issue. I just extract the whole thing (git, cvs, whatever) and then pick out what I want.
It only takes ~3 seconds or so to switch branches on a checked out repo of around ~100,000 files, and certainly less than ~10 seconds to do an initial checkout of such a repo. Not to mention the fact that 2TB hard drives are $100 these days so there's no real excuse to be tight on disk space.
When I first started using git I did worry somewhat about disk space. I quickly came to the conclusion that a few extra gigabytes didn't matter in today's world of cheap multi-terrabyte hard drives. I typically have 4-5 copies of the DragonFly source base broken out, each with its own copy of the .git repo. A simple git pull is all I need to synchronize whatever directory I've decided to work in (since I'm often reviewing other developer's branches I have multiple independent copies). That's how little I care these days.
That said, it *is* possible to tell git to hard links or otherwise share repo files in order to reduce the size of the .git/ subdirectory in the checkout directories. We do this on our developer box (where each account is given its own private repo which syncs against the DragonFly master repo). I don't bother optimizing my own personal copies though.
And one final thing to note... if the filesystem can de-duplicate data, having a lot of copies lying around is even less of an issue. I've never had to depend on de-dup... it's kinda hard to actually run a 2TB drive that isn't being used to archive media files out of space... but it does work particularly well on backup machines.
-Matt
Well, you should post with your real name if you are going to make such an encompassing statement, instead of anonymously. I'm kinda wondering what repo management tools you are using that can handle 50GB+ data sets that you are trying to compare against something like git?
From my experience handling large data sets is less a function of the repo and more a function of disk bandwidth and memory. Putting, say, a million files into a repo (any repo) is not a big deal but managing it will definitely be dependent on the operations you are trying to do, memory, and storage.
I've found, in general, for the DragonFly project which manages ~500MB and ~1GB repos, that regardless of the repo if you want operations to be efficient you need to have a high speed caching layer helping out the filesystem. For DragonFly, of course, that means having a SSD in the system and using the swapcache feature to cache filesystem data and meta-data on the SSD. Then repo operations run fast regardless of the repo system used.
A nominally priced SSD can cache 100G of data fairly cheaply, so handling a large repo isn't a big issue. In our case we have two machines which keep about a dozen repos from different projects synchronized, in order to make them available to our developers, as well as perform incremental translations from CVS to GIT for pkgsrc (which is an ultra-nasty script). The scripts run twice a day and have a run-time of around an hour with the SSD caching layer. Without the SSD caching layer those scripts will take 6+ hours of time to run (12 hours a day of run time if I run them twice a day). The SSD makes a huge difference in manageability.
In terms of trying to manage fewer larger files, such as images... large numbers of binary files are best managed outside the repo infrastructure. Sure, a few here and there (such as a web site's icons) can easily be managed inside a repo, but trying to manage large amounts of bulk data in a repo generally just results in a lot of unnecessary pain. It's better to manage bulk data in a filesystem capable of performing snapshots.
Similarly for backups... repos aren't good mechanisms for making backups. You want something more closely integrated with the filesystem (and the filesystem's snapshot capabilities presuming you are using a filesystem with snapshot capabilities) to do LAN and off-site backups. Not a repo.
So the question here is: Are you complaining about the amount of time it takes to do an operation due to being disk-bound, or are you talking about bugs in the repo system causing the program(s) to crash or eat too much memory? I haven't had any significant memory issues with git myself though I can definitely see needing a 64-bit VM space if the repo becomes large enough for certain operations.
-Matt
Yes, for a git user the sha key is effectively the commit id / revision number, and it works incredibly well. I don't miss the crazy multi-dotted revision numbers from e.g. CVS, or even the simplified version numbers from svn, or anything else. The sha commit id works so well in git that our kernels include the first few digits of it in their version string printed out in the dmesg, which makes figuring out the basis for a bug report very easy.
Our use of git effectively has a master repo as well, and it is kept very clean relative to developer's local repos which have all of their local development branches.
But I think the most important feature is the utterly trivial incremental replication git supports. When we ship a new release we just include the current git repo in the disk image. If someone installs from that image they can then update their on-disk repo incrementally using the shipped repo as a base and only pull down a small amount of data over the network verses having to download the entire repo.
The incremental replication is also extremely reliable when using the git server feature (git://...), night and day compared to trying to distribute a CVS repo. My CVS repo syncnronization scripts are ridiculously complex. rsync, find the most recent change, rsync again over and over again until the repo is found to be stable and even then there is no guarantee that you have a stable copy. (and no, cvsup doesn't work too well either).
Being able to have a chain of git repos from a single master which are incrementally updated in a reliable fashion makes distributing code bases utterly trivial, and being able to ship a git repo and then incrementally update it over the net to bring it up to current is priceless. It's impossible to replicate with server-only repos.
I have no regrets switching the DragonFly project over to git.
Even dealing with pkgsrc is a lot easier in git than it is with CVS. We want to make pkgsrc available to our users but the master repo (which is in NetBSD's CVS repo) is impossible to keep synchronized using CVS, let alone be able to distribute to our user base in an efficient fashion. So what we do instead is track the CVS with a cron job and dump it into a git repo which we distribute to our userbase instead. The scripts are complex, but they work quite well and we can use the same trick of shipping out the current pkgsrc set as a git repo and have users simply do a small, simple, short, low-bandwidth incremental update to synchronize it with the latest available data.
-Matt
I'm not sure that repo-managed commit meta-data is necessarily a good thing, other than having the (obvious) log entry. Meta-data is so project-dependent that it is probably better to implement it as a wrapper and store it as a file within the repo instead of trying to integrate it with the repo.
Among other things, anyone who's tried to manage meta-data in repos for any length of time knows how nasty things can get when you need to edit previously committed meta-data. In otherwords, meta-data management can have far more unintended side effects then one might otherwise expect, and the result is that the meta-data management winds up locking you into a particular repo product (which is very bad).
The simpler the data management, the less tied you are to the repo.
-Matt
More horse-tripe, unfortunately. 'Unfunded liabilities' is just another way of projecting a deficit into the future and assuming nothing will ever be done to deal with it. It's a fun way to come up with a big-assed number but worthless in any real discussion.
The reality is that no deficit survives long enough to come remotely close to the numbers the talking heads spew out. Changes are made (or forced to be made) long before then.
Just like the 'hundreds of trillions' of dollars of face value people were screaming about for CDSs, these numbers are meaningless. You are taking the scare-mongers statements hook line and sinker without bothering to understand what the numbers actually mean.
Here's a little hint: Everything congress mandates comes out of somebody's pocket. 'Funded' vs 'Unfunded' is a feel-good gimick used by politicians. It doesn't matter how it is designated. You know, our two wars have been unfunded as well... congress simply tacked their costs onto the debt and didn't count them at all in their budgets. For ten fracking years.
Similarly if you were to talk about, say, Medicare, throwing around big numbers won't help you actually solve the problem.
p.s. this is also why people generally have no clue as to how healthy or non-healthy the banking system is. From an investor's standpoint the banking system is still in trouble, but from a fundamental solvency standpoint the banking system is not. Anyone looking for a large U.S. bank to go under due to mortgages or European debt or whatever is going to be mightily confused when it doesn't materialize.
We got the same tripe with the FDIC too. People were screaming about the FDIC's balance sheet going negative and having to dig into their line of credit with the Fed. It didn't happen, and anyone with a clue knew it wouldn't happen. But the screaming masses listening to the talking heads thought it might and some people (obviously) STILL think it might. Sigh.
In anycase, if you want to talk meaningful numbers then stick to the actual debt, which is $14T. Stick to current costs (medicare is good fodder here, but Social security isn't). There's plenty to talk about there without having to throw out meaningless synthesized numbers.
-Matt
You shouldn't just believe every bit of sensational news bits you hear on T.V. This isn't even remotely true.
You know, the same sort of crap screwed up the municipal bond market for almost 6 months. One talking head on T.V. and suddenly the mob thinks munis are going to collapse when, in fact, the default rate was actually going down to historic lows.
-Matt
But I guess it isn't so :-(
This isn't really news. These weren't even real loans, they were just 28-day backstops during the money market meltdown. Just like the inflated values reporters loved to throw around about CDSs, it's more of the same here. They're just adding them all together sequentially (and conveniently forgetting to report the short durations and senior debt status).
And, really, only a complete fool hopes and prays for the banking system to fail.
Go looking somewhere else, this was one thing the Fed actually did right. And like TARP, the government didn't lose any money doing it either.
If you want to complain about something complain about the use of the AIG bailout as an indirect method of bailing out the (mostly bank) counterparties. That was real money that didn't have to be paid back to the government.
-Matt
I think all they mean is that dram isn't really all that cost effective as a data cache. For data that one intends to export out the network. Storing that data on a SSD, assuming it's a relatively static data set (which most is), uses far less power and costs less than purchasing an equivalent amount of DRAM (and the much larger mobo required to hold that DRAM). The access times are plenty fast enough to still saturate the network. That's all. Not rocket science.
This has been known for several years. Replicate a small server with 8-16G of ram + a 160G SSD + a 2TB HDD sits right on the sweet spot. In fact, even 4G of ram would probably be fine. The idea is not to replace your hard drive but instead to insert another layer of cheap caching to avoid having to maintain a complex, expensive, power hungry HDD storage system just to get better throughput.
-Matt
Yah, that's me (DragonFly). You mean for the default password encryption? We switched the password encryption to sha-256 by default as one of the google code-in projects. I wasn't particularly involved but I'm sure you are correct. All I can say is that it was better than what we had before :-)
I guess SHA is turning out to not be quite as secure for its key-length as RSA was/is.
My personal position on password files is that it doesn't matter how good the encryption is. Social, physical, and psychological algorithms radically reduce the number of attempts needed to figure out someone's badly thought-out password and things like requiring a number, a capital letter, and a minimum of 8 or 10 chars just isn't good enough. Businesses can't really require ultra-secure passwords without pissing off their user base, so it's a losing proposition no matter how you twist it.
So we basically do not allow passwords at all on any of our development machines or servers, with exception to console access (a weak point to be sure but there isn't a whole lot I can do about it). The console server itself uses ssh. The master password file is *'d out.
Developers have to login via ssh or not at all, and are not allowed to use private keys on shared shell machines (they have to key-forward or not fan-out from the shared shell machine at all, which is more preferable). Then a security breech can't fan-out into multiple vectors as it does when an (encrypted) master password file is stolen from a computer and the attacker has an infinite amount of time to crack multiple passwords offline, or if the private key is stolen.
The biggest problem businesses have when their sites get hacked is not the closing of the security hole, but instead preventing the attacker from re-entering the system via an infinite number of broken password. Having to force millions of users to change their passwords is becoming a non-starter (I think Sony found that out the hard way).
Random passwords might be more workable but since random passwords cannot be remembered anyway one might as well go with a more secure and much longer public key.
That's my opinion anyhow :-)
-Matt
Very cool. Any chance of using Intel's AES-NI instructions on the i5 and i7 to shortcut some of the work for the cpu version? Or are they too specific to AES? AES-NI can run AES (I forget exactly which one) at something like 3 GBytes/sec on an i7.
(And, of course, programming it directly into a FPGA, but that's a separate topic).
-Matt
I really doubt that forced repurchase clause is even remotely legal. The whole point is for the vesting period to be the carrot, and not anything else. If shares had vested and he elected to exercise the options (within typically 90 days of employment if terminated, for any vested options), then he owns the shares free and clear and Skype can't steal them back. Unvested shares are typically lost, and that is standard.
Contracts often have a right of first refusal, that is if an employee owns stock on a company which has not gone public yet and that employee wishes to sell those shares to another person in a private transaction, the company has a right to purchase those shares at that same price first.
But I've never heard of a company being allowed to force a shareholder to sell shares back to the company at a price determined by the company. I really doubt that would stand up in court, because prior to going public it is the company itself that sets the fair market value for the shares (not the public market). It would be ripe for fraud otherwise.
I think this person has a real case if they decide to go to court. Skype should never have put such a clause in their employment contract, I don't know what they hell they were thinking.
-Matt
Sometimes its hard to back off the excess greed. Kudos to them. That's a lot of money to be able to retire on, easily $15M each after taxes. About $750K in income every year if maintaining their basis at-par (inflation adjusted) with a conservative portfolio.
-Matt
It's not even worth writing a long essay about. Oooooh, it's 'the cloud', by which they mean it's just a client/server model like most of the internet already resembles. Sheesh.
-Matt