m.dillon · Slashdot Mirror

Re:Price per gigabyte isn't really the issue on Are Consumer Hard Drives Headed Into History? · 2010-10-24 05:43 · Score: 1

I certainly would not have expected the X25-E's to be less reliable than the X25-M's. That sounds like it might be more of a firmware issue or manufacturing defect. How long ago did it happen? SSDs have gone through major firmware changes nearly every 6 months for the last several years. What was the average data rate being written to the drives? SSDs need idle time for the wear leveling to really work well, but even so SLC has 10x the durability that MLC has so you should not have seen an actual failure in the flash itself, even with severe write amplification.

-Matt

Re:Price per gigabyte isn't really the issue on Are Consumer Hard Drives Headed Into History? · 2010-10-24 05:33 · Score: 1

Yes, as long as you never touch the unused space, that is you partition it off on a factory-fresh drive, the SSD will happily use that space as part of its wear leveling. If you do touch it you have to reformat the drive to a factory-fresh state (or use TRIM to effectively reformat the drive), then partition the space off to get the same effect. I'm not sure what the SATA command sequence is for that but it should be possible to run it from any linux or bsd box. e.g. if you purchase a used SSD (is that an oxymoron?) then be sure to format/TRIM it before you use it for real to clean out whatever cruft the previous owner built up on it. And use SMART to check the wear of course.

Right now on the 40G Intels I am testing I create a 32G partition and leave 8G unused. After 8 months SMART is telling me the wear leveling is sitting on 200TB which is very good (400TB being the theoretical maximum for a 40G MLC flash drive). These particular drives are being used for meta-data caching via DragonFly's swapcache and are running 4-8 gigabytes a day worth of writes (limited by swapcache parameters) every day for the last 8+ months.

I recommend doing this for every SSD.

-Matt

Re:Price per gigabyte isn't really the issue on Are Consumer Hard Drives Headed Into History? · 2010-10-23 19:36 · Score: 1

I'm not sure what you are basing your statements on. Computer hardware goes through evolutionary cycles and consumers buy into those cycles on a very regular basis. Virtually nobody keeps a computer more than a few years anyway before upgrading. I think the average life span of a PC is somewhere on the order of 2-3 years. Consumers do not upgrade because they have to, they upgrade because they want to.

A good example of this in recent times would be the shift to PCIe. Is your computer old enough to still have an AGP bus? Good luck finding a video card for it. But, more to the point, even a low-end PCIe video card is going to have 10 times the performance of an old high-end AGP card. That is what consumers buy into. Video card makers don't even make AGP cards any more.

Similarly one can look at the shift in the laptop market. The NetBooks filled a niche... laptops had gotten too heavy, too hot, and too power hungry. The entire industry shifted downward. Now it is on its way to shifting back upward but the interesting point here is that laptops have also gotten considerably more powerful even in their smaller, lighter, longer-battery-life form factors. That's enough to get quite a large number of people to cycle into new machines. Again.

Smart phones... same thing.

SSDs... same thing. The SSD cycle is being helped along by full time internet access no matter where you are (Android hot spot anyone?). You don't actually have to store terabytes of data in your laptop in order to access terabyte-sized libraries. SSDs work very well in that sort of environment. Therefore the price point has nothing whatsoever in any way shape or form have anything to do with cost per gigabyte. It's strictly an absolute cost based on nominal storage needs which are no longer bound in isolation. The network is your drive, indeed! For a laptop anyhow. My laptop does quite well with a 40G SSD. When I'm at home it also NFS-mounts about 4TB worth of filesystems over my home wifi. When I'm not home I still have access to the same data at WAN or 3G speeds. I don't need 4TB of disk in my laptop. 120G is plenty.

-Matt

Price per gigabyte isn't really the issue on Are Consumer Hard Drives Headed Into History? · 2010-10-23 11:43 · Score: 2, Insightful

It's simply absolute price for a reasonable amount of storage, which these days is around 250GB. Sure I can pop in multi-TB drives for less money, and I do on the machines that need that kind of storage. But the vast majority of machines out in the world don't really need terrabytes of storage. If you don't actually need the storage then it doesn't really matter whether the drive you have installed is 250G or 2TB.

The comments regarding a SSD's ability to extend the life of older computer hardware, and even brand spanking new computer hardware, are right on the mark. How meaningful is one or two hundred extra dollars if your laptop is nice and responsive with the latest memory-hogging software for another year or two because you popped in that SSD? Not very meaningful at all.

So if the question is when will SSDs really start to take off in the consumer world as more than just a niche item? It will be when the price point for that 250G SSD drive drops to something reasonable, like $100 or so. That price point is not actually that far off.

In terms of durability I gotta laugh at anyone who thinks a hard drive is more durable than a SSD. Hard drives last maybe 5 years. I don't think any of my HDs have lasted more than 7 or so years without accumulating serious enough errors to warrant replacement. There is one key difference... it is possible to recover critical data off a HD many years later whereas data stored in flash is gone once it goes bad (and even that might not be true any more with HD densities getting so high). But those sorts of recovery services (where the HD cannot even be powered up any more without destroying it) cost a lot of $$ and I don't think your average consumer would ever use something like that.

Even a little Intel 40G SSD has a 35TB vendor-specified durability. When configured properly along with the OS that durability rises in excess of 200TB, and that's for the cheaper MLC flash. I have around 10 of the 40G SSDs installed and their durability is riding the 200TB mark based on the wear values returned from SMART over the last 8 months or so. The higher capacity SSDs have higher durabilities. With nominal use (which is 99% of the use cases) we are still talking 10 years plus for a small SSD.

I'm not sure who these people are complaining about SSDs failing on them... maybe they should post the vendors they bought them from along with the actual model. I haven't had a single one of my Intels fail and I'm hitting some of them pretty damn hard. I have not seen any performance drop-off with my SSDs either and, besides, a thrashing HD can only do 2MB/sec or so, even a SSD with a moderate performance dropoff is still going to do an order of magnitude better than a HD with a fragmented filesystem. When it comes right down to it if a performance drop-off is a problem for you, just copy the raw storage off the SSD and then back onto it. Poof, problem solved for another year or three.

TRIM is not really needed. In fact, it can be a liability performance-wise since it isn't a NCQ-capable command. All you really need to do is partition a fresh drive a bit smaller than its rated capacity and you get 95% of the benefit of TRIM without having to deal with it. If you have 120G SSD then create a 110G partition. Congratulations, you now have 95% of what TRIM would get you. It's funny how the rabble keeps screaming the TRIM mantra but it isn't that spectacular a feature.

-Matt

Colo vs Home Server vs Virtual Machine, and backup on Cryptome Hacked; All Files Deleted · 2010-10-05 10:38 · Score: 1

Well, it just goes to show you get what you pay for. From the point of view of security Colo is probably the best, but running a server on a static IP from home is likely the most cost effective. Virtual hosting is dirt cheap but worthless for any serious operation. VMs tend to be configured minimally and ISPs mash them all together using shared resources so performance is all over the place. It's pretty easy to brick an OS running in a VM due to the minimal memory configuration it is typically given.

And backups... well, there are lots of choices there. There is no need to lose more than the most recent 60 seconds worth of modifications if you run a near-real-time streaming backup off the site. Something like DragonFly + HAMMER can do just that (and here is my unashamed advertising of DFly :-)).

Also... only 8G of data? That's it?

-Matt

Why would I have to start over? on Bittorrent To Replace Standard Downloads? · 2010-10-03 16:24 · Score: 3, Insightful

Both FTP and HTTP can fetch at offsets other than 0 and ftp at least has been able to do that for well over two decades. I haven't had to start a download over in a long, long time.

-Matt

Density isn't really the story here on Is SSD Density About To Hit a Wall? · 2010-09-19 05:52 · Score: 1

Density isn't really the story here. HD technology will always have higher density, at least for the next 20 years or so until someone comes up with a cheap way to produce nanoscale chips without using expensive lithography processes.

The real story here is performance. SSDs can scale performance far better than HDDs can. We may be hitting a wall on density but there is plenty of room to boost chip interconnect speeds and make the performance of a SSD scale with the number of individual chips making up the storage. We already have this to some degree. It is going to get a lot better. 6GBit SATA is going to be maxed out in less than a year.

The performance equation is going to radically change how systems scale. Right now computer hardware, sans the storage, tends to be limited by available RAM (compute power is no longer the defining issue in a modern system). As SSDs become faster a new mid-level cache tier becomes more and more viable, so instead of having to stuff the machine full of expensive ram one can instead just put a moderate amount of a ram in (say, 8-16G) and stuff a SSD as a 80G mid-level caching tier.

-Matt

Re:Try another vendor on Intel Wants To Charge $50 To Unlock Your CPU's Full Capabilities · 2010-09-19 05:30 · Score: 1

Personally speaking I try to support AMD as much as possible. Their price/performance is almost universally better, particularly when I factor in the power bill and the cost of the MB and power supply. They are considerably less political than Intel.

AMD has fallen behind a bit on PCIe and AHCI support but their new chipsets (e.g. 880G) are finally catching up, thought they are certainly rougher around the edges. It took a bit of massaging to get DragonFly's AHCI driver to probe them properly due to firmware breakage in the newer AMD chipsets (not handling IFS and PCS interrupts properly). I suspect it is partially due to the longer training/negotiation times required on a 6GBit SATA port, even when the device is only 3GBit. Still, AMD's AHCI/SATA firmware still doesn't do FBS (Fis-based-Switching) for devices behind a PM and that is annoying to say the least.

All modern chipset and MB configurations these days are measured by how many concurrent PCIe lanes they can support. AMD is doing quite well on that front.

In terms of performance Intel has the edge on the high-end, but those Intel MB/chip configurations are monsters in terms of power consumption, heat, and noise when the cpu is being run full-out, meaning one has to spend more money on cooling. I can run the AMD systems full out with cheaper case internals. The costs add up. I do like the fact that the high-end AMD consumer cpus are unlocked, and people regularly OC them past 4GHz. So far I haven't seen a need to do that to mine.

That all said, we have to ask ourselves whether the minor difference in performance even matters any more. I stuffed the new PhenomII x 6 along with the cheapest PCIe dual-port video card I could find (HD 4650) and even without real hardware acceleration in the X driver my X display is ten times faster than the one on the machine I bought just two years ago. Performance has far outstripped my needs even when I'm doing bulk package source builds that utilize all available cpu horse-power. The new MBs can take up to 16G of ram (I have 8G stuffed for now)... it's difficult to find a use for all of that ram.

In fact, the issue for me now in terms of getting the most out of my systems has devolved down to just storage bandwidth. I've taken to adding a small SSD as an intermediate meta-data cache which greatly improves find/ls/file-lookup performance on my multi-TB filesystems. Even a system with lots of ram can use a 40G mid-level SSD meta-data cache. The SSD cache has been far more effective than adding more spindles (RAID).

-Matt

Re:Mark me redundant, mark you redundant on Intel Wants To Charge $50 To Unlock Your CPU's Full Capabilities · 2010-09-19 04:56 · Score: 1

They do both. They disable functions that don't meet test but they also disable functions to match production to market demand. Unlocking those functions is a hit-or-miss proposition.

-Matt

Facinating on Market Data Firm Spots the Tracks of Bizarre Robot Trading · 2010-08-04 08:48 · Score: 3, Insightful

It looks to me like the orders are trying to match against dark pool bids/asks, and/or all-or-nothing bids/asks. Another possibility is that they are trying to extract non public information from the trading system by purposefully loading the system down and timing responses.

High frequency trading bleeds money away from institutional investors (by sussing out dark pool bid/ask levels) and from market makers (by stealing ETF rebates for volume). Also, most brokerages use fairly simple algorithms to handle market orders which can be sussed out by the more sophisticated algorithms used by the HF traders.

None of this will really effect the retail investor, it amounts to a penny or less on some transactions. Frankly, people have it easy these days where the bid/ask spread is a single penny. When I began trading in my late teens the bid/ask spread was in fractions and was considerably more than a penny. Retail investors get much better pricing these days.

-Matt

Re:Blacked out Canon logos on Microsoft Tech Can Deblur Images Automatically · 2010-07-31 12:33 · Score: 0

Well, the algorithm they are using is real enough, but that is a high-end Canon DSLR. The ultrasonic logo on the lens is clearly visible. Which means these guys have a hell of a lot of low-noise pixels to work with, and it also means they have very fine control over the number of pixels the blur can cross.

How to remove camera shake with a DSLR, 4-step plan:

* Use a high-end DSLR which can take pictures at ISO 3200 with the same noise content of point-and-shoots at ISO 400. 3 stops.

* Use a fast L series prime lens (like, say, a 50mm F1.2L), or use an IS lens. That's another 3 stops.

* Use a camera with 20+ low-noise mega pixels. Then reduce the pixel count to 0.5x on each axis. Hell, this is a high-end Canon, you might as well reduce the pixel count to 0.25x. 2 more stops.

Uh.. how many stops so far? 8 stops so far. That isn't enough? WAIT, THERE'S MORE!

The single best way to reduce Camera blur with a high end Canon or Nikon DSLR ... (drum roll) ....
HOLD DOWN THE SHUTTER BUTTON AND TAKE 5-7 SHOTS. Then pick out the best one in post-production. Tada!

Camera shake is one thing. Blur from Subject movement is quite another. When taking photos in low light there is a point where camera shake becomes irrelevant.

-Matt

Re:Traditional denial on The Curious Case of SSD Performance In OS X · 2010-07-04 18:52 · Score: 1

All cell phone vendors goose the signal strength meter. All that happened was that Apple goosed it so much they got caught red-handed and were forced to admit it. It certainly was NOT a software bug or a mistake. There is no way it could have been anything but intentional (before they got caught) IMHO.

-Matt

Re:OS X has nothing to do with it on The Curious Case of SSD Performance In OS X · 2010-07-04 18:46 · Score: 3, Informative

Strange reasoning. In anycase, the Intel SSDs appear to use a combination of static and dynamic wear leveling and it seems to do a really good job. A really, really good job. I have over a half dozen of the 40G drives and have not noticed any reduction in read or write performance.

There seem to be dozens of different write combining and wear leveling implementations across vendors. Dozens and dozens. Variations between vendors are significant and even variations between revisions from the same vendor can be significant. Drives sold just two years ago are likely to have primitive weal leveling and write combining verses drives sold today. Vendor technology can be years apart.

I guess you can thank MLC flash for the radical improvement in wear leveling and write combining algorithms over the last few years. Vendors can't really cheat when they use MLC flash... the algorithms have to work properly or the device has an early death due to the limited cell durability.

Personally speaking, I am very confident about Intel's technology. OCZ seems to be pretty good too but it is also full of hacks and does not properly support SATA NCQ. I'm sure there are some other good vendor technologies out there but there are also definitely some very bad ones that are years behind Intel. In the SSD space, the quality of the software matters a lot.

-Matt

TRIM equivalent on The Curious Case of SSD Performance In OS X · 2010-07-04 18:22 · Score: 3, Interesting

All SSDs have a bit more storage than their rating. Partitioning a little less space on a vendor-fresh drive can double or triple the extra storage available to the SSD's internal wear leveling algorithms. For all intents and purposes this gives you the equivalent of TRIM without having to rely on the OS and filesystem supporting it. In fact, it could conceivably give you better performance than TRIM because you don't really know how efficient the TRIM implementation is in either the OS or the SSD. And because TRIM is a serialized command and cannot be run concurrently with read or write IOs. There are a lot of moving parts when it comes to using TRIM properly. Systems are probably better off not using TRIM at all, frankly.

In case people haven't figured it out, this is one reason why Intel chose multiples of 40G for their low-end SSDs. Their 40G SSD competes against 32G SSDs from other vendors. Their 80G SSD competes against 64G SSDs from other vendors. You can choose nominal performance by utilizing 100% of the advertised space or you can choose to improve upon the already excellent Intel wear leveling algorithms simply by partitioning it for (e.g.) 32G instead of 40G.

We're already seeing well over 200TB in endurance from Intel's 40G drives partitioned for 32G. Intel lists the endurance for their 40G drives at 35TB. I'm afraid I don't have comparitive numbers for when all 40G is used but I am already very impressed when 32G is partitioned for use out of the 40G available.

Unfortunately it is nearly impossible to stress test a SSD and get results that are even remotely related to the real world, since saturated write bandwidth eventually causes erase stalls when the firmware can no longer catch up. In real-world operation write bandwidth is not pegged 100% of the time and the drive can pre-erase space. Testing this stuff takes months and months.

Also, please nobody try to compare USB sticks against real (SATA) SSDs. SSDs have real wear leveling algorithms and enough ram cache to do fairly efficient write combining. USB sticks have minimal wear leveling and basically no ram cache to deal with write combining.

-Matt

Hulu has pretty much won the fight on Subscription-Based 'Hulu Plus' Is Now Official · 2010-06-29 09:39 · Score: 1

As much as I sympathize with people who would like to have completely ad-free content, Hulu has pretty much won the battle. People just don't seem to mind three or four 15-30 second breaks during a show, or the occasional 60 second ad. The ads which used to be low-budget fillers have been steadily getting better and better, an indication of real support from the industry. Hulu has also been able to slowly get more up-to-date content on a wider variety of shows, and has completely beaten out (as in destroyed) organization-specific content sites which only offer content specific to that organization.

All that is left to be seen now is whether there is room for another player or two in the space. Ultimately nearly all the content is going to wind up on some sort of Hulu-like service. Verizon, AT&T, COMCAST, and other providers are quickly pricing themselves out of the market. It will eventually all be internet-only.

-Matt

Re:Price is the biggest issue on Israeli Startup Claims SSD Breakthrough · 2010-06-16 05:03 · Score: 1

One thing you need to be careful about with the Intel SSDs is that they have some serious firmware bugs with their SMART implementation. Issuing a SMART command while the controller is busy with other non-SMART commands can brick the SSD and require a full reset or power cycle to fix.

If you are getting bus errors on your controllers and not issuing SMART commands then it probably isn't the SSDs fault.

In anycase, SSDs have plenty enough going for them to warrant the significantly increased cost per GB of storage, you just aren't thinking about them in the right context. Try comparing their cost to the cost of RAM instead of the cost of bulk storage and it should become clear. There is a great deal of infrastructure today that requires costly, power hungry machines with tons of ram for which a SSD retrofit is extremely cost effective. Instead of a big honking server with 32G of ram you can often get the equivalent using a tiny server with 4G of ram and a small 40G SSD. Not in all cases, of course, but certainly a large chunk. The requirement for this sort of conversion is, of course, that the ram is mostly used to store a static or slowly-changing dynamic data set.

Another example would be the storage and management of meta-data. Meta-data uses several orders of magnitude less storage than the data it manages, but often uses more storage than you can conveniently pack into server ram. The perfect solution is a couple of SSDs. Abstractly, if you needed to index 100 million different files using a SSD is the perfect solution.

You wind up spending $100 to replace $1000 worth of hardware and god knows how much energy. Or, just as good, you spend $100 to retrofit existing hardware instead of having to buy new hardware.

A really good example of this would be the active session data for web servers. You know, when you login to something like Amazon and it keeps track of your session for the next hour. This data set is essentially kept in ram full time now but could easily be spooled off to SSD-based storage after 5 minutes of idle without wearing the SSD out. The ram requirements for storing session data are then reduced from needing a 30-60 minute session data footprint to only needing a 5 minute session data footprint. That is a big deal. The reason SSDs can fix this problem while HDs cannot is due to the random-access nature of the spool-in/spool-out of the session data. A HD is severely limited by seek time.

Thus SSD technology has the ability to reduce overall power consumption and physical footprint in a manner which makes it very cost effective.

Write bandwidth is another interesting issue, but I'm not sure how applicable it is to the use of SSDs in an enterprise environment. Writing to a hard disk is certainly be more cost effective if the writes and future reads are linear. If either the writes or the reads are NOT linear the 1-2 orders of magnitude improved iops for random access that you get with a SSD kinda trumps the cost issue. The smaller Intel SSDs do have fairly a inconsistent IOPS / bulk random access and bandwidth but on the otherhand even the worst-case is still ten times better than a hard drive. The OCZ colossus, on the otherhand, is optimized for write bandwidth and write IOPS is far more consistent, but it sucks rocks on random reads due to the lack of NCQ support (presumably they will fix that).

The only time a SSD can be said to be clearly NOT cost effective is in the linear reading and writing case.

-Matt

Re:Signal to noise ratio in FLASH MEMORY? on Israeli Startup Claims SSD Breakthrough · 2010-06-16 03:57 · Score: 1

I'm pretty sure flash chips use analog voltage comparators internally, not A/D's. Though, theoretically, it would be possible to mess with the thresholds for the comparators so if a block had excessive bit errors the thresholds could be manipulated and the block re-read to determine which bits are the most likely culprits. With that information in-hand further error correction could be done.

That is, normally ECC is calculated without any knowledge about which of the N bits of data might be erroneous. If you can gain this additional information your existing ECC code can actually correct more bits, and you can also develop other ECC codes. For example, a simple burst ECC code is just XORing each 8 bit byte in the data stream to produce a single 8 bit burst ECC code. This code would only be useful if you had the additional 'likely culprit' information in hand and if the likely culprit had no overlaps in the burst ECC code. All very easily calculated. These other ECC methodologies only work if you can figure out which bits of data are the most likely to be erroneous.

-Matt

Re:If anything on Israeli Startup Claims SSD Breakthrough · 2010-06-16 03:35 · Score: 1

This is called write amplification and it depends on many factors: Linearity of writes by the computer, how often the computer tells the SSD to flush dirty data to media, the size of the SSDs ram cache, the ability of the SSD to write-combine or scatter/gather sectors, the wear leveling algorithm used by the SSD, and a few other factors.

MLC flash uses 128K blocks. If a database or log is flushing every 1K you wind up with a 128:1 write amplification effect, for example. With some tuning (for example flushing the logs for several database transactions at once instead of one at a time) write amplification effects can be minimized.

Something I noticed while testing the Intel parts is that algorithms to cache clean data or meta-data in a SSD which is mastered elsewhere... that is, a cache which does not have to survive a host reboot, can be optimized to the point where write amplification effects are reduced to 2:1 or better. Formatting a filesystem on a SSD directly tends to have more severe write amplification effects. We are already seeing write durabilities in the 200TB range on Intel's 40G MLC SSDs when used as a data/meta-data cache instead of with a filesystem, which is very good.

-Matt

Re:Why Can't It Just Act As Write-Back Cache? on Hybrid Seagate Hard Drive Has Performance Issues · 2010-06-02 06:02 · Score: 1

This doesn't work well in practice. About the only thing the HDD can actually cache is unrequested data that passes under the head while it is going after requested data. For example, it can cache data ahead of linear read requests, even if several programs are doing linear reads from different parts of the disk. This is what the HD's zone cache does. Usually around 16 'zones' can be tracked in this manner.

Unfortunately this is data which is already readily accessible, so once the HD caches enough to buffer the requests from the OS there is no point caching any more (performance will not improve any further). Adding more cache ram to the HD itself will have little effect once that point is passed.

It would be far, far better to spend the extra money on ram for the system and not ram for the HD. The hybrid SSD model for the HD also has serious problems... the HD has no way to determine what data should be preferentially cached whereas the OS does. So it is far better to attach a SSD directly to the OS as a separate entity and have the OS do the data/meta-data caching.

-Matt

Re:This is the wrong place for this optimization on Seagate Launches Hybrid SSD Hard Drive · 2010-05-24 04:36 · Score: 1

I would go as far as to say that Seagate is chasing an impossible dream here. I've done extensive tests with HAMMER on DragonFly with a SSD caching HD content.

First of all, 4G of flash will not do diddly to improve the performance of a high capacity hard drive in the real world. The minimum you would need is 20G to have any chance of being able to cache filesystem meta-data. The 40G or 80G Intels fit the bill very nicely.

Secondly, it absolutely matters what data the system decides to throw onto the SSD, and there is no way a hard drive can figure that out from short-term activity logs. The HD would have to analyze several days worth of fine-grained activity to get a clue and it is clearly not going to be doing that. Operating systems and filesystems have a much better clue in this matter.

-Matt

More capacity is certainly useful on Seagate Confirms 3TB Hard Drive · 2010-05-17 04:54 · Score: 1

Nobody should be surprised, I expect we will be hitting 5-7TB in the same form factor in another year or two. The real question is how to best make use of the space since access overheads are not going to change much. Linear read speeds will be able to make use of the higher densities but the game is over the moment you have to seek.

There are lots of uses, even for home users. Mid-range digital cameras now pack well in excess of 16MBits and high-end cameras generate raw files in the 30MB per picture range. The newer Canon's (actually they've been around for almost two years now) can shoot video in full HD resolution at 30 frames a second and generate a 35+ MBit/s H.264 video stream. Just 3 minutes of video is a gigabyte of output. Terrabytes get eaten up very quickly under these conditions.

It gets even better for people serious about making backups and snapshots. I already use a 2TB drive for backups and it can hold about one year's worth of (efficiently stored) daily snapshots from around half a terrabyte of base use. Backup requirements WITH snapshots increase non-linearly, so as the base storage needs go up the backup requirements go up even more.

The biggest problem facing people using these larger capacity drives is being able to manage the filesystem meta-data efficiently. You would be just fine if all the files were huge but there are many situations where the number of directory entries can multiply (snapshots being one good example) and once you get enough that the meta-data cannot be reasonably held in ram any significant manipulation of the filesystem can simply take too long to be practical. I once had 40 million inodes on a Reiser filesystem and wanted to remove half of them. Two weeks later (yes, the rm -rf was running for two weeks straight!) it had only gotten rid of a few million and there was no end in sight. I wound up reformatting the filesystem. Ext is only marginally better but, really, any filesystem is going to have serious problems past a certain number of directory entries when the storage medium is a single hard drive.

This is where having a small SSD in the system to help cache the meta-data really helps. We are able to use fewer, larger drives only because we have a little 40G SSD on each system caching all the filesystem meta-data. So on DragonFly, a HAMMER filesystem + a small SSD does the job quite nicely.

-Matt

NTP issues -- really only one major issue on Robust Timing Over the Internet · 2010-05-02 09:07 · Score: 3, Insightful

The only really major issue with NTP is figuring out the time offset due to network latency in asymmetric network environments. The NTP protocol itself can frequency lock down to a few ppm over the internet without any problem at all. Offset errors are another matter. Over the internet 10 ms is about the best you can do. Over a LAN with a local GPS signal driving the protocol you can reduce the offset error to less than a microsecond and can probably get it down to less than 100ns without too much trouble (and without needing any hardware time stamping or netif queue prioritization), and you can frequency lock down to a few nanoseconds.

Other issues include the fact that most motherboards do not have temperature compensated timebases, and they tend to float around a few ppm (for the better MBs) to a few hundred ppm (for the stupid MBs that put crystals next to heat sources), as well as bugs in the operating systems themselves which don't get noticed until you actually try to frequency lock your timebase. This creates a multiplicative effect if you are not connected to a stratum 1 or 2 server because each server in the chain is trying to do a frequency lock against a drifting MB timebase. The result is the leaf nodes in the chain can get frequency locks but the locks are to a drifted frequency instead of to the correct frequency because the server itself can't instantly correct for its drifting timebase. This is why you see the frequency lock jump around by +/- 50ppm in a seemingly uncontrollable fashion once you get past stratum 2.

NTPD was famous for not being able to frequency lock while doing an offset correction at the same time. The NTPD program has also had many issues over years, such as not using a proper dual-staggered linear regression to control offset and frequency corrections, which is why I wrote dntpd for DragonFly. I wonder how many of these old issues with NTPD have been fixed over the years, maybe they have.

Network jitter is largely irrelevant. It isn't a problem if you use a proper linear regression over a long enough period of time. The linear regression can be used to calculate what the jitter is and thus form a good knowledge of the baseline accuracy of the protocol. A medium sized run over a few minutes will get you down to 10-15 ppm on your frequency lock and you can get down to 1-5 ppm within about 10 minutes (assuming the motherboard has an accurate timebase of its own).

The sad thing about all of this is that it takes just a single resistor to temperature compensate a crystal, a resistor most MB manufacturers don't put in. Some don't even use crystals anymore for their PLL input (and the crystal on the RTC doesn't help much because the RTC doesn't have a fine enough grain timer to poll easily). Sigh.

-Matt

Re:Wrong. Swap often acts as a cache. on Software SSD Cache Implementation For Linux? · 2010-04-22 15:28 · Score: 3, Informative

OS's have traditionally discarded clean cache data when memory pressure forces the pages out. Swap traditionally applied only to dirty anonymous memory (The OS needs to write dirty data somewhere, after all, and if it isn't backed by a file then that is what swap is for).

However in the last decade traditional paging to swap has fallen by the wayside as memory capacities have increased. Most of the data in ram on systems today is clean data, not dirty data, and most of the dirty data is backed by a file (e.g. write()s to a database or something like that). On most systems today if you look at swap space use you find it near zero.

But the concept of swap can trivially be expanded to cover more areas of interest. tmpfs (tmpfs, md, mfs, etc) is a good example. For that matter anonymous memory for VMs can be backed by swap. It is very desireable to back the memory for a VM with either a tmpfs-based file or just straight anonymous memory instead of a file in a normal filesystem. That is a good use for swap too.

It isn't that big a leap to expand swap coverage to also cache clean data. It took about two weeks to implement the basics on DragonFly. Those operating systems which don't have this capability will probably get it as time goes on simply because it is an extremely useful mechanic for interfacing a SSD-based cache into a system. It is also probably the cleanest and simplest way to implement this sort of cache, and it pairs up well with the strengths of the SSD storage mechanic. Since you can reallocate swap space when something is rewritten there are virtually no write amplification effects and the storage on the SSD is cycled very nicely. You get much better wear leveling than you would if you tried to map a normal filesystem (or mirror the blocks associated with a normal filesystem) on top of the SSD.

-Matt

Re:Wrong. Swap often acts as a cache. on Software SSD Cache Implementation For Linux? · 2010-04-22 13:20 · Score: 4, Informative

The way DragonFly's swapcache works is that VM pages (cached in ram) go from the active queue to the inactive queue to the cache (almost free) queue to the free queue. VM pages sitting in the inactive queue are subject to being written out to the swapcache. VM pages in the active queue (or cache or free queues) are not considered.

In otherwords, simply accessing cacheable data or meta-data from the hard drive does not itself trigger writing to the SSD swapcache. It's only when the cached VM pages are pushed out of the active queue due to memory pressure and are clearly heading out the door when DragonFly decides to write them to the SSD.

This prevents SSD write activity from interfering with the operation of the production system and also tends to do a good job selecting what data to write to the SSD when and what data not to. A file which is in constant use by the system just stays in ram, there's no point writing it out to the SSD.

With respect to deciding what data to cache and what data not to, with meta-data its simple. You cache as much meta-data as you can because every piece of meta-data gives you a multiplicative performance improvement. With file data it is harder since you don't want to try to cycle e.g. a terrabyte of data through a 40G swapcache. The production system's working data set at any given moment needs to either fit in the swapcache or you need to carefully select which directory topologies you want to cache.

-Matt

Re:Buffers? on Software SSD Cache Implementation For Linux? · 2010-04-22 13:06 · Score: 3, Informative

The single largest problem addressed by e.g. DragonFly's swapcache is meta-data caching to make scans and other operations on large filesystems with potentially millions or tens of millions of files a fast operation. Secondarily for something like DragonFly's HAMMER filesystem which can store a virtually unlimited number of live-accessable snapshots of the filesystem you can wind up with not just tens of millions of inodes, but hundreds of millions of inodes. Being able to efficiently operate on such large filesystems requires very low latency access to meta-data. Swapcache does a very good job providing the low latency necessary.

System main memory just isn't big enough to cache all those inodes in a cost-effective manner. 14 million inodes takes around 6G of storage to cache. Well, you can do the math. Do you spend tens of thousands of dollars on a big whopping server with 60G of ram or do you spend a mere $200 on a 80G SSD?

-Matt

Slashdot Mirror

User: m.dillon

Comments · 771