Slashdot Mirror


Software SSD Cache Implementation For Linux?

Annirak writes "With the bottom dropping out of the magnetic disk market and SSD prices still over $3/GB, I want to know if there is a way to to get the best of both worlds. Ideally, a caching algorithm would store frequently used sectors, or sectors used during boot or application launches (hot sectors), to the SSD. Adaptec has a firmware implementation of this concept, called MaxIQ, but this is only for use on their RAID controllers and only works with their special, even more expensive, SSD. Silverstone recently released a device which does this for a single disk, but it is limited: it caches the first part of the magnetic disk, up to the size of the SSD, rather than caching frequently used sectors. The FS-Cache implementation in recent Linux kernels seems to be primarily intended for use in NFS and AFS, without much provision for speeding up local filesystems. Is there a way to use an SSD to act as a hot sector cache for a magnetic disk under Linux?"

297 comments

  1. I don't get it by microbee · · Score: 1

    Linux caches data from any disks all the same, SSD or not.

    1. Re:I don't get it by Wesley+Felter · · Score: 1

      Linux caches disk data in memory. The author wants to cache disk data in an SSD.

    2. Re:I don't get it by owlstead · · Score: 1

      This is about doing double caching: cache to fast but limited RAM (L1) first and then have a much larger but slower cache, that being the SSD (L2). Difference being with other caching systems that the SSD of course holds state if power is down (so often use sectors may never be written do disk).

    3. Re:I don't get it by Anonymous Coward · · Score: 0

      I think the submission is talking about inserting the SSD as a (huge) caching layer between the buffers in RAM and the hard drives, something that's much better than normal RAID, but not quite as good as using a metric fuck-ton of RAM and memcached.

    4. Re:I don't get it by JessGras · · Score: 1

      Linux caches disk accesses to RAM. The OP asks about caching disk accesses to SSD. SSDs are much more expensive than magnetic disk, but in turn RAM is much more expensive than SSDs. So at a fixed price point you could cache a great deal more of your HD w/ SSD than w/ RAM.

      Plus, RAM is wiped on reboot. With an SSD cache in front of your HD, you would benefit from the SSD performance, say, on your next reboot - something a cache in RAM could not offer. Or perhaps you launch Gimp several times a day: it would be nice to see it fire up 100x faster!

    5. Re:I don't get it by Korin43 · · Score: 1

      Would using an SSD as a swap device have the effect they want?

    6. Re:I don't get it by Anonymous Coward · · Score: 5, Informative

      The idea is to use the SSD as a second-level disk cache. So instead of simply discarding cached data under memory pressure, it's written to the SSD. It's still way slower than RAM, but it's got much better random-access performance characteristics than spinning rust and it's large compared to RAM.

      As for how to do it in Linux, I'm not aware of a way. If you are open to the possibility of using other operating systems, this functionality is part of OpenSolaris (google for "zfs l2arc" for more information).

      Cache Devices
                Devices can be added to a storage pool as "cache devices."
                These devices provide an additional layer of caching between
                main memory and disk. For read-heavy workloads, where the
                working set size is much larger than what can be cached in
                main memory, using cache devices allow much more of this
                working set to be served from low latency media. Using cache
                devices provides the greatest performance improvement for
                random read-workloads of mostly static content.

                To create a pool with cache devices, specify a "cache" vdev
                with any number of devices. For example:

                    # zpool create pool c0d0 c1d0 cache c2d0 c3d0

                The content of the cache devices is considered volatile, as
                is the case with other system caches.

      You can also use it as an intent log, which can dramatically improve write performance:

      Intent Log
                The ZFS Intent Log (ZIL) satisfies POSIX requirements for
                synchronous transactions. For instance, databases often
                require their transactions to be on stable storage devices
                when returning from a system call. NFS and other applica-
                tions can also use fsync() to ensure data stability. By
                default, the intent log is allocated from blocks within the
                main pool. However, it might be possible to get better per-
                formance using separate intent log devices such as NVRAM or
                a dedicated disk. For example:

                    # zpool create pool c0d0 c1d0 log c2d0

                Multiple log devices can also be specified, and they can be
                mirrored. See the EXAMPLES section for an example of mirror-
                ing multiple log devices.

                Log devices can be added, replaced, attached, detached, and
                imported and exported as part of the larger pool. Mirrored
                log devices can be removed by specifying the top-level mir-
                ror for the log.

    7. Re:I don't get it by Unit3 · · Score: 1, Informative

      No. Swap is not a cache. Swap holds things that don't fit in RAM. I/O cache will never hit swap, it limits itself to physical RAM.

      --
      -- sudo.ca
    8. Re:I don't get it by Colin+Smith · · Score: 4, Insightful

      so

      CPU L1
      CPU L2
      CPU L3
      RAM
      SSD
      DISK
      NETWORK
      Internet

      I estimate SSDs would be closer to Level 5 cache.

       

      --
      Deleted
    9. Re:I don't get it by Penguinisto · · Score: 1, Offtopic

      ...only if you want to blow out the SSD wear-limits.

      What the author wants (I believe) is to have Linux figure down which sectors are read most frequently, and have those mapped/linked/whatever to the SSD for speed reasons.

      If that's indeed the case, then why not simply put the MBR, /boot, /bin, and /usr on the SSD, then mount stuff like /home, /tmp, swap, and the like onto a spindle disk? No algorithm needed, thus no overhead needed to run it, etc.

      --
      Quo usque tandem abutere, Nimbus, patientia nostra?
    10. Re:I don't get it by Anonymous Coward · · Score: 1, Informative

      http://leaf.dragonflybsd.org/cgi/web-man?command=swapcache&section=ANY

    11. Re:I don't get it by jibjibjib · · Score: 1

      Yes, Linux caches data from disks in RAM. But what we're talking about here is not caching in RAM, but using a fast disk (SSD) as cache for a slow disk.

    12. Re:I don't get it by Annirak · · Score: 1

      The issue is that there are lots of frequently read data blocks in /home, /tmp, /swap, etc. and there are lots of infrequently accessed data blocks in /boot, /bin, and /usr. Storing infrequently read data on a fast device and frequently read data on a slow device is inefficient. I want a system which puts the frequently read data on a fast device, and the infrequently read data on a slow device.

    13. Re:I don't get it by pak9rabid · · Score: 1

      Correct me if I'm wrong, but isn't /tmp usually mapped to a ramdisk?

    14. Re:I don't get it by dissy · · Score: 1

      If that's indeed the case, then why not simply put the MBR, /boot, /bin, and /usr on the SSD, then mount stuff like /home, /tmp, swap, and the like onto a spindle disk? No algorithm needed, thus no overhead needed to run it, etc.

      Unfortunately, it usually works out that some of the most volatile places on disk (/home /tmp swap) are the very places one would see the best result in speeding up.

      Also unfortunately those are the worst uses currently for a SSD

      Then again, for anyone who really wants to speed up things like swap and /tmp, the best way is to simply quadruple your ram and get rid of swap, and use tmpfs in ram for /tmp.

      The usual reason for not doing that is ram is expensive, and on top of that motherboards to handle a ton of ram are also expensive (At least compared to their max 4-8gb 4 slot lower end versions)
      However since SSDs are in the equation here, I'm guessing penny pinching is probably not that much of a concern.

    15. Re:I don't get it by Jezza · · Score: 1, Informative

      Assuming the SSD was faster at both read and write - it should speed things up. Hell just moving the swap onto a different physical disk helps. But don't. SSD have a limited life, in a different sense to spinning disks. SSD wear with writing, so if you constantly write to the same "sectors" they will fail. If you think about what's happening when the system is swapping - that's exactly what's going on. So yes, it'll help (a bit) but it's really expensive given what will happen to the SSD. Better is add RAM, so the system won't need to swap (with enough RAM you don't need swap at all).

    16. Re:I don't get it by InlawBiker · · Score: 1

      What about this: the SSD Ram Disk (SSDRD). It's exactly like a normal RAM disk, but it simulates an SSD. It would be supremely faster to write to an imaginary SSD rather than an imaginary HD.

      Patent!

    17. Re:I don't get it by countach · · Score: 1

      Because all of /bin is hardly going to be your most used stuff, and there's probably a ton of stuff frequently used that isn't in /bin, /usr.

      Sure, you can try and mount your most used stuff on SSD, but that's (a) a pain in the neck to fiddle around with (b) something ideally better left to an algorithm. (c) doesn't actually work that well, since you have to divide all your most used stuff into separate file systems.

    18. Re:I don't get it by Znork · · Score: 1

      I'd argue it's better to implement as HSM (Hierarchial Storage Management), with least recently used things getting delegated to more archival storage. It would be nice with a device-mapper-hsm layer that would let you simply stack one device upon the other and obtain the best distribution of desireable characteristics they could offer.

      IIRC, there was an intern at IBM who did a project like that some years ago, but I don't think much became of it.

    19. Re:I don't get it by TheRaven64 · · Score: 4, Informative

      The submitter wants something like ZFS's L2ARC, which uses the flash as an intermediate cache between the RAM cache and the disk. This works very well for a lot of workloads. Since Linux users appear to be allowed to say 'switch to Linux' as an answer to questions about Windows, it only seems fair that 'switch to Solaris of FreeBSD' would be a valid solution to this problem.

      --
      I am TheRaven on Soylent News
    20. Re:I don't get it by GuruBuckaroo · · Score: 0

      What the author wants is what Windows 7 calls ReadyBoost - except using SSDs instead of USB Flash drives. I'd love to see it too.

      --
      Poor means hoping the toothache goes away.
    21. Re:I don't get it by the_one(2) · · Score: 1

      Probably not. That would be bad if for example you wanted to burn a DVD and the burner program put a lot of stuff in /tmp. I'm not a linux pro or anything so I don't know how different distros do it but I don't think that's the default.

    22. Re:I don't get it by Jezza · · Score: 1

      Yeah, especially if those sectors don't change much - an SSD isn't suitable for data that's rapidly changing.

    23. Re:I don't get it by rwa2 · · Score: 1

      Correct me if I'm wrong, but isn't /tmp usually mapped to a ramdisk?

      Depending on the distribution, but sometimes.

      On servers /tmp can get pretty big with random crap, though, so generally you want to be able to put it on a disk or allow it to swap out and use your RAM for something more useful.

      But on thin clients, netbooks, etc. without too much going on it might be better to put it on tmpfs to reduce SSD wear.

    24. Re:I don't get it by MobyDisk · · Score: 1

      <facepalm
      If you want to play Mr. Pedantic, you skipped registers. And CPU cache may not necessarily cache disk data, so those don't count for the same reason registers don't count. Networks don't cache the internet. And don't forget newspapers - the Internet caches those. And newspapers cache events, which cache time. :-)

      The point is that most of the layers you listed are implementation dependent or not relevant to the discussion. For this purpose, the CPU is a black box - it could have different # of levels, and they might not always be used for that purpose. So RAM and SSD are caching disk. It stops there.

    25. Re:I don't get it by raynet · · Score: 1

      Well, if you use tmpfs and not ramdisk for /tmp, then pages will be swapped to disk if needed, thus you can burn you DVD as long as you have enough swap available and damons like swapd or swapspace allow you to have reasonable size swap partition and then will create swapfiles by demand.

      --
      - Raynet --> .
    26. Re:I don't get it by Anonymous Coward · · Score: 2, Informative

      SSD wear with writing, so if you constantly write to the same "sectors" they will fail.

      2006 called, they want their FUD back. While it's true that erase blocks in flash memory wear out with use, the whole battle between SSD manufacturers for the last couple years has been in mapping algorithms that ensure you don't hit the same erase block very often. By now, SSDs have longer lifetimes than HDDs. Of course that applies to real SSDs, not makeshift IDE-to-CompactFlash adapters.

    27. Re:I don't get it by vikingpower · · Score: 1

      That is how I understood it.

      --
      Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
    28. Re:I don't get it by owlstead · · Score: 1

      OK, but CPU L1/2/3 is a data cache. Of course it will help but it's just not configured as a disk cache. Multi-processor systems for instance would not benefit from the CPU caches.

      Furthermore, I don't know about you, but my disk is certainly not used for network or internet cache. For network resources there is no such thing as configuring life-time and the browser disk cache is the first thing I disable (using a spinning system disk for internet cache is stupidity IMHO).

      Besides all that, it's just an example to provide an explanation to microbee, nitpicking on my response does not make for a better answer. Of course, this being Slashdot, I could have expected it to happen :)

    29. Re:I don't get it by EvanED · · Score: 2, Informative

      So yes, it'll help (a bit) but it's really expensive given what will happen to the SSD. Better is add RAM, so the system won't need to swap (with enough RAM you don't need swap at all).

      A RAM buffer cache and SSD cache address far different issues. The buffer cache is far faster when it hits, but the SSD cache is far larger. It's pretty easy to find workloads where getting enough RAM so that your working set will fit into your buffer cache (alongside the memory use of whatever you're doing) would be more expensive than getting at least a cheap, small SSD. (You can get a cheap 30 GB OCZ drive for about the price of 4 GB of RAM.) Your buffer cache can't survive between boots, while an SSD cache would (though an SSD swap partition wouldn't; not really).

      Finally, SSD wear is, I think, overstated. Even with quite heavy write activity, current SSDs will last years, and I suspect adding a 30 GB SSD cache would be a bigger help in 5 years than adding 4 GB of RAM now would, at least in many cases.

      Saying "better is to add RAM" is way too simplistic an answer.

    30. Re:I don't get it by Plekto · · Score: 1

      The best solution then is a physical ramdisk/ramdrive. The capacity isn't huge(8-16GB as a rule), but the speed is easily equal to any SSD and you can beat on it forever without worrying about it running out of write cycles.

    31. Re:I don't get it by EvanED · · Score: 1

      but not quite as good as using a metric fuck-ton of RAM and memcached

      Of course, an SSD will cost less than your metric fuck-ton of RAM, and has the benefit of lasting through boots.

    32. Re:I don't get it by rwa2 · · Score: 1

      Submitter's probably looking at this backwards; just put the entire system on the SSD, and create symlinks to large directories hosted on conventional storage instead.

      Even very small 32GB SSDs are large enough to fit your entire OS on; then you can use the hard disk for large file storage. So I'd say it's probably not worth the effort to try to collect detailed traces using SystemTap or whatever to figure out which files should go on SSD and which should be relegated to the spindle drive; just put it all on SSD, and maybe put /home/ftp/pub on a conventional drive for all your archives / photos / videos / pr0n / etc. Then maybe move and symlink other directories to a place in the conventional drive as necessary after studying the output of "find / | sort -n" .. things like package repositories or seldom used data - /var/cache/apt , /usr/share/doc etc.

      It's really not all that different from the way you might treat different RAID levels, here's what I do on my current system:

      / - RAID10 -p f2 to maximize read performance /home , /usr/local/games - RAID5 to maximize storage space and still have decent performance as long as no drives have failed. /tmp , /usr/src - RAID0 lots of disposable space for fast writes, compiling stuff (personal projects are under SCC in /home), etc.

      There are also ways to use "readahead" tricks to optimize conventional drives during bootup or to preload applications (now outdated thanks to SSDs), but I don't reboot all that often so I don't really have much use for them :-/

    33. Re:I don't get it by Anonymous Coward · · Score: 0

      Does Linux cache actual data (content of files) or just the block addresses?
      As far as I know, only the second.

    34. Re:I don't get it by Anonymous Coward · · Score: 0

      network should be faster than spinning disk and
      likely even ssd. a fast network connection these
      days is 10gbps. a fast ssd is 1gbps (under ideal
      conditions).

    35. Re:I don't get it by hairyfeet · · Score: 1

      Correct me if I'm wrong, but from the sounds of it he wants Readyboost for Linux, which I can't say as I blame him as I have Windows 7 and Readyboost is nice. Now I don't know how well it works, since I am not a Linux guy, but Lifehacker has a DIY Readyboost for Linux. And for those still on XP there is a Readyboost for XP but it costs $40.

      Anyway from TFA it sounds like he wants Linux Readyboost. If he tries it he should probably come back here and give us a little review of how it went. After all if you can get a speed boost using a cheap flash stick like you can in Windows 7, hell why not do it?

      --
      ACs don't waste your time replying, your posts are never seen by me.
    36. Re:I don't get it by h4rr4r · · Score: 1

      Just make a tmpfs, ram is cheap as hell.

    37. Re:I don't get it by Jezza · · Score: 1

      Sure, adding RAM isn't a panacea, but running "san swap" can really speed (some) things greatly. The question I was addressing was all about swap - nothing else.

      I do think running swap in SSD is **probably** a bad idea, especially if you can put enough RAM in to not need swap. But sure that is a pretty glib statement...

    38. Re:I don't get it by h4rr4r · · Score: 1

      RAM is cheap, the OS already caches disk to that. Adding ram is more useful and easier.

    39. Re:I don't get it by dazby · · Score: 1

      Swap to flash does work. It allows you to increase swappiness, and push unused stuff to flash, but use RAM for file cache. I've done some experiments with USB flash on my laptp, and it's heaps better in terms of responsiveness now. No HDD thrashing when ram gets tight. http://dazsbraindump.blogspot.com/2010/03/why-put-your-linux-swap-on-usb-stick.html

    40. Re:I don't get it by h4rr4r · · Score: 1

      Indeed, this is the 1 killer feature of ZFS that btrfs seems not to have yet.

    41. Re:I don't get it by owlstead · · Score: 3, Interesting

      A fast SSD is not 1 Gb/s under ideal conditions. A fast SSD is up to 2 Gb/s (about 250 MB/s) under real life conditions (while reading). Anyway, it still makes sense to cache network content to disk if the other side of the connection is slow or not reliable.

    42. Re:I don't get it by Anonymous Coward · · Score: 0

      You are missing

      Internet
      Carrier pigeon
      Truckload of CDs at 70mph
      Carved in stone and sent by mule
      Smoke Signals

    43. Re:I don't get it by gfody · · Score: 1

      Why are people hung up on SSD lifespan? Unless you're talking about USB flash thumbdrives any SSD you buy is not going to "wear out" any time soon - 5 years is the absolute worst case scenario assuming you write constantly as much as the drive will take. Tracking against intel's media wearout indicator suggests even heavily used drives will last around 15 years. How much computer equipment do you have around that's even 5 years old?

      --

      bite my glorious golden ass.
    44. Re:I don't get it by AHuxley · · Score: 1

      "SSD have a limited life, in a different sense to spinning disks. SSD wear with writing, so if you constantly write to the same "sectors" they will fail."
      The early versions did, but now you have real developers with real support entering consumer space.
      eg. http://eshop.macsales.com/shop/internal_storage/Mercury_Extreme_SSD_Sandforce
      10,000,000 Mean Time Before Failure (MTBF) and a 5 year warranty.

      --
      Domestic spying is now "Benign Information Gathering"
    45. Re:I don't get it by Jezza · · Score: 1

      I'm not going to answer that - but to give you an idea I do have a NeXT Dimension ...

      But I take your point.

    46. Re:I don't get it by rwa2 · · Score: 1

      Hmm, swap to flash sounds good, but you pretty much lose it after rebooting.... I think we'd want something a bit more persistent so it can continue to help the system load fast after reboots or after frequently used but infrequently accessed stuff expires from SSD cache.

      I guess maybe if you set your computer to hibernate instead of powering down :-/

    47. Re:I don't get it by Jezza · · Score: 1

      That looks very kewl. Thanks for the tip.

    48. Re:I don't get it by asamad · · Score: 1

      Why not unionfs, ssd & spindle. make most used stuff to ssd and default stuff to spindle

    49. Re:I don't get it by dazby · · Score: 1

      Yeah, that's true - no help until you've got some memory pressure. I used to get in a situation where if I'd left my machine for a while, with a lot of apps/RAM used, it would all get swapped out by some IO activity. Swapin was horribly slow, but with USB flash swap, swapin is independant of other HDD IO, and not impacted by random access and is pretty quick. Big doses of pgout are still nasty, but once it's happened and you've got your working set in RAM, lots of stuff on standby in swap, and big file cache, things work well.

    50. Re:I don't get it by owlstead · · Score: 1

      SSD is much cheaper than RAM, may computers have serious limitations on the amount of RAM that can be added and finally you don't want oodles of write cache in case your power fails.

      That said, adding a lot of RAM to a system for cheap is certainly a good thing people should consider before going the SSD way. I'm using 77% of my 8 GB RAM for disk cache at the moment, and my system certainly is faster than my laptop once it has started up and applications have been loaded. Of course, it being a desktop, it's continuous power makes it easy to simply suspend to RAM the whole time (and with the new motherboard it actually manages not just to sleep but to wake up as well), so it basically never reboots unless required for an update.

    51. Re:I don't get it by Anonymous Coward · · Score: 0

      Trick with FBSD - it doesn't believe in removing L2ARC devices yet.
      So it's a one-way path if you add one to a zpool.

    52. Re:I don't get it by Anonymous Coward · · Score: 0

      Whoa! That cache goes up to 12!

    53. Re:I don't get it by afidel · · Score: 1

      No, RAM is by far the most expensive resource in a modern virtualized environment followed quickly by IOPS. SSD's can help address both issues. I just wish my SAN had the same kind of auto-tiering that ZFS offers with L2ARC.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    54. Re:I don't get it by BobPaul · · Score: 1

      This won't improve boot times at all. If the sectors used during bootup are on the SSD, boot times can be reduced. If you try to do this with a tmpfs, all you do is extend the boot time by copying that info from the slow HD into the tmpfs before trying to boot.

      A real ramdisk/ramdrive (a batter backed device attached via SATA or SCSI and filled with RAM) would do this, but really not any better than putting /boot, /bin, /usr, etc on an SSD.

      An automated solution, like the author wants, would speed up files all over the system and not just user defined partitions. This could be done using either SSD or battery-backed RAM disks.

    55. Re:I don't get it by Just+Some+Guy · · Score: 1

      For network resources there is no such thing as configuring life-time and the browser disk cache is the first thing I disable (using a spinning system disk for internet cache is stupidity IMHO).

      For the love of God, why? You like refetching every little icon or JavaScript snippet on every page you visit?

      --
      Dewey, what part of this looks like authorities should be involved?
    56. Re:I don't get it by Daengbo · · Score: 1

      Just put the OS on the SSD, with /tmp, /var, and /home on the HD.

    57. Re:I don't get it by Just+Some+Guy · · Score: 2, Informative

      Trick with FBSD - it doesn't believe in removing L2ARC devices yet.

      You're wrong:

      $ sudo zpool add tank cache ada1
      $ sudo zpool status
      [...]
      cache
      ada1 ONLINE 0 0 0
      $ sudo zpool remove tank ada1
      $ sudo zpool status
      [nothing about a cache device]

      You're probably thinking of ZIL devices. You can't remove them in FreeBSD, but the version of ZFS in Solaris (that's being ported to FreeBSD right now) supports removing them.

      --
      Dewey, what part of this looks like authorities should be involved?
    58. Re:I don't get it by steveb3210 · · Score: 1

      Suppose you took your main drive and the SSD drive and created a md device out of them. Then the only thing left to explain to the OS would be to favor one space over the other based on frequency of use.

    59. Re:I don't get it by Galactic+Dominator · · Score: 1

      ...only if you want to blow out the SSD wear-limits.

      The early part of last decade called and they'd like their SSD stereotypes and generalizations returned.

      --
      brandelf -t FreeBSD /brain
    60. Re:I don't get it by Jurily · · Score: 1

      Adding ram is more useful and easier.

      Except for the cases where adding RAM == buy new computer.

    61. Re:I don't get it by Score+Whore · · Score: 1

      Really? I'm curious, let's say I have an SSD. I write block 0. The SSD controller, being the affable chap that it is, happily puts that in some cell somewhere on the device. I write block 0 again. The SSD controller, still being quite affable, knows that if it keeps rewriting the same cell the cell will fail in time. So it remaps block 0 to some other cell. I turn off the power. Turn it back on. Then I ask my SSD controller to read me back block 0. Where does it get the data from? The first cell I wrote to or the second cell? For correct functioning that's going to be the second cell. How does it know? Perhaps it writes a table of block->cell mappings. And that table is stored where? In flash memory perhaps. Now that bit of flash memory is going to get the hell written out of it. In fact that bit will get a write for every remapping that the controller does. And what is the life time of that bit of memory?

      I'm actually serious about this. I've never met anyone who could explain to my what that doesn't happen.

    62. Re:I don't get it by Score+Whore · · Score: 1

      That's interesting, but it is a bit humorous that someone who isn't a name in the enterprise storage market wants to tell us that their storage device is 100x more reliable that the current offerings available. Also "10,000,000 Mean Time Before Failure"? Ten million what? If they don't know the language, I don't know that I believe they know the actual needs of the market.

    63. Re:I don't get it by xtracto · · Score: 1

      Howdy cow, every time someone mentions a feature on ZFS (mostly everytime someone in /. ask "how do I... in Linux") I get more and more impressed.

      I am seriously thinking to try PC-BSD as my next Operating system, with a rock solid and feature full file system like zfs, I am sure I should not let it pass!

      Thanks AC!

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    64. Re:I don't get it by Anonymous Coward · · Score: 0

      Actually spinning disks also wear out in the same way that SSD's does. Failing sectors on spinnings disks used to be a concern.
      Fortunatly the spinning disk developers got their act together and implemented wear leveling and extra hidden sectors they could replace with bad blocks.

      The SSD developers (Used to be memory developers.) are just catching up to technology that has been known by others for decades.
      Expect SSD's to have all issues worked out in a generation or two. (Well, I have not tested current gen. but niether have I had any problems with the old one and I run a windows system just fine on it without even bothering about wear.)

      Now if all SSD's could just get rid of that slow sata-link and put a pci-e on them. That would be nice. :)

    65. Re:I don't get it by Galactic+Dominator · · Score: 1

      Next Gen Smoke Signals have developed spread spectrum technology making throughput equivalent to a flock of pigeons and you don't have the latency. Only usable during daylight hours weather permitting.

      --
      brandelf -t FreeBSD /brain
    66. Re:I don't get it by beelsebob · · Score: 1

      A fast network in someone's house by comparison is 1Gb/s, and more commonly only 100Mb/s. Meanwhile, a hard disk will read at 1Gb/s, and an SSD at 2Gb/s.

      Your numbers, fix them.

    67. Re:I don't get it by badran · · Score: 1

      SSD's are cheap.

      1. mount / on SSD. In most cases you would not need more than 5GB
      2. mount /home on HDD.

      problem solved.

    68. Re:I don't get it by Moxon · · Score: 1

      Parent: +1 insightful

      Unfortunately, the ReadyBoost concept seems to be patented in the US. Don't know if this patent still applies when the SSD is actually faster in every way than the moving platter disk, however. The clever trick of ReadyBoost was realizing that an USB drive could deliver a small file faster than a hard disk's seek time, so you save time even if the disk has a better sustained data rate over time for big reads.

    69. Re:I don't get it by Anonymous Coward · · Score: 0

      How commercial SSDs do wear-leveling is a relatively closely guarded secret by each controller manufacturer. I doubt you'll get an honest answer. That said, if you look at some of the open source flash file systems or flash translation layers, you can get a pretty good guess. Ditto if you look at the old idea of a log structured file system. My guess is many of them by writing out the metadata in chunks as it writes other things, and then scans it to reassemble it in memory when it starts up.

      See also http://www.eecs.berkeley.edu/~brewer/cs262/LFS.pdf

    70. Re:I don't get it by Anonymous Coward · · Score: 0

      No. It may be on some systems, but this will be a huge problem for any program that uses /tmp when data is too large to hold in RAM.

    71. Re:I don't get it by obarthelemy · · Score: 1

      you're assuming
      1- all of the OS is accessed often enough to justify being "cached" on the SSD
      2- no other code/data (apps...) is used more often than that.
      3- users know how to do that, or Linux allows to do that in an intuitive way

      My guess is, you're wrong on all counts. Hence the OP's question. Nice opportunity for Linux to show how it can integrate innovations faster than Windows :-p

      --
      The Cloud - because you don't care if your apps and data are up in the air.
    72. Re:I don't get it by SkunkPussy · · Score: 1

      im pretty sure my disc latency is lower than the respone time of most websites i come across

      --
      SURELY NOT!!!!!
    73. Re:I don't get it by owlstead · · Score: 1

      No, of course not, good gods. I use the memory cache instead. CacheViewer shows that .js and images are nicely stored in RAM. Of course, they are gone when I restart FF, but I can live with that. Why do *you* think there is a memory and a disk cache?

    74. Re:I don't get it by owlstead · · Score: 1

      Depends if I am multi-tasking really.

      For instance, I like to keep my browser open while gaming. This can be a catastrophe waiting to happen if disk caching and - of course - flash is not disabled. JavaScript seems to be less of an issue.

      Most of the time the disk cache is supported by my RAM, but there is still disk latency to worry about. Disks simply suck at multi-tasking.

    75. Re:I don't get it by Just+Some+Guy · · Score: 1

      Because your hard drive is far larger than your RAM, and it's still a lot quicker than the 'net. But out of curiosity, why don't you like disk caching?

      --
      Dewey, what part of this looks like authorities should be involved?
    76. Re:I don't get it by drinkypoo · · Score: 1

      The "DIY Readyboost for Linux" is instructions on swapping to a flash drive. I'm having a very hard time imagining why you would want to do this. AFAIK the Linux kernel will not automagically detect the type of media the swap you're adding is located upon and use it intelligently and accordingly.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    77. Re:I don't get it by owlstead · · Score: 1

      Well, I have got a lot of RAM (8 GB cost almost nothing, I've got a way bigger RAM cache installed than the default 50 MB disk cache) and I don't like disk because it sucks when multi-tasking. Unless you can configure FF not to cache to disk when I'm doing other things to it, I would rather have it leave my disk alone. If my 8 GB RAM is not enough, it may always revert back to using the disk to cache unused RAM pages.

      That, and I've got a loud HDD (a raptor from before the early SSD age).

    78. Re:I don't get it by Daengbo · · Score: 1

      I was adding to the GP and P posts.

      1 - an 8-16GB SSD would be more than enough to hold the whole OS plus tons of apps. There's no need for them to be cached.

      2 - These parts (the main OS) are the most commonly accessed during boot and use, unless you run a DB server.

      3 - This is easy to set up on the partitioning screen of most distros.

      FreeBSD has this feature already, which means that if there was a real call for it, the feature
      would have been ported over to Linux, already. I doubt there's a lot, though, since DB servers generally just cache everything in memory, and swap seems to be good enough for everything else.

      Nice snark, though.

    79. Re:I don't get it by hey · · Score: 1

      On top of that you need:
      - Your brain's long term memory
      - Your brain's short term memory

    80. Re:I don't get it by azmodean+1 · · Score: 1

      I agree with you in general, but just one nit. "readahead tricks" can still give a boost to loading large amounts of data from a SSD. While the "small read" performance of a good SSD is orders of magnitude faster than the same scenario for rotating media, "large read" performance is still significantly faster than either, so if you can queue up lots of contiguous writes your operation will complete faster no matter what kind of media you have.

    81. Re:I don't get it by azmodean+1 · · Score: 1

      A few problems with this approach, one is that the overhead of unionfs is not negligible. While I don't believe it's huge, it's still somewhat counterproductive to add another layer when you're talking about trying to improve performance. The main problem though is that unionfs just doesn't support that use case. The "lower" device is generally required to be read-only, which means you can't write through to the rotating disk. It's just a bad match.

    82. Re:I don't get it by azmodean+1 · · Score: 1

      you're assuming
      1- all of the OS is accessed often enough to justify being "cached" on the SSD
      2- no other code/data (apps...) is used more often than that.
      3- users know how to do that, or Linux allows to do that in an intuitive way

      My guess is, you're wrong on all counts. Hence the OP's question. Nice opportunity for Linux to show how it can integrate innovations faster than Windows :-p

      And *you're* assuming that
      1 - the OS is big enough to worry about
      Just checked, my / partition has 4.7GB on it, that's everything that's not /home or /boot in my case. To be safe I allocate 10GB for /. Just to be clear, that is every app I use, and I have tons of apps on there that I could get rid of if it were worth my time, but it's not, because that's a trivial amount of storage.
      2 - The proposed scheme leaves any apps out, and that it's not flexible enough for you to trivially put data on the SSD.
      Obviously you aren't familiar with linux, this layout puts ALL your apps on the SSD by default. The only exception is if you build from source on your /home partition and don't bother to install the resulting app to the usual location.
      If you find yourself needing data cached on the SSD as well, you can just set up a partition for your special data on the SSD as well, it's trivial.
      3 - users don't know how to do it, and Linux doesn't allow it to be done in an intuitive way.
      Every recent graphical installer I've seen has a section where you set up the layout for your partitions, and also an automatic setup that could trivially be adjusted to run some quick speed tests on the drives and use that to determine where to put the various partitions. In other words this approach has been available to those in the know for longer than SSDs have been around, so you do a very poor job of making your point.

    83. Re:I don't get it by pak9rabid · · Score: 1

      Isn't that also why you put a limit on size of the tmpfs ramdisk?

    84. Re:I don't get it by Plekto · · Score: 1

      The advantage of physical ram, though, is that it almost never wears out. I've booted up 30 year old computers and the ram still works. SSDs if used for this purpose will burn out in weeks or months, because they're not capable of withstanding that sort of (ab)use for very long. SSDs also have long write times compared to their read times. Much faster than a hard drive, of course, but nothing like a ramdisk. When you need to beat on it a hundred thousand times a day, nothing beats good old ram.

      I suggest he take a look at the ANS-9010. This is going to work great for his uses. I've seen them for around $350 new, and then you just drop in a couple of hundred dollars worth of memory. It also has built-in raid so he can get a bit more bandwidth if he needs to. For this sort of swap file/temporary memory space application, it's fantastic.

    85. Re:I don't get it by On+Lawn · · Score: 1

      I'll agree, but ultimately the problems arisen with cache consistency and the latency of the network in inhibiting that consistency makes such caching unfeasible.

    86. Re:I don't get it by rwa2 · · Score: 1

      I was under the impression that most of the improvement from using the RedHat "background readahead" optimizer comes from reordering lots of small reads by their physical position on disk, so that the drive head can just do one sweep to load the data, rather than jumping back and forth (adding seek latency). Also, it might have a small chance of improving hits to the drive's internal readahead cache. But I don't think either of those would really apply to an SSD.

      True, block reads would be faster than lots of small reads, but the only way to do that to lots of small files would be to group them together into a special package... like maybe through judicious use of squashfs or something similar.

      But yeah, it would be awesome if someone managed to work out how to do the latter... ideally during bootup the hard disk should be pegged at full read speed until everything it needs from disk is loaded. It would call for a lot of stripping out of files that weren't accessed, though.

    87. Re:I don't get it by BobPaul · · Score: 1

      SSDs if used for this purpose will burn out in weeks or months,

      Probably not. He's asking for a cache of most used sectors, not a cache of currently used sectors. The information on the SSD/RAMDisk used for this cache probably wouldn't change much on a daily basis. Based on his description, I wouldn't expect this to be taking much abuse.

      Additionally, modern SSDs have much higher block erase values than users give them credit for, and with wear level routines in the firmwares, they often have life expectancies greater than HDs, which wear out as oil leaks from the motor housing. Add in the cost benefit of both... $350 + 8 * $170 = $1710 for a 32GB RAMDrive using the ANS-9010. I'll take less than $200 for a decently spec'd 32GB SSD. Even if you're right and I have to replace it frequently, I'd have to go through 9 before I match the price of the RAMDrive. Heck, the most expensive 32GB SSD is still only $360, with 250MB/s read and 170MB/s write. Maybe not as fast as ram, but way faster than a HD and with a 3 year warranty I'd have a lot of trouble justifying the $1350 price difference.

      It's all moot until someone writes an algorithm to match his request, though.

    88. Re:I don't get it by Plekto · · Score: 1

      Obviously he would use cheaper 2GB memory modules as 16GB is more than enough for his use. I mentioned the 9010 vs the 9010B as is would allow for raid. If he's not interested and just wants a big "drive", use the 6 slot B model which is just over $200. Drop some cheap ram in it and go.

      With 2GB modules, he's looking: 350+220X2=$790.
      Or with the 6 slot model, about $600. More than a SSD, but not hugely so.

      And you'd be amazed. My computer shows an average of nearly a million I/O hits to my memory and hard drive a day. Unless he's very careful, he can end up bricking that SSD in a few months by getting one setting wrong.

    89. Re:I don't get it by BobPaul · · Score: 1

      Unless he's very careful, he can end up bricking that SSD in a few months by getting one setting wrong.

      You're drastically underestimating SSD write limitations. I have an original EEE 900 with a 16GB SSD that I've used daily without a second thought for most of 2 years now. I even have a swap partition. It'll eventually wear out, but it hasn't yet.

    90. Re:I don't get it by owlstead · · Score: 1

      Absolutely, unless there is some well defined expiration date, e.g. those used by web servers and HTTP connections.

      As for Java development, my game, there are things like Maven that can cache "artifacts" locally and check if they are up to date when required.

      In other words, network caching is only feasible on application level. Caching NFS or SMB file shares is an entirely other matter.

      Non-maven release builds can take quite a bit of time at my company, each and every file has to be transmitted over 100 Mbit/s network, each and every time they are needed.

    91. Re:I don't get it by BobPaul · · Score: 1

      Overestimating, sorry...

      Here's a look. TS16GSSD25S-S, 16GB, $160 on newegg. 96MB/s max write throughput.

      AMD was advertising 1 million cycle flash chips back in the late 90s. Typical SSDs are estimated between 1-3million cycles, but manufactures generally don't give this data for individual products. Lets assume 1 million.

      16GB * 1 million / 96MB/s = 5.4 years if all you did was continually write data to the SSD.

      Intel's SSDSA2SH032G1, the 32GB SSD for $270 I referenced earlier? Again, estimating low with 1 million cycles. 32GB * 1 million / 170MB/s = 6.1 years.

      So knock it off with this "matter of weeks" or "matter of months" crap.

    92. Re:I don't get it by lyml · · Score: 1

      Well an example to solve that problem would be to use several pieces for that single block and simply cycle through them, using the one with the most recent stamp.

      Though I doubt that is how the SSD manufacturers actually solve the problem it is a solution to the problem.

    93. Re:I don't get it by w0mprat · · Score: 1
      100mb/s maximum write would take *decades* to kill a SSD. Importantly block write-erase cycles outlives the data retention time for un-refreshed blocks (typically 10 years), so a long-lived system may face data corruption should there somehow be seldom-written blocks lurking. I'm not so confident that the SSD controller circuitry would live that long especially now that all PCBs are manufactured with lead-free RoHS compliant crap. (Indeed I've had a number of reccent mechanical hard drives that have had the controller board fail while the moving parts are working sweetly, yet I have a HDD somewhere from 1998 that still works perfectly).

      Thus the following is some what of a myth nowadays:

      SSDs if used for this purpose will burn out in weeks or months

      Others here will point out the write-limitation ceiling is so high, it'll never be a concern, even in the worst case usage scenario.

      In the case of windows I have had ReadyBoost and eboostr on USB disks for years and have a SSD with linux swap and windows swap files on them. I have not had a problem - and these are bargain bin USB flash drives that I was fully ready to consider expendable.

      --
      After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.
    94. Re:I don't get it by w0mprat · · Score: 1

      Yes but what about linux?

      --
      After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.
    95. Re:I don't get it by Plekto · · Score: 1

      The issue of wear is only properly addressed IF the load-leveling is working properly and the same few blocks aren't getting hammered all day long.

      http://www.pcper.com/article.php?aid=669&type=expert&pid=1

      There's also this issue, which is a real problem for MLC drives.

      The average time where it starts to experience problems when using it like the original poster wanted to is very very short. Reads are fine, but writing crawls to very slow speeds. Some tests have shows that for swap space usage, this can happen in days.

      **an excerpt**
      Until Intel tweaks their write combining algorithms and revises their released firmware, there are ways to minimize your chances of falling into the fragmentation black hole. Here are some things to avoid:

              * Disk partitions not properly aligned with flash block boundaries (to be covered in another article).
              * Heavy temporary file activity (think temporary internet files).
              * Heavy page / swap activity.
              * Applications that write random small chunks, even within a larger file (i.e. BitTorrent / Steam).
              * Running *any* disk defragment utility (DON'T DO IT!).

      ****
      In short, SSDs are not recommended at all with current OSs for swap and similar tasks if you care about long-term speed and performance. There's a reason ramdisks exist, and technically for his intended use, he probably only needs 4GB or so, if that. He might be able with a 64 bit OS to actually just buy more RAM and partition off a chunk as a ramdisk for less money. (say 12GB primary and 4GB temp/swap space) Back up the data space every day to a flash drive. As a bonus, he'll not have any issue either with i/o bandwidth. Even SATA2 is horrendously slow compared to on-board RAM.

    96. Re:I don't get it by Hal_Porter · · Score: 1

      Those AMD flashes with 1 million write cycles were probably NOR flash. NAND flash is typicall 100,000 cycles for SLC and 10,000 cycles for MLC. Most consumer SSDs use MLC because it is cheaper.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    97. Re:I don't get it by Hal_Porter · · Score: 1

      Really? I'm curious, let's say I have an SSD. I write block 0. The SSD controller, being the affable chap that it is, happily puts that in some cell somewhere on the device. I write block 0 again. The SSD controller, still being quite affable, knows that if it keeps rewriting the same cell the cell will fail in time. So it remaps block 0 to some other cell. I turn off the power. Turn it back on. Then I ask my SSD controller to read me back block 0. Where does it get the data from? The first cell I wrote to or the second cell? For correct functioning that's going to be the second cell. How does it know? Perhaps it writes a table of block->cell mappings.

      It has a table. It maps an LBA, i.e. a sector number to a physical address.

      And that table is stored where? In flash memory perhaps. Now that bit of flash memory is going to get the hell written out of it. In fact that bit will get a write for every remapping that the controller does. And what is the life time of that bit of memory?

      I'm actually serious about this. I've never met anyone who could explain to my what that doesn't happen.

      If you invert the table, you can store it in flash. So when you write LBA 0 to block 342 you mark that as containing block 0. When block 0 is rewritten you write it to block 567 and mark block 342 as obsolete. This has the handy property that you only need to update the table when you write a sector. You need to garbage collect the obsolete sectors.

      So you effectively have a table such that table[physical address]=LBA, which is obviously non optimal for data access - you'd need to do a linear search for every write or read.

      Some drives will scan at startup and build a table in RAM so that ram_table[LBA]=physical address but this will need a lot of RAM

      Others will write it to flash. But the ones that write it into flash will need to keep moving it around. E.g. one possibility is to use unused LBAs (I've seen negative LBAs used for this) for the mapping table. If you do that you can keep a list of the physical blocks containing the mapping table in RAM. This is likely to be fairly compact. A 4 byte integer in RAM can tell you where a page of the mapping table is, and that page gives you the addresses of say 2048/4 sectors, or 1MB of data. You could easily have enough RAM in the controller to keep track of the list of pages in the mapping table. Of course the drawback to this scheme is that you need to rewrite two pages in flash for every write - one for the data and one for the mapping table.

      Now you have a scheme like this. When reading a sector you look in ram to find the relevant page of the mapping table, read the mapping table (of course you can have a RAM cache for the most frequently used pages - which interestingly means that SSDs do have an analogue of seek delays - the first access to a new part of the disk will require an extra page read) and then read the data.

      Now of course I've glossed over a bit of subtlety here - the page size of the flash is bigger than 512 bytes, typically 2048 bytes. You can emulate 512 bytes by doing read/modify write cycles, or (better) batch up multiple 512 byte writes into one or more sectors. To make it fast you need to aim to be writing to multiple chips in parallel - probably at least 8. So maximum write speed will only happen when writing in big chunks.

      Most likely SSDs will run better with a bigger cluster size. There's even an argument for increasing the sector size.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    98. Re:I don't get it by bodan · · Score: 1

      The problem with your explanation is that, to be persistent between reboots, the table must be stored in flash memory (as you mentioned).

      But the method you describe causes one write to the relocation table for every write of a “data” block. One supposes that the table is significantly smaller than the data area. So the blocks of (hidden) memory used for the table are actually written even more often than the data blocks. Thus, one would expect them to fail even faster.

      (Even worse: suppose I want to implement wear-leveling for the relocation table, and use a single block to remember where the relocation table was last written. This would be a two-level wear leveling. This means that that single block is written to for every write on the device. Thus, the wear is even more concentrated on the single block, while it is leveled on the rest of the device.)

      I would assume that inventors of wear-leveling are not complete idiots, and that’s there’s a trick to avoid that effect. That’s the part of WL that isn’t usually explained.

      Hmm, points to me for not being completely lazy: Wikipedia indicates that flash devices often (usually? always?) have a special block that supports many more write cycles than usually (by a factor of 100 or more), which is used for wear leveling. So, apparently what I suggested in the parentheses above is actually true: they focus writes on a special block in order to level writes over the rest, and make the special block more resistant. (Which is obviously cheaper than making all blocks more resistant.)

      --
      "I think I am a fallen star. I should wish on myself."
    99. Re:I don't get it by Hal_Porter · · Score: 1

      The mapping table isn't at a fixed location - like data blocks, mapping table blocks can be written anywhere in flash.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    100. Re:I don't get it by Andy+Dodd · · Score: 1

      ionice() your Firefox process then...

      Interesting that what you do is the opposite of what quite a few others do (add an extra layer of cache with Squid...)

      --
      retrorocket.o not found, launch anyway?
    101. Re:I don't get it by bodan · · Score: 1

      Well, yeah, but you still need to find the mapping table. Just as you use the mapping table to find “normal” data blocks (because they keep moving), you need to use something else to find the mapping table (because, as you say, it can be written anywhere).

      That means either a fixed location for the mapping table, or a fixed location for a pointer to the mapping table. Hence the “special block”.

      (You might also just search for it, but that’s not really going to work for anything larger than a few MB.)

      --
      "I think I am a fallen star. I should wish on myself."
    102. Re:I don't get it by Hal_Porter · · Score: 1

      A table in Ram keeping the addresses of the blocks of the mapping table is feasibly small as I mentioned.

      E.g. a 4 byte integer could tell you the address of a block that contains 2048/4=512 mapping table entries, each of which contains the address of one 2048 byte block. So each 4 byte integer lets you find 1MB of data. A 1KByte table lets you track a GB of flash data.

      You'd need to scan on startup to find this - or maybe you'd write the table to flash on a controlled shutdown to make the startup faster.

      Incidentally the only special block I know of is the first one. That's guaranteed to be good and usually to have more write resilience. It's meant for boot code though - typically NAND controllers read that block into internal Ram at startup and code in the block is used to boot the system. If the first block were bad, many NAND based systems would fail to boot.

      You'd don't need specially resistant blocks for the wear levelling scheme I discussed though - every structure can be anywhere in flash. In fact the systems I've seen usually have a write pointer that cycles through the flash - i.e. start at the beginning (well you'd skip the boot blocks) and wrap back to the beginning at the end. This guarantees good wear levelling.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
  2. I'd like to do this in Windoze by Anonymous Coward · · Score: 0

    I can't find a commercial SSD / Platter Hybrid anywhere...

    I wouldn't mind doing some of the work myself but I can't even find those mythical ssd / hard drive kits.

    1. Re:I'd like to do this in Windoze by GuruBuckaroo · · Score: 1

      Throw the SSD in a USB enclosure and try ReadyBoost if you're using Windows Vista SP1, 7, or 2008 Server. Not sure if that will work or not, but you can try it. Works with Flash drives.

      --
      Poor means hoping the toothache goes away.
  3. isn't 40 GB enough for applications? by owlstead · · Score: 3, Interesting

    Is there really a need for this? Intel 40 GB SSD still has a read speed of 170 MB/s and costs about 100 euro here in NL. Why have some kind of experimental configuration while prices are like that? OK, 35 MB/s write speed is not that high, but with the high IOPS and seek times you still have most of the benefits.

    I can see why you would want something like this, but I doubt the benefits are that large over a normal SSD + HDD configuration.

    1. Re:isn't 40 GB enough for applications? by Unit3 · · Score: 4, Informative

      They are huge for larger applications. Database servers, for instance, can see performance increases in the magnitude of 10-20x the number of transactions per second when using a scheme like this for datasets that are too large to fit in RAM.

      --
      -- sudo.ca
    2. Re:isn't 40 GB enough for applications? by Annirak · · Score: 1

      The idea is that I want to automate the management of which data are stored on the SSD. Entire files need not be cached, only their hot sectors. By using a SSD directly, you lose the benefits of keeping infrequently accessed bits of data out of the SSD, reserving it only for the most commonly accessed data.

      Doing it this way makes the majority of the filesystem perform better, whereas using a normal SSD+HDD configuration requires you to actively manage where your most frequently used data are.

    3. Re:isn't 40 GB enough for applications? by MobyDisk · · Score: 1

      t I doubt the benefits are that large over a normal SSD + HDD configuration.

      Which doesn't work for laptops. :-(

      Most laptops can only fit a single drive. I would love to have an SSD for faster build times, but a 40GB SSD is useless in my laptop since the second drive would have to be an external. But a 300GB drive with 16GB of integrated flash might give me a single drive with the performance boost that I am looking for.

    4. Re:isn't 40 GB enough for applications? by mickwd · · Score: 1

      I'd agree with this. Get that Intel SSD and stick /usr on it, together with any other read-mainly filesystems (maybe the root filesystem too, if you have stuff like /var on separate partitions).

      As well as faster reads, the biggest gains are in seek times, so it'd be helpful to have your home directory and all it's "dot" config files on there too (especially when starting up something like Gnome or KDE). However, if you're gonna fill your home directory with tons of stuff, then stick your home directory itself on the SSD, but mount a hard-drive-based partition into it and keep your space-using stuff on here. For example:

      /            SSD
      /var         Magnetic
        :
      [Paging]     Magnetic
      /usr         SSD
      /home        SSD
      /homestorage Magnetic

      For user X, create /homestorage/X, then create a symlink /home/X/stuff -> /homestorage/X.

      Yes it's a slight pain to keep most of your personal stuff in a subdirectory below your home directory, but it's hardly going to kill you.

    5. Re:isn't 40 GB enough for applications? by kgo · · Score: 2, Insightful

      Yeah, but if you've got some 'enterprise-level database' with those sort of transaction requirements, you can probably justify the purchase of SSDs. It's not exactly like you're building that system from craigslist parts...

      --
      Can you construct some sort of rudimentary lathe?
    6. Re:isn't 40 GB enough for applications? by Amouth · · Score: 1

      Speak for your self.. some companies do not want to spend the money required to do it right.. but rather would have you spend more time than the equipment cost putting something crazy together to make it work.

      --
      '...if only "Jumping to a Conclusion" was an event in the Olympics.'
    7. Re:isn't 40 GB enough for applications? by h4rr4r · · Score: 1

      Then just use ZFS on solaris or bsd.
      You can add the SSDs as cache devices.

    8. Re:isn't 40 GB enough for applications? by Just+Some+Guy · · Score: 1

      Thank you! People say that this is "wasting time", then advocate manually partitioning your data and moving files around to their most appropriate locations. No thanks; I'd much rather let the computer (which knows what data I'm actually accessing) handle it for me.

      --
      Dewey, what part of this looks like authorities should be involved?
  4. Holy fuck, just use partitions. by Anonymous Coward · · Score: 0

    Get one drive of each type. Stick them in your computer. Create partitions for the main directory hierarchies. Put /, /boot, /bin, /etc, /usr, and other relatively-static hierarchies on the SSD drive. Put /home, /var, and other frequently-modified directories on the magnetic disk drive. There, you've got the caching you want.

    1. Re:Holy fuck, just use partitions. by Unit3 · · Score: 1

      Err, no you don't. That's not caching at all, and doesn't help with datasets that don't fit on the SSD.

      This is a shortsighted kludge with limited uses, and not at all the elegant solution the poster was asking for.

      --
      -- sudo.ca
    2. Re:Holy fuck, just use partitions. by dow · · Score: 1

      Seems rather obvious to me... I've always had my / on the first part of the fastest disk, and had /usr, /home and /var elsewhere. It makes it far easier to upgrade or mess around with multiple distributions when its partitioned like this, but Linux distributions often dont encourage it as much as they should.

      You probably just have one or two directories that are huge, such as your Southpark collection and maybe some porn. Just put everything on the SSD, and have the few large directories of rarely used sequentialy accessed files on a cheap green drive. If you don't already have all these sorts of files grouped together, changing your way of working won't kill you.

      I'm currently using Windows7 off an SSD, and won't ever be without one in the future. There is no way that 200 quid spent on a processor or ram or graphics card can make a system so responsive as only 100 quid spent on an SSD. My other drive in this system is a Velociraptor, but in hindsight I'm thinking perhaps I should have bought a slower larger drive. Then again, I guess that is what NAS is for :)

    3. Re:Holy fuck, just use partitions. by obarthelemy · · Score: 1

      really depends.
      1- responsiveness is not speed. Some users care about boot and app launch times, other about how quickly their dataset / movie conversion... finishes.
      2- SSDs are still expensive. I'm fairly sure intelligent caching would let you get as much speed boost from a 16GB SSD as you currently get from a 64GB one that gets filled with rarely-used cruft that happens to be in the same dir as frequently-used files, and misses frequently-used files because it's full already.

      --
      The Cloud - because you don't care if your apps and data are up in the air.
  5. ZFS by Anonymous Coward · · Score: 5, Informative

    ZFS can do this (http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Separate_Cache_Devices) but I don't know about zfs-fuse

  6. Buffers? by DoofusOfDeath · · Score: 1

    I hate to sound dumb, but isn't what you're describing basically file system buffering that OS's have been doing for many decades now?

    1. Re:Buffers? by Anonymous Coward · · Score: 0

      Yes, but I guess this is supposed to be persistent across power cycles.

      I.e. recently/often accessed files/sectors/blocks are kept on the fast SSD, whereas data that is accessed more seldom is stored on an ordinary HD. Assuming the caching system does the right thing (it probably needs to be somewhat smarter than just a LRU), you would benefit from the SSD-cache mainly during boot and directly afterward when data is not yet cached in RAM.

    2. Re:Buffers? by Anonymous Coward · · Score: 0

      Almost. Typically files are cached from the spinning disk straight to RAM. The author wants to cache the spinning disk to the SSD, then to RAM. In other words, think of the SSD as a very large cache for the spinning disk. The advantage is that the author wouldn't have to decide which files to write to the small SSD or the large disk, the caching algorithm would do that automatically.

    3. Re:Buffers? by MobyDisk · · Score: 2, Informative

      No.

      You would buffer on an SSD differently than your would do it in memory. Memory is volatile, so you write-back to disk as fast as possible. And whenever you cache something, you trade valuable physical memory for cache memory. With an SSD, you could cache 10 times as much data (Flash is much cheaper than DRAM), you would not have to write it back immediately (since it is not volatile), and the cache would survive a reboot so it could also speed the boot time.

    4. Re:Buffers? by ras · · Score: 1

      Not really. There are a couple of applications for SSD's. One is to speed up boot times. Obviously a RAM cache is useless in that application.

      Another is if you want to speed up a transaction server (one that is writing as much as it is reading), then the answer is again no. Think of the battery backed up RAM cache RAID arrays have. Those caches are there for a reason. RAM can do read caching, but it can't do write caching and still be secure across power failure.

      My interest is in the second application, and like the guy who wrote this article I have been patiently waiting for someone to produce a nice local file system cache for Linux. So far I haven't seen anything that comes close.

    5. Re:Buffers? by m.dillon · · Score: 3, Informative

      The single largest problem addressed by e.g. DragonFly's swapcache is meta-data caching to make scans and other operations on large filesystems with potentially millions or tens of millions of files a fast operation. Secondarily for something like DragonFly's HAMMER filesystem which can store a virtually unlimited number of live-accessable snapshots of the filesystem you can wind up with not just tens of millions of inodes, but hundreds of millions of inodes. Being able to efficiently operate on such large filesystems requires very low latency access to meta-data. Swapcache does a very good job providing the low latency necessary.

      System main memory just isn't big enough to cache all those inodes in a cost-effective manner. 14 million inodes takes around 6G of storage to cache. Well, you can do the math. Do you spend tens of thousands of dollars on a big whopping server with 60G of ram or do you spend a mere $200 on a 80G SSD?

      -Matt

    6. Re:Buffers? by JoelKatz · · Score: 1

      Yes, exactly. And that's why it's so surprising that there's no good way to do what he wants.

      The only quirk is that he wants to keep frequently-read, rarely-modified sectors on the SSD. Not frequently-used data, and not recently-modified data. He needs the software to identify frequently-read, rarely modified data and store *that* on the SSD.

  7. ZFS L2ARC by jdong · · Score: 5, Informative

    Not Linux per se, but the same idea is implemented nicely on ZFS through its L2ARC: http://blogs.sun.com/brendan/entry/test

    1. Re:ZFS L2ARC by Anonymous Coward · · Score: 2, Interesting

      Not Linux per se, but the same idea is implemented nicely on ZFS through its L2ARC: http://blogs.sun.com/brendan/entry/test

      Swapcache on DragonFly BSD 2.6.x was implemented for this very reason IIRC.

      http://leaf.dragonflybsd.org/cgi/web-man?command=swapcache&section=ANY

    2. Re:ZFS L2ARC by bill_mcgonigle · · Score: 1

      Not Linux per se, but the same idea is implemented nicely on ZFS through its L2ARC: http://blogs.sun.com/brendan/entry/test [sun.com]

      And, of course, you can run Linux on it via iSCSI or NFS. I even have a machine tha runs Xen, a CentOS 5 Dom0, a Nexenta DomU which gets physical disks for ZFS, and shares them to Fedora DomU's, all on one piece of physical hardware.

      Really the only weak link is CentOS's iSCSI tools, which go out to lunch on occasion.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  8. You don't need it by Anonymous Coward · · Score: 0

    Install the operating system and key applications to the SSD and use your standard harddrive for all the data storage

    1. Re:You don't need it by MobyDisk · · Score: 1

      Laptops can only fit a single drive.

    2. Re:You don't need it by countach · · Score: 1

      A lot of people replace the DVD drive with a 2nd drive. There are kits available.

    3. Re:You don't need it by MobyDisk · · Score: 1

      For some laptops, yes, that hack would work. But that isn't a general purpose solution like the SSD + HDD combo that the submitter proposed.

    4. Re:You don't need it by owlstead · · Score: 1

      Yeah, but this is about a *software* implementation using an SSD as cache for one or more HDD. So there are two drives by definition.

    5. Re:You don't need it by MobyDisk · · Score: 1

      Oops, yeah. I was reading comment threads about SSD + HDD in one and I forgot that the submitter hadn't proposed that.

    6. Re:You don't need it by burisch_research · · Score: 1

      Fail. The OP stated SSD as *CACHE* for a standard magnetic spinny thing.

      --
      char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
    7. Re:You don't need it by Anonymous Coward · · Score: 0

      and usually one really fast sd card.

    8. Re:You don't need it by Anonymous Coward · · Score: 0

      *Some* laptops will only fit a single drive.

  9. Counter-Productive by Bucaro · · Score: 1

    By having the SSD act this way, you will lower the lifespan of the drive by unnecessarily depleting the write-cycles with continued optimization

    1. Re:Counter-Productive by sourcerror · · Score: 1

      If you want to use it to boost booting, you will mostly read it.

    2. Re:Counter-Productive by JessGras · · Score: 1

      But what is the point of accelerating an access which happens only rarely?

    3. Re:Counter-Productive by Anonymous Coward · · Score: 0

      So what if he wants to blast his drive... The idea is sound. It should be generic though and be able to use any drive.

      So you could have say a cheap usb drive and a slow HD...

      MS did this with their 'readyboost' and USB thumb drives. Doesnt work worth a damn because they reset the cache everytime you reboot. So you spend 10 mins waiting for it to fill 4 gig with junk.

    4. Re:Counter-Productive by Unit3 · · Score: 2, Insightful

      Define "unnecessarily". Given current SSD costs and depletion rates, it's probably completely acceptable to replace an SSD used as an intermediary cache in front of a large spindle-based array every couple of years.

      Just because it's not useful to you, doesn't mean it's not useful.

      --
      -- sudo.ca
    5. Re:Counter-Productive by Anonymous Coward · · Score: 0

      To you but mabye there are peole that's boot often. mabye people that whis to save energie and reboots the computer.
      people doing development that brings the system down every now and then and lots of other people for varius reasonshttps://help.ubuntu.com/community/WifiDocs/Driver/bcm43xx/Feisty_No-Fluff

    6. Re:Counter-Productive by MobyDisk · · Score: 1

      I don't understand what you mean. How does caching data on an SSD lower the lifespan on the magnetic drive? Or did you mean it lowers the lifespan of the SSD? Which it would do, but it shouldn't be any more than any other use of an SSD. SSD lifespan is really not an issue any longer. (Intel claims over 10 years on their drives. Other manufacturers are claiming similar timespans).

      Could you clarify?

    7. Re:Counter-Productive by pwnies · · Score: 3, Informative

      Sadly, nowadays this is a myth. Current MLC and SLC SSD's have (on average) 10,000 and 100,000 writes (respectively) before any bitwear will occur. While this number is small, remember that all modern mainstream SSD's have wear leveling algorithms built into the controller. Intel rates their drives' minimum useful life at 5 years [pdf link - page 10], with an estimated life of 20 years. Note that this number is based on 20GB of writes per day, every day. SSD's nowadays will have no problems with acting as a cache for the system.

    8. Re:Counter-Productive by TheRaven64 · · Score: 1

      And if you're only using it for cache, who cares if it wares out in a few years? Your data is all safely on the other disk(s) and in a few years the SSD won't be worth much anyway.

      --
      I am TheRaven on Soylent News
    9. Re:Counter-Productive by obarthelemy · · Score: 1

      I guess the alternative is to put your SSD under a nice glass protection, and display it for your friends to fawn over. Actually using it might damage it !

      --
      The Cloud - because you don't care if your apps and data are up in the air.
    10. Re:Counter-Productive by FreakyGreenLeaky · · Score: 1

      Exactly. Another use is plain old swap space. I find it works extremely well on heavily loaded machines where lots of memory is required.

  10. Dear Slashdot : by Anonymous Coward · · Score: 0

    How much was Slashdot paid for this plug for Silverstone?

    Hackers want to know.

    Thanks in advance.

    Nick Haflinger.

    1. Re:Dear Slashdot : by Anonymous Coward · · Score: 0

      Actually $0.0000. If you want to post your own plug, please visit http://slashdot.org/submission

      And thanks for asking.

      Slashdot

  11. I get it by Anonymous Coward · · Score: 0

    He wants the OS to intelligently (and automatically) use an SSD to store the frequently used files from his larger spinning hard disk. It's a great idea and surely Windows will do it soon enough (as much as I hate to say it).

    1. Re:I get it by drsmithy · · Score: 1

      He wants the OS to intelligently (and automatically) use an SSD to store the frequently used files from his larger spinning hard disk. It's a great idea and surely Windows will do it soon enough (as much as I hate to say it).

      It basically does already (ReadyBoost). I can't imagine there would be much work involved in modifying it to use arbitrary disks like SSDs instead of just thumbdrives. Indeed, I wouldn't be at all surprised if there's a simple Registry setting that decides what devices can and can't be used for ReadyBoost.

    2. Re:I get it by Rockoon · · Score: 1

      I do not believe that ReadyBoost caches frequently used files because thats not the problem that it is solving. I believe that you are thinking of SuperFetch (which itself does cooperate with ReadyBoost.)

      ReadyBoost is an attempt to reduce latency by leveraging devices that have very fast seek times, such as USB thumb drives and SD cards.

      These same devices often have *horrible* throughput. Typically only a few megabytes per second, pretty much never anywhere near as fast as even the slowest hard drives. The bad throughput means that they cannot be used to effectively cache frequently used files, but instead only small parts of files. The first few sectors of sequentially read blocks get cached here, but for bulk moving of data into memory the HD must still be relied upon.

      So readyboost drives mostly cache just the first few sectors of the files you use, along with a minority of hot sectors.

      --
      "His name was James Damore."
    3. Re:I get it by drsmithy · · Score: 1

      I do not believe that ReadyBoost caches frequently used files because thats not the problem that it is solving. I believe that you are thinking of SuperFetch (which itself does cooperate with ReadyBoost.)

      Well, it doesn't cache files because that would be silly. It caches blocks. The _result_ is the same as what the OP is after, which is that frequently accessed _data_ is cached on the higher-speed device.

      ReadyBoost is an attempt to reduce latency by leveraging devices that have very fast seek times, such as USB thumb drives and SD cards.

      Yes. Otherwise known as a cache.

      These same devices often have *horrible* throughput.

      But that's not really important, because if throughput is a limiting factor then you're looking at long, sequential reads or writes - which typically aren't cached by most caching algorithms anyway.

      Also, a few MBs a second is pretty pitiful for any modern thumbdrive. A semi-decent drive should be able to hit 15-20MB/sec reads and ~10MB/sec writes.

      The bad throughput means that they cannot be used to effectively cache frequently used files, but instead only small parts of files.

      Yes, that's because that's where the biggest benefits are derived - small accesses where seek time dominates. If throughput is the limiting factor, you're almost always in a long sequential read, where the data is unlikely to be cached even in RAM.

    4. Re:I get it by Rockoon · · Score: 1

      Well, it doesn't cache files because that would be silly. It caches blocks. The _result_ is the same as what the OP is after, which is that frequently accessed _data_ is cached on the higher-speed device.

      NO! You dont seem to get it.

      Only the first few sectors of frequently accessed sequential data benefit from being cached here.

      But that's not really important, because if throughput is a limiting factor then you're looking at long, sequential reads or writes - which typically aren't cached by most caching algorithms anyway.

      Oh yes they are. Don't make things up.

      The typical cache algorithm doesnt know anything about the sequentialness of the operations, the latency of the device, the throughput of the device, nor the overall size of the read request. Its level of operation is below that, and for good reason. The cache memory is superior in every way to what its caching. That is, specifically, that cache memory has better latency AND better throughput. In that typical cache algorithm that you are misinformed about, the two things it considers in its caching policy are frequency and age. Everything else is meaninglessly static in favor of cache memory.

      It is pretty much only with readyboost that the back end is not uniformly worse, that the flash drive is better than the HD in some ways but not others. It considers size, sequentialness, frequency, latency, throughput, and age.

      Yes, that's because that's where the biggest benefits are derived - small accesses where seek time dominates.

      It does it even on large sequential accesses, but only for up-to the first 512KB (yes, since I posted last I read the damn specifics. The white paper is available.)

      The 512KB limit was derived from the amount of data that can be ready from the top performing thumb drives (around 20MB/sec) within the seek time of the slowest HD's (around 24ms)

      --
      "His name was James Damore."
    5. Re:I get it by drsmithy · · Score: 1

      Oh yes they are. Don't make things up.

      I can guarantee you the typical SAN will identify and (typically) not cache large sequential disk operations. If you want that sort of behaviour you need to tune for it. OS caching is a little different, but by its nature sequential data will typically be evicted quicker

      It does it even on large sequential accesses, but only for up-to the first 512KB (yes, since I posted last I read the damn specifics. The white paper is available.)

      Can you point me to this whitepaper ? I searched but couldn't find anything with technical details about how ReadyBoost is implemented.

    6. Re:I get it by Rockoon · · Score: 1

      OS caching is a little different, but by its nature sequential data will typically be evicted quicker

      So what you are thinking is that the OS cache intentionally purges high frequency data in favor of low frequency data, for a reason that arbitrary hurts performance.

      Got it. You have no idea what you are talking about.

      "Even though I am fielding requests for that big block of data all the god damned time.. like every five seconds and shit... I'm going to go ahead and dump most of it as soon as the system asks for that random burst of rarely used data that is only read once per session at most"

      Leave the cache algorithm discussions for the programmers, OK? Running a SAN does not make you informed about cache algorithms, and SAN's arent even fucking typical.

      --
      "His name was James Damore."
    7. Re:I get it by drsmithy · · Score: 1

      So what you are thinking is that the OS cache intentionally purges high frequency data in favor of low frequency data, for a reason that arbitrary hurts performance.

      No. Nearly the exact opposite, in fact - that the OS will favour smaller, more frequently accessed data over larger, less frequently accessed data.

      Dumping most of your cache contents because someone decided to rip a 4GB DVD to an ISO (to use a fairly obvious example) is nearly always going to be a dumb idea.

      "Even though I am fielding requests for that big block of data all the god damned time.. like every five seconds and shit... I'm going to go ahead and dump most of it as soon as the system asks for that random burst of rarely used data that is only read once per session at most"

      That's not an access pattern even remotely typical of a large sequential read or write. Large sequential reads or writes to a given dataset are - relatively - uncommon events.

  12. Hybrid drives were the way to go by engrstephens · · Score: 1

    I think SSD will always be behind Magnetic. Ive dreamed of building an SSD controller that had a magnetic backend. Popin SD cards to expand the cache. OS doesnt notice that your drive is really 2 drives. I'm going to get flamed but I do think M$ had it right when they backed hybrid drives!

    1. Re:Hybrid drives were the way to go by MobyDisk · · Score: 1

      I agree, and I've been thinking along these same lines. Existing caching algorithms are not sufficient for this purpose. Iin-memory caches, they attempt to write-back the data as soon as possible since the memory is volatile. You would want an algorithm specifically made for non-volatile caching.

      I imagine a 500GB hard drive with 32GB of SSD. The caching algorithm would be smart enough to keep 2 cache areas: 16GB reserved for long-term read-only things like OS files. No matter what disk thrashing goes on, don't purge these. This keeps boot times fast. The second 16GB would be a normal write-back cache. Browser cache, temp directory, pagefile, etc. So if I edit a video, compile some code, or play a game - I get SSD performance. The drive might decide to write that back to the magnetic disk immediately, or in 5 minutes, or tomorrow afternoon. Yes - I could even power-down the computer and it could finish the write-back another day. There is no need to write-back unless the rive is idle, or if it is out of cache space.

      The good thing is that this could be done in software. I had hoped that Windows ReadyBoost would do this. But from what I've read, even though it is improved in Windows 7 it still is only useful if you have low memory. I am not sure if that is because USB is slow, or because ReadyBoost still operates like an in-memory cache (since the USB flash drive could be removed). I would love to have a check-box that says "I promise I won't remove this drive - treat it as non-volatile" to get it to do that. I bet someone could write a driver to do it.

  13. OSDI by jameson · · Score: 1

    The OSDI deadline is in August; plenty of time to implement this, write it up, and get a publication at a top research conference out of it!

  14. bcache by Wesley+Felter · · Score: 5, Informative

    http://lkml.org/lkml/2010/4/5/41

    I'm a little surprised at the lack of response on linux-kernel.

    Solaris and DragonFly have already implemented this feature; I'm surprised that Linux is so far behind.

    1. Re:bcache by Kento · · Score: 5, Informative

      Hey, at least someone noticed :)

      That version was pretty raw. The current one is a lot farther along than that, but it's still got a ways to go - I'm hoping to have it ready for inclusion in a few months, if I can keep working on it full time. Anyone want to fund me? :D

    2. Re:bcache by pydev · · Score: 1

      You shouldn't be surprised that "Linux is so far behind"; we like it that way. If we thought that what the Solaris or DragonFly engineers are doing was important, we'd be using their systems instead.

    3. Re:bcache by CyprusBlue113 · · Score: 1

      If it were a block device wrapper along the lines of md, I'd be interested.

      Have a project at the moment where I'd *love* to be able to specify tiers of storage (say md volumes), and have writes go to the highest priority, and blocks trickle down to the lowest based on usage.

      Sort of like a specialized CoW.

      --
      a handful of selfish greedy people are no match for millions of selfish, greedy people -u4ya
    4. Re:bcache by rayvd · · Score: 1

      Any plans to add in write cache support? I'm thinking along the lines of putting ZFS's ZIL on SSD's. Really makes NFS in sync mode much quicker.

    5. Re:bcache by Kento · · Score: 1

      This should do what you want - with the caveat that I haven't thought about multiple tiers, but I can't imagine that being that hard to add.

      It's currently written so there's a systemwide pool of block devices that are used for cache, and all cached data is spread around them, regardless of where it came from. It wouldn't take much work at all to change that though, if there was something that'd benefit.

    6. Re:bcache by TheRaven64 · · Score: 1

      I'm surprised that Linux is so far behind

      Obviously you are either unfamiliar with Linux, or unfamiliar with all non-Linux operating systems except perhaps Windows and maybe Darwin.

      --
      I am TheRaven on Soylent News
    7. Re:bcache by Kento · · Score: 1

      Yep, that's on the list.

    8. Re:bcache by CyprusBlue113 · · Score: 1

      I'm not looking for a limited use pool of cache only disks so much as hierarchical storage management at the block level.

      For example you could have a set of 7.2k sata drives, a set of 15k SAS drives, a set of FC drives, and a set of SSDs. Each group would be a tier, which all together would act as a storage pool (each tier probably raided together using MD, as there is no hybrid md/lvm at the moment to sub-devide them gracefully at the lv level instead of pv level), with writes all going to the highest speed tier allowed for the volume, and blocks being moved around constantly based on access stats (demoted / promoted to different tiers).

      --
      a handful of selfish greedy people are no match for millions of selfish, greedy people -u4ya
    9. Re:bcache by GuruBuckaroo · · Score: 1

      Summer of Code, dude. This sounds like something Google would get behind.

      --
      Poor means hoping the toothache goes away.
    10. Re:bcache by Jah-Wren+Ryel · · Score: 1

      Isn't summer of code for kids?
      With a UID like that, I don't think he's a kid.

      --
      When information is power, privacy is freedom.
    11. Re:bcache by jdb2 · · Score: 1

      Here Here!

      I've been looking for something like this for Linux for ages. Unfortunately I don't know much about kernel level programming but I'd certainly donate to your project and I bet *many* others would as well. I have my $50 waiting -- time for you to set up the necessary PayPal account. :)

      Cheers,

      jdb2

    12. Re:bcache by jdb2 · · Score: 1

      Here Here!

      Change that to "hear hear." Damn fingers.

      jdb2

    13. Re:bcache by SQL+Error · · Score: 1

      Thanks, now I feel old. No, wait, that's just the arthritis acting up. Never mind!

    14. Re:bcache by Kento · · Score: 1

      Heck, that would be friggin' amazing if people really wanted to do that :) I've been working full time and then some for near a month and a half now, and I'm certainly not getting paid for it yet... Email address in the profile is my paypal account :)

    15. Re:bcache by Anonymous Coward · · Score: 0

      As long as you enjoy your lack of a cohesive operating system bundle...

    16. Re:bcache by Anonymous Coward · · Score: 0

      As a long suffering user of Solaris, I enjoy my lack of "cohesive operating system bundle" immensely. I just hope that Sun engineers will stay contained at Oracle, instead of spreading across the industry again.

    17. Re:bcache by dpilot · · Score: 1

      I read your post, too. So someone else noticed. By reading the post, it was clear that your patch wasn't nearly ready for someone like me, who only occasionally goes off of distro kernels to vanilla. But it's Gentoo, so at least I compile my own kernels, distro or vanilla.

      As a general thoght on the whole subject, I recently started using FS-Cache for my nfsv4 at home, which I've been using for several years. Reading through the documentation, it appears that FS-Cache is a generic middle-layer which can have any number of front-end filesystems and can accomodate multiple types of back-ends. He describes nfs and afs as network filesystems in front, but also talks of using iso9660 as an example of a "slow" front-end filesystem that could use caching, but is not networked. On the back-end he describes 2 cache systems, one partition-based and one filesystem-based.

      It seems to me that another way this thing could be viewed is with your cache as a high-performance back-end to FS-Cache, which would make it advantageous for traditional disk (not network) based filesystems to hook into FS-Cache for the performance boost. I've no idea how far this deviates from what you've already done, or how hard it would be to patch existing filesystems to hook into FS-Cache. It just seems like perhaps a more generic implementation. It also suggests further work for FS-Cache, in that it would be nice to have multiple back-ends simultaneously active, perhaps in an L4/L5 cache type of way, where L5 would only back network filesystems while L4 would back most-active files, local or network.

      Nice of me to suggest - I know, I need to show some code.

      --
      The living have better things to do than to continue hating the dead.
  15. Waste of time by onefriedrice · · Score: 5, Informative

    What a waste of time. Just put /home on a magnetic disk and everything else on the SSD. This way, you can get away with a small (very affordable) SSD for your binaries, libraries, config files, and app data, and use tried and true magnetic for your important files. Your own personal files don't need to be on a super fast disk anyway because they don't get as much access as you would think, but your binaries and config files get accessed a lot (unless you have a lot of RAM to cache that, which I also recommend). I've been doing this for over a year and enjoying 10 second boots, and instant program access coldstarts (including openoffice and firefox).

    I personally fit all my partitions except /home in only 12.7GB (the SSD is 30GB). Seriously, best upgrade ever. I will never put my root partition on a magnetic drive ever again.

    --
    This author takes full ownership and responsibility for the unpopular opinions outlined above.
    1. Re:Waste of time by MobyDisk · · Score: 1

      Just put /home on a magnetic disk and everything else on the SSD

      Try jamming two hard drives into a laptop. :-(

    2. Re:Waste of time by Logic · · Score: 1

      My primary Linux laptop is an Inspiron 1721, with two mirrored drives.

      --
      -Ed Felix qui potuit rerum cognoscere causas.
    3. Re:Waste of time by MobyDisk · · Score: 1

      Can I have it? :-)

      I think my next laptop will have space for 2 drives. I might even be willing to have my optical drive be external. As a developer, I need the disk space and a single SSD just can't cut it unless I want to spend close to $1000 on the drive.

    4. Re:Waste of time by Anonymous Coward · · Score: 0

      take out the CD/DVD drive and you can fit a hard drive in there instead.

    5. Re:Waste of time by Anonymous Coward · · Score: 0

      A nice idea (as far as caching). I've often thought that using two drives for my system is a pretty big waste of one of the drives. I used two drives because when installing a new version of Linux, I didn't want to hose all of my user preferences, email, configuration, etc. So what I would do is keep /home on a second drive, then when I install a new version of Linux, I blow away the drive with the root partition, install a new version. After installation, I toast the /home partition on the / partition, then ln -s /seconddrive/home /home and one link brings all data back (really it never moved), yet its there. You never have to move email, background pictures, fonts or other user configuration again. Extra binaries and system data files are also kept on the second drive, while system files (libraries, updates, graphics, etc) are all kept on the root partition. For a 'fully loaded' Linux setup, 20GB is lots (since extra binaries and data files are kept on drive 2). Go ahead and apt-get to your hearts content, and you will have a hard time filling 20GB. Very slick. You always get a new version whenever you upgrade your system (which on Ubuntu for me is every 6 months) all new software, and yet all of your stock configuration is fast and easy (about 4 key presses for everything), and you never lose any email, even decades later.

    6. Re:Waste of time by vikingpower · · Score: 1

      Just spent $ 4100 on an Alienware M17x with dual SSD. Awesome perfs. Never saw anything like it.

      --
      Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
    7. Re:Waste of time by MobyDisk · · Score: 1

      I know I could do that, but there are downsides to it and that isn't a general purpose solution. The submitter's proposed SSD + HDD in one would be much nicer. :-)

    8. Re:Waste of time by MobyDisk · · Score: 1

      How much does it weight? I almost bought an Alienware a few years ago, but they didn't list weight in the specs.

    9. Re:Waste of time by owlstead · · Score: 1

      Actually, my S300 Thinkpad does have a second SATA connector - it's used by the rather useless DVD writer. I've looked everywhere but I cannot find anybody that sells a simple slim-DVD drive bracket. I cannot even find a cable that is suited. There is a site that build their own cable, but the cable that they build it out of is hard to get and I'm not that great at soldering electronics either.

      The problem is of course that there are a few different connectors out there (3 to 4 is my current estimation). Furthermore, people would whine about the looks of the drive bay (not in the right color or form). And for high end laptops vendors already sell so called multi-bays that fit hard disk drives. Still, since a bracket would basically consist of a well ventilated bracket with two cables, I cannot fathom why there is nobody in the entire world manufacturing these - especially if you have a look at the oodles of components and cables available that fill other niches.

      If you really want two disks you'll have to opt for a desktop replacement with 2 bays or a high end with multi-bay (and pay about 50 dollars for the plastic bracket). Or go for one of many external options of course.

    10. Re:Waste of time by sylvandb · · Score: 1

      I've looked everywhere but I cannot find anybody that sells a simple slim-DVD drive bracket.

      http://www.idotpc.com/thestore/pc/viewPrd.asp?idcategory=74&idproduct=780

      And there are many more turned up by google. Also look on the Mac sites (e.g. for putting two hard disks into a Mac Mini).

      sdb

    11. Re:Waste of time by MikeFM · · Score: 1

      Not much good for servers. Although I opt for just putting craploads of RAM into the system for the most part. Unfortunately HDDboost isn't appropriate for servers as it doesn't cache the most read data but just the start of the disk and requires a reboot to refresh it's data. Supposedly they are working on a server friendly version.

      I have a custom FS that caches better to RAM for my use - especially writes. I'd rather see a hardware-based battery-backed RAM hdd cache though. Would be sweet if you could plug in massive amounts of RAM as a cache to these multi-terabyte drives. Caching writes safely would be extremely awesome for databases.

      I have 128GB Consair Nova SSDs in my laptops right now. Would like bigger but 128GB covers what I need available all the time. The rest is all on my NAS.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
    12. Re:Waste of time by Anonymous Coward · · Score: 0

      Your solution is too rigid and something akin to hard partitioning.

      When you have 5+TB of arbitrary data and want to improve performance (principle of locality) then it is an excellent idea to put another tier in the hierarchy when the technology exists.

    13. Re:Waste of time by owlstead · · Score: 1

      That link provides no solution for a DVD drive as in my SL300. That's just a physical place holder for a slimline desktop case. I need to have a connector as well. One that is not using a 30 cm cable, if anywhere possible.

      The Mac sites clearly describe soldering cables. And even the Mac Mini is easier for storing cables than in a standard DVD drive bay.

      So, care to backup your claim instead of wasting my time with another fruitless search?

    14. Re:Waste of time by sylvandb · · Score: 1

      Here, I'll hold your hand a little bit longer, but I'll leave you to pick which of the 10-20 variants they list will work best in your machine.

      http://lmgtfy.com/?q=notebook+hard+drive+optical+bay&l=1

    15. Re:Waste of time by jeff_at_betaredex · · Score: 1

      but /home is where I do all my compiling from! Just goes to show you, one size does not fit all.

  16. Configurability by Anonymous Coward · · Score: 0

    Set the root and swap partitions on the SSD, set up the kernel so the hibernation image is stored on the SSD, and set the /home partition on the HDD.

    Not as elegant as what your asking for, and your solution could easily be implemented, but the solution outlined above should get most people the same results. Any directories you don't want on the SSD you can just assign to the HDD.

    Ahh, configurability - don't you love Linux? :D

  17. Go for Hardware implemented Caching by iammani · · Score: 1, Informative

    Hardware implemented caching is the way to go. There are 'hybrid drives' available now, which automatically cache disk access to SSD. These are very specific for the task and way more efficient than any software implementation.

    1. Re:Go for Hardware implemented Caching by Wesley+Felter · · Score: 1

      I see that you haven't actually used hybrid hard drives, because they're nothing like what you describe. AFAIK only Samsung ever made them, and they've now been discontinued. The HHD itself didn't perform caching; it relied on Vista to manage the flash, which didn't really work out when no one bought Vista. HHDs also included laughably small (256MB) and slow flash that would get totally owned by the smallest slowest SSD today.

    2. Re:Go for Hardware implemented Caching by iammani · · Score: 1

      Mmmm, interesting. I guess I my understanding about how these 'hybrid drives' work is wrong. That makes me wonder why isnt a completely hardware implemented ssd cache available? Is there a technical reason why this is not possible? Wouldnt this be faster than a software implemented one?

    3. Re:Go for Hardware implemented Caching by Wesley+Felter · · Score: 1

      Adaptec MaxIQ is a hardware SSD cache; just don't ask what it costs. To do SSD caching you need some DRAM to hold the metadata and a CPU to manage the cache; would you prefer to buy additional CPU/RAM or just use what's already in your computer?

    4. Re:Go for Hardware implemented Caching by iammani · · Score: 1

      Modern harddisks already have DRAM for buffering (also referred to as disk cache) and a dedicated "chip"/embedded processor exclusively for cache management should be cheap.

    5. Re:Go for Hardware implemented Caching by owlstead · · Score: 1

      Somebody forgot that you cannot create speed by just using a slow flash chip or two. You need speedy chips, and a lot of them, to create a fast SSD. Besides a good controller and software for that controller of course. Besides that, 256 MB is so low that I wonder if RAM would not already perform most of the caching, even for writes.

  18. Working on this by pmjordan · · Score: 1

    I've actually been working on this off-and-on for a while, I'm hoping we can release some beta code soon. Currently developing it on Linux, but planning to release OSX and Windows versions, too. We're caching reads and writes, and only the blocks that are most frequently used, plus various other SSD-relevant optimisations. The block allocation logic is pretty complex (and I'm too busy with work), which is why it's been taking so long.

  19. But why? by Eudial · · Score: 1

    The oldest and simplest solution is to mount partitions from a small fast disk where you want fast read/write speeds, and partitions from slower disks everywhere else. Works quite well, too.

    --
    GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
  20. Do it yourself. by Anonymous Coward · · Score: 0

    This wouldn't be hard to do with a little smart configuring.
    When you install linux, make sure that /home /boot and /bin end up on the SSD (probably / would be fine), and mound the spinning HDD somewhere else. I call mine /data.
    When you rip your DVDs, or whatever, put them on /data.
    When you 'sudo apt-get' a new application, it goes onto the SSD, without you having to do anything special.

    No cache implementation rewrite necessary.

    1. Re:Do it yourself. by SkunkPussy · · Score: 1

      and /usr

      --
      SURELY NOT!!!!!
  21. Sure it's caching. And it's not a "kludge" at all. by Anonymous Coward · · Score: 0

    Sure it's caching. You're storing frequently-used data on a faster medium.

    It's actually better than caching, because in this case you don't have to write back the data to the spinning-disk drive when there are changes, since the SSD drive itself is non-volatile. An added benefit of that is that there's no need to keep the SSD-stored directories on the spinning media, freeing up more space there.

    Call it a "shortsighted kludge" all you want, but this general technique has been used very effectively for decades by people and organizations working with absolutely huge data sets. It's a proven technique that dates back to mainframes and microcomputers.

    Hell, that's why this is so easy to do with UNIX-like systems. Disk drives then had a smaller capacity but faster access times, and were used to store frequently-accessed data (like system files and applications). User data was stored on tape, since it was comparatively plentiful, even if somewhat slower. The very earliest UNIX implementations split their filesystem hierarchy over multiple storage devices with differing capabilities.

  22. People forgot the low-level Linux stuff quickly. by guruevi · · Score: 2, Informative

    First of all, you can do this with ZFS which is newer tech and works quite well but is not (ever going to be) implemented in the Linux kernel

    For lower tech, you can do it the same way we used to do back when hard drives were small. In order to prevent people from filling up the whole hard drive we used to have partitions (now we just pop in more/larger drives in the array). /boot and /var would be in the first parts of the hard drive where the drive was fastest. /home could even be on another drive.

    You could do the same, put /boot and /usr on your SSD (or whatever you want to be fastest - if you have a X25-E or another fast writing SSD you could put /var on there (for log, tmp etc. if you have a server) or if you have shortage of RAM make it a swap drive. If you have small home folders, you could even put /home on there and leave your mp3's in /opt or so.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  23. Some ideas, from "The world of Windows" though by Anonymous Coward · · Score: 0

    I stash these items on my SSD (4gb GIGABYTE IRAM SSD #1 - this can actually BOOT AN OS though, so others know):

    ====

    1.) WebBrowser caches

    2.) %Temp% + %Tmp% ops

    3.) Event Logs

    4.) Print Spooler

    5.) cmd.exe %Comspec%

    ----

    And, on my 3gb CENATEK RocketDrive:

    1.) Pagefile.sys (for nearly the ENTIRE SSD in size of it)

    ====

    Seems to all work out well, for better performance... how so?

    Well - not just because those items benefit by mostly being smallish files inside folders, which tend to "increase speed" (less latency in seeks mostly & NO head movements either as in mechanical disks) but, also because I am "offloading" my main C: drive (bootdrive in Windows for those "not in the know" on Windows, & yes, there are those folks out there @ times, albeit rarely), making it do LESS WORK also!

    APK

    P.S.=> Your ideas aren't 1/2 bad either for Linux though... good job! apk

  24. dm-cache by Gyver_lb · · Score: 3, Informative

    google dm-cache. Not updated since 2.6.29 though.

  25. I was just wondering the same thing by RelliK · · Score: 1

    Windows 7 (and I think XP) has ReadyBoost. I haven't been able to find anything similar for Linux. It is also not clear how much difference ReadyBoost makes. The only benchmark I was able to find uses a crappy USB flash drive. I was wondering how much difference something like the 80GB x-25m would make. There is clearly potential for huge gains as MaxIQ benchmarks show.

    This would be an awesome speedup if it was supported: just add a 40-80GB SSD for swap & file cache, and gain a massive performance boost over a standard cheap 7200RPM drive. Given that the price per GB for SSD is likely to stay very high for the foreseeable future, this seems like the best way to go. If only there was OS support for it...

    The alternative (that I'm waiting for) is for SSD prices to drop.

    --
    ___
    If you think big enough, you'll never have to do it.
    1. Re:I was just wondering the same thing by MobyDisk · · Score: 1

      It is also not clear how much difference ReadyBoost makes

      Keep searching. There's lots of other benchmarks, and they all same the same thing. It helps if you have an old machine with insufficient memory.

      ReadyBoost doesn't really do what the author wants. Windows treats ReadyBoost as a write-through cache like it treats memory. It assumes you might unplug the drive at any moment. It won't speed up boot time, and it won't speed up writes. It won't place the swap file on there either. I'm not sure if you could tell Windows to use a regular SATA drive for ReadyBoost - that might be an interesting experiment, but it still won't use it effectively.

    2. Re:I was just wondering the same thing by GuruBuckaroo · · Score: 1

      Not quite correct. The ReadyBoost cache is pre-filled with usable data by the SuperFetch caching system, so it's not just write-back. It will also (theoretically) "learn" your loading patterns, and if you, say, start up the same application at 9am every day, it will start putting that application into the ReadyBoost cache just before. Also, the article linked in the GP specified a max cache of 4gb - that was true in Vista, but 7 and 2008 can use larger flash drives provided they're formatted NTFS or exFAT. I have an 8GB ReadyBoost cache on my workstation at the office, it makes a noticeable difference - so much so that I immediately noticed it when some cleaning jerk swiped my USB key and ReadyBoost wasn't functioning anymore. And this is with a system with 4gb of memory (granted, 32-bit Win7, so not all of it was being used). Your mileage, as always, may vary.

      --
      Poor means hoping the toothache goes away.
    3. Re:I was just wondering the same thing by robot256 · · Score: 1

      What you really want to be talking about is a different Windows 7 feature called ReadyDrive, which actually does what the author is talking about. Basically, the system heuristically determines what files are used most often/during boot and the BIOS read- and write-caches them to the flash in the ReadyDrive. I bought a Thinkpad with a "4GB Intel TurboBoost Memory" chip and it made a noticeable difference in boot time when I enabled it as a ReadyDrive.

      I also found that when Windows (rarely) crashed, it would take a lot longer to start up, possibly because it had to rebuild the flash cache.

      I have since disabled the turbo memory after getting a real SSD--and still faster boot times. A 60GB drive holds Win7, programs and critical data, and I swapped out the optical drive for the old mechanical disk, where it stays 90% of the time and holds my video and crap, reducing write cycles on the SSD. Sometimes I transfer videos to the SSD when I'm watching them in the car so the motion doesn't interfere with playback--it's a very convenient combination.

    4. Re:I was just wondering the same thing by MobyDisk · · Score: 1

      You are right about Superfetch - so it will help boot times. That's half way there. As far as I can tell though, it is still write-back. Meaning, if I compile something and it writes 500MB to disk, Windows will immediately try to commit that to the HDD. It won't delay that for 5 minutes until the HDD is idle.

  26. Thread summary by Goaway · · Score: 2, Funny

    "If Linux doesn't already do it, you don't need it anyway!"

    1. Re:Thread summary by EvanED · · Score: 2, Insightful

      No kidding. It's threads like this (where I think the question is entirely reasonable and a good thing to support) that really sour my opinion of Linux. There are a few other things -- better file-system-supported metadata, transactional filesystems, etc. -- that have come up in the past too where it seems I just flat out disagree with most hardcore Linux users.

      (Don't worry, I hate Windows too, but for mostly different reasons. I don't use OS X very often and don't have an opinion on it, but I'd probably hate it too.)

  27. Re:Sure it's caching. And it's not a "kludge" at a by TheRaven64 · · Score: 1

    Sure it's caching. You're storing frequently-used data on a faster medium.

    That's not what caching means. It comes from the French word for 'hidden' and the fact that it is not directly addressable is the important part of the definition. A cache is not just a faster medium, it's a faster medium that is hidden from the user / programmer and is used to accelerate access to the slower medium.

    --
    I am TheRaven on Soylent News
  28. Good idea, lousy implementation. by SanityInAnarchy · · Score: 1

    Kind of like the current idea of pushing the wear-leveling back to the drives. This is something the OS can do, and it's a case where flexibility matters -- it's not something I want in a black box inside a drive controller.

    --
    Don't thank God, thank a doctor!
  29. Re:Sure it's caching. And it's not a "kludge" at a by EvanED · · Score: 1

    Sure it's caching. You're storing frequently-used data on a faster medium.

    You're also storing a lot of infrequently-used data on a faster medium, and a lot of frequently-used data on the slower medium. How is that better than using the faster medium only for frequently used data?

    As it happens, for many of my workloads anyway, I'd expect to see bigger benefits by storing some of my data directories on SSDs than storing /usr. Which directories? Well, they're sort of scattered all around. Why should I have to go through and figure out what things I use all the time? How can I even determine that (I don't know what files programs are opening behind my back). Figuring out that sort of thing is exactly what computers are good at.

    I's actually better than caching, because in this case you don't have to write back the data to the spinning-disk drive when there are changes...

    That's not a problem. Writes are already delayed because they can be buffered; if you cache in a SSD, they can be delayed even further before you write them back to the magnetic drive.

    An added benefit of that is that there's no need to keep the SSD-stored directories on the spinning media, freeing up more space there.

    Whoopde do. Even if you got a 100 GB SSD, duplicating that space on, say, a 1 TB hard drive would cost under $10. Considering that the SSD would be in the vicinity of $400, that extra $10 in lost space isn't exactly something to cry about.

  30. Puppy Linux by technosaurus · · Score: 1

    Puppy Linux implements periodic syncing to its save file (ext2,3,& 4) and the times can be adjusted. This was initially implemented when flash drives were less reliable, but still useful if you want to reuse old equipment or if you need to do a lot of read/write intesive operations.

  31. cashefs by zash.se · · Score: 1

    I would like to see something like this, except as a file system layer similar to unionfs that does copy-on-read from some other place (network, slow usb hdd etc) , and purges or keeps files (based on popularity) when the place it caches to gets full.

  32. Wrong. Swap often acts as a cache. by Anonymous Coward · · Score: 1, Informative

    I've seen several other blatantly wrong comments from you for this story, and you clearly don't understand what caches are, and how they work. It's quite easy for swap to be a cache.

    An example of this in nearly every personal computer is information read from spinning plastic disc media, like CD-ROMs or data DVDs. Typically, the OS will read data from the plastic disc, and store it in memory. If the memory usage becomes tight, that data from the spinning plastic disc will be swapped out to a magnetic disk drive.

    This is caching, because you're storing the data from a comparatively slower medium (the CD-ROM or DVD disc) on a comparatively faster medium (the magnetic hard drive). If the data is needed, it's retrieved from the faster magnetic disk drive, rather than the slower spinning plastic disc drive. Thus we have caching.

    1. Re:Wrong. Swap often acts as a cache. by drsmithy · · Score: 1

      An example of this in nearly every personal computer is information read from spinning plastic disc media, like CD-ROMs or data DVDs. Typically, the OS will read data from the plastic disc, and store it in memory. If the memory usage becomes tight, that data from the spinning plastic disc will be swapped out to a magnetic disk drive.

      No, it won't, it will discard it to shrink the overall size of the cache and free up RAM. I'm not aware of any system that behaves in the way you describe.

    2. Re:Wrong. Swap often acts as a cache. by Anonymous Coward · · Score: 0

      What? Linux, Solaris, FreeBSD, OpenBSD, NetBSD, DragonFlyBSD and swap out buffered data from disc drives to cache before discarding it outright. If you don't believe me, go peruse their source code.

    3. Re:Wrong. Swap often acts as a cache. by m.dillon · · Score: 4, Informative

      The way DragonFly's swapcache works is that VM pages (cached in ram) go from the active queue to the inactive queue to the cache (almost free) queue to the free queue. VM pages sitting in the inactive queue are subject to being written out to the swapcache. VM pages in the active queue (or cache or free queues) are not considered.

      In otherwords, simply accessing cacheable data or meta-data from the hard drive does not itself trigger writing to the SSD swapcache. It's only when the cached VM pages are pushed out of the active queue due to memory pressure and are clearly heading out the door when DragonFly decides to write them to the SSD.

      This prevents SSD write activity from interfering with the operation of the production system and also tends to do a good job selecting what data to write to the SSD when and what data not to. A file which is in constant use by the system just stays in ram, there's no point writing it out to the SSD.

      With respect to deciding what data to cache and what data not to, with meta-data its simple. You cache as much meta-data as you can because every piece of meta-data gives you a multiplicative performance improvement. With file data it is harder since you don't want to try to cycle e.g. a terrabyte of data through a 40G swapcache. The production system's working data set at any given moment needs to either fit in the swapcache or you need to carefully select which directory topologies you want to cache.

      -Matt

    4. Re:Wrong. Swap often acts as a cache. by Anonymous Coward · · Score: 0

      I know some dipshits here are going to disagree with Matt, and tell him he's wrong. Let me save you some grief by informing you that Matt is the creator of DragonFly. He knows it inside and out, upwards and downwards, backwards and forwards. You can't basically can't question him; what he says is how it is.

    5. Re:Wrong. Swap often acts as a cache. by drsmithy · · Score: 1

      What? Linux, Solaris, FreeBSD, OpenBSD, NetBSD, DragonFlyBSD and swap out buffered data from disc drives to cache before discarding it outright. If you don't believe me, go peruse their source code.

      I'm not going to go crawling through source code, but I find it difficult to believe any OS is going to swap out buffer cache when the corner cases that could actually be useful are vanishingly small.

    6. Re:Wrong. Swap often acts as a cache. by drsmithy · · Score: 1

      The way DragonFly's swapcache works is that VM pages [...]

      I don't think it's really fair to put forth swapcache as a general case, when it's specific to only a single OS.

      I'm willing to be corrected, but I struggle to believe any OS is going to swap out buffer cache as a general rule when the use cases for that actually being beneficial are vanishingly small (and better solutions - like swapcache, ZFS's ZIL and L2ARC, or Windows ReadyBoost - exist).

    7. Re:Wrong. Swap often acts as a cache. by Daengbo · · Score: 1

      Thank you. You (and people like you) are the reason I still read Slashdot comments. It seems like there used to be a lot more of you guys. Where'd the others go? ;)

    8. Re:Wrong. Swap often acts as a cache. by m.dillon · · Score: 3, Informative

      OS's have traditionally discarded clean cache data when memory pressure forces the pages out. Swap traditionally applied only to dirty anonymous memory (The OS needs to write dirty data somewhere, after all, and if it isn't backed by a file then that is what swap is for).

      However in the last decade traditional paging to swap has fallen by the wayside as memory capacities have increased. Most of the data in ram on systems today is clean data, not dirty data, and most of the dirty data is backed by a file (e.g. write()s to a database or something like that). On most systems today if you look at swap space use you find it near zero.

      But the concept of swap can trivially be expanded to cover more areas of interest. tmpfs (tmpfs, md, mfs, etc) is a good example. For that matter anonymous memory for VMs can be backed by swap. It is very desireable to back the memory for a VM with either a tmpfs-based file or just straight anonymous memory instead of a file in a normal filesystem. That is a good use for swap too.

      It isn't that big a leap to expand swap coverage to also cache clean data. It took about two weeks to implement the basics on DragonFly. Those operating systems which don't have this capability will probably get it as time goes on simply because it is an extremely useful mechanic for interfacing a SSD-based cache into a system. It is also probably the cleanest and simplest way to implement this sort of cache, and it pairs up well with the strengths of the SSD storage mechanic. Since you can reallocate swap space when something is rewritten there are virtually no write amplification effects and the storage on the SSD is cycled very nicely. You get much better wear leveling than you would if you tried to map a normal filesystem (or mirror the blocks associated with a normal filesystem) on top of the SSD.

      -Matt

    9. Re:Wrong. Swap often acts as a cache. by Score+Whore · · Score: 5, Informative

      Solaris certainly doesn't. What developer would ever code this kind of behavior? Non-dirty filesystem data in the cache is already on disk, what would be the rational to write it out to another part of the disk? That's just stupid. Non-dirty pages are thrown away when RAM is in demand. Dirty filesystem data is just written to disk. Then the pages become non-dirty and can be freed at any time. Possibly immediately if there is demand.

      Scenario A:

      1. File is read and data is copied into system memory where is it buffered. Time passes.
      2. Memory usage skyrockets.
      3. Kernel writes data to swap space and frees the memory for use by other processes.
      4. Later an application wants that data. Kernel reads data from swap space.

      Scenario B:

      1. File is read and data is copied into system memory where is it buffered. Time passes.
      2. Memory usage skyrockets.
      3. Kernel locates non-dirty cached data and frees that page for use by other processes.
      4. Later an application wants that data. Kernel reads data from original file on disk.

      Differences between scenario A & B:

      Scenario A has two disk IOs (steps 3&4) during memory pressure. Scenario B has one (step 4).
      Scenario A uses limited swap space to store duplicate data. Scenario B doesn't.

      And no, Solaris doesn't cache slow devices (tape, dvd-rom, etc.) either. If you choose to access those types of devices, that is your choice. The OS isn't going to save your ass. If you want it cached, make your application do the caching.

      Also, I'm not considering special purpose systems such as ZFS's l2arc or other similar/more generalized systems that utilize SSD as a midway point between RAM and HDD. We're talking generic swap space and filesystem caches.

    10. Re:Wrong. Swap often acts as a cache. by leuk_he · · Score: 1

      The important part in your post is "Solaris doesn't cache slow devices (tape, dvd-rom, etc.) either."

      There you can have a big win.
      -Consider the speed of devices in the cachning algoritm.
      -Consider seek times in caching/readahead algoritm (SSD has almost no seek, HDD has ~10ms seek, CD has 200 ms seek, tape has LONG seek)
      -Traditional devices (HD/CD/NFS) have equal write/read times. SSD has considerable slower wite than read. Expelling read-only swap in SDD would be more logical since a dirty write is much more expensive than a re-read of a read-only page.

      But recently (last years) i did all my optimalisations in user land, using a index in a DB has a much larger win than making a 10% faster system by tuning the swap.

    11. Re:Wrong. Swap often acts as a cache. by RichiH · · Score: 1

      Swap is thrown away between reboots. Thus your scheme would not decrease start-up times.

    12. Re:Wrong. Swap often acts as a cache. by drsmithy · · Score: 1

      There you can have a big win.

      Not really. Access patterns to such devices tend to be a) infrequent and b) sequential.

    13. Re:Wrong. Swap often acts as a cache. by leuk_he · · Score: 1

      Still not a reason not to optimize.

      And CD is used in livecd environment A LOT, you see that even strange access patterns could use optimizing.

    14. Re:Wrong. Swap often acts as a cache. by drsmithy · · Score: 1

      Still not a reason not to optimize.

      Actually it's an excellent reason not to optimise, because you'd be optimising for an uncommon case at the expense of a common one.

      And CD is used in livecd environment A LOT, you see that even strange access patterns could use optimizing.

      LiveCDs are not even remotely close to a common use case, which is why the vastly superior option is to manually optimise by using things like RAM drives.

  33. ZFS' example by Anonymous Coward · · Score: 1, Interesting

    What a waste of time. Just put /home on a magnetic disk and everything else on the SSD. This way, you can get away with a small (very affordable) SSD for your binaries, libraries, config files, and app data, and use tried and true magnetic for your important files.

    Solaris' experience with use SSDs as a read cache shows 5x to 40x increase in read IOps:

                    http://blogs.sun.com/brendan/entry/l2arc_screenshots

    while still getting the advantages of bulk storage with SATA drives (in various forms of RAID configuration).

    This may not be a big deal for home stuff, but if you're serving homedirs, VMware VMDKs, or databases over NFS for work, it could save a lot of money in equipment and power/cooling just by adding a few SSDs.

    Similarly using write-optimized SLC SSDs can help synchronous write operations (12x more IOps, 20x reduction in latency in some benchmarks):

                    http://blogs.sun.com/brendan/entry/slog_screenshots

    More on the general concept of "hybrid storage pools":

                    http://blogs.sun.com/brendan/entry/hybrid_storage_pool_top_speeds

    I believe parts of this functionality has been ported to FreeBSD as well (they're a few ZFS revs behind Solaris).

  34. Re:People forgot the low-level Linux stuff quickly by BitButcher · · Score: 1

    You can run ZFS on Linux via FUSE. This would probably achieve exactly what the OP is looking for. See http://zfs-fuse.net/

  35. preload by Anonymous Coward · · Score: 1, Interesting

    I'm surprised no one mentioned "preload":
    "preload is an adaptive readahead daemon. It monitors applications that users run, and by analyzing this data, predicts what applications users might run, and fetches those binaries and their dependencies into memory for faster startup times."
    http://sourceforge.net/projects/preload/

    Development seem staled, but i think the idea is there. Well, they attacked the problem of using unused RAM, but it could easily be adapted to use a SSD partition.

    Sebastien Giguere

  36. Not necessarily a good metric by StikyPad · · Score: 1

    Caching is only worthwhile if the data can benefit from higher bandwidth. I don't want, for example, my porn or SETI@home data using valuable cache space regardless of how frequently it's accessed, because it can't be processed at anything approaching the bandwidth of magnetic storage, let alone a good SSD. I'd much prefer to have my app/games stored on the SSD, because regardless of how infrequently I use any one of them, the performance gains would be far more dramatic.

    1. Re:Not necessarily a good metric by Thundersnatch · · Score: 1

      Caching is only worthwhile if the data can benefit from higher bandwidth.

      No, caching is only worthwhile if the application/ssytem can benefit from lower latency. Increased bandwidth is generally something of a side-effect of that, but any sustained high-bandwidth operation will quickly overrun or saturate your cache.

  37. Replace the optical drive. by kf6auf · · Score: 1

    Replace the optical drive. I've been keeping a log of how many times I've ever used my DVD drive while away from home. So far I'm at 1; I ripped a CD I got for Christmas before I brought it back home with me. It could have waited.

    Yes, I know it doesn't work for everyone, but I think it works for most people, assuming you get a USB powered optical drive or enclosure.

    1. Re:Replace the optical drive. by MikeFM · · Score: 1

      That's a good idea. I wonder if anyone sells a SSD designed to fit that form factor or an adapter.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
  38. Duh by FranTaylor · · Score: 1

    Try jamming two hard drives into a laptop.

    Re-read the problem as stated:

    "Is there a way to use an SSD to act as a hot sector cache for a magnetic disk under Linux?"

    Ya think maybe it's assumed that there are two drives? Just Maybe?

    1. Re:Duh by MobyDisk · · Score: 1

      You are right. I was thinking about an SSD + HDD in one based on some of the comments were going that direction.

  39. Waste of time ? by scromp · · Score: 1

    This kind of thing is fairly big in some circles.

  40. FSCache would work except... by Jah-Wren+Ryel · · Score: 4, Interesting

    I have a similar problem and I tried the FSCache approach:

    I've got two raids.
    One is optimized for big ass files read contiguously and has raid6 redundancy.
    The other is a much smaller JBOD that I can reconfigure via mdraid to anything that linux supports in software.

    The problem is that 5% of the big ass files need read-only random access and that kills throughput for anything else going on. It takes me down from ~400MB/s to 15MB/s.

    So, I thought I'd use the FSCache approach and use the JBOD as the cache.
    I did an NFS mount over loopback and pointed the fscache to the JBOD.
    It worked great got practically full throughput for contiguous access, for about 10 hours and then crashed the system.

    Apparently NFS over loopback is well known to be broken in linux and has been since, essentially, forever.
    I was stunned, it had never even occurred to me that NFS over loopback would be broken. Its freaking 2010 - that something I had been using on Sun0S 3 a bazillion years ago didn't work on linux today had not even entered my mind.

    I've also tried replicating the files from the raid6 to the jbod, but that quickly turned into a hassle keeping everything syncronized between the files on disk and the applications that create the files on the raid6 and the apps that use the files on the JBOD. Plus, it doesn't scale out past the size of the JBOD, which I also ran into.

    So now, I'm looking at putting the apps that need random access reads to the data in a VM and NFS mounting it with cache to the VM hoping to avoid the NFS-broken-over-loopback problem. I haven't had time to implement it yet, and personally and leery of doing so since I have to wonder what new "known-broken" problems will bite me in the ass.

    So, if there is a better way, I am dying to hear it, unfortunately solaris/freebsd is not an option...

    --
    When information is power, privacy is freedom.
    1. Re:FSCache would work except... by Gekke+Eekhoorn · · Score: 1

      Argh, I replied to this post but the useless iPhone interface made me actually reply to the topic. Can you please explain what is broken about loopback nfs? I can't find a recent reference anywhere...

    2. Re:FSCache would work except... by Jah-Wren+Ryel · · Score: 1

      http://kbase.redhat.com/faq/docs/DOC-22461

      There may be other problems too, my cursory examination of the kernel backtraces I got didn't necessarily indicate a deadlock, but I'm not overly familiar with linux backtraces either. My impression is that nfs over loopback is pretty rare - I think the automounter is smart enough to use bind mounts for local filesystems rather than nfs, which is pretty much the only commonly used form.

      --
      When information is power, privacy is freedom.
    3. Re:FSCache would work except... by Anonymous Coward · · Score: 0

      The problem is that 5% of the big ass files need read-only random access and that kills throughput for anything else going on. It takes me down from ~400MB/s to 15MB/s.

      So, I thought I'd use the FSCache approach and use the JBOD as the cache.
      I did an NFS mount over loopback and pointed the fscache to the JBOD.
      It worked great got practically full throughput for contiguous access, for about 10 hours and then crashed the system.

      Near as I can tell, this shows just how significant the seek advantage is on SSD. For most of us using Linux on the desktop the seek speed difference should be the real big performance win since anyone can afford 8GB+ of RAM.

    4. Re:FSCache would work except... by Anonymous Coward · · Score: 0

      FSCache will crash your system unless you are using the version that is part of Kernel >= 2.6.30 .

      Accessing local files over an NFS loopback is nuts though. You's be far better off mounting your FS journal on an SSD.
      http://insights.oetiker.ch/linux/external-journal-on-ssd/

    5. Re:FSCache would work except... by Jah-Wren+Ryel · · Score: 1

      FSCache will crash your system unless you are using the version that is part of Kernel >= 2.6.30 .

      Way past that kernel rev.

      Accessing local files over an NFS loopback is nuts though. You's be far better off mounting your FS journal on an SSD.

      Nope I wouldn't because its not journal updates that are the problem. Especially since the performance goes in the toilet because of read-only accesses - as in they don't cause journal updates.

      --
      When information is power, privacy is freedom.
    6. Re:FSCache would work except... by Thundersnatch · · Score: 1

      So, if there is a better way, I am dying to hear it, unfortunately solaris/freebsd is not an option...

      Why not? Given the amount of pain you've endured under Linux, the costs of a transition to a new platform are small. Your time isn't free. Use the right tool for the job, man.

    7. Re:FSCache would work except... by Jah-Wren+Ryel · · Score: 1

      Why not? Given the amount of pain you've endured under Linux, the costs of a transition to a new platform are small. Your time isn't free. Use the right tool for the job, man.

      Because the cost to migrate the data to a new OS is prohibitive.

      --
      When information is power, privacy is freedom.
    8. Re:FSCache would work except... by Thundersnatch · · Score: 1

      Add up the value of the man-hours spent on the issue so far, plus the costs associated with your downtime and performance issues, and the transition may look a lot more attractive. You probably cost your company at least US$100/hr including overhead and benefits.

    9. Re:FSCache would work except... by Jah-Wren+Ryel · · Score: 1

      Add up the value of the man-hours spent on the issue so far, plus the costs associated with your downtime and performance issues, and the transition may look a lot more attractive. You probably cost your company at least US$100/hr including overhead and benefits.

      Gee, I never thought of that! That's really insightful of you.

      --
      When information is power, privacy is freedom.
  41. Benchmarking file system performance by rwa2 · · Score: 1

    Does Linux cache actual data (content of files) or just the block addresses?
    As far as I know, only the second.

    It caches both.

    Some fun things to try by way of benchmarks:

    sync; sudo sh -c "echo 2 > /proc/sys/vm/drop_caches" # clear out the file cache
    time find /usr/share/doc -type f > files # uncached file system performance
    time cat `cat files` > /dev/null # uncached data read performance
    time find /usr/share/doc -type f > files # cached file system performance
    time cat `cat files` > /dev/null # cached file system + data performance
    sync; sudo sh -c "echo 2 > /proc/sys/vm/drop_caches" # clear out the cache again
    time cat `cat files` > /dev/null # uncached performance for both data and inodes

    As long as you don't overrun your available cache memory, you should get at least 100x faster performance on cached accesses.

  42. Atrato does this... by Anonymous Coward · · Score: 0

    Check out Atrato's AppSmart software...
    http://www.atrato.com/products/app-smart.asp

  43. Re:Sure it's caching. And it's not a "kludge" at a by Anonymous Coward · · Score: 0

    Thanks for your display of ignorance.

    In Alaska, the Yukon and British Columbia, the word "cache" also refers to a platform used to store food away from animals. It's quite accessible to whoever puts the food there. Given that the pioneers (Runciman and Booth) in the study of caching, both at the processor level and for IO, hailed from Juneau, it's likely that's what they were thinking of when they coined the term.

    A cache doesn't have to be hidden or inaccessible. In fact, most caches at the software level explicitly allow the users to specify the size of the cache, the replacement and deletion policies, among other factors.

    Besides, what you're saying doesn't even apply here. A given application won't know where its data is stored. It won' t know that /etc/hosts is on a SSD drive, while /home/raven/penises.jpeg is on a traditional magnetic disk drive.

  44. A better use for SSD's than regular storage? by Anonymous Coward · · Score: 0

    The algorithm seems simple. Wouldn't you just have to track every sector (or other more appropriate unit) loaded to the ssd and have a score associated with each one. (+1 every time a given sector is loaded from hdd or ssd) ssd remains full all the time. Users don't 'access' it as a drive, ie - it's 100% managed. if a load request gives a sector a higher score than the lowest sector in the ssd, overwrite it with the new one. Keep the score list in RAM and store to the hdd when shutting down. Then whatever you load most will always be right there, including the os. It would be like a selective mirror for most requested items, the whole ssd becoming like a giant page file.

  45. small error by Weezul · · Score: 1

    In fact, your own files might be accessed frequently, but they're very likely their small enough for ram cache. All this changes if you commonly manipulate huge files of course.

    --
    The Christian religion has been and still is the principal enemy of moral progress in the world. -- Bertrand Russell
  46. Wear leveling math by Weaselmancer · · Score: 1

    If that's indeed the case, then why not simply put the MBR, /boot, /bin, and /usr on the SSD, then mount stuff like /home, /tmp, swap, and the like onto a spindle disk? No algorithm needed, thus no overhead needed to run it, etc.

    This sounds like a good way to go.

    ...only if you want to blow out the SSD wear-limits.

    Thanks to wear leveling, your HDD would probably wear out first. This guy did the math.

    Read "Flash SSD Application from Hell - the Rogue Data Recorder". And keep in mind it was written in 07 - things are better than that now. You might die of old age before your SSD does, depending on your setup.

    --
    Weaselmancer
    rediculous.
  47. Re:People forgot the low-level Linux stuff quickly by greg1104 · · Score: 1

    I already do something similar on regular hard drives, based on the fact that the logical start of most hard drives now are almost twice as fast as the end. Create a small root partition, which will be on the fast part of the disk, for things that tend to be randomly accessed all the time, and then put all the bigger files on the slow part. There is no reason you need tiered storage at the hardware or OS level for this.

  48. Microsoft invented this! No kidding! by Anonymous Coward · · Score: 0

    Microsoft has a technology called "ReadyBoost" that does this! It works great. No kidding! Google it.

  49. Can this boe done in LVM? by jafo · · Score: 1

    One idea that I've had but haven't had an opportunity to try is doing this in LVM. "pvmove" lets you move a physical extent from one device to another, so if you can track which extents are hot you can move them to the SSD. But pvmove works by mirroring the extent from one device to another, so it would seem that you could keep it on the spinning disc and mirror it for reads. But if it is write heavy you probably need it to stay on the SSD primary.

    Of course, what we really want is more than just one level, we want hierarchical storage: Tape for bulk storage of infrequently needed stuff, maybe optical. 5400RPM big drives, 15K for faster IO on that, MLC for faster, and SLC for fastest. That's totally what we need, right? :-)

    I could totally see 4GB of SLC, 32GB of MLC, and then a 500GB hard drive in my laptop.

    1. Re:Can this boe done in LVM? by Anonymous Coward · · Score: 0

      Possibly. for the best results you'd probably want filesystem integration though. the LVM layer has some appeal but it doesn't seem like you have enough context to make it really good.

      I think in the filesystem parlance they refer to this as a repacker, as you see access to blocks and files you could mark them and then you could figure out a way to put them on the disk to maximize performance. Keeping files and blocks that are frequently accessed together, together on the disk, then you probably want to store them near the journal and keep the head over the journal in case a write comes.. Maybe locate the journal in the middle of the platter too. Essentially you try to keep the head in the middle of the platter so you minimize full platter seeks, then you keep the most frequently touched files nearest the journal, and ideally together.. Likewise you could put clues in the meta data, when a certain file is accessed, maybe you want to pre-read the whole thing or maybe you want to just read it as it's demanded for large and rarely read files, or maybe you could identify 3 or 4 other heuristics on a file by file basis. Then if you wanted to get fancier still you could use some storage heuristics, like USB or firewire storage or optical storage is really slow so pre-reading is more advantageous. Then with discs and SSD, you could apply the repacker logic and put the heaviest used files and the files with the randomest access in to SSD.

      There are some flip sides. RAID has never taken off seriously on small systems. It just costs too much. If you run JBOD then you exponentially increase the likelihood that you'll have a failure. You really need some mirroring and striping. It's late in the evening for me so I'm not going to do the math justice but you'd need something like 3x the SSD storage and 3x the disk storage to make a combined filesystem distributed across it all with the same reliability as just a single disk. That's a lot of cost. The reality is SSD prices are dropping faster than we already expected and it's going to continue and you'll just buy a system with 1TB of SSD and then maybe some giant disks for "offline" storage rather than trying to make the filesystem more intelligent and complicated to distribute between the two..

  50. Put /swap on SSD? by lbates_35476 · · Score: 1

    While I think that purchasing more RAM should ALWAYS be your first choice, couldn't you put /SWAP on the SSD. If it gets used a lot, the SSD will get worn out (write cycle limit on the SSD), but hey you are the one that wants to use SSD for SWAP.

    1. Re:Put /swap on SSD? by MikeFM · · Score: 1

      Just disable swap. I always do. It just slows your computer down. Who really needs it these days anyway? You can buy a normal laptop with 8GB RAM these days which is plenty for most people.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
  51. Do you control the app? how 'bout the system? by IBitOBear · · Score: 1, Informative

    Here are the things you probably didn't look into...

    1) If you control the app source code, adding fadvise() calls after open to tell the application that you are going to use random access will turn off the read-ahead on those files selectively. The real reason that you are seeing the random access kill you is because you are probably using the defalut read ahead of 128k for the parition. That means that if you are reading 1k records, for example, and the total size of your active file set is greater than your dchace memory, your effective raw read lenght is 129k and you are wastind nearly 100% of that read.

    2) For the reasons set forth above, go into the various /sys/block/(whatever)/queue/ directories and set the kb_read_ahead values to zero, or maybe 8k for the partitions where you are storing these large random access files. Remember that in your case there are layers (the raw block device, and the raid, etc) and tuning the layers may be in order.

    3) The optimal stacking order is LVM on top of dmcrypt on top of raid_5_or_6 on top of media. No I don't say to do all of that every time, but that's the optimal order. If, for instance you put the raid over the encrypted volume then you will pay about 3 times the encryption cost than having the dmcrypt over the raid. (this is because saving and computing the xors of data that is already encrypted is much cheaper than decrypting all the sectors, computing the xors, then separately encrypting all those sectors as they return to the media.

    4) for large storage soulutions _always_ have an LVM on top. The overhead cost of a volume in a volume group is basically a single add and an extra function call. In exchange you can persistently set the readahead profile of each partition. You can also migrate your elements between physical devices safely and on the fly. All the reasons why aren't terribly obvious until the day it saves your life, or your weekend. (I have, for instance, plugged six USB hard drives into a system with a failing raid, built a raid on those USB drives, added the new raid to the existing volume group, and then migrated the active partition from the failing raid to the plugable raid with the system up the whole time then Dropped the failing raid from the volume group. Meanwhile I built a second computer to replace the first. Shut down the old computer. Moved the plugables over to the new one. Booted up into production. then built the new perminant home for the partition. Added that raid into the volume group. Migrated the partiton, then dropped the plugables. Total down time was the cutover between the two computers. I also then got to grow the file system onto the new raid in the new computer (the drives were bigger) during the next maintence window. LVM is very much your friend.

    5) it is easy and even desireable to build a volume group that is over heterogeneous storage. In particular lets say I have a raid6 built over five drives /dev/sd[a-e]1 called /dev/mapper/main. I will make a volume group "system" over /dev/mapper/main and /dev/sdf1 and then build my runtime file system in an LVM that is only on the /dev/mapper/main part of system. Now, when I have to make backups etc I can create the snapshot in the /dev/sdf1 part of system without seriously impacting my operations by compeeting for raid computation time/space.

    6) make sure you tune the stripe cache for any raid to a big number if you are doing writes, particularly random writes, to it, the default stripe cache is tiny. This is also important for keeping your read rates up if you end up in a degraded state.

    7) take a serious look at your choice of scheduler.

    So anyway, the degradation you describe is identical to having an improperly (or naively) tuned storage stack. In particular it sounds like read-ahead waste.

    --
    Innocent people shouldn't be forced to pay for inferior software development.
    --"Code Complete" Microsoft Press
  52. Caching is not your problem... useless readahead by IBitOBear · · Score: 1

    Here are the things you probably didn't look into...

    1) If you control the app source code, adding fadvise() calls after open to tell the application that you are going to use random access will turn off the read-ahead on those files selectively. The real reason that you are seeing the random access kill you is because you are probably using the default read ahead of 128k for the partition. That means that if you are reading 1k records, for example, and the total size of your active file set is greater than your dchace memory, your effective raw read length is 129k and you are wastind nearly 100% of that read.

    2) For the reasons set forth above, go into the various /sys/block/(whatever)/queue/ directories and set the kb_read_ahead values to zero, or maybe 8k for the partitions where you are storing these large random access files. Remember that in your case there are layers (the raw block device, and the raid, etc) and tuning the layers may be in order.

    3) The optimal stacking order is LVM on top of dmcrypt on top of raid_5_or_6 on top of media. No I don't say to do all of that every time, but that's the optimal order. If, for instance you put the raid over the encrypted volume then you will pay about 3 times the encryption cost than having the dmcrypt over the raid. (this is because saving and computing the xors of data that is already encrypted is much cheaper than decrypting all the sectors, computing the xors, then separately encrypting all those sectors as they return to the media.

    4) for large storage solutions _always_ have an LVM on top. The overhead cost of a volume in a volume group is basically a single add and an extra function call. In exchange you can persistently set the readahead profile of each partition. You can also migrate your elements between physical devices safely and on the fly. All the reasons why aren't terribly obvious until the day it saves your life, or your weekend. (I have, for instance, plugged six USB hard drives into a system with a failing raid, built a raid on those USB drives, added the new raid to the existing volume group, and then migrated the active partition from the failing raid to the plugable raid with the system up the whole time then Dropped the failing raid from the volume group. Meanwhile I built a second computer to replace the first. Shut down the old computer. Moved the plugables over to the new one. Booted up into production. then built the new permanent home for the partition. Added that raid into the volume group. Migrated the partition, then dropped the plugables. Total down time was the cut-over between the two computers. I also then got to grow the file system onto the new raid in the new computer (the drives were bigger) during the next maintenance window. LVM is very much your friend.

    5) it is easy and even desirable to build a volume group that is over heterogeneous storage. In particular lets say I have a raid6 built over five drives /dev/sd[a-e]1 called /dev/mapper/main. I will make a volume group "system" over /dev/mapper/main and /dev/sdf1 and then build my runtime file system in an LVM that is only on the /dev/mapper/main part of system. Now, when I have to make backups etc I can create the snapshot in the /dev/sdf1 part of system without seriously impacting my operations by competing for raid computation time/space.

    6) make sure you tune the stripe cache for any raid to a big number if you are doing writes, particularly random writes, to it, the default stripe cache is tiny. This is also important for keeping your read rates up if you end up in a degraded state.

    7) take a serious look at your choice of scheduler.

    So anyway, the degradation you describe is identical to having an improperly (or naively) tuned storage stack. In particular it sounds like read-ahead waste.

    --
    Innocent people shouldn't be forced to pay for inferior software development.
    --"Code Complete" Microsoft Press
  53. evil double post.... the 2nd one is spelled better by IBitOBear · · Score: 1

    sorry, it gave me an error the first time... the second copy is even spell checked... 8-)

    --
    Innocent people shouldn't be forced to pay for inferior software development.
    --"Code Complete" Microsoft Press
  54. External journal by TheLink · · Score: 1

    Something like this might also help:

    http://insights.oetiker.ch/linux/external-journal-on-ssd/

    --
  55. HyperDrive5? by Anonymous Coward · · Score: 0

    I would think for SSD caching purposes a HyperDrive 5 RAM SSD in RAID0 mode would be the better choice compared to a flash SSD, if you could afford it. The long term abuse of a flash SSD due to the write patterns of a cache would make it potentially as slow as a spinning hard disk due to the tricks modern flash SSD drives use to improve write performance (trying to write into unoccupied large contiguous blocks).

  56. Re:Caching is not your problem... useless readahea by Jah-Wren+Ryel · · Score: 1

    Well yeah, caching is not the problem its a possible solution.

    The raid6 is on an areca, and it was deliberately tuned for streaming reads and writes over random access - in retrospect separate raid volumes might have been better as the need for random access was not part of the initial requirements. I'm also using jfs and it appears to have a problem with fadvise, although that's been with POSIX_FADVISE_DONTNEED on streaming reads rather than POSIX_FADVISE_RANDOM.

    I will look into playing with the kb_read_ahead setting, although I've my doubts because what I see with sar suggests the problem is in the areca-- when random accesses are occurring sar shows vastly reduced tps and rd_sec/s with 100% utilization. If it were buffercache readaheads, I would expect those values to equal or at least be in the neighborhood of the values during pure streaming access. I've already played with areca's own firmware settings for read-ahead aggressiveness without much benefit. Plus, readahead is good for all the streaming work, its just that a small amount of random access has a disproportionate impact on the streaming access.

    --
    When information is power, privacy is freedom.
  57. And /boot by JSBiff · · Score: 1

    The original ask slashdot also mentioned speeding up boot times. I suppose that could be accomplished by copying the contents of /boot to a small filesystem on the SSD, then modifying /etc/fstab to mount /boot from the SSD, then running grub or whatever bootloader you are using to re-write the MBR.

    However, that still isn't *quite* what the original poster was asking for. I do think he has a good idea (I've wondered about something along those lines too).

    The thing about swap is, the the files will *always* be initially read from the magnetic disk into the swap when you first boot and first access files. Also, if a file has been closed by all processes that had requested it, I don't think it will be considered by the kernel to still be loaded into swap, will it (even if the sectors it was written to still physically have the data)? I think the original poster wanted something where the kernel will look *first* on the SSD to see if it can fulfill the request, and only after that will try to fetch from the magnetic drive - even right after a reboot.

    Putting /swap on that won't really do that. Still, putting /swap on the SSD, I suppose, would probably still help quite a bit.

    Actually, I'm thinking about this. Doesn't the Linux kernel try to keep most open or recently accessed files in an in-RAM cache (as long as there is enough RAM), and access them directly in RAM instead of from either swap or disk, if it can fulfill the request from cache? If that is the case, doesn't it make the most sense to just get more RAM? 8 or 12 Gigs of RAM would, I should think, provide plenty of space to keep a lot of files in cache (unless you are dealing with some really massive files like full feature-length movies).

    I guess my point is, it is my understanding that once loaded from the magnetic disk initially, a lot of file requests can be serviced from RAM, thus negating a lot of the benefits of putting /swap on an SSD (I think), so the main place where the SSD can provide a benefit is during the initial loading, but swap doesn't get hit during the initial loading. So, only doing something like the OP suggested, I believe, would provide any benefit?

  58. Loopback NFS broken??? by Gekke+Eekhoorn · · Score: 0, Redundant

    Can you elaborate on that broken loopback NFS in Linux? I couldn't find anything about that, last mention of it being broken was in 2002.

    You know, a lot of people use loopback nfs for crypto homedirs and I think fuse. I'd like to think that it isn't broken...

  59. Cleancache by zdzichu · · Score: 1

    Cleancache was just recently posted to LKML: Cleancache [PATCH 0/7] (was Transcendent Memory): overview.

    --
    :wq
  60. HSM IS THE ANSWER! by Anonymous Coward · · Score: 0

    Hierarchical Storage Management (HSM) is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.

    Full article on HSM: http://en.wikipedia.org/wiki/Hierarchical_storage_management

  61. Mount your FS journal on the SSD by Anonymous Coward · · Score: 0

    http://insights.oetiker.ch/linux/external-journal-on-ssd/

  62. PCIE memory controler by Anonymous Coward · · Score: 0

    We should have flash memory devices with just a PCIE memory controler (with external PCIE for external devices).

    The wear leveling code should be in the kernel.

    That would allow services taylored wear leveling algorithm to be easily implemented.

  63. inotify + aufs by luxifr · · Score: 1

    i think such a software could very easily be implemented using inotify and aufs...

  64. Re:Sure it's caching. And it's not a "kludge" at a by obarthelemy · · Score: 1

    Do you have any other example of a directly accessible cache, like what you're calling your SSD ? I don't... And being able to specify size, policies... does not cut it. Cache is transparent and system-managed, so that neither the user nor the apps have to care/know much about it.

    Manually installing certain files on certain media does not feel at all like cache to me.

    --
    The Cloud - because you don't care if your apps and data are up in the air.
  65. Re:Sure it's caching. And it's not a "kludge" at a by SkunkPussy · · Score: 1

    its not caching because you do not have two copies of your data

    --
    SURELY NOT!!!!!
  66. Something similar by Any+Web+Loco · · Score: 1

    I keep /home on my SSD but symlink everything big (eg Music or Videos sub-sdirecgtories) out to a magnetic disk. Having /home on the SSD helps speed things up for me - all those kde4 dot files that get loaded...

  67. A battery backed ram drive is the answer I use by Anonymous Coward · · Score: 0

    I have the OS and most files I use on an ACARD 9010b battery backed ram drive that I boot off of, and long-term storage for less accessed files on a regular hard drive.
    If power goes out, the everything on the ram drive is copied to a compact flash card built into the unit.

    This way you have the same speed of an SSD, but no wear-leveling problems, and your write speed is the same as the read speed (unlike an SSD).

    It uses SATA, as well as dual-cable SAS interface to double the transfer rate, making it a tiny bit (OK, insignificantly) faster than an SSD for some operations.

    Cheap? no, but not bad either.
    Capacity is up to 64 GB of Ram.
    most OS's might take 4-6 GB so plenty

    Google acard 9010 to see the specs!

  68. have you seen DM-Cache? by Anonymous Coward · · Score: 0

    DM-cache appears to do just what you are talking about, perhaps you could further that project?

    http://users.cis.fiu.edu/~zhaom/dmcache/index.html

  69. To answer the original question... by On+Lawn · · Score: 1

    It seems I'm a bit late to the party.

    The only potential's I've seen in linux for this (aside from ZFS fuse already mentioned) is...

    OHSM

    CacheFS

    If you want to do this in hardware, you can use MaxIQ from Adaptec which, IIRC, uses linux drivers to get this function from their storage controllers. There is also a few others, one of which only mirrors the first part of the hard drive.

  70. UnionFS? by snadrus · · Score: 1

    UnionFS / AUFS is currently used to make Live CDs look writable (via ram). Using this would prefer the SSD more than the hard-drive it overlays.
    Changes Needed:
    - add write through
    - copy to ssd on read
    - if SSD full, delete lesser-used blocks

    --
    Science & open-source build trust from peer review. Learn systems you can trust.
  71. Some raid controllers support SSD caches by rhvarona · · Score: 1

    For example Adaptec MaxIQ on quite a few of their controllers, which lets you add an Intel X25-E to your existing raid array. It transparently uses the SSD as a cache for the most frequently used data.

    1. Re:Some raid controllers support SSD caches by rhvarona · · Score: 1

      LSI has something similar with their FastPath and CacheCade features, which create hybrid storage systems that are transparent to the operating system.

  72. Re:People forgot the low-level Linux stuff quickly by Terrasque · · Score: 1

    only thing, its sloooooooooooooow. I did some tests a while ago (v0.6.0) and the speed was about an order of magnitude below a raid5 setup on the same machine.

    --
    It's The Golden Rule: "He who has the gold makes the rules."
  73. Flashcache for Linux - new project by Anonymous Coward · · Score: 0

    Courtesy of Facebook kernel hackers:

    http://github.com/facebook/flashcache

  74. Access time and latency... could be improved by niftymitch · · Score: 1
    Access time and latency... could be improved but it would be a pain ITA to get it right.

    As booting from a LiveUSB key demonstrates it is possible to boot and run from a USB key 2GB or so in size and a 8GB would be roomy.

    Thus a modest priced SSD on SATA could be used to boot linux and also contain symbolic links to directories or files on a much larger rotating media or network resource.

    Sun an others did a bit of work to move files and junk off the boot disk and onto a shared NFS resource....

    One difficult to address IO problem on a demand paged VM OS is the latency and lack of streaming that can be obtained. i.e. each page fault generates a single IO request of a disk. This applies to pages or text, data, or swap IO.

    As many folk know IO is often measured with largish memory buffers and largish system calls to the OS. Demand paged IO has a granularity that is page size... and is about the worst IO increment that the system sees.

    Latency of rotating media for a 5400 RPM or a hot 10,000 RPM disk is very slow when compared to a SSD and this alone could tip the balance giving SSD a strong place in a system design.

    For this to gain traction laptops and systems must have two disk interfaces. A SSD form factor could be established that was about the size of a book of matches or smaller to permit a pair of device to fit in the box. This does not make sense for desksize systems because LARGE DRAM is perhaps cheeper and tricks like readahead daemons and a revisit of the sticky bit would make it moot.

    --
    Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
  75. Facebook just released one by charlesnw · · Score: 1

    http://perspectives.mvdirona.com/2010/04/29/FacebookFlashcache.aspx Maybe this ask /. article prompted them to release it?

    --
    Charles Wyble System Engineer