Anatomy of Linux Kernel Shared Memory

← Back to Stories (view on slashdot.org)

Anatomy of Linux Kernel Shared Memory

Posted by kdawson on Saturday April 17, 2010 @08:51AM from the culture-by-another-name dept.

An anonymous reader sends in an IBM DeveloperWorks backgrounder on Kernel Shared Memory in the 2.6.32 Linux kernel. KSM allows the hypervisor to increase the number of concurrent virtual machines by consolidating identical memory pages. The article covers the ideas behind KSM (such as storage de-duplication), its implementation, and how you manage it.

14 of 93 comments (clear)

This Is Just One Reason ... by WrongSizeGlass · 2010-04-17 09:09 · Score: 4, Interesting

... why OSS is the way things should be. You'll never see this type of documentation, and this type of detail, available to anyone and everyone from closed source software. I love my Mac, and supporting Windows pays my bills, but OSS is unlike any other animal out there.
1. Re:This Is Just One Reason ... by abigor · 2010-04-17 09:41 · Score: 4, Informative
  
  OS X's kernel is open source (BSD license) and very well documented.
2. Re:This Is Just One Reason ... by mswhippingboy · 2010-04-17 12:18 · Score: 3, Insightful
  
  far better documentation from MS and Apple than Linux has ever had.
  Have you ever looked at Amazon or InformIT/Safari or any technical documentation vendor or website? There are enough books and articles on MS and Linux to keep you reading for many lifetimes (Apple not so much, but still plenty by my estimate). It's just a FACT that there are certain things that closed source vendors do not disclose as a matter of trade secret or intellectual property, which is what I believe WrongSizeGlass was referring to. OSS does not have this limitation.
  
  And no, the source doesn't count if no one knows what you intended to do
  It absolutely does count, if you know how to read code. You can read documentation all day long, but if it doesn't match the code you'll be lost.
  When I get involved in a rewrite/upgrade/modification of an existing application (and I've worked on many much larger than most OSS apps, and I've been doing it for over 30 years - so it think I have a clue), I read the docs for a quick overview, but I read the code to find out how it really works.
  I've spent plenty of time with both msdn.microsoft.com (and developer.apple.com to a lesser degree) over the years and while there are loads of information there, I know "everything" is not there. There are API documentation discrepancies, quirks, or various undocumented oddities that can be frustrating to work around. These issues can only be resolved one of two ways - trial and error or by reading the code.
  With closed source there is only one way - pray the docs are correct. With open source there are usually two ways and if the documentation is shitty, you can be sure the code is correct. That makes my life as a developer easier.
  
  Get a clue fanboy.
  Wow. I'm sure you won him over with that jab. First an outpouring of pomposity and narcissism, then a put-down to finish it off.
  Good job dude.
  
  --
  Sometimes the light at the end of the tunnel is the headlight of an oncoming train.
3. Re:This Is Just One Reason ... by Saint+Stephen · 2010-04-17 15:35 · Score: 2, Insightful
  
  Take it from a guy who's seen the NT source code: Inside Windows 2000, the windows kernel debuggers, and a firewire cable gave you MORE than enough detail; there's not much important that's not publicly known.
  It just doesn't make Slashdot or the sites you frequent. How do you think Windows Device Driver writers do their job?
4. Re:This Is Just One Reason ... by Anonymous Coward · 2010-04-17 15:54 · Score: 2, Insightful
  
  How do you think Windows Device Driver writers do their job?
  Very badly if my experience is anything to go on.
Re:First Post by Anonymous Coward · 2010-04-17 09:15 · Score: 2, Informative

VMWare? Fuck, this has been around for decades in the case of OS/360.
Re:First Post by Anonymous Coward · 2010-04-17 09:27 · Score: 5, Informative

From the article:
"Going further
Linux is not alone in using page sharing to improve memory efficiency, but it is unique in its implementation as an operating system feature. VMware's ESX server hypervisor provides this feature under the name Transparent Page Sharing (TPS), while XEN calls it Memory CoW. But whatever the name or implementation, the feature provides better memory utilization, allowing the operating system (or hypervisor, in the case of KVM) to over-commit memory to support greater numbers of applications or VMs. You can find KSM—and many other interesting features—in the latest 2.6.32 Linux kernel."
Re:First Post by Abcd1234 · 2010-04-17 09:34 · Score: 5, Insightful

If your OS isn't sharing duplicate memory blocks already, you're using a shitty OS. (Linux already shares dup read only blocks for many things, like most modern OSes).
Umm, no.
Most modern OSes share memory for executable images and shared libraries. In addition, some OSes, such as Linux, support copy-on-write semantics for memory pages in child processes created with fork (note, Solaris is an example of an OS that *doesn't* do this).
Aside from that, there is no automated sharing of memory between processes. Frankly, I have no idea where you got the idea there was.
Re:First Post by Anpheus · 2010-04-17 10:24 · Score: 3, Interesting

For now, at least. VMWare doesn't support combining pages >= 2MB because the overhead (hit rate on finding duplicates versus the cost of searching for duplicates) and I suspect other hypervisors will do the same. Additionally, Intel and AMD are both moving to support 1GB page tables. What are the odds that you'll start up two VMs and their first 1GB of memory will remain identical for very long?
The only way I see page sharing working in the future is if the hypervisor inspects the nested pages down to the VM level, which will typically be the 4KB pages we know and love. Either that, or paravirtualization support needs to exist for loading common code and objects into a shared pool.
Even so, there's a lot of overhead from inspecting (hashing and then comparing) pages which will only grow as memory sizes grow. If we increase page sizes, the hit rate decreases and the overhead of copy-on-write increases. It's not a good situation.
Sources: Performance Best Practices for vSphere 4 which references Large Page Performance which states:

In ESX Server 3.5 and ESX Server 3i v3.5, large pages cannot be shared as copyonwrite pages. This means, the ESX Server page sharing technique might share less memory when large pages are used instead of small pages. In order to recover from nonsharable large pages, ESX Server uses a “sharebeforeswap” technique. When free machine memory is low and before swapping happens, the ESX Server kernel attempts to share identical small pages even if they are parts of large pages. As a result, the candidate large pages on the host machine are broken into small pages. In rare cases, you might experience performance issues with large pages. If this happens, you can disable large page support for the entire ESX Server host or for the individual virtual machine.
That is, page sharing involves breaking up large pages, negating their performance benefit and is only used as a last ditch when you've overcommited memory and you're nearly to the point of having to hit the disk. And VMWare overcommit is great until you hit the disk, then it's a nightmare.
Re:Linux is a joke by icebraining · 2010-04-17 11:15 · Score: 2, Funny

I start daemons myself, you insensitive clod!

--
Dilbert RSS feed
Re:First Post by Trepidity · 2010-04-17 14:04 · Score: 2, Insightful

This article claims that on Solaris, "fork() has been improved over the years to use the COW (copy-on-write) semantics". It's sort of an in-passing comment, though, and I can't find a definitive statement in docs anywhere (the Solaris fork() manpage doesn't give any details).

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Re:First Post by bzipitidoo · 2010-04-17 14:09 · Score: 2, Interesting

Personally, I find memory compression more interesting than just deduplication, which could be considered a subset of compression. The idea has been around for years. There used to be a commercial product for Windows 95 that claimed to compress the contents of RAM, but which had many serious problems, such as that the software didn't work, didn't even attempt compression, and was basically "fraudware". The main problem with the idea was it proved extremely difficult for a compression algorithm to beat the speed of just doing everything without compression. Gave the idea of memory compression a bad reputation.
Now we have LZO, an algorithm that has relatively poor compression, but which is very, very fast-- fast enough to beat the speed of a straight memcopy. 15 years ago, there wasn't any compression algorithm fast enough for this application. Also, I'm thinking memory is slower relative to processors than 15 years ago, as that would provide incentive to increase the size and sophistication of CPU caches, which has happened. (100 MHz, 32bit Pentium with 33 MHz RAM then, vs 3 GHz, 64 bit multi core CPUs with 800 MHz DDR2 RAM today) Now CPU caches are plenty large enough to handle many 4K pages.
Still, deduplication could have many other cool uses. Friend of mine once hacked up some disk deduplication software for the Apple II. All it did was crosslink identical sectors it found on a disk, and then mark the duplicates as free. There was no provision for counting the number of links or anything like that, in case you wanted to change the contents of the disk. Just had to be aware this had been done to the disk. Ultimately proved more trouble than it was worth, but it was a nice thought for teenagers desperate for more disk space.

--
Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
Re:First Post by Abcd1234 · 2010-04-17 17:33 · Score: 4, Insightful

Aside from all the places that memory is shared between processes, theres no sharing between processes ... yea, I totally get you ...
That's exactly right. I pointed out all the places. *All of them*. And there's *two*: shared, read-only executable pages, and the heaps of children created by COW-enabled forks. That's it. That's all.
So any new technology for memory de-duping is impressive because, traditionally, it just ain't done. Which directly contradicts the content of your original post.
Perhaps now you understand?
Re:First Post by nabsltd · 2010-04-18 02:00 · Score: 2, Informative

So any new technology for memory de-duping is impressive because, traditionally, it just ain't done. Which directly contradicts the content of your original post.
For those who are still confused, the big difference between the various shared library-type schemes and memory de-dupilication is passive vs. active.
Shared libraries (or executables) take advantage of the fact that when you load an program multiple times, the same bits are obviously being loaded each time and so it's just a reference count increment.
For memory de-duplication, during idle times, the hypervisor creates hashes of all the used memory pages and if any duplicates are found they are replaced with the same sort of reference count as in the shared library approach. This would allow me to do something like copy the "vi" command to my home directory and rename it "edit", but the system will figure out that it's the same pages as the real "vi", so those pages will only be in memory once for all users.
As far as I know, only hypervisors use the active memory de-duplication approach...no regular OS does what I suggested in my "vi" example.