Understanding Memory Usage On Linux
Percy_Blakeney writes "Have you ever wondered why a simple text editor on Linux can use dozens of megabytes of memory? A recent blog posting explains how the output of the ps tool is misleading and how you can get a better idea of how much memory a process really uses."
Nice article. /dev. For example, the X server, on a system with a 256MB graphics adapter, will map all that memory into its address space, making X look huge, even though it's not using all that much system RAM. This will show up as a device-backed mapping in the maps file.
It could also have mentioned mappings on
On a related note, X also looks big because it's holding pixmaps belonging to various applications (Firefox comes to mind).
How can they diminish EMACS like that ?
Try statically linking a program that uses just a few glibc calls and it's pushing 800k. Now add in libc++, Qt/gtk, Xlib, kde, boost, xml, etc and you're talking a lot of memory. This is what gets me about people who say "well Java performs okay now, but it uses so much memory".
A typical C/C++ based app uses just as much memory, it's just shared between processes. And for that matter, startup time of the first thing using kde/gnome isn't all that great either. Isn't it about time some effort was put into making Java or Mono part of the system, so it can be shared like C apps do?
...no process is using more than 10% of the available resources...
But hey, 10 processes are using 10%...
Hey! That's my sig you're smoking there!
How about going one step further than just blogging about it and actually submitting a documentation update to the ps man page. That way future confusion of the ps output could be avoided. Of course I guess people have to actually read the man page (In honor of slashdot, I didn't read it before posting this comment ;-)
His next story will cover that.
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
Firefox does indeed suffer from some very serious memory management-related issues. For anyone who doesn't have an ideological connection to the project, it's obvious why that is.
About 8 months back I attempted to embed Gecko within an existing graphical user interface toolkit. Having heard so much from the open source community about how easy it was to do, I thought it would go rather quickly. Of course, it did not. The lack of up-to-date documentation (if such documentation there at all) and solid examples were some of the big problems.
But the overall architecture struck me as the worst part of Mozilla. Like it or not, it's overly complicated and convoluted in many areas. I admit that it's not easy to build well-designed software, but they so completely missed the boat it's unbelievable. However, it does make it obvious as to why many people complain about Firefox and Seamonkey running so slowly, in addition to suffering from huge memory consumption.
As for the embedding of Gecko, I said to hell with it. I took a page from Apple, and used KHTML instead. The loss in portability by not going with Gecko was well worth the far quicker development time, the lower memory consumption, the increased responsiveness, and the higher degree of stability of KHTML.
Cyric Zndovzny at your service.
Devin's blog also has an excellent posting on Apache performance. "Tuning Apache, part 1" (and the comments) is the sort of succinct empirical advice it is always nice to find.
Typical Slashdot response, blame the users for the browser's bloat. 99% of the users of Firefox are not programmers and wouldn't have the slightest clue what is going on. They just want to look at porn without popups or getting infected with spyware via IE's ActiveX vulnerabilities. Asking them to download some script, set environment variables, and then file bug reports is unrealistic since most of them can't even tell the difference between a web browser and a web site. That's what beta testers are supposed to be doing but we all know that 90% of the beta testers never bother to file any bug reports, even when the browser crashes.
Apparently, some people don't know that modern NT-based Windows versions also behave in exactly the same manner.
Indeed. You're completely correct. If the Mozilla crew want to take on Internet Explorer, then they can't have the general public debugging their software for them. That just won't fly.
Part of the problem is that it's far too easy for bugs to creep into Mozilla. The code is a small step above horrible, and the architecture isn't much better. A lack of up-to-date documentation leads to programmers not knowing which XPCOM interfaces are deprecated, and which aren't.
You can look at browsers like Konqueror and Opera, which offer a very comparable feature set to Firefox, yet do no suffer from the drawbacks. Not only that, but Konqueror and Opera are often described as feeling far more responsive, while being extremely stable. It's things like that which really impress the average Jill and Joe. Excessive memory usage will just perplex them, and likely result in them going back to Internet Explorer.
Cyric Zndovzny at your service.
Top will show you the same as ps does, ps calls /proc//statm and asks whats going on. The problem on linux is the copy on write principle wich saves heaps of memmory, but makes it virtually impossible to figure out what belongs to what. The thing is, when you fork it maps the memmory and marks everything as copy on write, when something needs to write to part of the memmory, then it will make the copy for each process.
However asking the process how much memory it has allocated will show all memory including stuff that is marked copy on write - that is, I could have 100 processes showing they each use 1.4MB of memory, because they all share the same libray, but in fact, its the same copy they are all using so I'm only using 1.4 MB instead of 140MB (+PCB et. al)
I'm kinda curious who you heard that from. Embedding Mozilla when you've got an already existing binding (such as for Gtk) is trivial, but writing the binding from scratch is no easy task. Gecko is a beast and the need to integrate its own drawing layer with yours makes it hard to integrate as an embedded browser. In its defense, it was never designed or intended for such a purpose. KHTML is only easier if you're using Qt (and you *did* obey the license, right?), otherwise you need to provide mappings from all the Qt primitives used by KHTML to your own. Easier than embedding Gecko, but still not trivial.
What I really wanna understand is the memory usage in Windows.
w00t
A nice article, been looking for more information on this. So often you read items in program FAQs or such giving a disclaimer on how ps memory usage is misleading, but they offer no better way. Okay, so ps memory usage information is pratically useless; now what am I supposed to use?
I was hoping for a bit more, though; like, say, a small program that lets you see both the aggregate virtual memory total as well as the memory used specifically by the program. Add a few options for how to handle the only-one-app-using-a-library situation. Doesn't seem like it'd be that hard, and very useful.
Is not that there's not a perfect tool, the problem is that it's a problem which is impossible to solve properly as I see it
Take a shared library. For whatever reason, process 1 uses only the first half of the library. Thanks to demand-loading, only that half is loaded in mem, and that's what accounts as RSS for that process, say 10 MB.
Now a process 2 is launched and it uses the other half of the library. Now, all the library is loading in memory, and even if the first process is not using and has not requested to use the second half, its RSS will grown because somebody else use other parts of the library.
I don't think it's something you can or want to "solve": That's a consequence of the design ideas behind shared libraries. Deal with it.
Actually, modern runtime linkers use a table of offsets rather than embedding the relocated symbol addresses directly into the executable code, and the relocations themselves are handled by mapping the file contents into virtual memory at the necessary addresses. With those two techniques combined, it is almost never necessary for the in-memory version of the executable to differ from its on-disk representation where the code and constant-data sections are concerned. When a typical application begins execution, nearly all of its virtual memory will be mapped directly onto the executable file and its shared libraries, and loaded on demand. The initialized-data section must be copied into virtual memory, and the uninitialized-data section and the stack are typically allocated as they are accessed on a page-by-page basis. Aside from a handful of housekeeping data for the linker and the C libraries, the rest of the virtual memory consists of read-only memory-mapped files.
"The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
Please do yourself a favor and educate yourself before making any future bogus claims.
/ a der/
The following two articles respectively deal with executable and libary loading in Windows:
http://msdn.microsoft.com/msdnmag/issues/02/02/PE
http://msdn.microsoft.com/msdnmag/issues/02/03/Lo
So, people don't know how to interpret the output of ps? And that's a Slashdot frontpage story?
Slashdot isn't only about breaking tech news; it's about keeping geeks generally informed. Many Linux geeks (including myself) probably learned something from the article that they didn't know. It's a well-written, informative article, and I'm glad Slashdot posted it because otherwise I probably would have never seen it. Not every Slashdotter already knows everything there is to know about Linux like you apparently do, and I imagine this isn't quite "common knowledge," so it's helpful for some of us.
What have I done wrong in my settings to deserve such trivial items?
No one forced you to click on "Read More." Sorry that you wasted a couple seconds reading the summary and realizing you already knew all about ps, but you didn't need to waste even more of your time trolling.
I have discovered a truly remarkable proof of this theorem that this sig is too small to contain.
On a related note, if anyone is curious how memory management library calls such as "malloc" work, you might check out my article on the subject.
Engineering and the Ultimate
Run top. Check out the column that says SHR. Subtract it from VIRT if you want to know the virtual memory usage of a process excluding shared libraries, or subtract it from RES if you want to know the physical memory usage of a process excluding shared libraries. Problem solved.
I don't like how he phrases that what ps reports is "wrong." It's not wrong, or even "wrong." It reports exactly what Linux tells it (through the proc filesystem). It's just might not be what you expect it to be, which means you don't understand the tools and the system. When ps reports that a process' virtual memory usage is xKb, that is correct. In the address space for the process, xKb have been allocated. Shared or not, they're still in the address space.
The first person to use a system might load 128Mb worth of libraries and applications. The second and all subsequent users may only use 15-30Mb worth of RAM for each additional user. e.g. A 1Gb RAM system could handle 30 concurrent users rather than 8.
Deleted
Load and CPU usage are different things. Load is a very tricky topic. The gist of it is that it is the average number or processes that were waiting to do some amount of processing. It is then scaled based on a logarithmic algorithm to give you a rough picture of what is happening. So lets say you have an SMTP server with a dozen processes all trying to disk access and the disk is also busy updating its locate database. Your disk is hammered. Your processor is not. But you have so many processes competing for IO that it bogs down the process scheduling eventually, which can make everything sluggish. Your CPU usage might not be heavy, but that doesn't mean the system isn't bogged down trying to do other things. CPU usage is an important part of system load, but not the only thing going into it.
Jeremy
The thing is, when you fork it maps the memmory and marks everything as copy on write, when something needs to write to part of the memmory, then it will make the copy for each process.
A couple other tips:
* Each thread in a process shows up as consuming the same amount of memory (either this only happens under Linuxthreads or I don't have any threaded applications running on my system).
* Device mappings show up as consumed memory (which generates plenty of XFree86/xorg complaints). If you want to find out how much memory xorg/X11 is actually using (bytes in cached pixmaps on behalf of each process and sans device mappings), try this program (contains a tiny program that lists how much memory X is using for other programs by caching pixmaps and a perl script that lists how much memory X is using sans device mappings).
* The article mentions the fact that shared libraries show up in every application's memory usage. So, for example, glibc alone adds 1.5MB to the memory usage of every process. But Win folks may not realize how significant this is. Most Windows applications ship with their own copies of almost all shared libraries used, which means that there is a huge amount of wasted memory under Windows that *actually affects you*. Under Linux, instead of shipping shared libraries with applications, folks have built tools to automatically download the latest shared libraries and use those across multiple applications. Result -- only one copy of the library need be in memory at a time. This means that it's actually reasonable to run a box with 128MB of memory and three remote users using the thing. You simply can't pull that under Windows and expect usability.
* This may not sound significant, but Linux's VM is (anecdotal evidence, of course) really solid. When I run out of memory under Windows, performance rapidly degrades -- bring an application to the foreground, and the system just starts churning. Under Linux, you can push a ways into VM and things generally keep functioning pretty well (this is one of the causes of people talking about "applications loading faster under WINE than Windows" when they're trying to prove that WINE is 'faster' than Windows -- good disk I/O and VM code).
Any program relying on (nontrivial) preemptive multithreading will be buggy.
When you can remember running 60 users on a mainframe with about 1MB RAM and a processor no faster than a 386.
--
Given enough personal experience, all stereotypes are shallow.
The closest I've come to dealing with it was writing exmap.
This is a (moderatly ugly) gtk+ tool which uses a loadable kernel module to work out which pages are used by more than one process. If a page is used by N processes, each process is credited with PAGE_SIZE/N bytes.
I believe it "solves" the problem you describe above. The biggest problem is that it provides a little too much information, so perhaps I should simplify it a bit.
(Known problems with current 0.8 version: some of the tests fail intermittently and some systems with pre-linked elf binaries can cause errors. Should fix up both with the next release).