Understanding Memory Usage On Linux

Not only shared libraries by pontus · 2006-02-06 01:09 · Score: 5, Informative

Nice article.
It could also have mentioned mappings on /dev. For example, the X server, on a system with a 256MB graphics adapter, will map all that memory into its address space, making X look huge, even though it's not using all that much system RAM. This will show up as a device-backed mapping in the maps file.
On a related note, X also looks big because it's holding pixmaps belonging to various applications (Firefox comes to mind).

Re:Not only shared libraries by ratboy666 · 2006-02-06 04:52 · Score: 5, Informative

The "problem" is the concept of a COW page (copy on write). Coupled with the semantics of mmap().

In a nutshell: I can use mmap() to map /dev/zero into memory, for (pretty much) as big as I want. 200MB? Its now mine.

I can have a pointer to this memory.

The problem? The memory doesn't exist. What I have is a pointer, and a guarantee that enough backing store exists to satisfy it.

If I read through that pointer, I will see zeros. It *is* /dev/zero after all. However, I can write into the memory. If I write something, the page that is changed is copied and replaced; taking memory AT THAT TIME. Sparsely.

The mmap() call can map a file (backing store) and allow data to be shared. Memory does not need to be used until the data is read (or written). And this time, the backing store doesn't even need swap (because the file is the backing store).

All of which means non-changeable may be altered. Changeable may be non-existent or shared. Try to teach that to your DG tools.

A page of code that is shared - may becomes a page of code that is private. A page of data that is unwritten doesn't have to exist. Even if it is read! A page of data that is written may STILL be shared.

"ps" and the other tools could walk through typical process maps, counting up pages, and figuring out what each was for, but that may be a bit too intensive. The pages aren't "cross referenced" for that purpose. Besides, the page could be COWd, and then swapped. Should THAT count against the memory of the application? Maybe, maybe not.

So "ps" by default gives you an idea of the "big picture" for each process.

Ratboy

--
Just another "Cubible(sic) Joe" 2 17 3061

Re:The only thing running by chris+macura · 2006-02-06 01:22 · Score: 2, Informative

A typical C/C++ based app uses just as much memory, it's just shared between processes...

That's the point. Nobody cares about how much actual memory a C/C++ app touches.

Making Java "part of the system" won't help much either because the libraries aren't the same. You could argue that at the bottom of the pyramid its still libc that's being used, but we still have and need all the wrappers on top of the library to make it compatible with Java code.

So until people find it normal to run more than one or two java applications at once, Java will be deemed a memory hog. It's sort of a rut that Java is in right now, because nobody would really run more than two Java applications at once. My computer—granted a 5-year old 1.7ghz P4 with 386mb of RAM—can barely handle Eclipse at any reasonable speed. God forbid I also run something else)

Linux file & memory management shines by carribeiro · 2006-02-06 01:24 · Score: 2, Informative

Linux (and to be fair, Unix-like systems in general) shine at file & memory management. Many people don't know, but executable files are not 'loaded' in the Windows sense - they're just mapped into memory. This design improves performance and gives the system better performance under swapping (not thrashing, mind you). Things like mem mapped files are integral in the way the system is designed and implemented. That's one of the very reasons why a Linux machine usually runs faster and more reliable than a equivalent Windows machine... even if has less memory. The Apache tuning example is great, and it shows how much performance you can squeeze out of a good design.

Re:Linux file & memory management shines by Anonymous Coward · 2006-02-06 01:37 · Score: 5, Informative

Many people don't know, but executable files are not 'loaded' in the Windows sense - they're just mapped into memory.

Apparently, some people don't know that modern NT-based Windows versions also behave in exactly the same manner.
Re:Linux file & memory management shines by JesseMcDonald · 2006-02-06 02:25 · Score: 4, Informative

Actually, modern runtime linkers use a table of offsets rather than embedding the relocated symbol addresses directly into the executable code, and the relocations themselves are handled by mapping the file contents into virtual memory at the necessary addresses. With those two techniques combined, it is almost never necessary for the in-memory version of the executable to differ from its on-disk representation where the code and constant-data sections are concerned. When a typical application begins execution, nearly all of its virtual memory will be mapped directly onto the executable file and its shared libraries, and loaded on demand. The initialized-data section must be copied into virtual memory, and the uninitialized-data section and the stack are typically allocated as they are accessed on a page-by-page basis. Aside from a handful of housekeeping data for the linker and the C libraries, the rest of the virtual memory consists of read-only memory-mapped files.

--
"The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
Re:Linux file & memory management shines by Anonymous Coward · 2006-02-06 02:33 · Score: 3, Informative

Please do yourself a favor and educate yourself before making any future bogus claims.

The following two articles respectively deal with executable and libary loading in Windows:
http://msdn.microsoft.com/msdnmag/issues/02/02/PE/
http://msdn.microsoft.com/msdnmag/issues/02/03/Loa der/

"Tuning Apache" is also excellent by volts · 2006-02-06 01:27 · Score: 4, Informative

Devin's blog also has an excellent posting on Apache performance. "Tuning Apache, part 1" (and the comments) is the sort of succinct empirical advice it is always nice to find.

Re:Before you start bitch about Firefox memory lea by arkanes · 2006-02-06 01:57 · Score: 4, Informative

About 8 months back I attempted to embed Gecko within an existing graphical user interface toolkit. Having heard so much from the open source community about how easy it was to do, I thought it would go rather quickly.

I'm kinda curious who you heard that from. Embedding Mozilla when you've got an already existing binding (such as for Gtk) is trivial, but writing the binding from scratch is no easy task. Gecko is a beast and the need to integrate its own drawing layer with yours makes it hard to integrate as an embedded browser. In its defense, it was never designed or intended for such a purpose. KHTML is only easier if you're using Qt (and you *did* obey the license, right?), otherwise you need to provide mappings from all the Qt primitives used by KHTML to your own. Easier than embedding Gecko, but still not trivial.

Re:The only thing running by jsight · 2006-02-06 02:08 · Score: 3, Informative

Update your knowledge.

Java has concurrent GCs now that do not freeze the entire VM while being run. And I've never seen the GC go "out of whack" and hang permanently (though I've seen many apps do this due to poor thread/resource management).

--
Throw the bums out!

Re:Extra, extra, read all about it by tgv · 2006-02-06 02:40 · Score: 2, Informative

Because there is not just one moderator. Everybody can moderate. So there are always a few people who think that's funny. But by not being an Anonymous Coward, but logging in instead, you can set a threshold to all posts, which will exclude most of them...

Re:A practical measure and perspective. by CastrTroy · 2006-02-06 02:46 · Score: 2, Informative

I run a P2, 266 at home, with 256 MB of RAM. KDE 3.4 runs pretty slow. I've turned off a lot of the eye candy, but still the response time is quite slow. Windows 2000 on the other hand is quite speedy, I can't speak for windows XP, because I don't run it. The problem is, is that this isn't really a fair comparison, as the Windows 2000 UI, it more comparable to something like sawfish. Well, the look is similar, but Even straight X Windows has a better feature set. So, I could use Sawfish, but If I start up a KDE Program, then it takes forever just to start it up.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.

Memory Management by johnnyb · 2006-02-06 03:14 · Score: 3, Informative

On a related note, if anyone is curious how memory management library calls such as "malloc" work, you might check out my article on the subject.

--
Engineering and the Ultimate

top by Kupek · 2006-02-06 03:18 · Score: 5, Informative

Run top. Check out the column that says SHR. Subtract it from VIRT if you want to know the virtual memory usage of a process excluding shared libraries, or subtract it from RES if you want to know the physical memory usage of a process excluding shared libraries. Problem solved.

I don't like how he phrases that what ps reports is "wrong." It's not wrong, or even "wrong." It reports exactly what Linux tells it (through the proc filesystem). It's just might not be what you expect it to be, which means you don't understand the tools and the system. When ps reports that a process' virtual memory usage is xKb, that is correct. In the address space for the process, xKb have been allocated. Shared or not, they're still in the address space.

Re:My own favorite is 'top'. by jallen02 · 2006-02-06 03:21 · Score: 4, Informative

Load and CPU usage are different things. Load is a very tricky topic. The gist of it is that it is the average number or processes that were waiting to do some amount of processing. It is then scaled based on a logarithmic algorithm to give you a rough picture of what is happening. So lets say you have an SMTP server with a dozen processes all trying to disk access and the disk is also busy updating its locate database. Your disk is hammered. Your processor is not. But you have so many processes competing for IO that it bogs down the process scheduling eventually, which can make everything sluggish. Your CPU usage might not be heavy, but that doesn't mean the system isn't bogged down trying to do other things. CPU usage is an important part of system load, but not the only thing going into it.

Jeremy

More tips by typical · 2006-02-06 03:27 · Score: 5, Informative

The thing is, when you fork it maps the memmory and marks everything as copy on write, when something needs to write to part of the memmory, then it will make the copy for each process.

A couple other tips:

* Each thread in a process shows up as consuming the same amount of memory (either this only happens under Linuxthreads or I don't have any threaded applications running on my system).

* Device mappings show up as consumed memory (which generates plenty of XFree86/xorg complaints). If you want to find out how much memory xorg/X11 is actually using (bytes in cached pixmaps on behalf of each process and sans device mappings), try this program (contains a tiny program that lists how much memory X is using for other programs by caching pixmaps and a perl script that lists how much memory X is using sans device mappings).

* The article mentions the fact that shared libraries show up in every application's memory usage. So, for example, glibc alone adds 1.5MB to the memory usage of every process. But Win folks may not realize how significant this is. Most Windows applications ship with their own copies of almost all shared libraries used, which means that there is a huge amount of wasted memory under Windows that *actually affects you*. Under Linux, instead of shipping shared libraries with applications, folks have built tools to automatically download the latest shared libraries and use those across multiple applications. Result -- only one copy of the library need be in memory at a time. This means that it's actually reasonable to run a box with 128MB of memory and three remote users using the thing. You simply can't pull that under Windows and expect usability.

* This may not sound significant, but Linux's VM is (anecdotal evidence, of course) really solid. When I run out of memory under Windows, performance rapidly degrades -- bring an application to the foreground, and the system just starts churning. Under Linux, you can push a ways into VM and things generally keep functioning pretty well (this is one of the causes of people talking about "applications loading faster under WINE than Windows" when they're trying to prove that WINE is 'faster' than Windows -- good disk I/O and VM code).

--
Any program relying on (nontrivial) preemptive multithreading will be buggy.

Re:More tips by runderwo · 2006-02-06 06:52 · Score: 2, Informative

The TLB is nothing more than a page table cache. On IA-32, a program has no control or ability to view the contents of the TLB besides to flush it via CR3. Saying you can look into it is like claiming you can look into L1 or L2 cache via your program (notwithstanding exceptions such as cache-as-RAM during firmware initialization). The only way you can know the contents of such caches is to know what memory accesses are performed in what order, so if you have that knowledge, then yes you could "look" into the cache using it. But I don't see how that is useful for a mechanism such as you claimed the Windows VMM implements.

--
LRC, the best-read libertarian site on the web
Re:More tips by Kupek · 2006-02-06 07:13 · Score: 2, Informative

Each thread in a process shows up as consuming the same amount of memory (either this only happens under Linuxthreads or I don't have any threaded applications running on my system).

Under LinuxThreads, each thread had its own PID. Under NPTL (Native POSIX Thread Library) all threads from the same process share the same PID, but each thread has a unique TID (which you can get with the Linux specific call gettid()). Calling getconf GNU_LIBPTHREAD_VERSION from a prompt should tell you what library and version you're running for pthread support.

Anyway, this is a round-a-bout way of saying you're right. Since LinuxThreads uses a unique PID for each thread, if you queried the kernel for memory info, it would tell you that each process (thread) was invidivually consuming xKb. That's non-intuitive behavior, but I think the blame belonged to LinuxThreads, not the kernel; LinuxThreads was abusing the concept of a PID. Thankfully this has been changed in NPTL.

Loading unneccessary libraries by arth1 · 2006-02-06 03:27 · Score: 2, Informative

What gets me is how some distro builders see a security warning about setuid/setgid binaries using lazy so loading, and decide that using -Wl,-z,now is a good thing to add. Excuse me, but that will pull in EVERY library at link time, whether used or not, often leading to some MAJOR bloat.
Yes, it "fixes" the "problem", but so would using rpath to DSOs not writable by users or ensuring that LD_LIBRARY_PATH doesn't point to user writable directories. Without the load time bloat.

Regards,
--
*Art

Re:A practical measure and perspective. by Trelane · 2006-02-06 04:13 · Score: 3, Informative

Linux is somewhere in between the two. It doesn't go to swap quite as early as Solaris, and also not as late as FreeBSD.

It's also quite informative to note that the swappiness of the Linux kernel may be changed dynamically, via /proc/sys/vm/swappiness.

--

--
Given enough personal experience, all stereotypes are shallow.

Re:man page update by TallMatthew · 2006-02-06 04:28 · Score: 3, Informative

How about going one step further than just blogging about it and actually submitting a documentation update to the ps man page. That way future confusion of the ps output could be avoided.

Because what ps reports is the truth, from a certain point of view.

Re:My own favorite is 'top'. by Corgha · 2006-02-06 04:41 · Score: 2, Informative

The "feature" that I find annoying about top, though it's really rather necessary for a CLI program, is that only the most CPU-intensive programs at a given instant get to the top. [...] I find that KSysGuard works pretty well for this, since the processes all stay in the same place

This has nothing to do with CLI vs GUI programs, and everything to do with what you're choosing to sort by. You can change the sort order in top.

If you sort by PID or process name or something else less volatile than CPU percentages, the processes all stay in the same place in top, too. However, if you're looking for programs that are using a lot of CPU over time, it's probably worth sorting by cumulative CPU time instead.

Read the man page or the interactive help (hit "?").

200 instances and 170 megs by gini_ · 2006-02-06 05:05 · Score: 2, Informative

That is how it should be read I think. To start 200 instances of your Java proggie you pretty much did the same thing as starting 200 threads in single virtual machine. These threads show in ps output as operating system processes and they map entire address space of virtual machine which is why their sizes are identical.

Memory usage of Java actually scales very nicely with silly number of threads. A couple of months ago I created a small server which opened lots of listener sockets in their own threads.

With one thread the size of the virtual machine about 40 megs which pretty much for a simple application but when I created more server threads the amount of added memory was very small. With 100 listener threads it was like 60 megs, with 400 it was 80 megs and finally with 3000 server threads the amount of used memory was only 290megs!
It is true that these threads were not actually doing anyting except listening on their sockets but I thing it is very impressive nevertheless.

mod parent Overrated by Darkforge · 2006-02-06 08:27 · Score: 2, Informative

That's just not true, as someone else has swillden points out in this comment to the current story. Nobody should follow your suggestion.

Based on your over-simplified claim (which I'll call "wrong") the 43 java threads on my Tomcat box are using 3.0GB of RAM total, minus 426MB shared, which is impossible on a box with 256MB of RAM and 512MB swap.

More generally, the problem with ps (and top) is that they fail to highlight the most important piece of information: the amount of unshared memory each process is using, or, as TFA calls it, the "marginal cost" of each process.

Instead, they give you the total memory available to each process. That number is irrelevant to a user of that process. It won't tell you, for example, how much memory you'd save if you killed off any given process. It won't even tell you how much total memory (shared+unshared) that process is using... as others have pointed out, ps's number includes unused copy-on-write device-mapped memory.

ps is at best deceptive, if not actually wrong.

--

When I moderate, I only use "-1, Overrated". That way, I never get meta-moderated!

Slashdot Mirror

Understanding Memory Usage On Linux

24 of 248 comments (clear)