Iconization is a fairly fundamental and common
window manager thing. I don't know how easy it is
to configure something like sawfish to support this
mode of operation, but it is generally the default
for window managers like fvwm, twm, or even
mwm. I would not be surprised if many modern
alternate window managers supported powerful ways
of controlling icons and iconification behavior;
I know fvwm does.
Unfortunately, the best I could manage with
a current Wine version was a very limping
QuickTime player with major speed problems.
VMWare managed better but still could only play
the smallest size of Quicktime trailers with
decent results (larger ones had delayed video
or breaking up audio). And this was on fairly
good hardware.
Since the CodeWeavers plugin demonstrates
that the basic Wine technology can do the job,
presumably it is possible to do it by hand.
Somehow. An energetic person could probably
get themselves a certain amount of fame by
writing up a HOWTO on this; there's a likely
a fair amount of
pent-up demand for good QuickTime playback
on Linux.
Time to come up after a crash (either power
failure or software) is only relevant if that
happens to the filesystem very much. Certainly
our filesystems on our file servers won't
experience that more than once in a blue moon
(we have UPSes and run stable kernels); I
suspect more than a few people are in that
sort of environment. If crashes and powerfails
are at all common, you have larger problems than
filesystem performance.
One can certainly argue that in most
environments the performance drop (if any)
is worth paying for the crash recovery
(neglecting that several journalling
filesystems have some uncomfortable crash recovery
issues because they only journal metadata).
But that is a very different argument from
a claim that fast crash recovery makes
performance differences irrelevant.
This general issue goes under the name of
(network) path characterization, and is a
reasonably active research area. Usefully you
can get several programs that will do their
best to characterize the bandwidth and other
attributes of each step in your routing.
Nothing runs as fast as traceroute, but the
numbers are likely to be interesting.
The best starting point for available
programs that I have available offhand is
the home page for pchar, one of the
programs that does this. As well as pchar, the
page has a fairly large collection of links to
other similar and related programs. (Various
programs use somewhat different approachs and
math, and operate somewhat differently, so you
may want to use several to cross check the
results.)
The moral equivalent of -turbo for
X-based versions of Mozilla is the remote control
of a running Mozilla offered by the -remote
argument. (This is actually a feature inherited from
Netscape, from a long time back.)
Typical usage is mozilla -noraise -remote
'openURL(about:blank,new-window)' (disclaimer:
I use a standalone program to feed this to Mozilla,
not Mozilla itself, so Mozilla's syntax for this
may be slightly different). With some auxiliary
programs and some shell scripting you can
construct quite useful systems out of this;
I can highlight a URL in an xterm (or anywhere),
pick a menu entry from my root menu, and be
browsing that URL in a new Mozilla window in
moments.
Similar tricks can be played in Netscape.
Netscape's documentation on this can be found
here, along with the small standalone program
to do the remote control. There are some
differences between the documentation and current
Netscapes (and Mozilla), but nothing too hard to
figure out.
I am pretty sure that NeWS post-dates the
public availability of X. It may predate X11, but
X version 10 was out there quite early on and at
the time Sun had its own windowing system (as did
all vendors at the time). It appears that NeWS
came out around 1987; X goes back to at least 1986
and I believe the original Project Athena work is
probably from even earlier (and thus known, if
not yet distributed).
The core causes of NeWS's failure are probably
highly debateable, but I think a good part of it
is that Sun wanted to make as much money from NeWS
as possible, which meant they charged per-seat
license costs early on. Given there was a free
alternative, it's not hard to see how various
vendors made the choices that they did.
Other failures were partly technical; for
example, apparently NeWS did not have a fast,
functional terminal emulator (ala xterm) for a
quite long time. Since this is an important part
of early use of any windowing system on X, most
of the time, there were some knock-on effects in
people's interest in using NeWS.
The major workstation vendors (DEC, HP, IBM,
etc) funded various university projects to build
large scale environments for workstations because
it was pretty likely that they would get quite
useful technology out of it. (Although
in the end it didn't
quite work out that way.) MIT's Project Athena
is only one example; Project Andrew at CMU gave
us AFS, for example. Direct rivalry with Sun's
general products (NFS, for example) was not
really a priority, although it was probably
a factor.
The ice is thin, and the zero-copy patch is
slightly misnamed. In this case 'zero-copy' is
not 'zero copy from user space', but 'zero copy
in kernel space'; if there is a user space to
kernel space transition there is still one copy
involved. Zero copy in kernel space eliminates
all the copies in two cases:
for kernel level servers (NFS, for example), or
if the application is using sendfile().
Why not zero copy from user space too? Because
it turns out to be quite expensive in inobvious
ways, especially on SMP systems, because in order
to do it right you must play games with the page
permissions and associated things. On SMP systems,
this apparently requires cross-CPU synchronization
to insure that all CPUs pick up the correct new
page permissions -- which, as you can imagine,
gets expensive.
In theory, glibc already transparently supports
this mechanism via something called versioned
symbols. Versioned symbols basically let the linker
say (when the executable is built) 'this requires
not just strcpy(), but a GLibc 2.0 compatable
strcpy()'. (You can see this relatively vividly
by running 'nm' on a shared library like libdb;
versioned symbols will have names like
'unlink@@GLIBC_2.0'.
The theory is that when the GLibc people
change the interface for something they provide
both the old routine and the new one with
different version names; one will be 'unlink@@GLIBC_2.0' and the other will be
'unlink@@GLIBC_2.1' or the like.
When an application is newly linked, it
presumably normally links to the highest
version; however, when an old binary is run
it gets the old 'unlink@@GLIBC_2.0' that it
expects.
This fails if either the GLibc people
don't realize that an interface has changed,
or the routine (or the interface detail)
is one they consider an 'internal' one, subject
to change without notice and backwards
compatability because no user program is
supposed to be using it (or depending on the
routine's undocumented behavior staying the
same). And of course they have to get the
compatability code right.
This also only works for executables
already fully linked; if you relink from
object files et al, you normally
get the highest version of the symbol defined,
even if you don't necessarily want that. (It
appears that there is no way to control this
behavior in ld, unfortunately.)
Direct IO and DMA to or from user space
gets discussed periodically on the Linux kernel
mailing list. The general commentary I've read
is that this is almost never a win until you are
dealing with huge amounts of memory. Since
doing this with huge amounts of memory is
fairly rare (especially doing this with huge
amounts of memory the driver can't supply to
the user, instead of the user supplying it to
the driver), code for this stays out of the
general kernel.
Why is it so expensive? Because to
do it properly you must do page table
manipulations (including TLB flushes et
al) and these are surprisingly expensive,
especially on multi-CPU machines (where you
must insure all CPUs have done the page table
and TLB updates before continuing). This makes
it faster to do the copying for most if not
almost all of the user-level write() and
read() operations.
This is one of the less obvious wins for
sendfile(), and why zero-copy TCP sendfile()
shows such a significant win: with sendfile()
you can avoid all those page table manipulations
because the data never becomes visible at the
user level.
I believe the Linus-approved solution for
things like video drivers is that the driver
should provide an address space that the user
can mmap() into his own process's memory space,
and which he can then read, write, and write()
as the driver scribbles information into it.
Other people consider this a less than entirely
satisfactory solution for various reasons.
There are three issues involved in this
issue:
'raw' context switch times, kernel context
switch times versus a purely user-mode context
switch, and the problems the Linux scheduler has
with lots of simultaneously runnable processes
and threads. There is also an advanced debate
here about cache effects that makes some
people argue that these differences are often
mostly moot.
Linux is asserted to have one of the fastest
context switch times going; I have seen reputable
linux-kernel people say that Linux switches its
processes faster than most systems can switch
their allegedly lightweight 'threads'. (I have
not done measurements myself, and it would be
somewhat challenging to level out other factors.
Would-be measurers could do worse than to
benchmark Linux against Solaris x86 and Windows
(possibly Windows NT) on the same machine.
Also, possibly Larry McVoy's lmbench micro
benchmarking suit already has suitable test
programs.)
In any N:M thread environment you may be
able to get away with a certain amount of
thread context switching that takes place
entirely at user level. Such context switches
can obviously take place very fast, faster
than one that involves the kernel if both are
coded as well. However, I don't know how much
of this actually happens in typical use of
Solaris's N:M threading, though. N:M threading
does have the advantage that it needs less
resources on the kernel side (eg, no need for
a kernel stack and context information for each
user-level thread), and may be
amenable to very clever implementations on
the user side to also reduce the resource
load.
Independant of those two issues is the Linux
kernel scheduler's problem with lots of processes
that are all runnable at the same time. This
definetly causes problems that do not appear in
an N:M threading model, and definetly causes
problems against a more 'efficient' scheduler.
This problem has the net effect of slowing down
context switch times when there are a lot of
processes. I don't know how much the slowdown
is when compared to systems with slower context
switching.
And finally: cache reload
effects. There is an argument that
the amount of time it takes to reload cold
CPU caches in a new thread (because each
thread is presumably working on different
data, and possibly in different code) will
completely dwarf the context switching overhead
itself, kernel or otherwise, for threads that
do real work (instead of benchmarking ping pong
context switching times).
There are some people who argue that this means
that huge amounts of threads are very bad: as the
amount of threads rises and as the amount of
context switching rises, you spend an ever
increasing amount of the CPU merely reloading
caches instead of getting any useful work done.
I am not entirely convinced that this is a
real effect. Presumably a well written threaded
program only uses threads when it needs to, to
isolate work on different things. In that case,
however the program is written (threaded vs state
machine or whatever) it is going to have to switch
back and forth between different chunks of data
with the associated cache switching; all that
threading should add is a small amount of extra
cache reloads for the thread-specific data that's
also needed (eg, different stacks for each
different thread). Unless the amount of data each
thread works with is very small, one would assume
that the extra thread data is small compared to
the data the program has to touch anyways.
Whether Linus is wrong about all this depends
in part on what sort of use he sees Linux mostly
being put to. There are tradeoffs involved: a
scheduler that is very good for 500 runnable
processes is likely to have a noticeable chunk
of unnecessary overhead for only 2 runnable
processes. If most of the environments you are
aiming at only have 2 runnable processes, well,
you may want to make the tradeoff in favour of
the smaller, simpler, faster for the common case
scheduler. I believe that
Linus has in the past been reluctant
to put in code that only helps 'big iron' while
hurting ordinary machines because most of the
machines that run Linux are ordinary machines.
(Also, 'big iron' people are competent enough
to make their own kernel variants if they want
to; look at SGI's efforts.)
As a number of people have already said, Linux
has kernel-supported threads (multiple threads of
execution in a single memory space, where one thread
performing blocking IO does not block other
threads). The Linux kernel does not have a special
implementation of 'threads' as distinct from
normal processes; instead threads are processes
that share (among other things) their virtual
memory mappings. Linux also has a nearly
complete and correct implementation of POSIX
threads, in glibc.
However there are a few things Linux does not
have in its thread model:
Full POSIX thread semantics. Various people
(including Linus) think that some of the pickier
requirements of POSIX threads are either
braindamaged or very hard to implement in the
kernel without performance hits or both.
Some of this is worked around by the POSIX
thread library that is part of glibc, but at a
performance cost.
A system for using many user-level threads
and fewer kernel-level threads. I don't know if
there are any obstacles preventing this from
being implemented entirely in user space, though.
The N:M model is apparently common in various
other Unix systems implementation of kernel
supported threads.
Efficient kernel scheduling for huge numbers
of
simultaneously runnable processes or threads.
The Linux scheduler does not cope well when you
have hundreds or thousands of threads all ready
to run.
This is a problem for the obvious
implementation of Java runtimes, as Java programs
often spawn very large numbers of threads for
various reasons. IBM, among others, have proposed
scheduler patches to deal with this; however,
all of the patches have hurt performance for the
usual/common case of only one or two runnable
processes on the system. This is getting to be
a hot issue for active several-way SMP machines
too (which may have a process or two per
processor, adding up to eight to sixteen runnable
at once),
so patches may get accepted at some point or
may start appearing in 'enterprise edition'
kernels from various distributions.
Also as people have said,
Linux does not need any special minimal 'thread'
kernel object: Linux process objects are
already as fast (or faster) and as lightweight as
thread objects in other systems. Only on some
old
legacy Unix systems do 'processes' necessarily
equate heavy-weight, slowly scheduled and heavily
resource-consuming objects. Both Linux and Plan 9
show that full-scale regular processes can also
be lightweight and fast, and easily be used to
represent threads too.
I have used dirdiff for this in the past. It has a good interface for dealing with two directory hierarchies, but I found its interface for the actual diffs to be a bit clunky (although it may have improved since I tried it). And it has a decent collection of features.
First off, this is a reasonably well studied (or at least written about) area of system administration. LISA has papers on this sort of thing fairly often, and everyone can get at most of them for free at Usenix's web site. It's well worth your time to look over LISA proceedings for the last five years or so.
A prefacing note: much of the advice here assumes a fully networked environment, where the machines are not isolated and can contact some central point routinely. Without this it will be very hard to handle updates, and you will probably need to think quite hard about the administrative structure: for example, who will create accounts for a local group of users? At this point user-friendly GUI tools may become an important consideration.
The basic thing to do is to automate as much as possible. You want a system where you never touch individual machines by hand; you touch a master machine or place, and machines update themselves. With 2,500 machines you also want this to be a pull model, not a push model; the machines pulling changes deals much better with machines being down than a central point pushing out updates. One concrete suggestion: make sure that your update-applying system can run arbitrary shell scripts or programs, not just install updated distribution packages; you will need to do this sooner or later.
In order to automate as much as possible, machines need to be as similar as possible. Where they're not similar, you need to build automated tools to detect the differences and deal with them. With 2,500 machines you probably can't keep machines 100% homogenous over their entire lifetime, so planning up front (and having the mechanisms in place in the initial rollout) will save you time later. Unfortunately, I'm not aware of any good tools to determine hardware configuration information in good, scriptable form for Linux (hopefully other people will).
I don't think that the choice of distribution will make a huge difference. You will have to customize whatever distribution you pick in some way, either by making new install media or by creating a master machine that is then cloned. You are almost certainly going to be getting intimately involved in the details of your chosen distribution's package management system; pick one which you're knowledgeable about and comfortable with.
If you go with creating a master machine that is then cloned, you should still automate as much of the creation of the master as possible. Among other good things, this will significantly help when it comes time to upgrade the base distribution, and it helps insure that upgrades are more easily automated (by simply supplying a new version of one of your customization packages). And it is good to know you can easily recreate the customized setup from scratch and the bare metal.
I will echo other people's comments on keeping user data off the 2500 machines. If you allow important user data to live on the local machines, you will have to back it up somehow, and it is a lot harder to back up 2500 machines than a few data servers that everyone talks to. If you have to back up data on each machine, strongly consider a push model where the workstations push the data to be backed up to a central server or servers for the actual backing up to tape. Also, if no user data lives on machines, returning a broken machine to service is a relatively trivial thing; you just drop in another generic clone, which should be a fast operation.
To distribute the load on your update and other servers, you probably want to cluster the workstations into groups (probably based on the network topology). Group servers pull updates and other things from the central server; the workstations pull updates from their local group server. This way all of the servers involved can be relatively modest, because none of them ever have to deal with large numbers of clients.
I personally don't like NIS for password distribution. Locally, we use something called track (available at ftp.cs.utoronto.ca in/pub) to have the clients pull new password files from servers on a regular basis, but one can use something like rsync or the like as well. People change their passwords by using a script that ssh's off to a central password server to run the real password command. Similar things can be done for other files that need to be distributed frequently.
In terms of security, you should first identify what the threats you're guarding against. Outside crackers call for very different precautions than untrusted employees. You'll want to take all the usual steps: limit setuid programs and running daemons, filter and screen what you still have to run, use encrypted connections for as much as possible, and so on. It's hard to give more specific advice without knowing more details.
I personally think that NFS is the best way to go provided that you can trust the workstations not to be subverted; it's the most solid, proven, and well-developed technology at the moment. If you cannot trust this, then you will need to look at alternatives: either something like Coda, or having central application servers that you can control and having the workstations only used to display things from them.
This has been said before in comments, but there really, truly is no way around actually sending email to see if an email address is truly valid. You cannot reliably tell invalid addresses in any other way; however, you may be able to quickly tell that some addresses are invalid with VRFY or RCPT TO:.
The major fly in the ointment for all SMTP level verifications is the presence of backup MX entries. The machine you are able to deliver email to may have nothing to do with actually delivering the email to the end user, and as such is going to be completely unable to know what addresses are good and bad there. This is very common with less than reliable network links or less than reliable mailer software.
There are also many mailers that will accept any user name in a RCPT TO: command, and only bounce invalid usernames later. Often this is done as a performance enhancement, so you only have to do the necessary and perhaps complex lookups once.
This might have been something that could have been done when Unix had few users. By now there is almost certainly too much water under the bridge to make it worthwhile to switch; the cost in pain and breakage would far exceed the gains.
Why's it so painful? Because switching the meaning of existing command-line switches means more than existing Unix users having to change their habits. It means that all existing uses of these commands have to change: from shell scripts to things embedded in Makefiles. That's a lot of work, especially since you pretty much have to look at everything, every script and Makefile, to make sure that it's still OK.
Worse yet, you're highly unlikely to get a majority of people to switch at the same time. This means that portable scripts, Makefiles, and users have to cope with having it both ways, which just increases the pain even more.
I think that the real question is: what do you want to achieve by labelling your software in some way? When you label your free software project as being in alpha, when mozilla.org labels Mozilla as being in alpha, what do they want to happen? Once you, or they, know that they can set the various code goals.
DANGER: if Internet users can cause bad things to happen to your site (databases getting out sync, etc) by submitting manual 'get' calls or the like, you need to do all that painful verification. Or sooner or later some joker who doesn't like you will come along and bring the house down.
Even if it's only an internal intranet application, you might want to do at least some verification just in case. People and programs can do all sorts of whacky things.
'Hiding' the backend in this way is a form of security and reliability through obscurity. Insert standard discussion of the relative merits and problems of that here, if desired.
One of the better summaries is Why Frames Suck (Most of the Time), one of Jakob Nielsen's Alertbox columns. He's revised his opinions a couple of times since the original (it was written in December of 1996), but still holds to them; check out his "Top Ten Mistakes" Revisited column, for example.
I strongly recommend his entire site, which is full of advice on various web design and usability issues. You may not agree with all of them (I'm not sure I agree with him about scrolling web pages), but I've found the issues he raises all worth thinking about.
I think that a big part of what confuses people is that so much functionality overlaps between the window manager people and the desktop environment people these days. The days when window managers did very little and what they did was obvious and easy to describe is long over.
For example, compare the set of modules that fvwm2 comes with with the set of functionality that KDE or Gnome provides. Fvwm2 has a pager to flip among virtual screens, something to keep lists of windows, and even something akin to the actual panel itself. (Possibly everything except the panel is actually done by kwm or Enlightenment, which sort of goes to prove that it's hard to tell.) My impression is that other window managers have about as much features (if not more) as fvwm2 does, and so come with as many add-on modules.
The standard help system format for GUI's is heading towards HTML, time to dump man. Even GNU dumped man years ago.
And GNU made a bad mistake in that: the typical info document is a crummy replacement for a decent manual page, much less a decent manual page in a good manual page reader such as TkMan.
Man pages are designed to be concise reference summaries. Info documents and most HTML documentation I've seen makes an excellent tutorial, and sometimes a good in-depth reference, but they almost invariably suck at being a concise reference for anything. When they don't, it's because the people writing that section followed the style of manual page writing, just in something besides roff -man macros.
And doing format conversion on texinfo, SGML, or HTML documentation isn't the answer. The formatting doesn't matter (TkMan and the xntp3 documentation demonstrate that), what matters is whether the semantic content is there in some extractable form. And for most documents, it's just not. And unfortunately most writers of documentation either don't realize this or don't care.
I'm pretty sure that POSIX compliance is a goal for the kernel, for GNU Libc, and for most if not all of the user-level utilities that POSIX specifies. Most of them are quite close, too. Non-POSIX-compliance tends, I believe, to only happen when the POSIX way is held to be vastly stupid.
| 1) man is standard for documentation only on Unix Systems | 1b) Linux != Unix, its merely Unixlike
Arguing that Linux isn't Unix is IMHO using a marketing definition of Unix that plays into the hands of foes of Unix, who would love to exploit the resulting market fragmentation.
Unix is many things, from a trademark to a culture and a way. In the ways that matter I maintain that Linux is Unix.
Although Linux (really, a Linux distribution) doesn't currently have the right to use the Unix trademark, and although the Linux kernel is not descended from a kernel written in Bell Labs, it does have the Unix culture and the Unix way. As a long-time Unix user, I can say that Linux and Linux distributions are Unix in all the ways that really matter in practice, from either a user's or a system administrator's perspective (and more like Unix than some, AIX being the popular target).
Linux is a Unix. It is no more strange, no more different, no more counterintuitive than any of the various other Unixes I've used. And it's a lot better than some of them.
I think that there are definite uses for KDE and GNOME, especially in student lab environments. In part, I think it comes down to ease of use for novices and casual users versus experienced, constant users.
From my casual look at the KDE and GNOME environments, a lot of the ornateness is there to create obvious things to manipulate. This in turn made it easy for me to start manipulating KDE and GNOME, and presumably helps novices to do so too. Certainly one reason we chose the Redhat 5.1 AnotherLevel environment as a starting point for our current workstations was its similarities to Windows, which we could assume was familiar to the users from elsewhere.
Whether I think it's a good idea or not, I'm pretty certain that there are a fair number of students here who consider the lab computers as just complex tools. They don't want to have to set up a carefully customized environment just to use them. They're willing to trade ease of use for clutter and flashyness.
I don't like the clutter of KDE and GNOME, but I'm not a typical user of our labs. I've spent years slowly tuning my (now) fvwm2-based minimalistic environment (originally twm based, then tvtwm, then I got tired of tvtwm's problems) until it works just as I want it to right now and only has the features and decorations I actually need in practice. Most users just aren't going to use the computers that intensely or care that much about it.
Possibly there is a great, non-chromed, minimalistic X11 environment that is still easy and obvious for novices and casual users to use. If there is, I would deeply love a pointer to it. Until then, it's likely that soon our lab workstations will run either KDE or GNOME, because our major target population is casual novices and either environment seems easy for them to use.
Because the XDM way of logging into systems is in a fundamental violation of the Unix way of initializing your environment, and it shows. XDM discards the entire concept of your login shell, and with it the entire concept that you can initialize your environment once and use it thereafter. (You should see the hacks that some vendors, like SGI, use to try to weasel their way around this failing (for SGI, scope out the userenv manpage sometime).)
You can kludge around this. But all the approaches are kludges, with all that implies: they are not general, and they are not necessarily supported in the latest fancy magic thing to come along. And you will go to extra work if you want to have things work seamlessly on non-XDM logins too. There is nothing with the simple and straightforward elegance of the login shell concept.
In my local experience, this is quite to be expected with a 2.2.* series kernel (used in a number of recent distributions, eg Redhat 6.0 onwards). 2.2.* seems to aggressively swap things out to make room for various sorts of disk cache; the same machines under 2.0.* didn't go into the swap until they actually mostly ran out of memory for real, non-cache/non-buffer pages.
This aggressiveness appears to be basically harmless; I haven't noticed any particular performance problems due to it.. Although it can be somewhat disconcerting to see a machine with lots of real memory and no programs applying memory pressure be 10 or 20 megabytes into swap.
Iconization is a fairly fundamental and common window manager thing. I don't know how easy it is to configure something like sawfish to support this mode of operation, but it is generally the default for window managers like fvwm, twm, or even mwm. I would not be surprised if many modern alternate window managers supported powerful ways of controlling icons and iconification behavior; I know fvwm does.
Unfortunately, the best I could manage with a current Wine version was a very limping QuickTime player with major speed problems. VMWare managed better but still could only play the smallest size of Quicktime trailers with decent results (larger ones had delayed video or breaking up audio). And this was on fairly good hardware.
Since the CodeWeavers plugin demonstrates that the basic Wine technology can do the job, presumably it is possible to do it by hand. Somehow. An energetic person could probably get themselves a certain amount of fame by writing up a HOWTO on this; there's a likely a fair amount of pent-up demand for good QuickTime playback on Linux.
Time to come up after a crash (either power failure or software) is only relevant if that happens to the filesystem very much. Certainly our filesystems on our file servers won't experience that more than once in a blue moon (we have UPSes and run stable kernels); I suspect more than a few people are in that sort of environment. If crashes and powerfails are at all common, you have larger problems than filesystem performance.
One can certainly argue that in most environments the performance drop (if any) is worth paying for the crash recovery (neglecting that several journalling filesystems have some uncomfortable crash recovery issues because they only journal metadata). But that is a very different argument from a claim that fast crash recovery makes performance differences irrelevant.
This general issue goes under the name of (network) path characterization, and is a reasonably active research area. Usefully you can get several programs that will do their best to characterize the bandwidth and other attributes of each step in your routing. Nothing runs as fast as traceroute, but the numbers are likely to be interesting.
The best starting point for available programs that I have available offhand is the home page for pchar, one of the programs that does this. As well as pchar, the page has a fairly large collection of links to other similar and related programs. (Various programs use somewhat different approachs and math, and operate somewhat differently, so you may want to use several to cross check the results.)
The moral equivalent of -turbo for X-based versions of Mozilla is the remote control of a running Mozilla offered by the -remote argument. (This is actually a feature inherited from Netscape, from a long time back.)
Typical usage is mozilla -noraise -remote 'openURL(about:blank,new-window)' (disclaimer: I use a standalone program to feed this to Mozilla, not Mozilla itself, so Mozilla's syntax for this may be slightly different). With some auxiliary programs and some shell scripting you can construct quite useful systems out of this; I can highlight a URL in an xterm (or anywhere), pick a menu entry from my root menu, and be browsing that URL in a new Mozilla window in moments.
Similar tricks can be played in Netscape. Netscape's documentation on this can be found here, along with the small standalone program to do the remote control. There are some differences between the documentation and current Netscapes (and Mozilla), but nothing too hard to figure out.
I am pretty sure that NeWS post-dates the public availability of X. It may predate X11, but X version 10 was out there quite early on and at the time Sun had its own windowing system (as did all vendors at the time). It appears that NeWS came out around 1987; X goes back to at least 1986 and I believe the original Project Athena work is probably from even earlier (and thus known, if not yet distributed).
The core causes of NeWS's failure are probably highly debateable, but I think a good part of it is that Sun wanted to make as much money from NeWS as possible, which meant they charged per-seat license costs early on. Given there was a free alternative, it's not hard to see how various vendors made the choices that they did.
Other failures were partly technical; for example, apparently NeWS did not have a fast, functional terminal emulator (ala xterm) for a quite long time. Since this is an important part of early use of any windowing system on X, most of the time, there were some knock-on effects in people's interest in using NeWS.
The major workstation vendors (DEC, HP, IBM, etc) funded various university projects to build large scale environments for workstations because it was pretty likely that they would get quite useful technology out of it. (Although in the end it didn't quite work out that way.) MIT's Project Athena is only one example; Project Andrew at CMU gave us AFS, for example. Direct rivalry with Sun's general products (NFS, for example) was not really a priority, although it was probably a factor.
The ice is thin, and the zero-copy patch is slightly misnamed. In this case 'zero-copy' is not 'zero copy from user space', but 'zero copy in kernel space'; if there is a user space to kernel space transition there is still one copy involved. Zero copy in kernel space eliminates all the copies in two cases: for kernel level servers (NFS, for example), or if the application is using sendfile().
Why not zero copy from user space too? Because it turns out to be quite expensive in inobvious ways, especially on SMP systems, because in order to do it right you must play games with the page permissions and associated things. On SMP systems, this apparently requires cross-CPU synchronization to insure that all CPUs pick up the correct new page permissions -- which, as you can imagine, gets expensive.
In theory, glibc already transparently supports this mechanism via something called versioned symbols. Versioned symbols basically let the linker say (when the executable is built) 'this requires not just strcpy(), but a GLibc 2.0 compatable strcpy()'. (You can see this relatively vividly by running 'nm' on a shared library like libdb; versioned symbols will have names like 'unlink@@GLIBC_2.0'.
The theory is that when the GLibc people change the interface for something they provide both the old routine and the new one with different version names; one will be 'unlink@@GLIBC_2.0' and the other will be 'unlink@@GLIBC_2.1' or the like. When an application is newly linked, it presumably normally links to the highest version; however, when an old binary is run it gets the old 'unlink@@GLIBC_2.0' that it expects.
This fails if either the GLibc people don't realize that an interface has changed, or the routine (or the interface detail) is one they consider an 'internal' one, subject to change without notice and backwards compatability because no user program is supposed to be using it (or depending on the routine's undocumented behavior staying the same). And of course they have to get the compatability code right. This also only works for executables already fully linked; if you relink from object files et al, you normally get the highest version of the symbol defined, even if you don't necessarily want that. (It appears that there is no way to control this behavior in ld, unfortunately.)
Direct IO and DMA to or from user space gets discussed periodically on the Linux kernel mailing list. The general commentary I've read is that this is almost never a win until you are dealing with huge amounts of memory. Since doing this with huge amounts of memory is fairly rare (especially doing this with huge amounts of memory the driver can't supply to the user, instead of the user supplying it to the driver), code for this stays out of the general kernel.
Why is it so expensive? Because to do it properly you must do page table manipulations (including TLB flushes et al) and these are surprisingly expensive, especially on multi-CPU machines (where you must insure all CPUs have done the page table and TLB updates before continuing). This makes it faster to do the copying for most if not almost all of the user-level write() and read() operations.
This is one of the less obvious wins for sendfile(), and why zero-copy TCP sendfile() shows such a significant win: with sendfile() you can avoid all those page table manipulations because the data never becomes visible at the user level.
I believe the Linus-approved solution for things like video drivers is that the driver should provide an address space that the user can mmap() into his own process's memory space, and which he can then read, write, and write() as the driver scribbles information into it. Other people consider this a less than entirely satisfactory solution for various reasons.
There are three issues involved in this issue: 'raw' context switch times, kernel context switch times versus a purely user-mode context switch, and the problems the Linux scheduler has with lots of simultaneously runnable processes and threads. There is also an advanced debate here about cache effects that makes some people argue that these differences are often mostly moot.
Linux is asserted to have one of the fastest context switch times going; I have seen reputable linux-kernel people say that Linux switches its processes faster than most systems can switch their allegedly lightweight 'threads'. (I have not done measurements myself, and it would be somewhat challenging to level out other factors. Would-be measurers could do worse than to benchmark Linux against Solaris x86 and Windows (possibly Windows NT) on the same machine. Also, possibly Larry McVoy's lmbench micro benchmarking suit already has suitable test programs.)
In any N:M thread environment you may be able to get away with a certain amount of thread context switching that takes place entirely at user level. Such context switches can obviously take place very fast, faster than one that involves the kernel if both are coded as well. However, I don't know how much of this actually happens in typical use of Solaris's N:M threading, though. N:M threading does have the advantage that it needs less resources on the kernel side (eg, no need for a kernel stack and context information for each user-level thread), and may be amenable to very clever implementations on the user side to also reduce the resource load.
Independant of those two issues is the Linux kernel scheduler's problem with lots of processes that are all runnable at the same time. This definetly causes problems that do not appear in an N:M threading model, and definetly causes problems against a more 'efficient' scheduler. This problem has the net effect of slowing down context switch times when there are a lot of processes. I don't know how much the slowdown is when compared to systems with slower context switching.
And finally: cache reload effects. There is an argument that the amount of time it takes to reload cold CPU caches in a new thread (because each thread is presumably working on different data, and possibly in different code) will completely dwarf the context switching overhead itself, kernel or otherwise, for threads that do real work (instead of benchmarking ping pong context switching times). There are some people who argue that this means that huge amounts of threads are very bad: as the amount of threads rises and as the amount of context switching rises, you spend an ever increasing amount of the CPU merely reloading caches instead of getting any useful work done.
I am not entirely convinced that this is a real effect. Presumably a well written threaded program only uses threads when it needs to, to isolate work on different things. In that case, however the program is written (threaded vs state machine or whatever) it is going to have to switch back and forth between different chunks of data with the associated cache switching; all that threading should add is a small amount of extra cache reloads for the thread-specific data that's also needed (eg, different stacks for each different thread). Unless the amount of data each thread works with is very small, one would assume that the extra thread data is small compared to the data the program has to touch anyways.
Whether Linus is wrong about all this depends in part on what sort of use he sees Linux mostly being put to. There are tradeoffs involved: a scheduler that is very good for 500 runnable processes is likely to have a noticeable chunk of unnecessary overhead for only 2 runnable processes. If most of the environments you are aiming at only have 2 runnable processes, well, you may want to make the tradeoff in favour of the smaller, simpler, faster for the common case scheduler. I believe that Linus has in the past been reluctant to put in code that only helps 'big iron' while hurting ordinary machines because most of the machines that run Linux are ordinary machines. (Also, 'big iron' people are competent enough to make their own kernel variants if they want to; look at SGI's efforts.)
As a number of people have already said, Linux has kernel-supported threads (multiple threads of execution in a single memory space, where one thread performing blocking IO does not block other threads). The Linux kernel does not have a special implementation of 'threads' as distinct from normal processes; instead threads are processes that share (among other things) their virtual memory mappings. Linux also has a nearly complete and correct implementation of POSIX threads, in glibc.
However there are a few things Linux does not have in its thread model:
This is a problem for the obvious implementation of Java runtimes, as Java programs often spawn very large numbers of threads for various reasons. IBM, among others, have proposed scheduler patches to deal with this; however, all of the patches have hurt performance for the usual/common case of only one or two runnable processes on the system. This is getting to be a hot issue for active several-way SMP machines too (which may have a process or two per processor, adding up to eight to sixteen runnable at once), so patches may get accepted at some point or may start appearing in 'enterprise edition' kernels from various distributions.
Also as people have said, Linux does not need any special minimal 'thread' kernel object: Linux process objects are already as fast (or faster) and as lightweight as thread objects in other systems. Only on some old legacy Unix systems do 'processes' necessarily equate heavy-weight, slowly scheduled and heavily resource-consuming objects. Both Linux and Plan 9 show that full-scale regular processes can also be lightweight and fast, and easily be used to represent threads too.
I have used dirdiff for this in the past. It has a good interface for dealing with two directory hierarchies, but I found its interface for the actual diffs to be a bit clunky (although it may have improved since I tried it). And it has a decent collection of features.
First off, this is a reasonably well studied (or at least written about) area of system administration. LISA has papers on this sort of thing fairly often, and everyone can get at most of them for free at Usenix's web site. It's well worth your time to look over LISA proceedings for the last five years or so.
A prefacing note: much of the advice here assumes a fully networked environment, where the machines are not isolated and can contact some central point routinely. Without this it will be very hard to handle updates, and you will probably need to think quite hard about the administrative structure: for example, who will create accounts for a local group of users? At this point user-friendly GUI tools may become an important consideration.
The basic thing to do is to automate as much as possible. You want a system where you never touch individual machines by hand; you touch a master machine or place, and machines update themselves. With 2,500 machines you also want this to be a pull model, not a push model; the machines pulling changes deals much better with machines being down than a central point pushing out updates. One concrete suggestion: make sure that your update-applying system can run arbitrary shell scripts or programs, not just install updated distribution packages; you will need to do this sooner or later.
In order to automate as much as possible, machines need to be as similar as possible. Where they're not similar, you need to build automated tools to detect the differences and deal with them. With 2,500 machines you probably can't keep machines 100% homogenous over their entire lifetime, so planning up front (and having the mechanisms in place in the initial rollout) will save you time later. Unfortunately, I'm not aware of any good tools to determine hardware configuration information in good, scriptable form for Linux (hopefully other people will).
I don't think that the choice of distribution will make a huge difference. You will have to customize whatever distribution you pick in some way, either by making new install media or by creating a master machine that is then cloned. You are almost certainly going to be getting intimately involved in the details of your chosen distribution's package management system; pick one which you're knowledgeable about and comfortable with.
If you go with creating a master machine that is then cloned, you should still automate as much of the creation of the master as possible. Among other good things, this will significantly help when it comes time to upgrade the base distribution, and it helps insure that upgrades are more easily automated (by simply supplying a new version of one of your customization packages). And it is good to know you can easily recreate the customized setup from scratch and the bare metal.
I will echo other people's comments on keeping user data off the 2500 machines. If you allow important user data to live on the local machines, you will have to back it up somehow, and it is a lot harder to back up 2500 machines than a few data servers that everyone talks to. If you have to back up data on each machine, strongly consider a push model where the workstations push the data to be backed up to a central server or servers for the actual backing up to tape. Also, if no user data lives on machines, returning a broken machine to service is a relatively trivial thing; you just drop in another generic clone, which should be a fast operation.
To distribute the load on your update and other servers, you probably want to cluster the workstations into groups (probably based on the network topology). Group servers pull updates and other things from the central server; the workstations pull updates from their local group server. This way all of the servers involved can be relatively modest, because none of them ever have to deal with large numbers of clients.
I personally don't like NIS for password distribution. Locally, we use something called track (available at ftp.cs.utoronto.ca in /pub) to have the clients pull new password files from servers on a regular basis, but one can use something like rsync or the like as well. People change their passwords by using a script that ssh's off to a central password server to run the real password command. Similar things can be done for other files that need to be distributed frequently.
In terms of security, you should first identify what the threats you're guarding against. Outside crackers call for very different precautions than untrusted employees. You'll want to take all the usual steps: limit setuid programs and running daemons, filter and screen what you still have to run, use encrypted connections for as much as possible, and so on. It's hard to give more specific advice without knowing more details.
I personally think that NFS is the best way to go provided that you can trust the workstations not to be subverted; it's the most solid, proven, and well-developed technology at the moment. If you cannot trust this, then you will need to look at alternatives: either something like Coda, or having central application servers that you can control and having the workstations only used to display things from them.
This has been said before in comments, but there really, truly is no way around actually sending email to see if an email address is truly valid. You cannot reliably tell invalid addresses in any other way; however, you may be able to quickly tell that some addresses are invalid with VRFY or RCPT TO:.
The major fly in the ointment for all SMTP level verifications is the presence of backup MX entries. The machine you are able to deliver email to may have nothing to do with actually delivering the email to the end user, and as such is going to be completely unable to know what addresses are good and bad there. This is very common with less than reliable network links or less than reliable mailer software.
There are also many mailers that will accept any user name in a RCPT TO: command, and only bounce invalid usernames later. Often this is done as a performance enhancement, so you only have to do the necessary and perhaps complex lookups once.
This might have been something that could have been done when Unix had few users. By now there is almost certainly too much water under the bridge to make it worthwhile to switch; the cost in pain and breakage would far exceed the gains.
Why's it so painful? Because switching the meaning of existing command-line switches means more than existing Unix users having to change their habits. It means that all existing uses of these commands have to change: from shell scripts to things embedded in Makefiles. That's a lot of work, especially since you pretty much have to look at everything, every script and Makefile, to make sure that it's still OK.
Worse yet, you're highly unlikely to get a majority of people to switch at the same time. This means that portable scripts, Makefiles, and users have to cope with having it both ways, which just increases the pain even more.
I think that the real question is: what do you want to achieve by labelling your software in some way? When you label your free software project as being in alpha, when mozilla.org labels Mozilla as being in alpha, what do they want to happen? Once you, or they, know that they can set the various code goals.
DANGER: if Internet users can cause bad things to happen to your site (databases getting out sync, etc) by submitting manual 'get' calls or the like, you need to do all that painful verification. Or sooner or later some joker who doesn't like you will come along and bring the house down.
Even if it's only an internal intranet application, you might want to do at least some verification just in case. People and programs can do all sorts of whacky things.
'Hiding' the backend in this way is a form of security and reliability through obscurity. Insert standard discussion of the relative merits and problems of that here, if desired.
One of the better summaries is Why Frames Suck (Most of the Time), one of Jakob Nielsen's Alertbox columns. He's revised his opinions a couple of times since the original (it was written in December of 1996), but still holds to them; check out his "Top Ten Mistakes" Revisited column, for example.
I strongly recommend his entire site, which is full of advice on various web design and usability issues. You may not agree with all of them (I'm not sure I agree with him about scrolling web pages), but I've found the issues he raises all worth thinking about.
I think that a big part of what confuses people is that so much functionality overlaps between the window manager people and the desktop environment people these days. The days when window managers did very little and what they did was obvious and easy to describe is long over.
For example, compare the set of modules that fvwm2 comes with with the set of functionality that KDE or Gnome provides. Fvwm2 has a pager to flip among virtual screens, something to keep lists of windows, and even something akin to the actual panel itself. (Possibly everything except the panel is actually done by kwm or Enlightenment, which sort of goes to prove that it's hard to tell.) My impression is that other window managers have about as much features (if not more) as fvwm2 does, and so come with as many add-on modules.
And GNU made a bad mistake in that: the typical info document is a crummy replacement for a decent manual page, much less a decent manual page in a good manual page reader such as TkMan.
Man pages are designed to be concise reference summaries. Info documents and most HTML documentation I've seen makes an excellent tutorial, and sometimes a good in-depth reference, but they almost invariably suck at being a concise reference for anything. When they don't, it's because the people writing that section followed the style of manual page writing, just in something besides roff -man macros.
And doing format conversion on texinfo, SGML, or HTML documentation isn't the answer. The formatting doesn't matter (TkMan and the xntp3 documentation demonstrate that), what matters is whether the semantic content is there in some extractable form. And for most documents, it's just not. And unfortunately most writers of documentation either don't realize this or don't care.
| Gee Linux is not posix compliant, [...]
I'm pretty sure that POSIX compliance is a goal for the kernel, for GNU Libc, and for most if not all of the user-level utilities that POSIX specifies. Most of them are quite close, too. Non-POSIX-compliance tends, I believe, to only happen when the POSIX way is held to be vastly stupid.
| 1) man is standard for documentation only on Unix Systems
| 1b) Linux != Unix, its merely Unixlike
Arguing that Linux isn't Unix is IMHO using a marketing definition of Unix that plays into the hands of foes of Unix, who would love to exploit the resulting market fragmentation.
Unix is many things, from a trademark to a culture and a way. In the ways that matter I maintain that Linux is Unix.
Although Linux (really, a Linux distribution) doesn't currently have the right to use the Unix trademark, and although the Linux kernel is not descended from a kernel written in Bell Labs, it does have the Unix culture and the Unix way. As a long-time Unix user, I can say that Linux and Linux distributions are Unix in all the ways that really matter in practice, from either a user's or a system administrator's perspective (and more like Unix than some, AIX being the popular target).
Linux is a Unix. It is no more strange, no more different, no more counterintuitive than any of the various other Unixes I've used. And it's a lot better than some of them.
I think that there are definite uses for KDE and GNOME, especially in student lab environments. In part, I think it comes down to ease of use for novices and casual users versus experienced, constant users.
From my casual look at the KDE and GNOME environments, a lot of the ornateness is there to create obvious things to manipulate. This in turn made it easy for me to start manipulating KDE and GNOME, and presumably helps novices to do so too. Certainly one reason we chose the Redhat 5.1 AnotherLevel environment as a starting point for our current workstations was its similarities to Windows, which we could assume was familiar to the users from elsewhere.
Whether I think it's a good idea or not, I'm pretty certain that there are a fair number of students here who consider the lab computers as just complex tools. They don't want to have to set up a carefully customized environment just to use them. They're willing to trade ease of use for clutter and flashyness.
I don't like the clutter of KDE and GNOME, but I'm not a typical user of our labs. I've spent years slowly tuning my (now) fvwm2-based minimalistic environment (originally twm based, then tvtwm, then I got tired of tvtwm's problems) until it works just as I want it to right now and only has the features and decorations I actually need in practice. Most users just aren't going to use the computers that intensely or care that much about it.
Possibly there is a great, non-chromed, minimalistic X11 environment that is still easy and obvious for novices and casual users to use. If there is, I would deeply love a pointer to it. Until then, it's likely that soon our lab workstations will run either KDE or GNOME, because our major target population is casual novices and either environment seems easy for them to use.
| I wonder why people still use the console too.
Because the XDM way of logging into systems is in a fundamental violation of the Unix way of initializing your environment, and it shows. XDM discards the entire concept of your login shell, and with it the entire concept that you can initialize your environment once and use it thereafter. (You should see the hacks that some vendors, like SGI, use to try to weasel their way around this failing (for SGI, scope out the userenv manpage sometime).)
You can kludge around this. But all the approaches are kludges, with all that implies: they are not general, and they are not necessarily supported in the latest fancy magic thing to come along. And you will go to extra work if you want to have things work seamlessly on non-XDM logins too. There is nothing with the simple and straightforward elegance of the login shell concept.
In my local experience, this is quite to be
expected with a 2.2.* series kernel (used in
a number of recent distributions, eg Redhat 6.0
onwards). 2.2.* seems to aggressively swap things
out to make room for various sorts of disk cache;
the same machines under 2.0.* didn't go into the
swap until they actually mostly ran out of memory
for real, non-cache/non-buffer pages.
This aggressiveness appears to be basically
harmless; I haven't noticed any particular
performance problems due to it..
Although it can be somewhat disconcerting to
see a
machine with lots of real memory and no programs
applying memory pressure be 10 or 20 megabytes
into swap.