How can you run UNIX for ~150,000 users?
OldBen asks:
"How do you handle email (or other services for that matter)
for hundreds of thousands of users? Obviously, you can't
just use a standard UNIX password file (or can you)?"
Actually, you COULD use a password file...but that would be
painful. There are ways to make unix use a database as a
replacement for the password file. However I'm sure there
are other ways of doing this. Are there any methods that would
allow you to split user resources over a pool of boxes
rather than just one huge monster box?
What say you all?
What say you all?
I've yet to set anything like that up on my linux box (still learning), but one thing I'd probably do is split things between multiple servers. With that many users you'll have to deal with people wasting half an hour just trying to find a username that's not already in use. Let's say you're setting up a system for a big university (as an example), I'd probably set up a math.college.edu, english.college.edu, cs.college.edu, business.college.edu, etc. to both spread the load and to simplify getting in touch with someone. I know that a similar system allowed me to easily track down people's email addresses in school. When my CS TA had a common name, I just checked for her name on the CS computers and there was luckily only one of her in that department.
Don't know if that helps, but that's probably where I'd start. If it's not a good system I'd probably find out soon enough.
Posted by shuvam:
We are creating a system which we call MANUS
(Manager for Network of Unix-like Systems)
which should help manage such situations. The
basic issue of 16-bit UID's won't be addressed,
but at least MANUS will offer you a unified
database of users across a network of servers.
Using MANUS, you don't create a user on a Unix
server. You create a user for a network of
servers. And then you identify a location as
the home location. The list of locations and
users is kept in an RDBMS. Against each location
is defined a set of SMTP server, POP3 server,
LDAP server, NNTP server, Web proxy server, and
so on. All email meant for that user automatically
gets redirected to the POP3 server of that
location. Thus, the sysadm doesn't have to
juggle aliases files, or separate passwd files,
and so on. Moreover, all users get emailaddrs
of the form user@yourcompany.com, irrespective
of the location or server on which his mailbox
resides.
This system is currently handling a network of
four locations with about 1300 email addresses.
It is written in Perl5. We hope to release it
under GPL "soon". There is no documentation
available for it on the Net yet.
You'll run into a problem with UIDs on most unices (including Linux). On most unices the UID field is 16 bits wide. This means about 64k users. However, it might be signed so it would be 32k (can't remember off-hand). There was a discussion about this in linux-kernel a month or so ago. 2.3 will supposedly feature much larger UIDs. However, this doesn't help you. Look into something like AIX (I think they have a larger UID) or Digital Unix (they might also). You could probably then utilize something like NIS+ or similar for password distribution. Or Kerberos (that's what BU uses).
Here we have around 30,000 email addresses and approximatly 70 sites nationwide, each with a Sun Ultra I and a Enterprise 10,000 in New York. The LDAP database runs on the enterprise server, and we use the Netscape Mail servers. For the most part,there have been very few problems. I imagine this would easily scale up to 150k+ users.
I by NO means am an LDAP guru. But here's my basic understanding of how it works. You have one server on a site. blahsv1.sitename.com. Everyone has me@sitename.com as their email address. LDAP gets the incoming mail and knows that it goes to blahsv1.sitename.com. So really, there is no reason that 150k users would be an issue.
Now...if you needed real logins on machines, we may have an issue...
The NSS modules have been 'rpm'ed and the system designed around PAM. - although I'm not sure the neccesity of this for basic authentication - It think it's to do with password modification.
There has been a little discussion on the openldap and rage.net lists.
I hope to have a go testing some of this soon....(albiet on a v.small scale!).
There is an RFC specifically on dealing with Password (among many other things) in LDAP. and utils to convert passwd files to LDAP.
- And this is what I do for fun??
Hong Kong Linux Center home of squidblock, and other cool stuff
You might be able to use LDAP, along with a GDBM database for your authentication. I know there are client patches/modules for Apache, QPopper and other programs that need authentication. This way, you don't use the password file for your email services. The drawback is, users won't be able to telnet in (unless you can patch ``login'' to use LDAP as well).
You'll need to apply a patch to Ext2fs in order to get 32-bit UID/GID's on Linux. With the patch, the extra UID/GID bits are stored in unused/reserved parts of the inode. A patch like I describe was posted recently on linux-kernel.
After applying the patch, you may need to recompile glibc in order for chown, chgrp, etc to work with the bigger UID's. Not sure. Contact the author for details. At any rate, 32-bit UID's will be "official" in 2.3 and 2.4 kernels.
Um, doesn't redhat support the development for PAM in linux? Doesn't pam allow you to have a SQL based password structure with one of the modules?
:)
Just a fleeting thought
-dieman
-- dieman - Scott Dier
[root@defiant /etc]# uname -rs /etc]# useradd -u 150000 biguid /etc]# grep biguid /etc/passwd
Linux 2.0.36
[root@defiant
[root@defiant
biguid:!!:150000:150000::/home/biguid:/bin/bash
I don't think it has any restrictions on anything, and it's easier to setup. It's also very reliable. My network has been running for over a month now.
Gosh, no restrictions on anything? That's amazing. Service Pack 4 must have been a real humdinger, to change it from all those limitations Microsoft teaches in their official course materials.
If you're going to lie, at least be creative enough to make it plausible, dumbass.
On many of the modern Unix variants, /etc/passwd is only a textual representation of a database file which holds the real user information. /home" on a SunOS system with like a thousand registered users was an invitation to get ahold of some (some!) coffee.
getpw*(3) uses this database file to access passwd data. This makes things way faster than it used to be, for example, on SunOS4, where ls(1) was written so stupidly that it scanned the (sequential) passwd file for every single uid lookup it needed to make. Typing "ls -l
Speaking of today, FreeBSD uses a DB database to store passwd information (in fact, it has two databases, one with and one without passwords, for "security"). This speeds up lookups quite a lot, but beware: The DB files are still generated text files, so adding users with such huge user databases is a real pain.
The question is whether you actually want to create that many Unix user accounts. For mail servers, you can often get away better with creating mail accounts only. This requires some hackery with your friendly MTA (postfix, qmail, sendmail, exim or even smail), but it is quite doable and also has positive security side-effects.
Look into Cyrus imapd you need message store implementation which is able to handle mailboxes for users who don't have a unix login. Beware, Cyrus comes with a pretty tcl-based administration interface which you almost certainly want to replace by a bunch of home-grown perl scripts to automate administration.
On many of the modern Unix variants, /etc/passwd is only a textual representation of a database file which holds the real user information. The getpw*(3) routines use this database file to access passwd data. This makes things way faster than they used to be, for example, on SunOS4, where ls(1) was written so stupidly that it scanned the (sequential) passwd file for every single uid lookup it needed to make. Type "ls -l /home" on a SunOS system with like a thousand registered users, sit back and relax.
Speaking of today: FreeBSD, for example, uses a Berkeley DB database to store passwd information. In fact, it uses two databases, one with and one without passwords, for "security". This speeds up lookups quite a lot, but beware: The DB files are still generated from text files, so adding users with huge user databases is a lengthy process.
The question is whether you actually want to create that many Unix user accounts. For mail servers, you can often get away better with creating mail accounts only. This requires some hackery with your friendly MTA (postfix or qmail), but it is quite doable and also has positive security side-effects.
Look into Cyrus imapd if you need a message store implementation which is able to handle mailboxes for users who don't have a unix login. Beware, Cyrus comes with a ugly^H^H^H^Hpretty tcl-based administration interface which you can replace by a bunch of home-grown perl scripts to automate administration. Cyrus makes it fairly easy to integrate your own authentication mechanisms through a seperate process, although the performance of such a mechanism would have to be determined.
In a nutshell: Unix in itself is not prepared to handle very large user populations. If you need to serve a lot of users with shell accounts, look into NIS+ or Kerberos and distribute the load onto a bunch of machines served by central (and well-hardened) user-database-servers. If you need to support only mail, you might be well off with one fast machine and a special purpose mailer configuration.
leprechaun 339_> uname -rs /usr/include/sys/types.h /* user id */
FreeBSD 4.0-CURRENT
leprechaun 340_> grep uid_t
typedef u_int32_t uid_t;
HP-UX supports large uids, clustering, passwd db's.
Linux is free. yah takes your choice and write the check.
Peter.