How can you run UNIX for ~150,000 users?
OldBen asks:
"How do you handle email (or other services for that matter)
for hundreds of thousands of users? Obviously, you can't
just use a standard UNIX password file (or can you)?"
Actually, you COULD use a password file...but that would be
painful. There are ways to make unix use a database as a
replacement for the password file. However I'm sure there
are other ways of doing this. Are there any methods that would
allow you to split user resources over a pool of boxes
rather than just one huge monster box?
What say you all?
What say you all?
Look into using hesiod (I think that was the name)
We used it at school for login services along with Kerberos for authentication in an AFS environment.
There were about 30,000 users there.
Your best bet is to contact the IS department of MIT and Carnegie Mellon University. Both have very reasonable (and very similar) ways of dealing with large numbers of users. Both use a combination of AFS/Kerberos/Moira. MIT's Athena has over 15,000 people right now, and easily scales much higher. CMU's is supposed to be even better (MIT developed Athena, and scaled back funding to IS; CMU is still developing stuff, so they're probably more up-to-date).
You should also read through USENIX LISA proceedings; they deal precisely with this problem.
- pmitros at nospam mit.edu
I've admin'd a system with between 5 and 8 thousand users logged into shell accounts at a time, with right around 80k users total. This was a 24-processor e10k domain and didn't seem to have any troubles with it.
So long as you have the hardware to handle stuff, Solaris shouldn't have any trouble on the software side.
OpenBSD runs ~13142 shell account here. It runs stable for us and nobody break in, yet 8-D ... ;)
OpenBSD project depends on donations, hardware, cd sales. The project developers can expect some money and hardware from my company in the next couple of years. I bought a cd for myself even
OpenBSD Daily Changelog. Browsing source on the Web and The OpenBSD FAQ
or try NetBSD..For more information, read NetBSD-current changes. Mailing Lists
One possibility may be to have your user accounts accross multiple servers and your front-end mail transfer agent just hold aliases for each user. Then when mail comes in for (say) Joe.Bloggs@your.domain it would be forwarded to bloggsj@machine10.your.domain. This could be fun to administer though.
The uni I went to allowed all of its users a UNIX login account to read their mail with. I don't know whether or not it had 150,000 users but it must have had quite a few thousand. Incoming telnet sessions were load balanced onto a few machines to handle the load. Presumably the mail spool and home directories were made available over something like NFS.
Does Linux use a database for passwd entries rahter than the plain text file? Could something be done with the pluggable authentication modules?
- Richard.
Well, you will need to set up a pool of machines, based on the number N of people who will be using the services simultaneously...
:)
this works fine for logging in, and you can use NFS, autofs, or (if you have money) AFS for files... unforunately, the IMAP/POP protocol doesn't have a "discover-which-server-I-should-contact" feature built into it.
Regarding Linux and UIDs, I am working on 32-bit UIDs for Linux, check out my page at
http://www-personal.umich.edu/~wingc/uid32
I have a working patch for 2.0.35 (that will be updated to 2.0.36 soon) which does indeed let you store 32-bit uids on an ext2 file system, and use >65536 accounts. It doesn't require that you recompile any programs (provided you are using glibc).
Basically what needs some work are the user-mode utilities in Linux; not everything can handle a 110,000 user password file quite right yet
Oh by the way, I wouldn't run a production system with these patches yet... they haven't crashed my machine but they really are "in development"...
-chris wing
wingc@umich.edu
We had a similar desire in a company I contracted for. We wanted to support many email accounts for smart phones.
The registration of users was done in and informix database, rather than duplicating the information into password files we changed the configuration on sendmail and the code on qpopper to authenticate against the database. This also allowed us to run virtual domains without worrying about unique UID/ usernames between them. For Sendmail we wrote a new "delivery agent".
I am sure this approach could be adopted to use MySQL or postgres.
you(original poster) may email me if you would like more details. mossc@mnsinc.xxx.nospam.com
(remove xs and nospam) for real address.
cmoss
I don't think it has any restrictions on anything, and it's easier to setup. It's also very reliable. My network has been running for over a month now.
well, 1st thing that comes to mind is the number of zeros that you have to spend in 150000 exchange lincense fees.
;)
like $10000000000 ??
and, win nt DOES have a limit with the number of users in a stand alone server or a domain.
M$ recommends 40,000 users top for domain. and the recommended hardware is p166 256MB ram (theres a formula that i can't remember right now... eheheh). and yes, this "recommendation" is... hmmmm.... not the machine that can handle it
and.. exchange supports "sites" that can even share the address space (great to admins - only one MX can forward email to all the other machines).
anyways i don't think that you can overcome the 40,000 limit (why do you think hotmail.com is _still_ running freebsd + solaris + apache + qmail?)
RaTao von J
This authentication protocol you can store in a text /bin/login or whatever to verify against it /etc/passwd..not sure about mail or anything
file and build a databse out of it and there a modules
for
instead of
though but it wouldnt be hard to hack into it.
I recall some info on NT that has to do with this problem.... Apparently, there are MAJOR limitations to the ammount of information you are able to store in the NT registry. For one, there is a limitation to the maximum file size for the registry file itself, along with several other MS guidelines such as the maximum ammount of data contained in either the tree itself or it's children. I'm sure having 100,000 users (or so) would exceed all of these. =) Oh, and as for NT being reliable enough to handle the job, you better not put it on the 'net. It's got more security holes than swiss cheese.
The only place I think you'd need to run a server with 150,000 users offhand is an ISP. And being an admin for an ISP that has just over 150,000 users I must say that if you're thinking of running a linux box for this you're insane.
We have a cluster of 8 Alpha 4100s, and they just handle our email side of things... They end up handling near a million pop connections, deliver around 800,000 messages through smtp per day. And no matter what database/password authentication style you use, it can get pretty ugly authenticating people for all those pop logins. Don't flame me when I tell you that the cluster is running VMS. It handles everything quite nicely.
We do use linux for some of our smaller boxes, but I personally know of no reliable configuration of linux which could even come close to handling the kind of load 150,000 users require. Sure, you could go with a linux cluster, but do you really think that they are reliable enough to handle a production environment like ours? Maybe in another three or four years... but not now.
Perhaps for different applications Linux could handle it, but in the ISP business, it just couldn't cut it IMHO.
SUN has a book on tuning sun and they mention having 150,000 users, and even 300,000 users accessing a database. I am sure that sun can handle this many users, and also deal with using multiple sun boxes. Here is why:
HOTMAIL is probably one of the largest email systems (other than aol). Both aol and hotmail use unix as there server. aol uses there own modified unix, and I am told some Linux boxes (don't quote me on that). Hotmail tried to convert there servers to NT and it failed, as NT could not handle the 10,000,000 users ( I think that is the right number of zeros).
an 8 character userid field gives you 99,999,999+ which is well over your 150,000, along with a 16 char userid gives you 9,999,999,999,999,999+ variations, I think you woudl need a database to do this and Oracle and Sun work pretty well with each other from what I am told ( I have seen them in action and they performed really quite well when tuned properly).
So to answer your question I think I'd refer you to a SUN tech/customer support and they may be able to tell you how, as I am sure they have done that volume before.
If you buy sun you get pretty good tech support, but Oracle lacks in that area, you must really pay for support if you need help with Oracle, and that is not just on Linux, but any platform..
I've yet to set anything like that up on my linux box (still learning), but one thing I'd probably do is split things between multiple servers. With that many users you'll have to deal with people wasting half an hour just trying to find a username that's not already in use. Let's say you're setting up a system for a big university (as an example), I'd probably set up a math.college.edu, english.college.edu, cs.college.edu, business.college.edu, etc. to both spread the load and to simplify getting in touch with someone. I know that a similar system allowed me to easily track down people's email addresses in school. When my CS TA had a common name, I just checked for her name on the CS computers and there was luckily only one of her in that department.
Don't know if that helps, but that's probably where I'd start. If it's not a good system I'd probably find out soon enough.
Posted by shuvam:
We are creating a system which we call MANUS
(Manager for Network of Unix-like Systems)
which should help manage such situations. The
basic issue of 16-bit UID's won't be addressed,
but at least MANUS will offer you a unified
database of users across a network of servers.
Using MANUS, you don't create a user on a Unix
server. You create a user for a network of
servers. And then you identify a location as
the home location. The list of locations and
users is kept in an RDBMS. Against each location
is defined a set of SMTP server, POP3 server,
LDAP server, NNTP server, Web proxy server, and
so on. All email meant for that user automatically
gets redirected to the POP3 server of that
location. Thus, the sysadm doesn't have to
juggle aliases files, or separate passwd files,
and so on. Moreover, all users get emailaddrs
of the form user@yourcompany.com, irrespective
of the location or server on which his mailbox
resides.
This system is currently handling a network of
four locations with about 1300 email addresses.
It is written in Perl5. We hope to release it
under GPL "soon". There is no documentation
available for it on the Net yet.
You'll run into a problem with UIDs on most unices (including Linux). On most unices the UID field is 16 bits wide. This means about 64k users. However, it might be signed so it would be 32k (can't remember off-hand). There was a discussion about this in linux-kernel a month or so ago. 2.3 will supposedly feature much larger UIDs. However, this doesn't help you. Look into something like AIX (I think they have a larger UID) or Digital Unix (they might also). You could probably then utilize something like NIS+ or similar for password distribution. Or Kerberos (that's what BU uses).
Here we have around 30,000 email addresses and approximatly 70 sites nationwide, each with a Sun Ultra I and a Enterprise 10,000 in New York. The LDAP database runs on the enterprise server, and we use the Netscape Mail servers. For the most part,there have been very few problems. I imagine this would easily scale up to 150k+ users.
I by NO means am an LDAP guru. But here's my basic understanding of how it works. You have one server on a site. blahsv1.sitename.com. Everyone has me@sitename.com as their email address. LDAP gets the incoming mail and knows that it goes to blahsv1.sitename.com. So really, there is no reason that 150k users would be an issue.
Now...if you needed real logins on machines, we may have an issue...
The NSS modules have been 'rpm'ed and the system designed around PAM. - although I'm not sure the neccesity of this for basic authentication - It think it's to do with password modification.
There has been a little discussion on the openldap and rage.net lists.
I hope to have a go testing some of this soon....(albiet on a v.small scale!).
There is an RFC specifically on dealing with Password (among many other things) in LDAP. and utils to convert passwd files to LDAP.
- And this is what I do for fun??
Hong Kong Linux Center home of squidblock, and other cool stuff
You might be able to use LDAP, along with a GDBM database for your authentication. I know there are client patches/modules for Apache, QPopper and other programs that need authentication. This way, you don't use the password file for your email services. The drawback is, users won't be able to telnet in (unless you can patch ``login'' to use LDAP as well).
You'll need to apply a patch to Ext2fs in order to get 32-bit UID/GID's on Linux. With the patch, the extra UID/GID bits are stored in unused/reserved parts of the inode. A patch like I describe was posted recently on linux-kernel.
After applying the patch, you may need to recompile glibc in order for chown, chgrp, etc to work with the bigger UID's. Not sure. Contact the author for details. At any rate, 32-bit UID's will be "official" in 2.3 and 2.4 kernels.
Um, doesn't redhat support the development for PAM in linux? Doesn't pam allow you to have a SQL based password structure with one of the modules?
:)
Just a fleeting thought
-dieman
-- dieman - Scott Dier
[root@defiant /etc]# uname -rs /etc]# useradd -u 150000 biguid /etc]# grep biguid /etc/passwd
Linux 2.0.36
[root@defiant
[root@defiant
biguid:!!:150000:150000::/home/biguid:/bin/bash
I don't think it has any restrictions on anything, and it's easier to setup. It's also very reliable. My network has been running for over a month now.
Gosh, no restrictions on anything? That's amazing. Service Pack 4 must have been a real humdinger, to change it from all those limitations Microsoft teaches in their official course materials.
If you're going to lie, at least be creative enough to make it plausible, dumbass.
On many of the modern Unix variants, /etc/passwd is only a textual representation of a database file which holds the real user information. /home" on a SunOS system with like a thousand registered users was an invitation to get ahold of some (some!) coffee.
getpw*(3) uses this database file to access passwd data. This makes things way faster than it used to be, for example, on SunOS4, where ls(1) was written so stupidly that it scanned the (sequential) passwd file for every single uid lookup it needed to make. Typing "ls -l
Speaking of today, FreeBSD uses a DB database to store passwd information (in fact, it has two databases, one with and one without passwords, for "security"). This speeds up lookups quite a lot, but beware: The DB files are still generated text files, so adding users with such huge user databases is a real pain.
The question is whether you actually want to create that many Unix user accounts. For mail servers, you can often get away better with creating mail accounts only. This requires some hackery with your friendly MTA (postfix, qmail, sendmail, exim or even smail), but it is quite doable and also has positive security side-effects.
Look into Cyrus imapd you need message store implementation which is able to handle mailboxes for users who don't have a unix login. Beware, Cyrus comes with a pretty tcl-based administration interface which you almost certainly want to replace by a bunch of home-grown perl scripts to automate administration.
On many of the modern Unix variants, /etc/passwd is only a textual representation of a database file which holds the real user information. The getpw*(3) routines use this database file to access passwd data. This makes things way faster than they used to be, for example, on SunOS4, where ls(1) was written so stupidly that it scanned the (sequential) passwd file for every single uid lookup it needed to make. Type "ls -l /home" on a SunOS system with like a thousand registered users, sit back and relax.
Speaking of today: FreeBSD, for example, uses a Berkeley DB database to store passwd information. In fact, it uses two databases, one with and one without passwords, for "security". This speeds up lookups quite a lot, but beware: The DB files are still generated from text files, so adding users with huge user databases is a lengthy process.
The question is whether you actually want to create that many Unix user accounts. For mail servers, you can often get away better with creating mail accounts only. This requires some hackery with your friendly MTA (postfix or qmail), but it is quite doable and also has positive security side-effects.
Look into Cyrus imapd if you need a message store implementation which is able to handle mailboxes for users who don't have a unix login. Beware, Cyrus comes with a ugly^H^H^H^Hpretty tcl-based administration interface which you can replace by a bunch of home-grown perl scripts to automate administration. Cyrus makes it fairly easy to integrate your own authentication mechanisms through a seperate process, although the performance of such a mechanism would have to be determined.
In a nutshell: Unix in itself is not prepared to handle very large user populations. If you need to serve a lot of users with shell accounts, look into NIS+ or Kerberos and distribute the load onto a bunch of machines served by central (and well-hardened) user-database-servers. If you need to support only mail, you might be well off with one fast machine and a special purpose mailer configuration.
leprechaun 339_> uname -rs /usr/include/sys/types.h /* user id */
FreeBSD 4.0-CURRENT
leprechaun 340_> grep uid_t
typedef u_int32_t uid_t;
HP-UX supports large uids, clustering, passwd db's.
Linux is free. yah takes your choice and write the check.
Peter.