Domain: netapp.com
Stories and comments across the archive that link to netapp.com.
Comments · 137
-
Have you looked at Network Appliance's products?
My company is currently using one of the NetApp filers for a data warehousing solution, and large file storage. Very scalable, very cool.
Check 'em out here. -
Re:And one to play on...I totally agree... I needed experience with HPUX so bad that I bought my own HPUX server. (I already have a Sun and am looking at getting an RS600)
If you can't give them an onsite loaner then get them remote access to one. (i.e. a seperate lab hanging off of a DSL line or something with secure shell access.)
Network Appliance has a "walk in" lab here in Boulder that they let us "check out".
Nothing beats hands on experience, and you will build goodwill with the admin community.
It would also be advantageous to include a *cheap* training course with the product. With a bad market no one is spending extra money on training.
A deeply discounted training course would get you brownie points with the admins too.
My .02
-
Score +4?
Yes indeed, there are absolutely no NAS solutions out there that don't lock you into a Microsoft-centric solution.
How'd this get +4?
- A.P. -
Re:Good.
Nope; NetApp implements snapshots using copy-on-write, so they consume less disk space, take effectively no time to create, and are atomic with respect to filesystem operations (so there won't be any problems if you're accessing the filesystem while the snapshot is in progress). Check out their File System Design for an NFS File Server Appliance white paper for the technical details if you're interested.
-
transparent proxy dodging micro-HOWTOI use this all the time to get around the proxy at my large, sketchy employer, which blocks "tasteless" and "subversive" sites like Salon; and also occasionally to get around the severely broken transparent caches used by my cable modem provider. Note that this requires a shell account outside the proxy.
$OBSCURE_PORT_1 = obscure port # on your local machine
$OBSCURE_PORT_2 = obscure port # on machine outside firewallOn the machine where you have the shell account, download and compile the ucspi-tcp package, and micro_proxy. Put the tcpserver and micro_proxy binaries in your $PATH; throw everything else away.
To run the proxy:
From your local machine,
ssh -C -L $OBSCURE_PORT_1:127.0.0.1:$OBSCURE_PORT_2 -l [username] machine.where.you.have.shell.account.co.va
(or if you use some fancy Windoze SSH client, forward $OBSCURE_PORT_1 on your local machine to $OBSCURE_PORT_2 on the remote machine)Once logged in, run tcpserver -DHlR 127.0.0.1 $OBSCURE_PORT_2 micro_proxy & on the remote machine
On your local machine, set your browser to use HTTP and HTTPS (IE)/SSL (Mozilla) proxies on host 127.0.0.1, port $OBSCURE_PORT_1
Surf to your heart's content.
-
Re:snapshotting..
NetApp has had this functionality available in their Filer appliances for a number of years - you can cd into a 'magic'
.snapshot directory where hourly, daily, weekly, and monthly snapshots are kept.In fact, we've had that since we first shipped our machines. There's a paper on our Web site that discuss how this works, File System Design for an NFS File Server Appliance.
However, although snapshot directories let you dredge up copies of files from snapshots in case you (or a program) screws up and trashes them, that's not a convenient way to roll back the state of the entire file system.
We did implement that later (atop the same mechanism); see SnapMirror and SnapRestore: Advances in Snapshot Technology - SnapRestore(TM)(R)(LSMFT) is the "roll back an entire file system to a snapshot" feature. (At times, all this SnapStuff makes me want to SnapTheNeckOfMarketing, but so it goes....) That paper doesn't discuss technical details to the extent that the other paper does, but it should be possible from the earlier paper to figure out at least some of how you'd do it.
-
Re:snapshotting..
NetApp has had this functionality available in their Filer appliances for a number of years - you can cd into a 'magic'
.snapshot directory where hourly, daily, weekly, and monthly snapshots are kept.In fact, we've had that since we first shipped our machines. There's a paper on our Web site that discuss how this works, File System Design for an NFS File Server Appliance.
However, although snapshot directories let you dredge up copies of files from snapshots in case you (or a program) screws up and trashes them, that's not a convenient way to roll back the state of the entire file system.
We did implement that later (atop the same mechanism); see SnapMirror and SnapRestore: Advances in Snapshot Technology - SnapRestore(TM)(R)(LSMFT) is the "roll back an entire file system to a snapshot" feature. (At times, all this SnapStuff makes me want to SnapTheNeckOfMarketing, but so it goes....) That paper doesn't discuss technical details to the extent that the other paper does, but it should be possible from the earlier paper to figure out at least some of how you'd do it.
-
Re:snapshotting..
NetApp has had this functionality available in their Filer appliances for a number of years - you can cd into a 'magic'
.snapshot directory where hourly, daily, weekly, and monthly snapshots are kept.In fact, we've had that since we first shipped our machines. There's a paper on our Web site that discuss how this works, File System Design for an NFS File Server Appliance.
However, although snapshot directories let you dredge up copies of files from snapshots in case you (or a program) screws up and trashes them, that's not a convenient way to roll back the state of the entire file system.
We did implement that later (atop the same mechanism); see SnapMirror and SnapRestore: Advances in Snapshot Technology - SnapRestore(TM)(R)(LSMFT) is the "roll back an entire file system to a snapshot" feature. (At times, all this SnapStuff makes me want to SnapTheNeckOfMarketing, but so it goes....) That paper doesn't discuss technical details to the extent that the other paper does, but it should be possible from the earlier paper to figure out at least some of how you'd do it.
-
snapshotting..
Snapshotting is what you really want for something like this. NetApp has had this functionality available in their Filer appliances for a number of years - you can cd into a 'magic'
.snapshot directory where hourly, daily, weekly, and monthly snapshots are kept.
FreeBSD 5.0-CURRENT includes preliminary snapshot support for ffs.
The Linux options aren't quite as good. The most promising new filesystem that could provide this functionality is tux2, where data is structured in a way that would make implementing this functionality fairly easy. There was a post explaining how it would work in the mail archives, but they seem to have disappeared.
There is commercial option: MVD Snap. Their fileserver is Linux based, and the code for their snapfs filesystem was once available during beta testing. -
Re:San VS. Local Raid.
> Nobody ever got fired for buying EMC
Buy NetApp and get promoted instead =) -
Largest Oracle implementation on Linux
VA Linux provided a PDF last year while we were buying hardware showing the largest Oracle implementation in the world was on Linux.
I believe it was netapp
If I find it I shall post it. -
The Simple Solution
We have multiple instances (dev & prod) running on the same server using a NetworkAppliances througn nfs. Two nice things about this are:
1) We can also mount the netapp on windows.
2) It automatically takes disk snapshots, which are very easy to access. (just cd .snapshot/hourly.0 or something) it keeps a few hourly,daily,weekly, etc.
I'm not trying to be a salesman, but we love our netapp. -
bastards were bound to do it eventually
Bastards were bound to do it eventually.
What are the chances that some sort of driver could be devised that would let MSWin machines share according to something that Samba would be compatible with and still be able to authenticate against AD or the domain AND still be fast? Knowing MS, that would be listed as a bad driver and excluded from being loaded into the machine.
Of course, for those of us using NetApps, Snap Appliances, and Maxtor MaxAttach units, the technology is updated with any patches the companies release. If they do this, they're going to alienate a lot of companies that do Samba-like products. But life will find a way around this, too.
-
Re:Bah...That's what he meant. But Real Men use RAID 5. Real Men's Heroes just NFS mount a NetApp Filer over GigaBit Fibre.
Carl G. Jung
-- -
Re:patent on networksI suspect that statement of the patent was more a function of the writer's inability to explain technical things to lay reader than an indication that the patent was lame.
What struck me as odd was the blurb at the end of the article, where they talk about adding SAN and wireless 'soon'. How the hell are they accessing and storing 6TB of data without a SAN or NAS in place? Considering that data availability is worth $136,000 an hour to them, one would think they'd have already put a premium on high-availability storage systems.
-
Re:How about directory lookups?
-
How about directory lookups?
One advantage that ReiserFS and XFS are supposed to hold over ext2fs and other ufs based filesystems is the directory lookup time on directories with moderate to moderatly large numbers of files (1 million to 10 million or so). Does anybody know of any benchmarks available on the net that can backup this claim? If you want to test it yourself, you can look into Postmark which is easy to compile and simulates a heavily loaded mail or news server.
Unfortunatly the primary site appears to be down (I just downloaded the file a couple of days ago!), but if it comes back the primary distribution site is: http://www.netapp.com/ftp/postmark-1_13.c
Down that path lies madness. On the other hand, the road to hell is paved with melting snowballs. -
Re:Brittish Boston Party?
You do realise that we pay through the nose for telephone calls over here already?
Thinking about the cost per megabyte and past proposals for an "Information Tax"... I wouldn't say that it's all that far fetched...
300Gigs is a small disk array in business terms.
Here's product sheet on a 6-Terabyte Filer well within the capacity of being bought by the Government and being installed in every local telephone exchange.
Data communications can be compressed up and stored on these, analogue (voice) calls could be parsed through voice recognition systems and also compressed. Hell, when they run out of space they'll start dumping the old stuff to tape. (Ahem... Sun's 11-Terabyte solution). If these types of solutions are available commercially just think what the governments of the developed world will have available to them. The two products I just speced out would fit in a rather small datacentre.
Put this configuration in each telephone exchange and keeping records of all calls is just a matter of buying the tapes!!!
On another note, we're not, in general, as concerned with privacy here in the UK as much as you guys are in the US. We've had thousands of Closed Circuit cameras installed throughout our streets since the '80s (What with IRA bombing campaigns etc...) and for many people, especially women, it has instilled security for the general public as opposed to fear. Are we mis-guided? I'm not saying that I agree that my telephone conversations can be recorded, but if they're just going to be archived to tape then it doesn't bother me extremely. Hell, I would think that they are just as likely to protect me as they might incriminate me.
GC -
Snapshots on Network Appliance filers
I work there, so I'm biased, but I think the best alternative to floppy backups is the snapshot feature built into the filesystem on the filers we make at Network Appliance. The filesystem periodically saves its state so that you can retrieve old versions of your files simply by peering into the special ".snapshot" directory (called "~snapshot" if you're using Windows CIFS network drives instead of Unix NFS.) So if you accidentally mung a file, you can just fetch it out of one of the hourly or nightly snapshots. You wouldn't believe how many times this has saved our asses in engineering.
:-)
Read the filesystem design paper to find out how it works. -
Stable NFS, go specialized
For industrial grade performance like that, take a look at NetApp. I've seen them in both a small and large ISP and have passing experience with dozens of these. Rock solid and fast as can be. They are designed, ground up, for NFS. Hotswappable spare drives, writes go into battery backed NVRAM for speed, and you can configure up regular backups through the snapshot feature. Every hour, every day, every week, etc. They're there until they rotate out. The
.snapshot directory has saved me hours of work in just the past week alone when I blew away a directory I aught not have. -
NFS BoxesIf you're just offloading the data to a box whose sole purpose is to be a large hard drive, you reall want to be using RAID, in a level that gives reliability, such as 1 or 5.
However, why are you using a Linux box at all? If you want a box that just holds tons of data for an NFS share, you probabally want to look into solutions that are designed to do that. I do consulting for a major web hosting provider that has a 150GB array hanging off a NetApp 760. I'm not going to say use that particular box, but there are quite a few storage + ethernet interface solutions out there that are designed for high availability.
Just make sure you can keep some hot spares in the array, and that whichever method you do choose has backup solutions that work for you.
-
NVRAM, cache, and speeding disk access
Perhaps someone can do a hardware workaround using an intermediate NVRAM between the SDRAM HD and the hard disk, using principles borrowed from both cache technology and High reliability file systems. But it'll take a bit of work.
You can get pretty good performance with NVRAM, by tightly integrating the OS, the filesystem, and the RAID subsystem. Hop on over to NetApp's Technical Library to read how they did it. In particular, check out File System Design for an NFS File Server Appliance which discusses how the Write Anywhere File Layout (WAFL filesystem) knows (and can take advantage of) how the RAID subsystem works. The NVRAM is mainly used to speed up write performance. They've also got some interesting bits on how the RAM cache works. It gives decent performance, even though NetApp caches tend to be small by modern standards.
James
-
NVRAM, cache, and speeding disk access
Perhaps someone can do a hardware workaround using an intermediate NVRAM between the SDRAM HD and the hard disk, using principles borrowed from both cache technology and High reliability file systems. But it'll take a bit of work.
You can get pretty good performance with NVRAM, by tightly integrating the OS, the filesystem, and the RAID subsystem. Hop on over to NetApp's Technical Library to read how they did it. In particular, check out File System Design for an NFS File Server Appliance which discusses how the Write Anywhere File Layout (WAFL filesystem) knows (and can take advantage of) how the RAID subsystem works. The NVRAM is mainly used to speed up write performance. They've also got some interesting bits on how the RAM cache works. It gives decent performance, even though NetApp caches tend to be small by modern standards.
James
-
Cross platform SAN not here yet
We've looked at NAS (Network Attached Storage) vs SAN (Storage Area Network) lately. I'm certainly not well versed in these technologies but it seems to me that to go with a pure SAN, that is, fibre channel from the disk array to all hosts, with all hosts writing to the same filesystems, there are no good cross platform solutions. Megadrive has CDNA that will be cross platform, I've heard they have linux support now, beta NT, with Solaris due in the summer. Achieving a cross platform SAN solution in this fashion is quite challenging given that in a pure SAN locking mechanisms have to be implemented on the device rather than mediated by a host's OS. Thus, you need a new filesystem and weird VM layer things, that's probably why linux was done first, its VM layer is pretty clean so that it can support other filesystems. So most solutions tend to be proprietary, ergo, single platform. CDNA buys you nothing for the number of platforms you want to support anyway, you'd still have to gateway to them, you're back to doing file service from a host.
If you don't need all hosts writing to the same filesystems, then of course you can partition the storage up. A SAN wouldn't be very cost effective for 100GB, given that you have to get the fibre hubs and adapters, this would be a significant fraction of the cost of the actual storage on a SAN of that size.
With NAS you just use your existing infrastructure. The NAS idea is kinda nice, simple to maintain, plugs right into your network, good performance. We looked at Network Appliance their stuff can do NFS and CIFS/SMB. Its not all that cheap either but more cost effective that a 100GB SAN. -
This would be *very* cool.Although I don't actually know whether anyone's working on this for Linux, I'd just like to chip in and say that it would be an extremely useful feature.
The Snapshot feature on the NetApp filer boxes (which are highly recommended, btw) is described here - for a simple idea, it's extraordinarily useful, and it's saved my hide a couple of times.
-
This would be *very* cool.Although I don't actually know whether anyone's working on this for Linux, I'd just like to chip in and say that it would be an extremely useful feature.
The Snapshot feature on the NetApp filer boxes (which are highly recommended, btw) is described here - for a simple idea, it's extraordinarily useful, and it's saved my hide a couple of times.
-
Re:NLFUGThere's also going to be a port to PowerPC (in part, to support the MacOS X/Darwin effort, I would assume), they're looking at doing lots more "appliances" (e.g., Whistle/IBM InterJet II, Stallion ePipe, NetWolves FoxBox, and even perhaps going after the higher-end market like the NFS/CFS and web proxy cache servers from Network Appliance).
They're also going to be pushing partnerships & co-marketing a lot harder, as well as a branding and pre-installation program so that you can make sure that the machine you buy is 100% compatible with FreeBSD, or you can even get FreeBSD pre-installed on the machine.
There's a lot more that they're going to be doing. I'm just waiting for the updated plans to be posted to the web pages of Jordan K. Hubbard, CTO of Walnut Creek and soon to be CTO of the merged BSD, Inc.
--
Brad Knowles -
Architecture of Caching to large-scale sites
For those of you interested in caching and how it can help large scale sites, I've helped co-author a technical report with Network Appliance, which was our experiences at accelerating the Mars Polar Lander website. That site used NetCache boxes, simple HTTP/1.1 cache-control headers, and a bit of cunningness to allow user-level tracking without letting the track requests filter through. As traditional, the site had a couple of problems which we've also included in the appendix after we fixed them, to hopefully save other people the same hassles in the future.
The technical report can be found at http://www.netapp.com/tech_library/307 1.html
We would all save a scary amount of bandwidth if more sites were designed with public caches such as (the awesome) squid in mind, and it's a really simple use of headers that make it possible.
For those who use Apache and are interested in making your own sites more cache-friendly, I recommend you look at mod_expires, which is part of the default distribution of Apache, although not compiled in by default. If you have large, static images that rarely change, then go ahead and put week-month-year long expiry headers on them, and watch the hits for those redundant images drop right down on your web server. And if you suddenly need to change them, then it's no real problem, as all you have to do is change the images URL and it will become a "new" entity for purposes of caching.
Yeah, granted, bandwidth is getting cheaper now, but for us poor Europeans, it's still a scarce commodity and we need to worry about these things :-)
-anil- -
Architecture of Caching to large-scale sites
For those of you interested in caching and how it can help large scale sites, I've helped co-author a technical report with Network Appliance, which was our experiences at accelerating the Mars Polar Lander website. That site used NetCache boxes, simple HTTP/1.1 cache-control headers, and a bit of cunningness to allow user-level tracking without letting the track requests filter through. As traditional, the site had a couple of problems which we've also included in the appendix after we fixed them, to hopefully save other people the same hassles in the future.
The technical report can be found at http://www.netapp.com/tech_library/307 1.html
We would all save a scary amount of bandwidth if more sites were designed with public caches such as (the awesome) squid in mind, and it's a really simple use of headers that make it possible.
For those who use Apache and are interested in making your own sites more cache-friendly, I recommend you look at mod_expires, which is part of the default distribution of Apache, although not compiled in by default. If you have large, static images that rarely change, then go ahead and put week-month-year long expiry headers on them, and watch the hits for those redundant images drop right down on your web server. And if you suddenly need to change them, then it's no real problem, as all you have to do is change the images URL and it will become a "new" entity for purposes of caching.
Yeah, granted, bandwidth is getting cheaper now, but for us poor Europeans, it's still a scarce commodity and we need to worry about these things :-)
-anil- -
Re:Umm. Remove head from defilade position?
Perhaps I should provide a few more details about our network, so I can justify the the design decisions that went into it.
We have about 250GB - in about 200,000 files - available on four servers. At this point, having a cleverly designed directory structure is no longer a viable solution for organizing our documents across the enterprise, and an additional layer of abstraction is needed so users can locate the files they need.
Enterprise Document Management Systems are applications that usually sit on top of an RDBMS and allow search, check-in check-out, version control, and other features beyond what a file server can do. Since we already have an Oracle ERP solution installed here, we went with their EDMS product, but there are certainly others available (most notably from Xerox and Eastman-Kodak, although recently I've wondered how suitable CVS would be for such a task).
By the way, I think the biggest CAD file I found on our servers was ~400MB, and it was a Pro/Engineer drawing of a component that could fit in the palm of your hand. After a six-month campaign of trying to ration disk usage by our employees, only to be voted down by higher-ups, I have resigned myself to the fact that I will be adding disk storage to these servers forever. Network Appliance is starting to look real good... -
Re:works fine for me
Nothing wrong with my connection. I can post on Slashdot (the definitive statement for being well-connected in this modern world) and can traceroute quite happily to the Mars Lander site. I suspect there's something unusual happening with the funky load balancing/Caching/DNS magic described in the technical document. I'm sure I won't be the only person experiencing problems reaching the site -- and yes, I did try several times.
:-)
--
Paul Gillingwater -
Pricey but attractive
The Network Appliance Filers are really sexy.
The beautiful thing is they use the WAFL filesystem so you can expand your array when you need to without adding big sets of drives.
Granted, I don't have one but I've submitted the proposals and am waiting on financing. The F720 scales to 464GB, is network attached, has journaling (rad), and can benefit your WHOLE network.
Of course, you have to use NFS or SMB though. I've heard they start as low as $17k but usually $30-40k with a bunch of drives but it's difficult to find general prices without hearing the sales pitch.
This paper discusses testing the Stanford Linear Accelerator Center performed while evaluating the NetApp filers. It's geared toward Usenet news but if it can handle that, it can surely handle your mail situation.
Does anyone here have first hand experience good or bad with NetApp Filers? And some word on the pricing?
-
Pricey but attractive
The Network Appliance Filers are really sexy.
The beautiful thing is they use the WAFL filesystem so you can expand your array when you need to without adding big sets of drives.
Granted, I don't have one but I've submitted the proposals and am waiting on financing. The F720 scales to 464GB, is network attached, has journaling (rad), and can benefit your WHOLE network.
Of course, you have to use NFS or SMB though. I've heard they start as low as $17k but usually $30-40k with a bunch of drives but it's difficult to find general prices without hearing the sales pitch.
This paper discusses testing the Stanford Linear Accelerator Center performed while evaluating the NetApp filers. It's geared toward Usenet news but if it can handle that, it can surely handle your mail situation.
Does anyone here have first hand experience good or bad with NetApp Filers? And some word on the pricing?
-
Re:Startups
They're ALL software... just depends on where the software is running: on your SCSI card, or within the OS.
Most common hardware (controller) RAID systems have the same power-down problem as the software (OS) systems. As this is for a "startup", I don't think a 25k$ RAID array is needed (yet.) -
LDAP patches for Qmail & other ideasYou might want to consider using the LDAP patches for Qmail on nrg4u.com. They patch qmail to do user lookups via an LDAP database. qmail-pop3d will also do user password lookups against the same LDAP database. Run OpenLDAP on a dedicated machine that runs a web server and some CGIs to allow updates to the LDAP database. Mirror the contents of the LDAP database on each of your mail servers (see below) with slurpd. That way, if the LDAP master goes away, mail delivery can still take place.
Machine-wise, PC hardware should handle this nicely. Take a few PCs and put them in 2U rackmount chassis with a hardware RAID adapter mirroring (RAID 1) the system disk. Put a layer 4 switch, such as a Foundry ServerIron or Alteon AceDirector in front of these machines. Need to take a machine down because of upgrades or hardware failure? Want to add more machines to the cluster to improve performance? No problem. Take the machine down and the switch automatically removes the downed machine from the available pool of machines.
Mount mail spools from a Netapp Filer. Put a few hot spares in the Filer and now you've got redundancy and fault-tolerance for your mail spools, too. Plus, it'll be fun if you give anyone tours of your facilities. Imagine their reaction when you nonchalantly yank a disk out of a Filer taking that kind of load, and then watch the Filer automatically rebuild the drive on the hot spares you have in it.
:-)You can also cluster Netapp Filers ( more info), which would allow you to have two Netapps that would automatically sync their contents. If one fails, the other takes over transparently.
Lastly, if you're going to be having all of this NFS activity with that size a user base, I would highly recommend putting a second NIC in each of your server PCs. Link these second NICs in each of the PCs into a physically separate network from the one the users will be using to retrieve their mail. Gigabit Ethernet may also be an option here depending on the traffic demands of NFS in your situation. There are two advantages to this separate network. 1) It separates your NFS traffic from your user requests and data transfers, thus preventing the network from reaching its saturation point as rapidly and 2) you can secure the NFS network and allow only NFS requests and other management processes to use this network. If your Filers are only homed to this NFS network, it would take a break-in to one of the PCs just to gain a chance at administrative access to the Filer holding all of your user data.
The only downside to all of this is that Qmail doesn't have a daemon to serve IMAP. I don't have any experience with it, but I've seen Cyrus recommended a lot for IMAP service. There are patches on qmail.org that patch Cyrus to authenticate against a CDB, the file format that qmail can use for authentication and other lookups. You might be able to do something along the lines of creating a cron job that checks for a timestamp on the LDAP entries, and updates the CDB entry for a user if the LDAP info has changed since the last invocation. Maildir support might be dicier; I only spent a few minutes on it, but I couldn't find any info on getting Cyrus to deliver to a maildir.
-
LDAP patches for Qmail & other ideasYou might want to consider using the LDAP patches for Qmail on nrg4u.com. They patch qmail to do user lookups via an LDAP database. qmail-pop3d will also do user password lookups against the same LDAP database. Run OpenLDAP on a dedicated machine that runs a web server and some CGIs to allow updates to the LDAP database. Mirror the contents of the LDAP database on each of your mail servers (see below) with slurpd. That way, if the LDAP master goes away, mail delivery can still take place.
Machine-wise, PC hardware should handle this nicely. Take a few PCs and put them in 2U rackmount chassis with a hardware RAID adapter mirroring (RAID 1) the system disk. Put a layer 4 switch, such as a Foundry ServerIron or Alteon AceDirector in front of these machines. Need to take a machine down because of upgrades or hardware failure? Want to add more machines to the cluster to improve performance? No problem. Take the machine down and the switch automatically removes the downed machine from the available pool of machines.
Mount mail spools from a Netapp Filer. Put a few hot spares in the Filer and now you've got redundancy and fault-tolerance for your mail spools, too. Plus, it'll be fun if you give anyone tours of your facilities. Imagine their reaction when you nonchalantly yank a disk out of a Filer taking that kind of load, and then watch the Filer automatically rebuild the drive on the hot spares you have in it.
:-)You can also cluster Netapp Filers ( more info), which would allow you to have two Netapps that would automatically sync their contents. If one fails, the other takes over transparently.
Lastly, if you're going to be having all of this NFS activity with that size a user base, I would highly recommend putting a second NIC in each of your server PCs. Link these second NICs in each of the PCs into a physically separate network from the one the users will be using to retrieve their mail. Gigabit Ethernet may also be an option here depending on the traffic demands of NFS in your situation. There are two advantages to this separate network. 1) It separates your NFS traffic from your user requests and data transfers, thus preventing the network from reaching its saturation point as rapidly and 2) you can secure the NFS network and allow only NFS requests and other management processes to use this network. If your Filers are only homed to this NFS network, it would take a break-in to one of the PCs just to gain a chance at administrative access to the Filer holding all of your user data.
The only downside to all of this is that Qmail doesn't have a daemon to serve IMAP. I don't have any experience with it, but I've seen Cyrus recommended a lot for IMAP service. There are patches on qmail.org that patch Cyrus to authenticate against a CDB, the file format that qmail can use for authentication and other lookups. You might be able to do something along the lines of creating a cron job that checks for a timestamp on the LDAP entries, and updates the CDB entry for a user if the LDAP info has changed since the last invocation. Maildir support might be dicier; I only spent a few minutes on it, but I couldn't find any info on getting Cyrus to deliver to a maildir.
-
Re:Maybe a Network Appliance Netfiler instead?
I have some experience with high end NetApps, and I think you should give them serious consideration for this kind of application.
My recommendation is to go with a cluster / failover pair of F760s, which can have 1TB+ of storage between the two of them.
In normal operation, each filer of the pair serves 1/2 the files. If one dies, the other does file serving for both by taking possesion of the other's disks. This pretty much happens seamlessly.
As a test, we started copying a large directory tree to one filer, and then turned it off mid-way through. After about a minute, the other filer of the pair assumed control, and the copy continued without interruption. After the copy finished, I did a file-by-file comparison of the original and the copy (GNU diff rocks) and they were exacly the same. Very cool.
In addition to cluster/failover, the NetApps have quite a number of other high-availibility features: multiple fans, good RAID subsystem, snapshots and checkpoints, journaling filesystem, etc.
I love Linux (set up the 4th box for my home this weekend), but it's not ready for this kind of task. It's much better to go with an established vendor, instead of spending a lot of time trying to build it yourself.
Please feel free to contact me if someone has more questions about Netapps. There are also a lot of good technology papers on their web site http://www.netapp.com.
James Graves