Ask Slashdot: Best File System For Web Hosting?
An anonymous reader writes "I'm hoping for a discussion about the best file system for a web hosting server. The server would serve as mail, database, and web hosting. Running CPanel. Likely CentOS. I was thinking that most hosts use ext3 but with of the reading/writing to log files, a constant flow of email in and out, not to mention all of the DB reads/writes, I'm wondering if there is a more effective FS. What do you fine folks think?"
Or maybe XFS.
Check it out here http://en.wikipedia.org/wiki/XFS
if you have to ask you should stick with ext3
The best file system would be one not running: mail, database, web hosting, and CPanel.
The obvious argument for ext4, the current ext version, is that it's been around a long time and is very solid. I'd only use something else if I knew the performance of ext4 would be an issue.
It will kill your innocent files to save some space....
The inefficiencies and handicaps introduced by that bloated turd of a platform will far outwiegh the sub-percentage point gains you might see from using ReiserFS or any other alternative filesystem.
Typically, that is the default file system. That is how you will get the best support when there is an issue. It will also be the most stable with your OS because the developers focus on that FS. So personally, I would use whatever is the default FS for whatever OS you decide to use. To get off topic a bit, IMHO that OS should be Debian because it is just too awesome and Debian based OS's have the largest community. Also, it should be running on Linode.com ;)
Put the data on the best type of filesystem required for it whether it be ext3, ext4, some NAS box with tons of memory, Ramdisks. If you have a complex web site, have multiple filesystem types. If you decide that you want a one size fits all then you obviously aren't that serious about the question.
It will be released someday
Especially if you decide to use a SSD. Even if there's not alot of data writing going on the constant rewriting of the directory entries to update the last accessed time stamp would wear an SSD and slow a regular hard drive.
NFS export hosted on ZFS but what do I know.
From memory (I've been out of that business for 6 months) CPanel stores mail as maildirs. If you have gazillions of small files (that's a lot of email) then XFS handles it a lot better than ext3 - I've never benchmarked XFS against ext4. Back in the day, it also dealt with quotas more efficiently than ext2/3, but I really doubt that is a problem nowadays.
If you aren't handling gazillions of files, I'd be tempted to stick to ext3 or ext4 - just because it's more common and well known, not because it is necessarily the most efficient. When your server goes down, you'll quickly find advice on how to restore ext3 filesystems because gazillions of people have done it before. You will find less info about xfs (although it may be higher quality), just because it isn't as common.
You're not going to be there forever, and all using a non-standard filesystem is going to accomplish is to cause headaches down the road for whoever is unfortunate enough to follow you. Use whatever comes with the OS you've decided to run - that'll make it a lot more likely the server will be kept patched and up to date.
Trust me - I've been the person who's had to follow a guy that decided he was going to do the sort of thing you're considering. Not just with filesystems - kernels too. It was quite annoying to run across grsec kernels that were two years out of date on some of our servers, because apparently he got bored with having to constantly do manual updates on the servers and so just stopped doing it...
#DeleteChrome
Even with an SSD you still need a file system format for it to be usable.
I'm all for ZFS, very reliable over long periods of time.
If you need a large filesystems then go with XFS. RHEL only supports up to 16TB filesystems with ext4 and up to 100TB with XFS. I'm not sure at this point where the limitation comes from as it is limited even with X86-64.
yeah - its especially good for your log files, after all, SSD is just like a big RAM drive.....
you're going to be better off forgetting SSDs and going with lots more RAM in most cases, if you have enough RAM to cache all your static files, then you have the best solution. If you're running a dynamic site that generates stuff from a DB and that DB is continually written to, then generally putting your DB on a SSD is going to kill its performance just as quickly as if you had put /var/log on it.
RAID drives are the fastest, stripe data across 2 drives basically doubles your access speed, so stripe across an array of 4! The disadvantage is 1 drive failure kills all data - so mirror the lot. 8 drives in a stripe+mirror (mirror each pair, then put the stripe across the pairs - not the other way round) will give you fabulous performance without worry that your SSD will start garbage collecting all the time when it starts to fill up.
Unless you want the special features of other file systems (say ZFS), the default (ext3 or ext4) should be fine. They are capable of handling high I/O loads.
If you want even more I/O performance, then use SSDs.
In Soviet Russia, articles before post read *you*!
Due to the amount of read writes & the life span of SSD's they are some of the worst drives you can get for a high availability web server. ext3 should work fine for you, especially if you're not too familiar with the different types of file systems. Two things I might recommend is if you're looking at really high traffic, you need to separate out your database, email, & web server into 3 different entities. If not... again the file system is not really a concern for you. Last, but not least, redundancy is what will save you a lot of time and headache, make sure you have some sort of mirroring going on, or if your server is at a datacenter, they probably take care of it for you.
This isn't 1999. You have no reason to host your web server, email server, and database server on the same operating system.
You would be well advised to run your web server on one machine, your email server on another machine, and your database server on a third machine. In fact, this is pretty much mandatory. Many standards, such as PCI compliance, require that you separate all of your units.
Take advantage of the technology that has been created over the past 15 years and use a virtualized server environment. Run CentOS with Apache on one instance - and nothing else. Keep it completely pure, clean, and separate from all other features. Do not EVER be tempted to install any software on any server that is not directly required by its primary function.
Keep the database server similarly clean. Keep the email server similarly clean. Six months from now, when the email server dies, and you have to take the server offline to recover things, or when you decide to test an upgrade, you will suddenly be glad that you can tinker with your email server as much as you want without harming your web server.
If you are concerned about performance and expect constant email stream you should host mail, database and web-servers on separate computers. There is a reason any reputable host does it this way. Plus increased load on one component doesn't affect others.
I think file picking system should be the least of your worries.
If you're going with rackspace, it'll be EXT3. If you're going with Amazon, well, there are more choices. But unless you have a really good reason to deviate from the standard (and it sounds like you don't), why would you make yourself a bunch of unnecessary trouble?
based on your topology you have described, the last thing you need to worry about is what file system to choose, since you have decided to host ALL tasks on a single server. if performance was an issue, you would separate them all to dedicated "farms" and if security is a factor (which it should be), none of them would be in the DMZ, only your proxy(s) would live there.
Whether your focus is on performance, reliability or both, you have other areas that require much more attention than the FS.
Everyone switched to ext4 years ago. Before that, I used ReiserFS, but then, you know....
I think you're not a very well trained sysadmin.
There is no reason to not have various parts of the filesystem mounted from different disks or partitions on the same disk. If you do this, you can run part of the system on one filesystem, other parts on others as appropriate for their intended usage. This is commonly done on large servers for performance reasons, quite like the one you are asking about. It's also why SCSI ruled in the server world for so long since it made it easy to have multiple discs in a system.
So run most of your system on something stable, reliable and with good read performance, and the portions that are going to take a read/write beating on a separate partition/disc with the filesystem which has better read or write, whichever is needed, performance. If you segregate your filesystem like this correctly, an added benefit is that you can mount security critical portions of the filesystem readonly, making it more difficult for an attacker.
Red
Contrary to the majority of the people replying to this post, I emphatically DO NOT recommend ext3. ext3 by default wants to fsck every 60 or 90 days; you can disable this, but if you forget to, in a hosting environment it can be pure hell if one of your servers reboots. Usually shared hosting web servers are not redundant, for cost reasons; if one of your shared hosting boxes reboots you thus get to enjoy up to an hour of customers on the phone screaming at you while the fsck completes
XFS is a very good filesystem for hosting operations. It has superior performance to ext3, which really helps, as it means your XFS-running server can host more websites and respond to higher volumes of requests than an ext3-running equivalent. It also has a feature called Project Quotas, which allows you to define quotas not linked to a specific user or group account; this can be extremely useful for hosting environments, both for single-homed customers and for multi-homed systems where individual customer websites are not tied to UNIX user accounts. The oft-circulated myth that XFS is prone to data loss is just that; there was a bug in its very early Linux releases that was fixed ages ago, and now its no worse than ext4 in this respect.
Ext4 is also a good option, and a better option than ext3; it is faster and more modern than ext3 and is being more actively developed. Ext4 is also more widely used than XFS, and is less likely to get you into trouble in the unlikely event that you get bit by an unusual bug with either filesystem.
Btrfs will be a great option when it is officially declared stable, but that hasn't happened yet. The main advantages for btrfs will be for hosting virtual machines and VPSes, as Btrfs's excellent copy on write capabilities will facilitate rapid cloning of VMs.
This is already a reality in the world of FreeBSD, Solaris and the various Illumos/OpenSolaris clones, thanks to ZFS. ZFS is stable and reliable, and if you are on a platform that features it, you should avail yourself of it. I would advise you steer clear of ZFS on Linux.
Finally, for clustered applications, i.e. if you want to buck the trend and implement a high availability system with multiple redundant webservers, the only Linux clustering filesystem I've found to be worth the trouble is Oracle's open source OCFS2 filesystem (avoid OCFS1; its deprecated and non-POSIX compliant). OCFS2 lets you have multiple Linux boxes share the same filesystem; if one of them goes offline, the others still have access to it. You can easily implement a redundant iSCSI backend for it using mpio. Its somewhat easier to do this then to setup a high availability NFS cluster, without buying a proprietary filer such as a NetApp.
Reiserfs was at one time popular for mail servers, in particular for maildirs, due to its competence at handling large numbers of small files and small I/O transactions, but in the wake of Hans Reiser's murder conviction, it is no longer being actively developed and should be avoided. JFS likewise is a very good filesystem, on a par with ext4 in terms of featureset, but for various reasons the Linux version of it has failed to become popular, and you should avoid it on a hosting box for that reason (unless your box is running AIX).
Speaking of older proprietary UNIX systems; on these you should have no qualms about using the standard UFS, which is a tried and true filesystem analogous to ext2 in terms of functionality. This is the standard on OpenBSD. NetBSD features a variant with journaling called WAPBL, developed by the now defunct Wasabi Systems. DragonFlyBSD features an innovative clustering FS called HammerFS, which has received some favorable reviews, but I haven't seen anyone using that platform in hosting yet. The main headache with hosting is the extreme cruelty you will experience in response to downtime, even when that downtime is short, scheduled or inevitable. Thus, it pays to avoid using unconventional systems that customers will use as a vector for claiming incomp
Due to the amount of read writes & the life span of SSD's they are some of the worst drives you can get for a high availability web server.
Only if you're completely ignorant about the difference between consumer and enterprise SSDs. The official rated endurance of a 200GB Intel 710 with random 4K writes (the worst case scenario) with no over-provisioning is 1.0 TB. In order to wear this drive out in a high-load scenario, you could write 100GB of data in 4k chunks to this drive every day for nearly 30 years before you approached even the official endurance.
If you use a consumer SSD in a high-load enterprise scenario, you're going to get bit. If you use an enterprise SSD in a high-load enterprise scenario, you'll have no problems whatsoever with endurance, regardless of what people spreading FUD like you would have you believe.
don't know the budget, but 250gb of "RAM" for 500$ looks like a good deal. and you just suggested an array of 4 drives to someone that wants the classic webserver with CPanel, all stuffed in one system, that would be like 3-4k$ just for the disks. SSD is the way to go on this cases mainly because of the money you save; and the lifespan? i replaced way more HDD than SSD in the last 3 years since using them, and they are in the same ratio right now and the SSD get way more I/O.
... in the year 2012, people are seriously suggesting others use filesystems that can (and eventually will) lose data on an unclean shutdown. C'mon people, this isn't stone age anymore.
Came here to say this. Unfortunately I have no mod points. Enterprise drives are more expensive, but if you need the performance, are an excellent option.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
2? 4? fuq dat, use 12. use another 12 if you need redundancy. and scsi is still a better performer than sata...
Need Mercedes parts ?
Go for PostgreSQL-backed services whenever feasible. For example, ðere is a quite competent IMAP server called Archiveopteryx, you can run Mediawiki on PostgreSQL, as well as Zope and whatnot.
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
If only they had some kind of way of allowing drives to fail while still retaining data integrity. It's probably because I just dropped Acid, but I'd call the system RAID.
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
I have a few websites solely in S3 and CloudFront. It works. Similarly RDS - it's a pretty uncomplicated MySQL service. Not sure about hosting mail on AWS - You can certainly send mail (SES) but I don't know about receiving it. But in general your presumed point is valid - if you can get away with cloudsourcing some of your infrastructure needs, it can be cost-effective and useful.
I spent some time late last year and earlier this year working very closely with the developers of BetterLinux, and in the work I did, I did stress testing (on a limited scale) to see how the product performed. It has some OSS components and some closed-source components, but the I/O leveling they do is pretty amazing.
http://www.betterlinux.com/
Insanity is a gradual process; don't rush it.
You're still going to want redundancy. At the very least 2 identical drives mirrored with software RAID.
If redundancy is important, 500GB/1TB "Enterprise" drives are cheap. 4 drives in RAID10 would give the best cost:redundancy:performance ratio. You can probably get 4 HDD's for the cost of the one $500 240GB SSD you mentioned.
There are arguments to be made in favor of FAT16 or even FAT32, but I think I'd go with FAT12, just because it's simpler. You don't need LFNs for web hosting, do you?
Cut that out, or I will ship you to Norilsk in a box.
BTRFS has not fully stabalized yet, making it a poor choice for a production system. And ZFS is only a viable option if you're running Solaris (Sure, you can use the 2009 OpenSolaris version of ZFS in BSD or FUSE... but again, not good production choices).
Ext3/4 and XFS are good choices depending on your needs and distribution. But for a small standalone sever, you will probably never notice the difference - use the default.
You forget the reason why adding RAM makes things faster. Linux caches a tonne of stuff in RAM so constantly reading from disks won't occur.
Why not btrfs and backups?
People still run their own email servers?
Oh no! My static images in RAM are no more!!! Let's revert to backups! Must .... rememeber... don't ... feed ... teh trollz
100% agree. I'm hosting on FreeBSD too with ZFS. Backups are such a breeze! Only one client left to convert over from Linux.
I hear reiserfs is killer.
(too soon?)
Whatever... I really did love reiser3 back in the day, if only because rm -rf on large dirs was blazingly swift compared to ext2
Yep, agreed... agonizing over the FS choice isn't going to provide many gains compared to spending time optimizing the physical disk configuration and partitioning.
FS performance is only going to really matter if you're going to have directories with thousands of nodes in them. But then hopefully you have better ways to prevent that from happening.
But you do want to spend a good deal of time benchmarking different RAID and partitioning setups, where you can see some gains in the 100-200% range rather than 5-10%, especially under concurrent loads. Spend some quality time with bonnie++ and making some pretty comparison graphs. Configure jmeter to run some load tests on different parts of your system, and then all together to see how well it deals with concurrent accesses. Figure out which processes you want to dedicate resources to, and which can be well-behaved and share with other processes. Set everything up in a way to make it easier to scale out to other servers when you're ready to grow.
The FS choice is probably the least interesting aspect of the system (until you start looking at clustered FSs, like OCFS2 or Lustre)
> The official rated endurance of a 200GB Intel 710 with random 4K writes (the worst case scenario) with no over-provisioning is 1.0 TB. In order to wear this drive out in a high-load scenario, you could write 100GB of data in 4k chunks to this drive every day for nearly 30 years before you approached even the official endurance.
I believe you must have meant 1.0 PB, since that's closer to how much data would be generated in 30 years at that rate. 1.0 TB would expire in 10 days.
Each of those application types tend to exercise the Host and therefore the filesystem different ways. It would be better to focus on the type of RAID and partitioning you are using then the base filesystem. But even the base file system probably has different performance profiles for the application types mentioned. In general I'm gonna assume you have several disks in this system for the sake of performance and redundancy (and if you don't then you prob should not be running your own server). Now, if this server is just for you you should have mirrored disks at min, or a higher level RAID depending on how much storage you need.
Mail: for SMTP type traffic its more straight writes and reads (queing mail and receiving it). For the POP/IMAP part there would be more seeks and lookups. In this latter part what you'd really want to do is try and spread the various users read's and writes across multiple disks (RAID). But, you could prob research the best layout for this kind of thing.
For database it depends on the database type MySQL type DB's tend to "enjoy" striped RAID etc (depending on table size and usage patterns). Often NoSQL type databases like RAID 10 or some other mirrored setup (because of the way the do data redundancy). You can also use caching on DBs to avoid going to disk. It depends a lot on your usage pattern and table layouts etc.
For a web server really you should be focused on caching repeatedly used content in memory and worry less about the file system.
SSDs area mixed bag although the enterprise ones are getting more reliable. But in memory caching is much cheaper and more effective if your usage patterns cooperate.
Shoot, sorry, yes. I meant 1.0 PB.
You'll want a RAID controller that supports TRIM.
Who ordered that?
I review my web log files on a regular basis and look for exploit attemps to update my firewall and make sure I am not exposed. I use log files to prevent a SHTF scenario as much as possible. Now get off my lawn kid.
Tomorrow is another day...
Create a Solaris ZFS NAS. Use individual VMs for each application. Do I need to say backups?
I'm planning to race a Yugo kitted out with cast iron spoilers and wooden tires.
Which type of decals will make me go fastest?
Ontopic; the choice of filesystem will have far less impact than the choice of programming language, database, webserver application and how you use those. The choice to go with CPanel (or any *Panel) means the impact of the filesystem will be unnoticable. Nothing wrong with those panels; they drive down human cost, but if you need the absolute best performance, panels won't let you get there if only because it depends on so many case-specific factors.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
I used JFS on all my machines from around 2007-2011, including laptops. I had many unclean shutdowns (especially on laptops) and JFS rarely had any problems, except that one time briefly in 2009 where I did actually lose a bunch of data, but then so did my ext4 reinstall a few weeks later (bad hardware).
JFS was much, much better than ext3. Especially in low-CPU situations/hardware.
I can't remember why I went back to ext4, I guess I wanted to see if it still sucked compared to JFS. With noatime I decided I couldn't tell the difference except perhaps for some really big git checkouts, but I didn't do any proper timing.
Go with tmpfs. It has the highest performance of any of the "standard kernel" filesystems, and if you use it for your personal webserver/blogserver/mailserver/etc, it will never lose any valuable data if the server reboots unexpectedly.
--Joe
As I mentioned in another reply CentOS or another RHEL based distro are your only choices if you want to run cPanel. ext4 is the default file system and that is what you should stick with. I've been in the hosting business for 10 years now and I can say from experience that you will only run into headaches if you try to be clever and run different file systems.
Best Storage System for Web Hosting?
Here, I'm using Storage System to refer to a design rather than a product.
While filesystems are a good point to look at, I'd be much more interested in the one thing almost all concurrent systems contend over: spindles (or more correctly, drive heads). Partitioning workloads onto separate spindles or SSDs makes a lot more sense than twiddling over the finer points of a filesystem. Serial read/write is well-suited to even slow SATA drives though YMMV, while high-concurrency OLTP DBs benefit from SSD. I can't think of a benchmark that shows any significant performance difference between the headline filesystems when you're not talking about SSD, and if you have the cash to go SSD for all your storage perhaps you should get a professional to advise you better?
You make no case how a consumer grade SSD would not last at least 10 years before normal replacement and upgrades compared to enterprise SSD. Sounds like your advertising for the enterprise guys to take money out of people's pockets. I have not had any issues running raided consumer SSD's.
Intel rates the endurance of the 710 at 1.0 PB and the 330 at 60 TB, so yeah, there's a pretty big difference there.
In Intel's case, specifically, the difference is between using MLC flash and MLC-HET flash. The difference is largely from binning, but it's the difference between 3k to 5k p/e cycles on typical MLC, and 90k p/e cycles on MLC-HET. SLC produces similar improvements. I could explain how they achieve this, but Anandtech and Tom's Hardware have both done pretty good write-ups explaining the difference.
It depends entirely on your workload. If you've got an enterprise workload where you don't do many writes, then a consumer drive will work just fine. And since most drives report their current wear levels, it's actually pretty safe to use a consumer drive if you monitor that.
Anandtech gave one example, when they were short on capacity and were facing a delay in getting some new enterprise SSDs; they walked out to the store, bought a bunch of consumer Intel SSDs, and slapped those into their servers. They were facing a write-heavy workload, so they wouldn't have lasted long, but they only needed them for a few months and kept an eye on the media wear indicator values, so they were fine.
My point overall is that you can't look at SSDs the same way if you're a consumer versus an enterprise user, and if you're an enterprise user, you need to pick an SSD appropriate for your workload.
One thing people don't consider is upgrade cycles. Hanging on to an SSD for ten years doesn't really make sense, because it only takes a few years for them to be replaced by drives enormously cheaper, larger, and faster. They're improving by Moore's Law, unlike HDDs. I paid $700 for a 160GB Intel G1, and three years later, I paid $135 for a much faster 180GB Intel 330. If you're going to replace an SSD in three to five years, does it matter if the lifespan is 10 or 30?
I would keep a close eye on BtrFS, which is currently supported by SUSE and Oracle Linux (based on RHEL), and stick with whatever you have until it's ready (if you have nothing, go with the default). I don't know about SUSE, but Oracle is already calling BtrFS "production-ready" (if their DB is any indication, keep lots and lots of backups). I suspect a lot of the harder to track bugs revolve around things like power loss, that aren't common with production servers, so Oracle's claim might not be too far off.
It has a lot of nice features (lvm type features, data mirroring, subvolumes, compression -- zlib and LZO, dynamic inodes, data and metadata crc32c checksums, SSD support, snapshots, seed devices, efficient incremental backup, automatic background repair of mirrored files), and growing (background defragmentation, RAID 5/6 on files or objects, more checksum options, more compression options -- zippy and lzo, probably fewer compression penalties, automatically move hot data to faster devices, online file system check). The lzo compression can be quite fast depending on usage patterns, and with a little work, can be turned on or off for each folder (e.g. /var or /home). You can hop over to phoronix.com to find some benchmarks on file systems under different loads.
If you're running mail, db and web on a single box it doesn't sound like performance is a huge concern anyway. I think there are lots of other things you can concentrate on (like server s/w and configuration, or storage h/w) before the FS even starts to make a difference.