FTP: Better Than HTTP, Or Obsolete?
An anonymous reader asks "Looking to serve files for downloading (typically 1MB-6MB), I'm confused about whether I should provide an FTP server instead of / as well as HTTP. According to a rapid Google search, the experts say 1) HTTP is slower and less reliable than FTP and 2) HTTP is amateur and will make you look a wimp. But a) FTP is full of security holes. and b) FTP is a crumbling legacy protocol and will make you look a dinosaur. Surely some contradiction... Should I make the effort to implement FTP or take desperate steps to avoid it?"
rsync is a great protocol, fairly robust, can be wrappered in ssh (or not), supports resuming transmission, and operates over one socket.
seems like the best of both worlds to me.
the real question is - do you control the clients that are going to access you? or is it something like a browser (which doesn't support rsync).
Do it for da shorties
The question is, "what do you want to do?" I run an FTP server (incidentally affiliated with etree.org, lossless live music!) and I need what it can give me. Namely I need multiple classes of login, each with a different
1) number of available slots
2) speed limit
3) premission set
Some people can only read files at 60KB/s, some can read and write (to the upload dir) at the same speed, come can only browse, etc. etc. For this kind of a setup, FTP is great _IF_ you keep your software up to date; subscribe to bugtraq or your distro's security bulletin or both.
On the other hand, HTTP is great when you want to give lots of people unlimited ANONYMOUS access to something. I'm sure there is a way to throttle bandwidth, but can you do it on a class by class basis? In proftpd it's a simple "RateReadBPS xxx" and I'm set.
As always, choose the tool that fits _your_ purpose, not the one that everyone says is "best"; they both have good and bad qualities. And http can be just as secure/insecure as any other protocol.
The time of day is 29:33.
[SNIP]
does not (by default) allow directory listings
[SNIP]
That is a dangerous and very incorrect assumption which has nothing to do with http and everything to do with your http server.
Furthermore, FTP allows for features such as resume, etc...
So does HTTP. With the 'Range' header, you can retrieve only a portion of a resource.
I agree that it really depends on the application, but for most practical "view directory, download file" purposes, there's no significant difference.
If you wanted to interact with a directory structure, change ownerships, create directories, remove files, etc., it's generally easier to do this with FTP.
using the right client, ie wget, you can resume from http streams provided the server supports it (and i think most modern ones do)
Need a Catering Connection
Plain wrong. RFC2068 section 10.2.7.
The data connection is most assuredly NOT UDP. It is a TCP connection just like the control connection. But yes, the latency required to initiate a transfer (due to more handshakes) generally makes FTP slower in general.
P2P?
I've written a tutorial on how you can use P2P on your website to save bandwidth, space etc. An obvious way to do this would be to run a P2P client and share the file on a simple PC & Cable Modem. This works, but it is a bit generic and un-professional. A better way to do this may be to run a P2P client such as Shareaza on a webserver. You could then control the client using some type of remote service (Terminal Services, for example).
P2P has it's advantages. Such as:
- Users who download the file also share it. This is especially useful if the client/network supports Partial File Sharing.
- When you release the file using the P2P client, you only need to upload to only a few users. Those users can then share the file using Partial File Sharing etc.
- Unlike FTP and HTTP, they aren't connecting to your webserver. Thus, it saves bandwidth for you and allows people to browse your website for actual content, not media. (Though, media is content). In addition, there is ussually "Max # of Connections" allowed to a server or FTP. Not so on P2P.
- P2P Clients have good queuing tools. At least, Shareaza does. It has a "Small Queue" and a "Large Queue". This basically allows you to have, say, 4 Upload slots for Large Files (Files that are above 10MB, for example) and one for Small Files (Under 10MB). Users who are waiting to download from you can wait in "Queue", instead of "Max users connected" on FTP.
Though, at it's core, all of the P2P I know of uses HTTP to send files etc. But the network layer helps file distribution tremendously.
FTP supports a single connection (Passive, or PASV in the actual protocol), which is what most web browsers use by default.
No, no, no. Jesus. Everyone always gets this wrong. FTP in any mode uses two TCP connections. Passive or not, there is a channel for data and a separate channel for commands.
The difference is that passive-mode means that the client initiates the data connection. The default FTP behavior is for the client to connect to port 21 on the server, and then the server initiates a data connection to the client.
Non-passive FTP clients are very hard for firewalls to keep track of, especially when NAT is involved. Passive is a little better because both connections are outgoing.
But at the same time, passive mode makes the server firewall's job tougher, because it requires an large range of incoming ports for the data connections.
No matter what the mode, FTP is not very firewall-friendly.
I also assume the following:
I would say that you should go with HTTP for sure. Of course, you can provide both, but there are some key reasons for using HTTP.
Easier Configuration Perhaps I'm just not that swift, but I've found that web servers (including Apache) are easier to configure. This is especially true if you have any previous web server experience. Of course, the FTP server is more complex due to its additional features that HTTP doesn't have, but assuming that (c) is true, then you won't need to mess with group access control rights and file uploads.
Speed This whole "FTP is faster" stuff is not true. HTTP does not have a lot more overhead than FTP; it may even have less overhead than FTP in certain cases. Even when it does have more overhead, it is in the order of 100-200 bytes, which is too small to care about. HTTP always uses binary transfers and just spits out the whole file on the same connection as the request. FTP needs to build a data connection for every single data transfer, which can slow things down and even occasionally introduce problems.
Easier for Users Given assumption (d), your users will be much more familiar with HTTP URLs than FTP addresses. You could just use FTP URLs and let their web browsers download the files, but then you lose the benefit of resuming partial downloads.
Simple Access Controls Though some people need to have complex user access rules, you may very well just need simple access controls. HTTP provides this (look at Apache's .htaccess file), and you can even integrate Apache's authentication routines into PAM, if you are really hard core.
There are a few main areas where FTP currently holds sway:
Partial Downloads Web browsers typically don't support partial downloads, but the fact of the matter is that the HTTP protocol does support it (see the Range header.) The next generation of web browsers may very well include this feature.
User Controls Addressed above.
File Uploads Again, HTTP does support this feature but most browsers don't support it well. Look to WebDAV in the future to provide better support.
In summary, just use HTTP unless you need complex access rules, resumption of partial download, or file uploading. It will be easier both on you and your users.
Being in IT for a large Fortune 500 company that sells an operating system among other things (no, not Microsoft), I can share some of my expereinces with you. So take it for what it is worth.
Our FTP servers run both HTTP and FTP providing the same content in the same directory structure. There are five servers that transfer an average of 1-2 TB (terabyte) per month each, so they are fairly busy. On a busy month each server can go as high as 7 TB of data transferred. File sizes range from 1 KB to to whole CD-ROM and DVD-ROM images. I think the single largest file is 3 GB.
The logs show a trend of HTTP becoming more popular for the last several years and not stopping. It is currently at 70% of all downloads from the "FTP" servers via HTTP. While the remaining 30% is via FTP. Six years ago (I lost the logs from before this time, they are on a backup tape but I am way too lazy to get that data), it was completely reversed. 75% of downloads were via FTP and 25% were via HTTP. 90% of all transfers are done with a web browser as opposed to an FTP client or wget or something.
One thing we learned was that many system administrators will download via FTP from the command line directly from the FTP server, especially during a crisis they are trying to resolve. They do this from the system itself and not a workstation. The reasons for this are a bit of a mystery. Feedback has shown that we should never get rid of this or we might be assassinated by our customers. We thought about it once and put out feelers.
I would say if you don't need to deal with incoming files and you file size is not too large then stick with HTTP. Anything over about 10 MB should go to the FTP server. An FTP server can be more complicated. It seems like the vulnerabilities in FTP daemons has died down in the past year or so. Also, fronting an FTP server with a Layer 4 switch was a lot more tricky because of all the ports involved. If you want people to mirror you then go with FTP or rsync for private mirroring. In reading the feedback, most power users seem to prefer FTP, perhaps because that is what they are used to. Also, depending on the amount of traffic you might need to consider gigabit ethernet.
The core dumps being uploaded are getting to be huge. Some of those systems have a lot of memory!
Starting multiple TCP connections for a single file download can be advantageous, because of congested network paths.
If there are 500 TCP downloads ocurring, each download will theoretically get 1/500th the bandwidth.
Therefore, by opening multiple TCP connections, you will increase the amount of bandwidth for your transfer, at a cost to everyone else using the connection. This is because you've effectively doubled the size of your receive window (one for each connection), causing the host you are downloading from to stuff that many more packets down the pipe.
The problem is, when everyone does it, it completely negates any advantage to using this method. It also leads to packet loss, since you have that many more TCP connections (each with its own receive window) fighting for pieces of the pie.
I provide a mirror for a couple of largish open-source sites, and several of them specifically request that sites provide FTP service as preferred over HTTP. A couple of reasons:
1. Scripts which need to get a list of files before choosing which ones to download - automated installers and the like - are easier to implement with FTP.
2. FTP generally seems to chew up less CPU on the host. I can serve 12mb/s of traffic all day long on a P-II 450 box with only 256mb of memory.
3. "download recovery" (after losing connection, etc.) seems to work better in FTP than HTTP.
Lynx lets you browse, but you can't do globbing, so you see lots of irrelevant crap, and you have to select files to download one at a time.
For getting (possibly multiple) files whose location you don't know in advance, FTP is more flexible and efficient.
The ocean parts and the meteors come down
Laid out in amber, baby.
- Do you have bandwidth issues? If you are serving files to many people, FTP servers allow maximum concurrent users, which can be useful. I know you can do this with HTTP, but it's difficult to segment the downloading >1Mb files traffic from the normal site traffic. A separate service also allows you to use all the Quality of Service stuff in the 2.4 kernel nicely.
- Do you have a large array of files that the user might want to download, such that using an FTP client to ctrl+select multiple files is the right answer compared to having your users click on twenty links and have to cope with twenty dialog boxes?
- Do your users need to be able to upload files to you? This can be done with HTTP, but you'll need some PHP processing or similar on the server, it doesn't support resuming, and it won't work through many company firewalls, and therefore isn't a good option. HTTP uploading it particularly hopeless for large files, as it provides no user-feedback.
However, you should NOT use FTP if you answer no to either of these:- Are you running some flavour of unix? There just aren't any robust Windows FTP servers. Yes, I'm prepared for the flame war about this.
:) - Can you be bothered to keep your FTPd patched? ProFTPd and WU-FTPd are both frequent appearers on bugtraq. You need to stay on top of the patches, or you will be 0wn3d.
Simple, see?FTP implementations frequently use a fixed, small window size. HTTP on the other hand will honor the system limit, almost always larger even without tuning.
t mls _tcp tune.html/ faq.html
Dramatically simplified, it means that the connection can send a lot more packets without hearing back from the far end, enabling the connection to reach higher speeds (imagine a phone call where you had to say 'okay' after every word the other person said. Now imagine only having to say it after every sentence. Much faster.)
The tiny window size of (most crappy legacy implementations of) FTP starts to affect download speed at just 25ms latency, and has a huge effect over 50ms.
A properly tuned system with HTTP can make a single high-latency transfer hundreds or even thousands of times faster than FTP.
Relevant links:
http://www.psc.edu/networking/perf_tune.h
http://www.nlanr.net/NLANRPackets/v1.3/window
http://dast.nlanr.net/Projects/Autobuf