For people who administer firewalls, FTP is one of the most significant pains in the rear end. It's a weird, old protocol from back in the day when everyone on the internet knew each other and nobody worried too much about security. Back when it was invented, the term packet filter didn't actually mean anything, or at least didn't refer to any real software. So, what's so bad about it? Let's find out.
Active FTP
What can be so bad about FTP, you ask? I mean, it's just using TCP port 21, right? Well, almost. When an FTP client connects to an FTP server, it does make a connection on TCP port 21. So far this isn't any different from what HTTP or SMTP or even SSH do. Next, it sends your username and password in the clear to authenticate (Unless you're using Kerberos, that is, but we won't go into that here since Kerberos is mostly used on LANs). This means any J. Random Spammer with a packet sniffer can grab, and reuse, your password. This is bad, but of relatively little importance to firewall administrators. No, the real ugliness happens when you attempt to retrieve a file. What happens then is that the client sends a PORT command to the server, which tells the server to connect to the client on a randomly selected port above 1023. Yes, you read that right, the server initiates an inbound connection to the client! Now, this inbound connection should, according to the RFC, come from TCP port 20 - but lazy programmers being what they are, combined with the fact that using a privileged port like that requires root access, means that very few FTP servers actually do that. But it gets worse. In that PORT command, the client can also tell the server to connect to a different IP address than the control connection is coming from. This misfeature, called an FTP third-party transfer, might as well have a giant green blinking ABUSE ME sign hung from it. There's almost no legitimate use for it, and almost no way to successfully pass this across a firewall. Now, not only is all this ugly, and a security problem just waiting to explode in the hapless admin's face, but it causes major headaches when NAT is involved.
When the FTP client is behind a NAT gateway, things get even stickier. First, the client, let's say its IP address is 172.16.30.25, initiates its connection on TCP port 21. The NAT gateway grabs that connection, translates the source IP to that of the NAT gateway itself, let's say 1.2.3.4, and passes it on to the server. Cool, everything seems to work normally at this point, until the client sends that fateful PORT command. When it does, one of two things happens. Either the port command says, 'hey server! Connect to port 30000 on 172.16.30.25!', in which case the server tries, and promptly finds its packets routed down a black hole long before they reach the internet backbone. Or, the client can say, 'hey server! Connect to port 30000 on me!', leaving the server to figure out who 'me' is based on the IP that initiated the control connection. This seems to work better at first - at least the connection attempt gets to the NAT gateway now - but when it gets there, the NAT gateway looks at that packet and says 'hey, I don't know anything about an incoming connection on TCP/30000 - you're not in my state table! Moreover, you're not in my rule set either. Denied!'. All the while, the user sees none of this, and just wonders why FTP isn't working.
Enter passive FTP
Well, clearly all that is unsatisfactory. So now along comes passive FTP, the first whack at cleaning up the awful active FTP mess. How passive FTP works, is that when the client requests a file, the server instead sends the PORT command, telling the client which randomly-selected high port to connect to it on. This is saner - at least now all the connections are coming from the client, and those godawful third-party transfers are no more. This makes things ever so much easier, as you no longer have to allow random servers to connect to random high ports on your FTP clients. Also, it eliminates the problem with NAT, since now the data connection is just another outgoing connection, and gets happily NATted and added to the gateway's state table, and everything's just peachy, right? Well, if all you're worried about is NAT, yeah. But what if you've got a restrictive packet filter that limits outbound traffic, too? This is pretty common on corporate networks, after all. Well, now we have a problem. Since passive FTP connections still connect to a randomly-chosen port, we have to allow every machine that wants to do FTP, to connect to any server on any port above 1023. So much for our egress filter, eh? Now people can use VNC, Microsoft RDP, Dameware and all kinds of other stuff, and more importantly can bounce off of open proxies on the Internet to evade filtering, or to do all kinds of other nefarious stuff.
But that's not the only problem with passive FTP. Some particularly brain-damaged clients and servers just don't know about it, or refuse to use it. What's more, both the client and server have to support it, so even if you can weed out all the braindead clients behind your firewall - and not every admin has that luxury - you're still boned if someone connects to a braindead server. If you think this problem has sorted itself out by now, in 2008, you'd be sadly mistaken, unfortunately. Microsoft is a particularly notorious offender; the FTP client in Internet Explorer, along with the Windows command line FTP client, both default to active mode, and in the case of IE, there's no apparent way to tell it to use passive mode unless the server forces it. A lot of automatic update programs want to use active FTP, too (though nowadays HTTP or HTTPS is becoming more common), and in a lot of cases there's no way at all to force these to use passive mode.
So just how does one solve such a nettlesome problem? At the crux of it, is that the firewall needs information that it doesn't have in order to permit the ancillary connection. It needs to see that PORT command, regardless of which direction it's going. There's two ways to do this. One way, and the simplest conceptually, is to just redirect all the FTP connections to an FTP proxy running on the firewall itself. This is the approach taken by IPfilter, TIS Firewall Toolkit, OpenBSD's PF and the commercial Gauntlet and Sidewinder G2 firewalls. This approach has a lot of merit, since an FTP proxy can do a lot to improve the security of FTP. It can enforce download-only or upload-only regulations on a per-site basis, it can force the use of anonymous FTP for sites where users don't need to log in, and it can block third-party transfer requests (which, 99% of the time, are nothing but trouble). It can cache frequently downloaded files, and even keep open connections to often-accessed servers. Also, importantly, an FTP proxy can translate between passive and active FTP, allowing a smart client to talk to a dumb server or vice versa. By doing this at the border firewall, you can force all FTP that flows through the LAN to be passive, which greatly simplifies configuration of router access lists, intermediary firewalls and host firewalls. Further, the FTP proxy can ensure that all traffic flowing over port 21 is, in fact, FTP, preventing some clever cracker from using it to smuggle games through the firewall or something similar.
FTP proxies, though, for all their benefits, have some shortcomings. Most notably, an FTP proxy requires that the firewall have a public IP address. For most firewalls, this isn't a problem, since most firewalls are routers, and NAT gateways most certainly are. But what if you want to build a firewall that's a transparent bridge? If you want this, you'll either need to do some serious NAT trickery to redirect connections to the firewall's loopback IP address, then change the source address back to the original one before it leaves, or you'll need something other than a proxy. This is where deep packet inspection comes in. Deep packet inspection is basically a packet filter that operates at layer 7 instead of layer 4, like most packet filters. It has the intelligence to disassemble an FTP packet and read the PORT command as it flies by, inserting entries into its state table accordingly. This is the technique adopted by Linux's Netfilter firewall, and also by Cisco's PIX and by the CBAC feature on some routers. Some commercial firewalls, like Netscreen, Sonicwall, Check Point and CyberGuard also use this approach. It can actually do most of the things that an FTP proxy can do, though it can't cache, and most implementations can't translate between active and passive. What it can do, though, is allow both active and passive FTP to work by adding appropriate entries to the state table, and it can do this without changing the source address, if you don't need to NAT. This allows you to make a transparent bridge firewall that knows about FTP, that can be inserted into the network with no infrastructure changes.
Incidentally, there's also SFTP, or Secure FTP. Other than performing the same function, this isn't related to FTP at all. Rather, it's a variant of SSH, and like SSH, it runs over TCP port 22, and uses one connection. It incorporates encryption and compression, and several forms of authentication, including passwords, public keys, Kerberos and X.509 certificates. Fortunately for firewall administrators, it's very sane and easy to work with.
SFTP should not be confused with FTPS, which is regular old FTP using TLS or SSL. The latter is more secure than classic FTP since the password is encrypted and also both client and server can be authenticated, but it still uses separate data connections. Also, since it's encrypted, it's impossible to read the PORT command with deep packet inspection. Even proxies are troublesome here, since they have to terminate the encrypted connection and initiate another. This breaks SSL authentication, and some paranoid clients (justifiably) will drop the link when this happens. If firewall admins hate FTP, then FTPS is even more evil and rude. Use SFTP instead, with X.509 authentication if you really need it.