Everything2
Near Matches
Ignore Exact
Full Text
Everything2

HTTP compression

created by rdude

(thing) by rdude (6.8 d) (print)   ?   (I like it!) 1 C! Wed Jul 17 2002 at 16:41:06

DON'T BE A TORTOISE

You're surfing along, and, unfortunately, unlike all your friends, you're still using a dialup modem. This means that pages take a while longer for you to download, like the E2 home page. And sometimes you're downloading the list of every known organism off of the National Center for Biotechnology Information's site, so it doesn't matter that you have DSL or cable:

Everything2.com             NCBI's Huge List
------------------         ------------------
Speed  | Load Time         Speed  | Load Time
-------|----------         -------|----------
14.4 K |  14 sec.          14.4 K |  98 sec.
28.8 K |   7 sec.          28.8 K |  49 sec.
56.0 K |   3 sec.          56.0 K |  25 sec.
128+ K |  <1 sec.          128+ K |  10 sec.
This is where HTTP compression comes in. Taking a file and making it smaller is known as compression, and HTTP is the protocol by means of which content is transfered over the web. So you can conclude that HTTP compression would be compressing the content which is transfered over the web.

HTTP compression software is installed by the owner of a webserver, directly onto the server. Browsers identify themselves as accepting compressed content (ACCEPT-ENCODING = GZIP, DEFLATE) if they can, and most browsers can. The HTTP compression software on the webserver sends compressed, or smaller, content to browsers which can accept compressed content. The browser then decompresses the content, and shows it to you. This process takes barely any time. Say E2 and NCBI both had HTTP compression software installed. Then the new download times would be as follows:

Everything2.com             NCBI's Huge List
------------------         ------------------
Speed  | Load Time         Speed  | Load Time
-------|----------         -------|----------
14.4 K |   3 sec.          14.4 K |  11 sec.
28.8 K |   1 sec.          28.8 K |   5 sec.
56.0 K |  <1 sec.          56.0 K |   2 sec.
128+ K |  <1 sec.          128+ K |  <1 sec.
Comparing these figures to the original charts, you can see that there's a huge improvement in download time. E2 is about four times faster and the NCBI list is about ten times faster. You might ask why E2 isn't ten times faster, too.

First of all, there's the fact that the NCBI list was larger to begin with. Larger files tend to have higher compression ratios than smaller files. In fact, most HTTP compression software does not compress the smallest of files, since compression would actually make these files larger. But there's another major difference: the NCBI list is static content, while E2 is dynamic content. Static content is content that is the same for everybody. Dynamic content is content that the webserver changes depending on who's looking at it. For example, if you look at a text file on the Net, and I look at the same file, we will see the exact same thing. This means that a text file is static. Now, look at E2's homepage. You might see your username in the Epicenter nodelet, but I see mine. This means that E2's homepage is dynamic (as is the rest of the site).

Compression methods for static and dynamic files differ. Static files aren't going to be changing at all, so they can be precompressed and the compressed versions can be stored in a directory known as the compression cache. This is called caching. On some webservers, you will not be able to get a compressed version of static content unless a compressed version already exists in the compression cache. When a static file is compressed for a user when no compressed version cached, it is known as on-demand compression. The resulting compressed file of on-demand compression is usually deposited into the compression cache.

Compression of dynamic files works differently since the webserver must first make changes to the file before the HTTP compression software can compress and send it. All requests for compressed dynamic files are on-demand requests (thought that's not what they're called), since the webserver first does processing, and then the HTTP compression software compresses the file. A new file is generated, and therefore a new file must be compressed, for every request. Some websites have only static or only dynamic compression enabled, depending on what type of content most of the site is composed of. It is usually possible to treat dynamic content as static content, since most HTTP compression software allows you to change which file extensions are treated as which type of file.

However, HTTP compression softwares vary in this, as they do in many other things. Some common software is PipeBoost (http://pipeboost.com/), FuzzyCompress (http://fuzzelfish.com/fc/), XCompress (http://xcompress.com/), and the compression built into PHP 4+. Most of these utilities operate in the same method. Some (such as PipeBoost) offer an online service that allows you to estimate how much compression will be done on a certain webpage using that product. Fortunately, since these products are all similar, a single such service will allow to estimate compression for all other products, too.

POSSIBLE PROBLEMS

This gets a bit more technical now. The browser identifies itself as accepting compression through the HTTP request header HTTP-ENCODE = GZIP, DEFLATE. Netscape and others may support only GZIP, not both GZIP and DEFLATE. Problems occur when the HTTP compression software sends compressed content to incompatible browsers. This may happen if the software does not check for compatability, or, much more likely, when the browser incorrectly identifies itself. Browsers that kids hack together in a few hours can have this problem. Also, sometimes a proxy server identifies itself as compression-compatible, when, indeed the browser itself is not. By W3C specifications all HTTP 1.1-compatible browsers should support HTTP compression, but not all do. Also, further checking is required since some HTTP-1.0 browsers support HTTP compression.

Overall, HTTP compression is a very useful technology. It's only implemented on a few websites, but I think we will see that number increase over time.

And that's all, folks...


printable version
chaos

PHP: How to use output compression Zzyzzyxx URL escape sequences Making your web site more cache friendly
META http-equiv Ad-aware How to get Apache to send compressed versions of static HTML files Indent-o-Meter
cache Dial-up self-extracting executable HTTP refresh
server push compression HTTP Wget your Webcomics!
Men can download naked women. Women can't download men worshipping them. Ha ha! The Genetics of Hair Color bull pizzle Pink Panther
W3C referer HTTP Methods edev
Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.
  Epicenter
Login
Password

password reminder
register

Everything2 Help

Cool Staff Picks
Nodes your sibling would have liked:
Oscar Wilde
Massachusetts Institute of Technology
Frida Kahlo
This ocean is angry but I might live through it
Diamond
Wow! signal
Electoral College
Nodeshell as a term has got to go
Harry Potter and the Half-Blood Prince
God is Dead
Trail of Tears
down in the quarry there is no noise
Running to 3D Porn
New Writeups
fallensparks
George's Marvellous Medicine(thing)
Ctrl Y
cognitive dissonance(fiction)
SharQ
Gone Baby Gone(review)
halfWit
If I could, I'd title this "Freedom"(thing)
Roninspoon
Airline Hero(thing)
Ktistec
Why Women Are Always Cold(person)
doctor wilson
Drug policy reform(thing)
tejasa
Easy Raspberry Cheesecake(recipe)
Joysim
Drug policy reform(idea)
aneurin
Tyburn(place)
niruena
Boiling to death(idea)
artman2003
summer(thing)
doctor wilson
The Silver City and the Silent Sea(log)
Dreamvirus
The Silver City and the Silent Sea(poetry)
Aerobe
A nihilist's soulmate(poetry)
E2 is a by-product of the existence of The Everything Development Company