Everything2
Near Matches
Ignore Exact
Full Text
Everything2

How to replicate a dynamic website quickly without the source code or database

created by salimfadhley

(idea) by salimfadhley (3.2 y) (print)   ?   (I like it!) 1 C! Thu Aug 15 2002 at 0:20:07

This is a procedure that you might need to use if you need to back up a moderately complex web-site in a hurry. It is not foolproof, for example it will not adequately handle javascript rollovers, nor will it deal with web-sites that use POST method or forms for navigation; but since most of the time pages link to other pages with simple '<a href=>' links, this method can be quite good. It will work 100% on 80% of websites.

I had to do this earlier today for a client who's previous web programmer decided to take a round the world vacation without first handing over the system or providing any documentation. This technique allowed me to backup the site and create an identical looking site in about 20 minutes.

This is also a technique for publishing scripted pages on free hosts. Supposing your original site makes use of Ldap servers, Database queries and other kinds of server side complexity - the copy site gives a similar appearance without any of the internal complexity.

What we are going to do:

We are going to use wget to traverse every linkable page of the source web-site, and have it copy the HTML and graphical content of each page into static files. Next we are going to use apache's mod rewrite and a tiny PHP (or Perl) script to serve up those pages and create the illusion that they are still running dynamically.

What you need:

  1. A functioning apache 1.3 webserver to run the copy site.
  2. Enough disk space to hold static versions of every possible page and image that can appear on the site.
  3. The unix command line utility 'wget'; for copying the web pages over.
  4. A basic text editor - I like jedit.
  5. PHP (or similar) installed as an apache module.

Step 1:

Make a new folder to store the new web pages you are about to copy. On a Linux computer this will usually be somewhere within /var/www/html. Change to that directory.

Step 2:

Use wget to copy the website over to your computer.

wget -r -t5 http://foo.net/ -o download.log

This means: Recursivly download everything you can find on foo.net, if you cannot fetch something keep trying 5 times and record all the progress in the file called "download.log".

Step 3:

Setup your apache virtual hosts file and your local DNS server (or /etc/hosts file) so that you can see the website you have just copied over on a convenient URL on your computer. This makes it easy to find your copied web-site and test the next step.

Step 4:

In any folder where you can find HTML pages, insert a .htaccess script that looks something like this:

#Beginning
RewriteEngine on
Options +FollowSymlinks

RewriteRule (.+) page.php
#End

This says, if any pages other than "/" (the default page) are requested from this directory, rather than attempting to display the specified page, just run a script called page.php.

Page.php needs to be something like this:
<?

// This is the folder from which all relative URLs are derived.
$base_path="/where/to/find/your/page/";

// This is the filename that I shall retrieve.
if (! is_null( $_SERVER["REQUEST_URI"] ) )
    {
    $file = $base_path . $_SERVER["REQUEST_URI"];
    } 
else
    {
    die ("No file to get");
    }

// Comment out this next line once you have it working, it's a security risk.
echo "<!-- This content was read from: ".$file."-->";
$fp=fopen($file, "r");

// Limit the page length to aprox 100kb
echo fread( $fp, 100000 );
?>

Step 5: Restart your apache server.

On a red-hat Linux box do:

/etc/init.d/httpd restart

Job done!


Update: I've recieved a number of messages outlining more elegant solutions than the one I propose. PERL programmers will appreciate w3mir, a package that does almost exactly the same thing but with many extra features.


printable version
chaos

publishing scripted pages on free hosts How to navigate the Donnie Darko website with some degree of success How to flip a coin when you haven't got one Making your web site more cache friendly
jEdit wget .htaccess I'm so sorry my brain works that way
65535 foo Perl PHP
AltGr httpd JavaScript recursive
Got milk? A ninjagirls bake sale! cardinality Peacefire bash
Picasso on Stalin's cock Ack! You lost experience! Tippmann surf fishing
Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.
  Epicenter
Login
Password

password reminder
register

Everything2 Help

Cool Staff Picks
What you are reading:
Friedrich Nietzsche
How to buy a stereo system (without winning the lottery)
learning to juggle
Kings Cross by Coke-light
Nuclear, chemical and genetic: Three different flavors
Civilization III has made me sympathize with the Japanese in WWII
Pyrrho of Elis
So, you want to change your hair color? Read this first!
Meal, ready to eat
My correspondence with the Westboro Baptist Church
Quaternion
Maybe it's bad manners, but you still can't buy my baby
When your job is keeping order, your life becomes chaos
New Writeups
fallensparks
George's Marvellous Medicine(thing)
Ctrl Y
cognitive dissonance(fiction)
SharQ
Gone Baby Gone(review)
halfWit
If I could, I'd title this "Freedom"(thing)
Roninspoon
Airline Hero(thing)
Ktistec
Why Women Are Always Cold(person)
doctor wilson
Drug policy reform(thing)
tejasa
Easy Raspberry Cheesecake(recipe)
Joysim
Drug policy reform(idea)
aneurin
Tyburn(place)
niruena
Boiling to death(idea)
artman2003
summer(thing)
doctor wilson
The Silver City and the Silent Sea(log)
Dreamvirus
The Silver City and the Silent Sea(poetry)
Aerobe
A nihilist's soulmate(poetry)
This affordable entertainment brought to you by The Everything Development Company