Rewriting the directory structure of a website with dynamic pages

by Arthur
Posted on Monday, January 23rd, 2006 at 8:11 pm CET
If you are hosting your website on an Apache server and your website has dynamic PHP or ASP pages with long unattractive URLs with lots of parameters, you should really look into using mod_rewrite in .htaccess files to transparantly rewrite your directory structure, without the user’s knowledge. It is very easy to do and your website will be that much nicer for your visitors.
Why rewrite your directory structure?
Let’s say we have a site that lists books and on one particular page we want to show all the books of the author Stephen King. In our database that author has ID number 35, so the URL of the page could be something like this:
http://www.site.com/index.php?cat=authors&id=35
Of course this URL is fine, and there are many sites that work like this. However, there are some disadvantages. First of all this URL exposes the underlying technology of the website, in this case PHP. This gives hackers the opportunity to try out other combinations of variables, potentially causing great risk to your site. For instance, they could try &id=0 and see what happens.
Secondly, it is just an awkward URL. Imagine a customer wants to send the URL in an email to a colleague, or a visitor who forgot to bookmark the URL and wants to type it in from memory.
“cat=authors&id=35″ just doesn’t mean anything, and all the punctuation like the ? and & are just plain annoying.
Fortunately, with the help of a simple .htaccess file on your server it is extremely easy to turn the above URL into something like this, without the visitor of your site ever knowing the original URL:
http://www.site.com/authors/stephen-king/
What are .htaccess files?
.htaccess files are ‘distributed configuration files’ that provide a way to make configuration changes on a per-directory basis. They are basically simple text files that contain your rewrite rules. The rules you put in an .htaccess file apply to the directory that contains the file, and all subdirectories underneath. In case of nested .htaccess files, the rules defined in the .htaccess file in the sub-directory prevail those defined in an .htaccess file in the parent directory.
Enabling the RewriteEngine
As long as the mod_rewrite module is installed on your server, you can enable it by simply creating a text file named .htaccess and in this file add the following line:
RewriteEngine on
Now put this file in the root of your site and voila … rewriting is enabled in all subdirectories throughout your site.
Rewriting directories
Let’s get right to business. The only two lines that are required in the .htaccess file to get the result in the example above is the following:
RewriteEngine On
RewriteRule ^authors/(.*)/$ /index.php?cat=authors&name=$1 [L]
The RewriteRule line in this example creates the nice author URL that we’ve seen above. You can have as many rewrite rules in the .htaccess file, but in this example you only need one rule for all the authors.
This is how it works; the RewriteRule is made up of two parts:
RewriteRule ^url to match$ original url [L]
^ (caret) signifies the start of the URL under the directory that the .htaccess file is in.
$ (dollar sign) signifies the end of the string to be matched.
[L] stands for last rule and is put at the end of the rule.
(.*) means literally “everything”. In the url to match this is what will be put in the place of $1 in the original URL. So if someone types in the url /authors/michael-crichton/ then what will be shown is /index.php?cat=authors&name=michael-crichton. It is then up to your PHP script to look in the database and match the string “michael-crichton” to display the correct information.
You can have multiple variables in your rewrite URL; the second (.*) in the url to match will be $2, and the next one $3, etc.
For instance:
RewriteRule ^authors/(.*)/(.*)/$ /index.php?cat=authors&name=$1&year=$2 [L]
In this example we have two variables (eg. /authors/stephen-king/2006/), whereas “stephen-king” will be $1 and “2006″ will be $2.
Matching criteria
(.*) is the simplest form to match a string. There are other ways, for instance:
([a-z]) will allow only characters a to z
([0-9]) will allow only numbers
([^/]+) will allow anything except / (forward slash)
([^&]+) will allow anything except & (ampersand)
Knowing this we could match the year in the example above in a better way. For instance, we could do it as follows:
RewriteRule ^authors/(.*)/([0-9][0-9][0-9][0-9])/$ /index.php?cat=authors&name=$1&year=$2 [L]
In this example the URL /authors/stephen-king/2005/ would shows the correct page with all 2005 books by Stephen King. But if you were to type in /authors/stephen-king/200/ then you get a page not found error, because the last directory must have four numbers. It only has three numbers, so the rule is not matched.
Where to find more information
This is just the tip of the iceberg. With this information you can probably get started, but there are 1001 possibilities with the mod_rewrite module.
Here you can find more information:
If you are hosting your website on an Apache server and your website has dynamic PHP or ASP pages with long unattractive URLs with lots of parameters, you should really look into using mod_rewrite in .htaccess files to transparantly rewrite your directory structure, without the user’s knowledge. It is very easy to do and your website will be that much nicer for your visitors.
Why rewrite your directory structure?
Let’s say we have a site that lists books and on one particular page we want to show all the books of the author Stephen King. In our database that author has ID number 35, so the URL of the page could be something like this:
http://www.site.com/index.php?cat=authors&id=35
Of course this URL is fine, and there are many sites that work like this. However, there are some disadvantages. First of all this URL exposes the underlying technology of the website, in this case PHP. This gives hackers the opportunity to try out other combinations of variables, potentially causing great risk to your site. For instance, they could try &id=0 and see what happens.
Secondly, it is just an awkward URL. Imagine a customer wants to send the URL in an email to a colleague, or a visitor who forgot to bookmark the URL and wants to type it in from memory.
“cat=authors&id=35″ just doesn’t mean anything, and all the punctuation like the ? and & are just plain annoying.
Fortunately, with the help of a simple .htaccess file on your server it is extremely easy to turn the above URL into something like this, without the visitor of your site ever knowing the original URL:
http://www.site.com/authors/stephen-king/
What are .htaccess files?
.htaccess files are ‘distributed configuration files’ that provide a way to make configuration changes on a per-directory basis. They are basically simple text files that contain your rewrite rules. The rules you put in an .htaccess file apply to the directory that contains the file, and all subdirectories underneath. In case of nested .htaccess files, the rules defined in the .htaccess file in the sub-directory prevail those defined in an .htaccess file in the parent directory.
Enabling the RewriteEngine
As long as the mod_rewrite module is installed on your server, you can enable it by simply creating a text file named .htaccess and in this file add the following line:
RewriteEngine on
Now put this file in the root of your site and voila … rewriting is enabled in all subdirectories throughout your site.
Rewriting directories
Let’s get right to business. The only two lines that are required in the .htaccess file to get the result in the example above is the following:
RewriteEngine On
RewriteRule ^authors/(.*)/$ /index.php?cat=authors&name=$1 [L]
The RewriteRule line in this example creates the nice author URL that we’ve seen above. You can have as many rewrite rules in the .htaccess file, but in this example you only need one rule for all the authors.
This is how it works; the RewriteRule is made up of two parts:
RewriteRule ^url to match$ original url [L]
^ (caret) signifies the start of the URL under the directory that the .htaccess file is in.
$ (dollar sign) signifies the end of the string to be matched.
[L] stands for last rule and is put at the end of the rule.
(.*) means literally “everything”. In the url to match this is what will be put in the place of $1 in the original URL. So if someone types in the url /authors/michael-crichton/ then what will be shown is /index.php?cat=authors&name=michael-crichton. It is then up to your PHP script to look in the database and match the string “michael-crichton” to display the correct information.
You can have multiple variables in your rewrite URL; the second (.*) in the url to match will be $2, and the next one $3, etc.
For instance:
RewriteRule ^authors/(.*)/(.*)/$ /index.php?cat=authors&name=$1&year=$2 [L]
In this example we have two variables (eg. /authors/stephen-king/2006/), whereas “stephen-king” will be $1 and “2006″ will be $2.
Matching criteria
(.*) is the simplest form to match a string. There are other ways, for instance:
([a-z]) will allow only characters a to z
([0-9]) will allow only numbers
([^/]+) will allow anything except / (forward slash)
([^&]+) will allow anything except & (ampersand)
Knowing this we could match the year in the example above in a better way. For instance, we could do it as follows:
RewriteRule ^authors/(.*)/([0-9][0-9][0-9][0-9])/$ /index.php?cat=authors&name=$1&year=$2 [L]
In this example the URL /authors/stephen-king/2005/ would shows the correct page with all 2005 books by Stephen King. But if you were to type in /authors/stephen-king/200/ then you get a page not found error, because the last directory must have four numbers. It only has three numbers, so the rule is not matched.
Where to find more information
This is just the tip of the iceberg. With this information you can probably get started, but there are 1001 possibilities with the mod_rewrite module.
Here you can find more information:




















