David Steven-Jennings
Linux, Coding, Webmastery

Implementing Permalinks - a Basic Overview

March 10th, 2008 by David

Ever wondered how Wordpress and other CMSes manage to take URLs with querystrings and make them readable (for users and search-engines)? Luckily, there’s no mystical Voodoo involved, just some basic commands that’re actually pretty straight-forward. I won’t be going into great depth mind you, as while the server side of things is pretty straight-forward, the programming is pretty open-ended. What I will be doing, however, is giving you the low-down on how the system works, so that if you want you can implement it your own way :) As the standard implementation uses Apache and PHP, I will use them for any example code.

What You Will Need

  1. A machine with Apache installed
  2. Apaches mod_rewrite must be turned on
  3. .htaccess files allowed (optional)
  4. PHP

How Does It All Work?

The basic theory is straight-forward. The webserver rewrites the requested URI (the /directory/page/etc bit in the URL) in the background to the name of a URI that will be able to display the content. Confused? Here’s an example:

A visitor hits your site, specifically the URL http://www.site.com/widgets/blue-widgets/. In the background, the webserver (eg, Apache) sees this URL, specifically the URI part /widgets/blue-widgets/ and checks it’s rewrite rules to see if anything must be done with it. With Apache, these rules are contained in the sites configuration file or a htaccess file*. If the URI matches a rule, the rule is then run in the background (the visitor never sees the rewrite) and the results are returned.

A very common implementation can be seen on some e-commerce sites. They are often plagued by URIs such as index.php?cat=345&prodid=22&maker=112, which are difficult to remember and trip up search engine spiders. So, using rewrite rules, a person could then ask to see www.site.com/345/22/112/ and they would be given the same page as www.site.com/index.php?cat=345&prodid=22&maker=112.

While this implementation is a good start, it is a bit of a ‘hack’ solution, a ‘bolt-on’ added as a stop-gap to a indexing problem that does not help the user as much as it could. I mean, can you tell what that page was about just by seeing /345/22/112/? Wouldn’t it be nicer if they saw /widgets/blue-widgets/? (especially if ‘they’ were a search engine?).

How to Create ‘Permanent’ Links

The original idea behind a ‘permanent link’ was to have something like a ‘bookmark’ point to any blog posts, etc that weren’t on the front page anymore. However, people soon realised that the links would be more useful (especially from an SEO point of view) if they were always used as the primary links. The nice part about creating them is that they use the same methods as above, but simply add a level of sophistication to it. Firstly, lets have a look at the Apache configuration rules, as this is where it all begins:

RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .* index.php

The first line is very, very important - it turns on the mod_rewrite engine :) Mod_rewrite, as the name suggests, is the Apache module that deals with rewriting URIs. You only need to specify this command once in the file, but it must appear before any Rewrite lines.
The next 2 lines are merely checks, as we don’t want physical files to be processed by this rule. If they were, links to images, css files, etc would not show (and probably break things). The second line says “if the requested thing is not a directory” and the third “if the requested thing is not a file.” The last line merely says “take whatever was asked for and rewrite it as index.php”

At this stage the rules have done their thing, and it’s now up to the PHP code to do the rest. Please note that the method below is just an example of how to do it. It really depends on what you’re comfortable with and how the URI processing code interacts with any databases, etc you may have.

First of all, we’ll need to get the requested URI. To do this, we use the following command:

$uri = $_SERVER['REQUEST_URI'];

This will store whatever has been requested into the variable $uri. From here, we want to break things up so that they can be used. To do this, we can use the explode function in PHP:

$exploded = explode('/', $uri);

Now, here’s an important thing to note: The first element of the array will be empty due to the way that the string in $uri is exploded. However, if there is a trailing slash at the end (eg, /dir1/dir2/) then the last element will also be empty. You will need to add in validation code to check to handle this, which luckily is pretty straight-forward.

Anyway, the final part has come - getting the info! The standard way to get it would be from a database like MySQL using the ID numbers, which are normally an auto-incrementing primary key, however as we aren’t using the ID numbers we need to find some way to find the page content. The simplest answer is to have another column in the database containing the ‘permanent link’ names, like what Wordpress uses, and look for those.

The final step is simply a matter of building up the database query. This will depend heavily on how you structured your database, however in all cases you’ll logically want to start with the page (the last part of the URI) and work up the category names to get the correct content. For example, if the URI was /directory1/directory2/directory3/pagename/ you’ll want to start off looking for the ‘pagename’ that is in ‘directory3′, which is a sub-directory of ‘directory2′, etc.

And that’s basically it. Despite what many people believe, it’s not that difficult or ‘new’ - like Ajax it merely uses features of established technologies that have been around for quite a while.

——–
* Everyone calls them ‘.htaccess’ because thats what the filename must be in order for it to work. However, in Linux a ‘.’ in front of the name means the file/directory is hidden. Thus, it’s not technically a part of the filename…

Posted in How To's - Coding

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.