Customized 404 messages, especially on Apache
(and a few notes about other error codes)

Why?

A Web author may wish to "customize" error messages that are sent to the browser (or other client) when a requested resource is not found. Normally the server in such cases just sends a general HTTP error code 404 (conventionally called and displayed as "Not Found"), and the browser then takes a general action that it applies in all such cases. So there is nothing site-specific. And there might be a reason why an author wishes to make something site-specific to happen.

Perhaps there have been lots of URLs that referred to the site but have now become non-functional due to a site rearrangement. That would be a gross mistake; cool URLs don't change. If you've done such a mistake, you should probably check what other mistakes you've done; see the alertboxes The Top Ten Mistakes of Web Design and The Top Ten New Mistakes of Web Design to avoid the worst problems in future. But let's assume that the mistake has been made, and it's impossible to use redirection or other mechanisms to fix things nicely. Or let's assume that you have some other reason, such as directing people using wrong URLs to a search page of yours so that they might find what they are looking for. Or you might have read Nielsen's alertbox Improving the Dreaded 404 Error Message which presents some good arguments in favor of doing something about that problem of poor default error messages.

Make it better than the default!

There's little point in creating an error page that is less informative than the default error page. In fact, your error page should be better than or at least as good as the default error page, for all users in all situations.

In particular, include an explanation in English, even if you also have an explanation in some other language. You might think that if all your pages are in, say, Estonian, only people who can read Estonian will try to access them. But on the Internet, virtually anything can happen. For example, someone might follow a casual link here or just mistype an address he read in a newpaper, turning the real address into something that refers to your server. If you had no customized error page, the user would probably see a default error page (sent by your server or shown by his browser) in English, or in some language he knows. So if you prevent that, make sure that there's something in English too on your error page.

However, putting long explanations in two or more languages into one page makes it rather big, and potentially confusing. Consider using language negotiation, so that each user gets a monolingual page, in his preferred language, if his browser settings correspond to his preferences. More information: Techniques for multilingual Web sites.

Make it very evident that an error page is an error page, not an odd-looking content page. The very first words should say that explicitly. Note that e.g. blind people will experience the page sequentially, so the sooner they hear that there's an error situation, the better. If you include something funny, put it after the simple explanation. It is not necessary to repeat the usual jargon "404 Not Found", if you can make things clear by other means, but it hardly hurts to use that expression, since so many people are familiar with it.

Nielsen's alertbox about error messages in general, Error Message Guidelines, is very useful when designing error pages, too.

The technical basics

At the general protocol level, the idea behind 404 customization is that when a server sends a 404 Not Found error code, it may, and indeed should, also send a document (normally, an HTML document) which explains the situation, an error document . Browsers and other user agents are not expected to treat that document as corresponding to the URL used but as explaining why there is no document corresponding to it. Typically a server sends by default a very generic error document like the following:

Not Found

The requested URL /foo was not found on this server.

Apache/1.3.9 Server at www.hut.fi Port 80

But server software and its settings may let an author affect what is sent, for example so than an author-specified error document is sent for all URLs that refer to his directory. (In principle, we should refer to URLs with a specific prefix, not directories; but typically servers map the path part of a URL to a path name in a file system.)

To see such a setting in action, try using a URL like http://www.cs.tut.fi/~jkorpela/asdfg which does not refer to any resource. Instead, the server sends an error code and an error document to the browser.

In Microsoft Internet Explorer 5, there is a feature which causes a customized error message to be suppressed in favor of the browser's default message, if the customized error document is "too short". It has been reported that the limit appears to be 512 bytes. Although a value of 1024 bytes has also been mentioned, a test on IE 7 shows that "too short" means "512 bytes or less". The limit might be changeable by the user via registry settings, and the entire feature can be disabled, but it is unrealistic to expect most users to know such things. Thus, it is advisable to make sure that your error document is at least 513 characters long (counted by the number of characters in the HTML source).

Whether and how "customized error messages" are possible depends on the server software and its settings. It's a server issue, and HTML markup is not involved (except that the customized error document is usually an HTML document, of course).

After finding out what software your server runs (you might use e.g. Delorie's HTTP Header Viewer for the purpose; give it some URL referring to something on your server), see the list of links to documentation of different servers by WebServer Compare to find documentation of server software, and contact your local server administration (webmaster@server) to ask about applicable settings if needed.


Doing it on Apache

In the Apache server software, which is rather widely used (and often imitated by other software), you can use the ErrorDocument directive. This means the following:

  1. You would first create an error document (a normal HTML document containing whatever you wish to put there) under a name of your choice, say notfound.html, into the directory where your Web documents are.
  2. Then you would create a plain text file named .htaccess (note the leading period) and put the following into it:
    ErrorDocument 404 /~jkorpela/notfound.html
    Naturally you would replace /~jkorpela/notfound.html by the address of your own customized error document. The address consists of whatever follows the server name (like www.cs.tut.fi) in the full URL of the error document when referred to directly (in this case, http://www.cs.tut.fi/~jkorpela/notfound.html).

Note that the address to be used is not relative to your own directory (a plain notfound.html wouldn't do) but to the server root. Technically you could also use a full absolute URL like (in my case) http://www.cs.tut.fi/~jkorpela/notfound.html but don't use absolute URLs here, since they make Apache send a wrong return code, namely 302 (Found), together with a redirection to the address specified in the ErrorDocument directive. This is all wrong, since it would indicate that the original URL works and the requested resource exists but temporarily resides under a different URL! This would cause quite a lot of confusion among users, search engines, etc. (The information related to 404 issues on MSN TV (previously WebTV) pages even recommends such a method; a previous version of WebTV's page about it called it "gentle deceit", but it is far from gentle!)

For example, suppose you have created a Web page and you wish to use a link checker (such as the W3C Link Checker) to verify that all of your links work in some technical sense at least. If you have actually mistyped a URL in link, the link checker will note a status code of 404 as an error, so that you can fix the problem. But if the status code is 302, there is no error to be reported; the checker could at most issue an informative message about redirection.

If you create your Web documents e.g. on a PC running Windows and separately upload your Web documents onto a server, you may find it difficult to create a file named .htaccess on Windows. Well, you could name it, say, access.txt and rename when uploading; for example, typical FTP programs let you specify a different name for the destination (e.g. put access.txt .htaccess would work in a simple command-based FTP program like DOS FTP).

The customization applies to subdirectories too. On the other hand you can create per-subdirectory customization too, overriding the customization for the parent directory. For example, I have made such customization for my test directory due to its special nature, so an incorrect URL like http://www.cs.tut.fi/~jkorpela/test/asdfg will cause a different error document to be sent.

What about other errors, like 401?

Other error conditions can be handled in similar or analogous ways. As regards to Apache, the documentation of the ErrorDocument directive contains a few examples like
ErrorDocument 401 /subscription_info.html
which might be a good idea for a paid-access directory: if you don't give a correct username/password combination, you'll see a customized error message page which gives you subscription information and hopefully also information that tells you what you should do if you are a paying customer who has forgotten the password. The following URL is for a trivial demo:

http://www.cs.tut.fi/~jkorpela/hidden/

It depends on the browser whether and how it prompts for a username and password anew automatically when it has got the 401 response from the server, instead of displaying the message, but the user hopefully knows how to abort that process and make the browser show the message.

A few ideas:

Generally, when considering customization of error messages, read carefully the HTTP status code definitions. You might expect that some error occurs in certain conditions but it might arise otherwise too, and a customized message could then be misleading, or plain wrong.