The reverse solidus (backslash, \) in Web authoring

In the HTML language itself, there is nothing special with the reverse solidus character: it is a normal data character with no markup significance. But in Web authoring, different other languages and notations (e.g. JavaScript and URLs) are used too, and they are often combined with HTML and with each other. This may cause confusion, since the languages and notations may use the reverse solidus for different purposes. This document tries to clarify concepts and to suggest practical measures to avoid confusion.

  1. The reverse solidus character "\" is a normal data character as far as HTML itself is considered - nothing special about it, and no need to "escape" it in any manner.

  2. You can however use the numeric character reference \ to denote "\" in an HTML document (e.g. if your key for reverse solidus is broken, or you can't find a key or other way to enter the reverse solidus on a strange keyboard).

  3. Relatively often the reverse solidus as a character is confused with the solidus (slash) character "/". They are similar in shape, just slanted differently. But they are quite distinct characters and have different uses. In particular...

  4. It is a common error to use "\" instead of "/" in URLs, as in href="..\index.html" instead of the correct href="../index.html". Further confusion is caused by the phenomenon that some browsers accept the incorrect format too, so people complain that "my links work on IE but not on Netscape". This isn't about HTML itself however; it's about misunderstanding the relationship between URLs and filenames and their syntax; see my attempt to describe the URL syntax and semantics.

  5. Rather often people write script elements that contain JavaScript code like
    In HTML terms, the problem is that HTML parsers should recognize </table> as an end tag when processing the <script> element, even though it is syntactically not allowed in that context. After all, a parser must look for </script>, and for certain reasons it should look for any end tag. This has practical impact in validation mainly; see section B.3.2 Specifying non-HTML data in the HTML 4 specification. The use of "\" to solve the problem, as in
    is not really an HTML issue, though the problem was in a sense caused by HTML parsing rules.

    It's a matter of JavaScript rules for string literals that the character pair \/ is a valid notation for the "/" character. So what happens is that the HTML parser (typically, a routine in a browser) takes document.write('<\/table>'); as such (note that it does not see any end tag there - the essential point is that "<" and "/" are not consecutive here), passes it to a JavaScript interpreter (typically, a set of routines in a browser), which will, by virtue of JavaScript rules, takes it as equivalent to document.write('</table>'); and so what gets written is just </table>.

    Confusing, isn't it? So to avoid the confusion, consider putting JavaScript code to an external file and referring to it via <script src="URL"></script>. Naturally, in such an external file, "</" in a string causes no problems, since the file is not processed by an HTML parser, only by a JavaScript interpreter. (The use of external JavaScript files caused some problems on very old browsers like IE 3, but it is in many ways better particularly for bulky code.)

  6. When an HTML document is generated by a Perl (or C or some other language) program, the reverse solidus is frequently used, since it has special meanings in Perl (and in other programming languages). For example, the program might contain
    print "<h1>Hello</h1>\n";
    There's nothing special about it. The notation \n is the Perl (and C) way of specifying an end-of-line character; what gets written into the HTML document is the string <h1>Hello</h1> followed by an end of line. But some confusion can be caused when people present their problems with HTML by quoting the Perl code for generating it, rather than the actual HTML document!

  7. Some Web browsers may incorrectly split lines between characters. In particular, IE may even split "a-b" as "a-" and "b". And such problems include a possible line split between the solidus and the reverse solidus in "/\"; the workaround is to use nonstandard nobr markup around them: <nobr>/\</nobr> It's a rare combination of characters of course. But I suspect that IE might split a line before or after reverse solidus in some other contexts too.

Date of creation: 2000-08-22. Last modification: 2006-04-18.

Jukka Korpela