Unicode spaces

This little document lists the various space characters in Unicode. For a description, consult chapter 6 Writing Systems and Punctuation and block description General Punctuation in the Unicode standard.

The third column of the following table shows the appearance of the space character, in the sense that the cell contains the words “foo” and “bar” in bordered boxes separated by that character. It is possible that your browser does not present all the space characters properly. This depends on the font used, on the browser, and on the fonts available in the system.

Space characters in Unicode
Code Name of the character Sample Width of the character
U+0020 SPACE foo bar Depends on font, typically 1/4 em, often adjusted
U+00A0 NO-BREAK SPACE foo bar As a space, but often not adjusted
U+1680 OGHAM SPACE MARK foobar Unspecified; usually not really a space but a dash
U+180E MONGOLIAN VOWEL SEPARATOR foobar No width
U+2000 EN QUAD foo bar 1 en (= 1/2 em)
U+2001 EM QUAD foobar 1 em (nominally, the height of the font)
U+2002 EN SPACE foobar 1 en (= 1/2 em)
U+2003 EM SPACE foobar 1 em
U+2004 THREE-PER-EM SPACE foobar 1/3 em
U+2005 FOUR-PER-EM SPACE foobar 1/4 em
U+2006 SIX-PER-EM SPACE foobar 1/6 em
U+2007 FIGURE SPACE foobar “Tabular width”, the width of digits
U+2008 PUNCTUATION SPACE foobar The width of a period “.”
U+2009 THIN SPACE foobar 1/5 em (or sometimes 1/6 em)
U+200A HAIR SPACE foobar Narrower than THIN SPACE
U+200B ZERO WIDTH SPACE foobar Nominally no width, but may expand
U+202F NARROW NO-BREAK SPACE foobar Narrower than NO-BREAK SPACE (or SPACE)
U+205F MEDIUM MATHEMATICAL SPACE foobar 4/18 em
U+3000 IDEOGRAPHIC SPACE foo bar The width of ideographic (CJK) characters.
U+FEFF ZERO WIDTH NO-BREAK SPACE foobar No width (the character is invisible)

In the Unicode standard, U+200B and U+FEFF are not included in the table of space characters, as they have no width and are not supposed to have any visible glyph.

Notes on browser support

In general, web browsers and other programs should not be expected to render all space characters according to their definitions or descriptions. If the font used for text does not contain a glyph for a space character, a symbol of missing glyph may appear instead.

In my original tests, on Windows XP with IE 6, using Times New Roman or Arial as the font, the page showed only the first two (space, no-break space) and the last two (ideographic space, zero-width no-break space) correctly, presenting the other space characters as small rectangles that indicate unrepresentable glyphs or as spaces of incorrect width. When Arial Unicode MS is used, all but the narrow no-break space (which was added to Unicode in version 3.0) are shown correctly. (Newer versions of Arial, but not Arial Unicode MS, contain the narrow no-break space.) Code2000 shows all the space characters correctly, but it has not been installed on most computers.

More modern browsers can usually find a glyph for a character if some of the fonts in the system contain it. This does not always take place, however, especially on IE. See Guide to using special characters in HTML.

The use of various space characters of specific width, such as THIN SPACE, is generally risky. Consider using other methods, such as the features of a text processing program or (on Web pages) CSS properties like padding, margin, word-spacing, and letter-spacing.

Width adjustments

In text processing, Web page display, and other contexts, space characters are often “adjustable” in the sense that they are presented in different widths, especially to satisfy justification requirements.

No-break spaces, on the other hand, are defined in Unicode as having the same width as spaces, and this corresponds to common practice. But they are often treated as having fixed width (in each font), which means that in adjusted text, spaces and no-break spaces have different effects.

On web browsers, no-break spaces tended to be non-adjustable, but some modern browsers stretch them on justification. Within justified text on web pages, authors have used no-break spaces instead of normal spaces to prevent stretching (e.g., as in 5&nbsp;m instead of 5 m). Due to changes in browser behavior, it is better to use fixed-width spaces instead. Among then, the four-per-em space (e.g., as in 5&#x2005;m) usually best corresponds to the width of a normal unstretched space. However, the fixed-width spaces act as normal spaces in line breaking, so you may wish to use some technique to prevent undesired line breaks (e.g., e.g., as in <nobr>5&#x2005;m</nobr>).

The Unicode standard describes the adjustment process and the intended role of specific-width space characterss as follows:

The fixed-width space characters (U+2000..U+200A) are derived from conventional (hot lead) typography. Algorithmic kerning and justification in computerized ty­pog­ra­phy do not use these characters. However, where they are used, as, for example, in typesetting mathematical formulae, their width is generally font-specified, and they typically do not expand during justification. The exception is U+2009 THIN SPACE, which sometimes gets adjusted.

The ZERO-WIDTH SPACE character has nominally no width, but it too may be expanded during justification.

The EM QUAD character is canonical equivalent to EM SPACE. The intended difference seems to be in the code chart note for the latter: “may scale by the condensation factor of a font”. However, there is no such note for EN SPACE to make it any different from EN QUAD.

Other notes

The MEDIUM MATHEMATICAL SPACE character was added in Unicode version 4.0.

Regarding the non-breaking property of no-break space and other characters, see Unicode line breaking rules: explanations and criticism.

See also Microsoft’s Space Characters Design Standards. It explicitly says: “In digital fonts there are only two kinds of space characters supported by most computers, the space and the no-break space.”

Alan Wood’s excellent Unicode resources contain a page on the General Punctuation block, with widths of space characters illustrated graphically.

See also: Styling spaces in CSS.

Demonstration

This paragraph is here for demonstration purposes only, and it contains normal SPACE characters between words.

This paragraph is here for demonstration purposes only, and it contains SIX-PER EM SPACE characters instead of normal SPACE characters between words.

Visible spaces

There are some graphic characters that can be used a symbols for a space. Though sometimes called visible spaces, they are not spaces at all but visible notations used to indicate the appearance of spaces in instruction manuals and descriptions of texts.

The following table lists some symbols, in decreasing order by practical usefulness. Their shapes vary by font; especially the last one varies a lot.

U+2423 OPEN BOX
U+2422 BLANK SYMBOL
U+2420 SYMBOL FOR SPACE