Image captions on Web pages
(HTML and CSS techniques)


This document suggests three ways of presenting an image with a caption in HTML. Styling in CSS is also discussed.

Summary: three methods

Sadly enough, there is no markup for image captions in HTML, unless you count the figcaption element in HTML5 proposals. What comes closest to semantically associating some text content with some image is putting them into a table so that the image is in one cell and the text is either in another cell or in a caption element. Then there’s the “semantically empty” approach, which is better than semantically wrong (such as suggestions to use definition list markup).

There are two basic ways to use a table for an image and its caption, so as a whole, we have three alternative methods:

The two-cell approach

(A picture of a Dalmatian)
A Dalmatian
<table class="image">
<tr><td><img ...></td></tr>
<tr><td class="caption">caption text</td></tr>

This approach generates by default (i.e. if you don’t use style sheets or additional attributes to affect the rendering) a presentation that is illustrated on the right. The image and the caption text are in two cells of a one-column table. The markup above assigns a class, caption, for the caption text cell, but it’s there just to make styling easier. The same applies to class image assigned to the table. There is nothing magic in class names in HTML and CSS; they are just names chosen by an author as he finds convenient and hopefully descriptive to anyone who reads the code.

By default, text in a table cell (td element) is left-aligned, but you can change this by using e.g. align="center" in the td tag, or in CSS (e.g., td.caption { text-align: center).

A table is normally left-aligned by default and appears with no other content on either side of it. You can affect this using a align attribute in the table or, more flexibly, using CSS. It might be a good idea to set just some left margin for the table, using e.g. the CSS code table.image { margin-left: 2em }.

A single-cell table with caption element

A Dalmatian
(A picture of a Dalmatian)
<table class="image">
<caption align="bottom">caption text</caption>
<tr><td><img ...></td></tr>

This approach is similar to the first one, but instead of putting the caption text into a cell, you put it inside a caption element. It is by definition a caption for the entire table, but in this case, the table has but one cell, containing the image.

By default, the caption would appear above the image, but the attribute align="bottom" puts it below the image. You could do the same in CSS using table.image caption { caption-side: bottom; }, but this is poorly supported: no support in Internet Explorer.

If you wish to affect the horizontal alignment in a caption element, use the text-align property in CSS. For example, <caption align="bottom" style="text-align: left">.

Using div elements

(A picture of a Dalmatian)
A Dalmatian
<div class="image">
<img ...>
<div>caption text</div>

This is the simplest method, using just div markup. The inner div element is used for two reasons: to make the caption text appear on a line of its own, and to make it an element, so that it can be referred to in CSS (using a selector like div.image div).

It might be argued that it is even simpler to omit the inner div markup and use just <br> to create a line break between the image and the caption. Even the outer div markup could be omitted on similar grounds. However, the markup presented here is the simplest reasonable alternative. The use of div makes it possible to treat the caption text and the combination of the image and caption as styleable elements.

A div element has no top or bottom margin by default. You can change this in CSS. For example, div.image { margin: 1em 0; } would set a top and bottom margin of 1em. On the other hand, the construct is often preceded by an element that has a bottom margin, or followed by an element that has a top margin, such as a paragraph or a heading, so it does not need margins of its own.

The caption text is left-aligned by default. This can be changed in different ways, but note that if you use align="center" for the inner div, the text will be horizontally centered within the available space, not with respect to the image.

Notes on styling

These three approaches give a tolerable rendering in non-CSS situations (showing the caption under the image), and they are each a relatively good starting point for styling. When using a table, you need to consider cell spacing and cell padding, which are by default nonzero. But there wouldn’t be strange browser idiosyncrasies to worry about. The rest really depends on the desired appearance as well as the properties of the image and the text.

The font in caption texts

(A picture of a Dalmatian)
A Dalmatian

Typically we’d probably want to set caption text size to a bit smaller than copy text, and maybe the font face to something different too, and we might wish to center the text (though this may depend on its length). In the first approach you could use the following:

.image .caption { font-size: 80%;
                  font-family: Verdana, Arial, sans-serif;
                  text-align: center; }

In the two other approaches, you would replace .image .caption by .image caption or .image div, respectively.

Wrapping long captions

For long caption texts, you need to decide whether they should wrap according to the width of the image or be set to some other width. It’s probably best to make the width the same as that of the image or (for narrow images) just a little wider.

By default, browsers handle the second approach (using a table and a caption element) so that the the text is wrapped to the same width as the image. This is because they determine the width of the table according to the cell containing the image. If you wish to make sure of this, you could explicitly set the table width to the same as the image width. In the first approach, you would need to be explicit about the table width, either in CSS or in HTML.

A Dalmatian dog. Drawing by Liisa Sarakontu.
(A picture of a Dalmatian)

In the above example, the caption element has grey background to illustrate that it extends a bit to the left and to the right of the image width. This is usually not serious when the text there is centered. The phenomenon is caused by default cell padding and cell spacing that browsers apply when rendering a table. If it becomes a problem, you can fix it in HTML by setting cellspacing="0" cellpadding="0" in the table element or in CSS by setting table { border-collapse: collapse; } td { padding: 0; }.

In the third approach, the caption text by default uses the available width. The reason is that the width of a div element by default extends across the available width.

(A picture of a Dalmatian)
A Dalmatian dog. Drawing by Liisa Sarakontu.

You could change the appearance by explicitly setting the width of the outer div element, e.g. <div class="image" style="width:200px">. Using a style attribute is a practical choice here, since the width needs to depend on the specific image that appears inside the element.

Of course, in many cases you could meaningfully use explicit line breaks (with <br>) markup inside the caption text, especially if the text has fairly separate parts. For example, you could write <div>A Dalmatian dog.<br><small>Drawing by Liisa Sarakontu.</small></div>.

(A picture of a Dalmatian)
A Dalmatian dog.
Drawing by Liisa Sarakontu.


As described above, the caption text can be centered relative to the image by setting a width the text and using align="center" (HTML) or text-align: center (CSS) for it.

On the other hand, if you wish to center the image and its caption as a whole horizontally, then you can simply use align="center" in the table tag, if you are using one of the table approaches. In the div approach, you would use CSS. You could use CSS in the table approach too, of course. Note that centering tables and other blocks is surprisingly problematic. Many constructs that might be expected to center a block will actually center each line instead, depending on browser. Please refer to the excellent treatises by Nick Theodorakis: Centering tables and Centering blocks with CSS.

The following example shows an image as centered so that the caption under it is left-aligned to the left edge of the image. A simple way to achieve this is to use the two-cell table approach, with align="center" for the table element and with the alignment of cells (td) defaulted to align="left".

(A picture of a Dalmatian)
A Dalmatian dog.
Drawing by Liisa Sarakontu.

Floating the image and the caption

(A picture of a Dalmatian) Using the align attribute in an img element, you can float an image so that appears on the right or on the left of some text, so that the text flows on the other side of the image. You can use a more modern approach as well, the float property in CSS. It’s more logically named as well, since this is really not about alignment but about floating. Moreover, you should usually set some left margin for an image floated on the right (and right margin for an image floated on the left), and CSS is the only way to do this reasonably. Thus, a simple way to float an image would be to use the attribute style="float: right; margin-left: 0.5em" in an img tag.

(A picture of a Dalmatian)
A Dalmatian

It is almost as easy to do the same when the image has a caption. Actually, such techniques were already used previously on this page, In the table-based approaches, you can just use align="right" in the table, or float: right for it in CSS. In the third approach, it is clearly best to use the CSS method, since there is no direct way to float a div in HTML. Here, too, CSS is the way to set a margin so that text does not come too close to the image.

To end floating, you can either use <br clear="all"> in HTML or clear: both in CSS (for the first element that should appear with no floating elements on either side).

Fluid galleries

If you have a set of images and you would like to present them as a collection on one page so that there are several images side by side, there are several approaches.

A common approch is to use a table, with images in one row, captions in another, then more images in a third row, etc. This approach does not linearize well, since when processed rowwise, the connection between images and captions is lost. But more importantly, it requires a fixed layout, with a fixed amount of images in one row. This means that the page requires a minimum width to be viewed without horizontal scrolling, and on the other hand it does not utilize the full available width in a wide window.

The goal here is to make an image gallery adapt to the available width. For simplicity, let’s assume that the images are of equal size.

In the simplest case, you could just write img elements in succession. A browser will then present the images so that it puts as many images side by side as fits to the available width. In effect, a browser treats img elements as big letters and processes a string of images as if it were text consisting of such letters. The following string of identical images illustrates this.

ornament ornament ornament ornament ornament ornament ornament ornament

I use a space between the img elements in HTML source. This tends to cause some spacing between the images on common browsers. Whether this is correct is debatable. In any case, if you don’t want any spacing, don’t leave those spaces or line breaks between img elements. Instead, you can put line breaks e.g. after the element name img before the attributes, where they cause no effect. And if exact spacing is important, do the same and use CSS properties to suggest specific margin or padding.

If we wish to put captions under the images, things become more complicated, but not much. We can float the elements that contain an image and its caption. We would use the methods discussed above, except that we float to the left, using float: left in CSS or align="left" in HTML for a table.

We probably want to have some spacing in the gallery. A simple way is to put some margin on the right and below each image. For this, we can wrap the elements inside a div element with some class, say class="gallery", and use CSS code like the following:

.gallery table { float: left;
                 margin: 0 5px 20px 0; }

This leaves a 5 pixel space on the right of each image and 20px space below each each image:

Remember to stop floating after the gallery, using the techniques mentioned above.

If the caption texts vary essentially in length, you need to consider how to make their boxes equal in size in rendering. This usually requires you to guess a reasonable height for the boxes. Moreover, to make the texts vertically aligned to the top (that is, the bottom of the image), it is simplest to use the two-cell table approach. In that case, you can simply use valign="top" (in HTML) or vertical-align: top (in CSS) for the cells. In the next example, the height of caption cells has been set to 4em.

Captions and accessibility

A caption should not be confused with an alt attribute, which specifies the textual alternative to be presented in place of the image, when the image itself is not presented (e.g., on a text-only browser). Neither of these should be confused with the title attribute, which specifies an “advisory title” for an element, typically implemented as a tooltip that is displayed when the pointer is moved over the element.

If an image is purely decorative or just visualizes something that has been said in the text, it is appropriate to use an empty alternate text, alt="". In that case, when accessing the page without images, the page would appear as if the image were not there at all. This however creates problems if the image has a caption. The caption text would appear on its own, leaving the user in confusion: what does this relate to? Thus, in such cases, it might be suitable to include the caption text into the image itself, using image processing software.

Normally, on the other hand, if an image has a caption, it is probably a content image and the caption text just describes what the image is about, instead of conveying its full message. Then the odds are that it would be better to have the caption read first, giving those users who have some way of accessing images (maybe the user is just surfing with images disabled?) a basis for deciding whether to try to access this particular image. The easiest way to achieve this (and still make the caption appear below the image in visual rendering) is to use the method of a single-cell table and a caption element with align="bottom".

Unfortunately, there’s no way to suppress a caption in non-visual rendering except by making the caption part of the image. For example, if your page contains some article that tells about some meeting and is illustrated by a photo of the meeting, with a caption, then both the photo and the caption should probably be omitted in non-visual rendering. In that case it’s probably the least of evils to use a short alt text like "(photo of the meeting)". Putting the caption into the image itself might not be practical enough, and besides, it might be relevant to the user to know that an image is available even if cannot (for now) see the image.

For some additional notes, see section When an image says more than a thousand words in Guidelines on alt texts in img elements.

Why not dl markup?

For some odd reason, the suggestion to use dl (Definition List) markup pops up fairly often. Logically, it makes no sense; such markup should be reserved for genuine definitions of terms, as discussed in Definition: a definition and an analysis. Presentationally, it creates a rendering that is rather poor, as shown below. Although it might be possible to tune the rendering using CSS, this would be more difficult and less reliable than styling simple div elements.

The default rendering of
<dl> <dt> <img ...> </dt> <dd>caption text</dd> </dl>
on your current browser is the following:

(A picture of a Dalmatian)
caption text

The reason why browsers render the construct that way has nothing to do with images or captions. They render a dl element so that the dt elements are indented somewhat and the dd elements are indented even more, and each of those elements starts on a new line:

a word or expression that has a precise meaning in some uses or is peculiar to a science, art, profession, or subject
terms of a particular subject area; (study of) proper ways of creating and using terms

So this is why the caption text gets indented relative to the image. Such indentation is generally not suitable, since normally captions should be either left-aligned or centered with respect to the image. But if desired, the indentation can be achieved very simply, and with a controlled amount of indentation, in the approaches described above, e.g. by setting a left margin for the caption.

If a speech-based browser implemented a dl element according to its defined semantics (ignoring any examples in the specification that contradict that), it would be natural to read
as follows: “Definition list. Term: xxx. Definition data: yyy. End of definition list.” Current browsers probably don’t do that, but would you really like to fear that some browsers start behaving by the specs? (Maybe there is no fear, because the HTML5 drafts effectively turn dl to a list of paired items with no real semantics.)

Using a definition list with a single dt element and a single dd element inside would be semantically odd. A list can have just one element, though it’s a rather pathetic list and makes sense in special case only. But this is not the main point. The point is that neither an image nor its caption is a term being defined. Well, except in a very special example like the following, which illustrates the absurdity of using dl markup for normal combinations of an image and its caption:

<dl><dt><img alt="mass" src="mass.gif"></dt>
    <dd>a fundamental property of matter</dd></dl>
Here mass.gif would refer to an image that consists of the word “mass” in some appearance.

The dl element is in practice just a visual layout trick, and a coarse and unreliable trick at that. Quite often the layout would not even be suitable but needs tedious styling. Besides, the dl is more difficult to style than most elements, since its default rendering is complicated and hard to describe, and there are quirks in CSS implementations that make the styling even harder.

The HTML5 figure and figcaption markup

According to HTML drafts, figure markup can be used as a container for an illustration (such as one or more images), with figcaption element inside it giving a caption for the image or images. This means markup like the following:

  <img src="..." alt="...">
  <figcaption>caption text</figcaption>

This would solve the problem of semantic association between captions and images, if supported by relevant software. It remains to be seen whether search engines will recognize such associations.

For minimally acceptable rendering, you currently need at least the following (the script is needed for IE up to and including IE 8 to make it recognize the markup at all):

figure, figcaption { display: block; }

You should probably also add some top and bottom margin for figure and also some left margin. HTML5 drafts suggest a left margin of 40px, but this is currently not what browsers usually do. So you should explicitly specify the left margin you want.

In order to have the caption rendered e.g. below the image in a box that is as wide as the image, it is probably best to use a small script on the page. The script can traverse the figure elements on the page and set the width of such an element equal to the img element contained in it, if there is just one img element there. Similar techniques can, of course, be also applied when some other markup is used for image captions.

(A picture of a Dalmatian)
This is caption text for the image, to be rendered inside a rectangle as wide as the image.

For comparison

For a different view on image captions, see CSS: figures & captions by Bert Bos. I don’t see any reason to use paragraph (p) markup in a simple structure consisting of an image and its caption. But if you use it, note that paragraphs typically have default rendering that involves top and bottom margins, though they might be suppressed if the paragraph is inside a table cell.

See also Scalable Figures and Captions with CSS and HTML by Robert J. O’Hara. It discusses, among other things, the distinction between a legend (extended prose) and a caption (a descriptive word or phrase only). Both are treated as captions in my document, but it is useful to note that there can be different “captions” that should be styled differently.

Technically, it is possible to include a caption into the image itself using a suitable graphics program. Although that’s a simple approach and although many programs generate such images automatically, it has essential drawbacks. Text that has been “burned” into an image is not directly accessible as text to programs, and its font cannot be changed the same was as normal text font can. If you need to change the text, you need to manipulate the image instead of simple text editing. Thus, if you wish to use the image in documents in different languages, things get awkward. Moreover, e.g. Google image search is based on searching for images using keywords, and Google associates words with images by their appearance near to each other (and in some other ways). A caption text embedded into an image itself is of course not accessible to Google, but if the caption text appears as real text right after the image, Google may find the image when someone searches with words that appear in the caption.

Yet another approach is to wrap an image and its caption in a container and declare it as an inline block, using display: inline-block as described in the CSS 2.1 draft. This approach would have some rather nice features especially in fluid galleries, but unfortunately browser support is still too small. In particular, IE has some bugs (e.g., the default width is 100% if the container is block-level in the HTML sense) and Firefox 2 lacks support. In some years, though, this might become a feasible alternative.