28 Sep 03

Abbreviations, Acronyms, Initialisms

For some time now, HTML 4 and XHTML 1 have allowed authors to mark up abbreviations and acronyms using the and elements.

More recently, the practical task of converting abbreviations and acronyms to semantically correct markup has been by considerably eased by the introduction of Acrobot.

Despite this, most web developers (of the minority of developers who even know or care that these elements exist) use to mark up both acronyms and abbreviations.

They do this mainly because IE, the world's most dominant browser, doesn't support .

However, ignoring in favor of is wrong for two reasons. First, there is a difference between these two elements. Second, it is an important one.

The difference

Let's look at some definitions gathered via Dictionary.com:

Abbreviation
The form to which a word or phrase is reduced by contraction and omission; a letter or letters, standing for a word or phrase of which they are a part; as, Gen. for Genesis; U.S.A. for United States of America.
Acronym
A word formed from the initial letters of a name, such as WAC for Women's Army Corps, or by combining initial letters or parts of a series of words, such as radar for radio detecting and ranging.
Initialism
An abbreviation consisting of the first letter or letters of words in a phrase (for example, IRS for Internal Revenue Service), syllables or components of a word (TNT for trinitrotoluene), or a combination of words and syllables (ESP for extrasensory perception) and pronounced by spelling out the letters one by one rather than as a solid word.

While these definitions should leave little room for confusion, they are not totally unambiguous.

The following statements may add some clarity:

All acronyms are abbreviations, but all abbreviations are not acronyms.

Some abbreviations are initialisms.

In other words, acronyms are a subset of abbreviations.

To further clarify things, it can be helpful to consider the differences in pronunciation. An abbreviation that is formed from the first letter or letters of words in a phrase and pronounced letter by letter is an initialism. An abbreviation that is formed from the first letter or letters of words in a phrase and pronounced as a word is an acronym. Everything else is just an abbreviation. See Table 1 for a summary of this classification.

Table 1: Abbreviations, acronyms, and initialisms.
Type Formed Pronounced Examples
Abbreviation By contraction and omission of any given number of letters of a word or phrase. As a word or full phrase etc, inc, info, Mac
Acronym From the first letter or letters of words in a phrase. As a word NATO, UNESCO, Benelux, radar
Initialism From the first letter or letters of words in a phrase. Letter by letter HTML, IMF, TV, UN

While there are additional categories, such as letter acronyms, syllable acronyms and hybrid acronyms, the distinction abbreviation-acronym-initialism is enough for the purposes of marking up hypertext.

Why it matters

Using the correct markup is important for semantics and accessibility reasons. Most importantly, the only way to ensure full accessibility for blind and visually impaired users is to have markup that distinguishes between abbreviations, acronyms and initialisms.

This relates to the previously mentioned differences in pronunciation. As noted by the W3C:

...abbreviations and acronyms often have idiosyncratic pronunciations. For example, while "IRS" and "BBC" are typically pronounced letter by letter, "NATO" and "UNESCO" are pronounced phonetically. Still other abbreviated forms (e.g., "URI" and "SQL") are spelled out by some people and pronounced as words by other people. When necessary, authors should use style sheets to specify the pronunciation of an abbreviated form.

HTML 4.01 Specification: Structured text

To counter this argument, some authors point to the differences in interpretation and lack of standards in screen readers and visual browsers. This is true. However, as is often the case when trying to motivate semantics and accessibility features, it is important to consider not just what is, but what will be. While the current situation may seem hopeless, it is relatively safe to assume that future versions of screen readers and visual browsers will have better standards support.

Suffice it to say that if accessibility and semantics is an issue, then there is little compromise.

XHTML Examples and Comments

Initialism, first occurence on a page:

HTML

Initialism, subsequent occurence on a page:

HTML

Acronym, first occurence on a page:

GIF

Acronym, subsequent occurence on a page:

GIF

Note that initialisms are treated as the "default" type of abbreviation, while abbreviations that are neither acronyms nor initialisms are not marked up at all.

It would perhaps seem more logical to apply a class initialism to initialisms, as suggested by Meadowcroft. However, the approach suggested here is preferable because initialisms (e.g. HTML) are normally used more frequently than "truncated" abbreviations (e.g. info). If a class is needed, then applying it to the least frequently used instance will reduce file size (and labor for hand coders).

To understand why abbreviations that are neither acronyms nor initialisms are not marked up, we need to look at these in more detail. There are basically three commonly used types of such abbreviations:

  • Type I - Abbreviations that function as words by themselves or that have become actual words. Examples: info (information), Mac (Macintosh), fan (fanatic).
  • Type II - Abbreviations for phrases that are abbreviated from one language in print, but spoken out in another language in speech. Examples: "i.e.", which is abbreviated from the Latin phrase id est but read as "that is" in English, or "e.g.", short for the Latin phrase exempli gratia but read "for example".
  • Type III - Abbreviations for phrases that are abbreviated in print but spoken out in speech. Example: Mass. (read as Massachusetts).

Now let's look at the reasons for not marking these up, with a set of relevant questions:

Does the reader need more information?

For most Type I abbreviations, the answer is no. They are understandable to visual and screen readers without their full word or phrase. For this reason, no markup is needed.

Type II abbreviations are distinctly different. They do not work as standalone words, and their meaning cannot be logically inferred from their abbreviation. Here, the reader clearly needs more information. The next question then is:

Can this information be provided in other ways?

It depends. For visual readers the answer is no. No markup - no information. However, this is one case where screen readers may be at an advantage: it would seem that screen reader software could add that missing piece of information by simply storing a list of the most common abbreviations of this kind. In theory, at least. But the fact remains that visual readers will be stuck. On the other hand, most people do know what most of these abbreviations stand for. Ultimately, the easiest path wins here: no markup.

Type III is similar to Type II, but, arguably, somewhat trickier to recognize for both visual and screen readers. To guide the developer's work, a third question needs to be asked:

What are the trade-offs?

If it is absolutely necessary that visual and screen readers recognize all type III abbreviations, then they need to be marked up. For the same reason, the title should always be included. Screen readers, in particular, would have to recognize type III abbreviations to be able to read out their full title, so a class would also need to be applied as follows:

Abbreviation (type III), all occurences on a page:

etc

However, as the next section will detail, the CSS declaration that would be required has very limited browser support. Another aspect to weigh in is that these types of abbreviations seem to be rare. So, the benefits of marking up type III abbreviations seem to be negligible. In other words, we leave out the markup because the trade-offs are acceptable.

As for the difference between first and subsequent instances, some authors recommend that all instances of an abbreviation or acronym be marked up with the title attribute, while others prefer not to mark up subsequent instances at all.

The former seems unnecessary, since, normally, the full phrase is only needed the first time an abbreviation is encountered. This is also how abbreviations are treated in print. The latter takes away semantic meaning, and, in the extension, aural information, which is contrary to the idea of marking up abbreviations in the first place.

However, because the method suggested here will render first and subsequent abbreviations in the same way, visual readers who place the cursor over the abbreviation may be confused when they don't see the full phrase displayed as a tooltip. This could be especially true for readers who browse a text rather than read it from start to end. The next section details a fix for browsers with better support for web standards than what is currently offered by IE.

CSS Examples and Comments

Provided that you use the markup suggested in the previous section, and that you have made a sensible browser choice, the following CSS will help distinguish between first and subsequent instances of an abbreviation on the screen:

abbr, acronym {
   border: none;
} 

abbr[title], acronym[title] {
   border-bottom: 1px dotted #aaa; 
   cursor: help;
}

The first declaration applies to all and elements. Because the default visual rendering is a dotted bottom border, that border is removed.

The second declaration applies only to those and elements that have an attribute title. These are "restyled" to mimic the default rendering. Here a dotted bottom border is chosen because it has become the de facto standard to indicate abbreviations.

Finally, the default cursor is changed to "help", usually rendered as a question mark, so as to provide further indication that more information is available.

Looking at aural styling, the following declarations should do the trick for most screen readers:

abbr {
   speak: spell-out;
}

acronym {
   speak: normal;
}

The previous section advised against marking up "type III" abbreviations (such as Mass. for Massachusetts). For those who still wish to do so, the following CSS should do the trick, provided that the class name is "trunc" (thanks to Craig Saila for this):

@media aural { 
   abbr[class="trunc"] { content: attr(title); } 
 } 

The @media rule allows the author to include style sheet rules for various media in the same style sheet. If a separate aural style sheet is used instead, the above rule can be simplified to the following:

abbr[class="trunc"] { 
   content: attr(title);
 } 

This will replace the contents of all elements of class "trunc" with their title value. However, at the moment this applied directly on an element is only supported by certain Norwegian browsers.

Note: content: attr(title); applied on the element itself, and not on :before or :after is CSS3 (thanks to Anne van Kesteren for pointing this out).

Discussion

Apart from inconsistent browser support, there seems to be some disagreement and confusion regarding the definitions of abbreviation and acronym. Part of this confusion stems from differences in how reliable sources, such as dictionaries and academic work, define or explain these terms. This is unfortunate, but not something that should be taken as an excuse not to use semantically correct markup. As the above definitions should show, there is enough clarity to use proper markup.

As for when to use it, there is less clarity. Even W3C's Web Content Accessibility Guidelines are of little help:

Specify the expansion of each abbreviation or acronym in a document where it first occurs. [Priority 3]

Note that "Priority 3" is the lowest priority level that the W3C assigns to their recommendations. In other words, it is up to the author to decide what to do with subsequent occurrences of abbreviations and acronyms.

It is also unfortunate that the W3C have failed to provide a dedicated element for initialisms.

For further explanations of the differences between acronyms, abbreviations and initialisms, including guidelines and examples of how to style these tags with CSS, as well as what to do with IE, see Lloyd (2003), Korpela (2003), and Saila (2002).

The Future

The HTML 4.01 and XHTML 1.X specifications will not change.

Most modern browsers support HTML 4.01 well, while XHTML 1.X support is likely to improve further. It is therefore unlikely that browser support for the and elements will be adversely affected in any way in the near future.

As for the next version of IE, Microsoft has made it clear that it will only be available with their next OS, codenamed Longhorn. When this will be released, and what it will support, nobody knows.

Until then, at least, the element will continue to be unsupported by Internet Explorer.

XHTML 2 is a different story altogether. Although the specification is still a working draft, it would seem that the element will be dropped from the document type.

It should be noted, however, that XHTML 2 is lightyears away from achieving widespread support and use. It has even been argued whether or not it ever will or should replace XHTML 1, see e.g. Zeldman (2003) and Pilgrim (2003). In other words, any adverse effects that XHTML 2 may have on the element in the future can be safely ignored for now.

Conclusions

To ensure proper markup of abbreviations in hypertext documents, follow these guidelines:

  1. Understand the difference between abbreviations, acronyms, and initialisms.
  2. Use the element for acronyms only.
  3. Use the element for initialisms, or when in doubt.

These guidelines should be used in conjunction with proper visual and aural styling with CSS.

References

Articles and tools:

W3C specifications and guidelines:

5 Comments (skip to form)

  1. Anne van Kesteren

    I love it! Great article and so complete.

    Though I have one tiny suggestion. Add the keyword CSS3 by the rule: abbr{content:attr(title);}. Because some people don't like CSS errors.

  2. Lars Holst

    Thanks Anne.

    Yes, the @media rule is CSS3. I have added a clarification, and also included a CSS2 alternative for those using a separate aural style sheet.

    Thanks for pointing this out!

  3. Karl Dubost

    It's funny to see how much my discussion with Jacques Distler has travelled on many Weblogs:

    I have made another point to explain some of the internationalization problems and meaning:
    abbr versus acronym : la suite.

  4. Lars Holst

    Thanks for the input Karl. Actually, I hadn't seen Jaques Distler's post. I'm not sure if it's a case of "great minds think alike", or just bad research on my behalf. But thanks for pointing me to it.

    While I'm at it, I need to correct my previous comment:

    @media was introduced in CSS2. The issue pertaining to CSS3 is that applying attr() on the element itself, and not on :before
    or :after, is only possible in CSS3, see sections 4.2 and 11 of the relevant CSS3 module, and section 12.2 of the CSS2 specification.

    The post has been edited to reflect this. Thanks to Anne who pointed this out to me in an email, and provided me with the links.

  5. Anne van Kesteren

    Karl,

    My French ain't good, but I think I can understand what point you are trying to make. You are 'trying to create' a third group. Why don't you mix the 'sigle' within abbr and acronym?

    From my point of view it doesn't matter that some abbreviations are in the US for example marked up with abbr and in France with acronym.

Leave a Comment

Comment Information and Guidelines

  • Trackback URI for this post
  • Comments are the properties of their posters.
  • Email addresses will never be shown or shared with third parties.
  • Offensive, distasteful, and irrelevant comments will be deleted.
  • HTML is optional, but if you do use it, please make sure that:
    • markup is well-formed and valid XHTML 1.1
    • ampersands (&) are encoded as &
    • angle brackets (< and >) are encoded as < and >
    • HTML allowed (please close tags):


Organized by WordPress

Ingredients: XHTML 1.1 | CSS 2 | WP 1.5.1.3

Just add Firefox