RFC 822 has been superseded by RFC 2822. The changes are probably small, but I haven't yet checked whether they affect the content of this document.
The RFC which defines the Internet E-mail ("electronic mail") address is RFC 822, titled Standard for the format of ARPA Internet text messages, is one of the oldest and most fundamental Internet standards (registered as STD 11). This document explains the address format defined in section 6 Addressing, as officially modified by clauses 5.2.15 and 5.2.16 of RFC 1123.
address = mailbox ; one addressee
/ group ; named list
Legend: The solidus / indicates alternative (or). In definitions like this,
taken verbatim from RFC 822, the semicolon (;) opens
an explanatory statement (comment) which is not part of the formal definition.
Normally an address is a mailbox, or a simple address. It could also be a group specification, though this possibility is rarely used. Note that the word mailbox is very often used (outside RFC 822) to refer to a file to which a system's E-mail software appends any incoming E-mail sent to an address (normally, a user). So there's a connection, but in RFC 822, mailbox is a syntactic and logical term which identifies a recipient rather than a store or a set of messages.
group = phrase ":" [#mailbox] ";"
Legend: The brackets [] indicate optionality, i.e. the parts enclosed in them
can be present, or can be omitted. Quotation marks surround literals which
must appear exactly as written (without the quotation marks of course).
The number sign # is a prefix that indicates that the construct following
it may be repeated any number of times,
using commas as delimiters; thus, #mailbox means any number (>0) of mailboxes
separated by commas.
This would allow an address like
foo:a@b.example,c@d.example,e@f.example;
But I don't think I've ever seen it used - perhaps I just didn't notice.
Mailing lists are more commonly used, and a mailing list address could appear
as syntactically just one address (mailbox).
mailbox = addr-spec ; simple address
/ [phrase] route-addr ; name & addr-spec
route-addr = "<" [route] addr-spec ">"
A mailbox can be just an address specification (addr-spec), but it could
also be such a specification enclosed between
"<" and ">",
in which case it can be specified by a comment-like phrase,
such as the user's real name. The syntax also allows a route specification
in the latter case, but this is rarely used nowadays.
Note that there are two ways to add information like a real name
(say Jukka Korpela)
to an address (say jkorpela@cc.hut.fi):
Jukka Korpela <jkorpela@cc.hut.fi><" and ">", and all the rest is like a comment,
to be ignored by the E-mail handling software (though it usually displays it,
for human readers to see)
jkorpela@cc.hut.fi (Jukka Korpela) route = 1#("@" domain) ":" ; path-relative
Legend: When the number sign # prefix is preceded by a number, it indicates that the construct following it may be repeated any number of times, using commas as delimiters, but must occur at least once. Parentheses indicate just grouping here and must not occur in the actual data.
addr-spec = local-part "@" domain ; global address
This is what "Internet E-mail address" normally means. If you are asked to tell your E-mail address, this is what people want you to tell; they may add some comment-like stuff to it when they use it e.g. to send E-mail to you.
local-part = word *("." word) ; uninterpreted
; case-preserved
Legend: The asterisk * prefix indicates that the construct following
it may be occur any number of times (but need not occur at all).
Thus, local-part is a sequence of one or more words separated with
full stops (dots, periods), such as jkoo
or "Jukka Korpela"
or Jukka.Korpela
or just.an.example.you.know. (As explained below,
the use quotation marks turns anything to a word, in the syntactic sense
that is relevant here.)
RFC discusses the meaning of a local-part as follows:
The local-part of an addr-spec in a mailbox specification (i.e., the host's name for the mailbox) is understood to be whatever the receiving mail protocol server allows. For example, some systems do not understand mailbox references of the form "P. D. Q. Bach", but others do.
This specification treats periods (".") as lexical separators. Hence, their presence in local-parts which are not quoted-strings, is detected. However, such occurrences carry no semantics. That is, if a local-part has periods within it, an address parser will divide the local-part into several tokens, but the sequence of tokens will be treated as one uninterpreted unit. The sequence will be re-assembled, when the address is passed outside of the system such as to a mail protocol service.
Within a domain, local-parts with periods are often used and processed in a uniform
way, e.g. using firstname.lastname structure.
The point in the text quoted above is that all such conventions depend on the
local arrangements, and E-mail processing software just passes the local-part as such
to the recipient system. The sender's software has no way of knowing what the
recipient system will do with the local-part.
And
an official amendment to RFC 822 clarifies:
A host that is forwarding the message but is not the destination host implied by the right-hand side "domain" must not interpret or modify the "local-part" of the address.
domain = sub-domain *("." sub-domain)
sub-domain = domain-ref / domain-literal
domain-ref = atom ; symbolic reference
Thus, syntactically, domain is a sequence of one or more words separated with
full stops (dots, periods), such as foo.bar.zap.example
or cc.hut.fi or hut.fi.
Domain-literals will not be discussed here. They allow
a domain be specified by its numeric (IP) address, e.g.
[10.0.3.19].
Syntactically, a domain literal consists of bracketed string of characters,
with some limitations on the character repertoire.
The use of domain literals has always been
strongly discouraged in RFC 822.
Basically, a domain part in an E-mail address is the hierarchical Internet domain
name, with the top-level domain on the right. To make things work, the top-level
domain names must be registered in a centralized manner and publicly; names directly under
each domain (subdomains) must be registered by the authority for that domain; etc.
Note that a domain name might refer to a particular computer, and often does,
but it need not.
Quite often domain names reflect some administrative hierarchy; for example,
cs.hut.fi is the domain of the Computer Science laboratory of
the Helsinki University of Technology, Finland.
See the discussion of domain semantics in RFC 822 for more information.
The terms "word", "atom", and "phrase" have been used above, but not syntactically defined here yet. The syntax as presented in section 3.3 Lexical tokens in RFC 822is a bit complicated, so here we give a plain English description.
A phrase is a word or a sequence of words.
An word is either an atom or a quoted string.
An atom is a sequence of printable ASCII characters except space
or any of the following:
()<>@,;:\".[]
Positively speaking, this means that the valid constituents
of an atom are the following:
!"#$%&'*+-/0123456789=?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ^_
`abcdefghijklmnopqrstuvwxyz{|}~
A quoted string is formed by
using normal ASCII quotation marks (") around a string,
and it's a way of turning almost any string syntactically to a word. This means
for example that a string containing a space (say, Jukka Korpela)
becomes acceptable when quoted, in a context where the syntax requires a word.
A quoted string may contain any ASCII character, but
quotation mark (") or
carriage return (CR control code) must be preceded by a reverse solidus
(backslash, \), and the reverse solidus itself as a character
must be written as doubled (\\).
Note that RFC 822 limits the character repertoire to ASCII. In practice, other characters (such as ä or é) usually work inside quoted strings used for commenting purposes (and comments), but they must not be used in addresses proper.
Note to Finnish readers: Olen laatinut suomenkielisen tiivistelmän RFC 822:sta muutoksineen.
Date of last update: 2001-05-02. A minor technical correcion 2014-04-25.
Jukka Korpela