Internet mail: formats
Greg's email pages
Greg's rants
Greg's product reviews
Greg's home page
Greg's diary
Greg's photos
Greg's links
Google

This is a historical document. The last change in content was dated 4 October 2000. The core content still applies, but a lot has changed since then, not all for the better.

The appearance of a mail message is the first thing that people notice, and it's one of the most important. It has a strong influence on the impression that you make on people you have never met. In this section, we'll look at the following topics:

Message layout

A lot of badly formatted messages come from bad mailers or badly configured mailers. The following mailers are known to send out badly formatted messages without you finding out about them:


cc:Mail
Eudora
exmh
Microsoft Exchange
Microsoft Internet Mail
Microsoft “Outlook”
Netscape

As you can see, the mailers in the Microsoft world are frequent offenders. If at all possible, use a UNIX mailer. If you can't, the following links make suggestions about how to set up Microsoft “Outlook” or Netscape Communicator.

Since Microsoft discovered the Internet, it has introduced a number of changes to mail formatting which contravene the standards. Apart from the fact that Microsoft mailers are non-standard, the changes are also bad. Let's look at some of the more common problems:

This may seem a funny requirement. To understand the reasoning, consider the following message:

FreeBSD versus Linux

If the reader is allowed to wrap the message, it could end up looking like this:

FreeBSD versus Linux, wrapped

Clearly, there must be some way of specifying that the message text should not be wrapped. That's text/plain. There are special MIME attachment types which allow wrapping, although I still think that this is a bad idea. If you specify that your message may be wrapped, you're making an assumption about what the receiver's screen looks like. Even if you're right some of the time, you can't be right all of the time. For example, one person may have a screen 200 characters wide in order to be able to display long log file entries, but he won't want to see his text that long.

When sending conventional mail, Microsoft mailers often send text that doesn't look like what they display on the screen. Without telling you, they either transform paragraphs into one long line, or they break lines into two, one long and one short. The resulting message looks like:

FreeBSD versus Linux, single line

This example shows the elm mail reader. This kind of message comes from a mailer which expects the receiving end to do paragraph wrapping. This is an invalid assumption and violates the spirit of MIME. In addition, RFC 2822 limits line lengths to 998 characters. It's easy to exceed this limit in a paragraph. The results may be truncated messages or gratuitous line breaks after 998 characters.

Microsoft themselves recently supplied an excellent example of this breakage. Since they explicitly allow verbatim reproduction, here it is.

Alternate long and short lines:

FreeBSD versus Linux, zigzag

This kind of message comes from a mailer which wants to know better than you where the line should end. Typically, it could be the same mailer as before (Microsoft “Outlook” is a good example), but configured to limit the line length. For example, you might have specified a line length of 65 characters, but you write a line of 70 characters before hitting the “Enter” key. The mailer remembers the end of your line, but it decides it's too long, so it splits it into two lines, the first (long) with as many words as will fit in 65 characters, and the second with the rest.

The insidious thing about these conversions is that you may not be aware of them. If you get messages from other people which appear to be garbled, your mailer may be reformatting them on arrival, in which case it is possibly reformatting them before transmission.

As if that weren't enough, consider what happens with “quoted text”. As we will see in Message layout issues, when you reply to a mail message, it's good practice to include relevant parts of the text to which you reply, so that your partner can remember what the discussion is about. This is especially important for people who go through a lot of mail--200 messages a day are not uncommon.

The standard way to quote text is to include a > character at the start of each line of quoted text. The problem is that this makes the line longer, especially if you include a space after the > for legibility's sake. The results can look like this:

Irishman joke

Here, the text has been quoted multiple times. The mailer (Microsoft Internet Mail Service (5.0.1458.49)) has tried to wrap the lines, but it has not understood that it is wrapping text with four levels of quotes, and only puts a single quote on the continuation lines. After a couple of cycles of this kind of mutilation, the message is completely illegible.

This example shows the more normal case with single character quotes. Microsoft offers other alternatives with “Outlook” as well. In particular, you can quote with a tab. After a couple of times, this gives results that look like this:

Irishman, no joke

I don't think there's anything more to be said about such mutilation. I personally don't use Microsoft products, but I know a lot of clever people who do. None has been able to tell me how to configure “Outlook” to avoid these problems. If you know, please contact me and tell me how. I'll then include it in my next update.

Using MIME attachments

MIME allows you to attach all sorts of data to a mail message, including images and sound clips. It's a great advantage, but unfortunately many people refuse to use it, perhaps because the UNIX community haven't got their act together. Credit where credit's due, this is one area where Microsoft is ahead of the UNIX crowd.

Nevertheless, you can do a lot of things wrong with MIME attachments. Here are some of the more common ones:

Correct time and time zone

When did you send your message? The answer is in the Date: header. Here are some typical examples:

Date: Wed, 11 Mar 98 11:29:05 +1030
Date: Tue, 14 Apr 1998 18:56:18 -0400

The exact layout of this header is defined in RFC 2822, but most of the fields are relatively easy to understand. The time is in the 24 hour clock. The field after the time specifies the Time Zone. Together with the time it defines the real time that you sent the message. This is more important than you may realize: if you're following a complicated exchange on a subject, it's very convenient to have the messages sorted in the order in which they were written, and most mail readers allow you to sort messages this way. If your time or your time zone is wrong, your messages will be sorted in the wrong place. This annoys many experienced mail users, and causes them to look down on you.

The first two digits of the offset specify the number of hours, the second two digits the number of minutes offset, so -0400 means “four hours behind UTC”. +1030 means “ten and a half hours ahead of UTC”. UTC is very close to, but not identical to, Greenwich Mean Time, or GMT, the time in England in the Winter. Just to make this more confusing, this is exactly the other way around from the convention that UNIX System V uses: System V considers EDT, the time zone on the East Coast of the USA in the summer, to be 4 hours ahead of UTC.

The older RFC 822 standard also allowed abbreviations like CST or EDT to represent US specific time zones only. RFC 2822 requires that the mail software must understand these format, but it shouldn't generate them.

Where do you set your time zone? That depends on your operating system.


Greg's home page Greg's diary Greg's photos Copyright

Valid XHTML 1.0!

$Id: email-format.php,v 1.13 2020/01/08 00:16:16 grog Exp $