Dr. Dobb's Journal February 2004
Among the reasons that Eudora (http://www.eudora.com/) is one of the most popular e-mail clients is its support of features such as filters, stationery, personalities, and plug-ins, not to mention that it runs on Windows, Macintosh, and PalmOS platforms. But as nice as these features are, the way Eudora handles message attachments and archiving can be a nuisance. Messages are stored in text format and attachments are stored in a directory common to all mailboxes. When messages are moved to another mailbox, the attachment stays in the attachment directory. This is straightforward, but causes problems when archiving mailboxes.
According to Eudora tech support, the recommended way to archive/backup messages is to copy the TOC files, MBX files, and the entire attachment directory to a new location (http://www.eudora.com/techsupport/kb/1602hq.html). This is simple, but doesn't let you organize the messages and attachments. Archiving messages for a project that was just completed means going through all messages, manually moving the attachments to a separate directory, and moving the mailbox files and attachments to backup media. Restoring mailboxes is a matter of copying the files and attachments back into the Eudora directories.
An alternative might be to write a script that scans the MBX file for the X-Attachment line, then to move the referenced files to another location. This helps sort attachments, but the message text still points to the original location of the attachment. Updating the text causes the message length to change, thus requiring a change to the message headers contained in the TOC file.
Yet another approach is to write a program that reads the MBX and TOC files and manages the messages and attachments in them. This gives you greater control over the placement of the attachments and provides a way of moving messages directly into a mailbox that is not in the Eudora directory. X-Attachment header information can be changed, and the TOC file is also updated. To write such an application, however, you need to understand the file format. In this article, I briefly examine the Eudora file formats and present a class library (available electronically; see "Resource Center," page 5) based on this information. The library centers on a DLL that is the main access point to the Eudora mailboxes. You can use this DLL to create applications where you simply drag messages into a mailbox archive and copy any attachments out of the attachment directory into a directory associated with the mailbox.
Eudora uses two files for each mailbox. The MBX file is an ASCII text file, which can be read by any text editor. The standard e-mail header information is contained as part of the ASCII text of the message. The TOC file, on the other hand, is a binary file that contains the index into the MBX file for each message as well as the message length and other information for the message.
The CEudoraMBXFile class (also available electronically) is derived from CFile and is the smallest class of the library. There are two ways to use CEudoraMBXFile. One way is to create an object using the default constructor with no parameters, then use the Open() member function to gain access to the data in the file. Another way is to use the constructor with the filename and open-flags parameters. The open-flags parameters are the same flags used by the CFile open-member function.
There are two file-access functions: GetMessage(), which reads from the file given the offset and the number of bytes from the message header and returns a Cstring; and Write(), which seeks to the end of the MBX file, saves the offset, and appends the text of the Cstring to the end of the file. The starting offset is then returned to the caller. This class is not exported from the DLL, so only those classes that are contained in the DLL can access it.
CEudoraTOCFile, the TOC file, is a binary that contains a file-header record and several message-header records (see Listing One). The file header is read when the file is opened. Most of the members of this structure define how Eudora displays the mailbox window and the elements of that window. These members are initialized to a default value when a new mailbox is created.
When Eudora creates a mailbox, the Version field is filled in two ways: If the mailbox is a system mailboxIn, Out, or Trashthe Version field is filled. If the mailbox is a user-defined mailbox, the Version is left blank. I have not determined what the binary data in this field denotes for a Eudora 5.2 mailbox.
The Title field is the mailbox name. When a new mailbox is created, this field is used to create the MBX and TOC filenames. For each mailbox, the TOC file and its corresponding MBX file have the same 32-character filename derived of the full mailbox name. Given a mailbox name, use these steps to determine the filename for the TOC/MBX pair:
1. Remove any ampersands or commas from the name.
2. If the remaining string is longer than 32 characters, then truncate it; if it's 32 or less, then it's suitable. For instance, "My New Mailbox has a very long name" is "MY NEW MAILBOX HAS A VERY LONG N.TOC," whereas "In" is simply "IN.TOC." The MBX will be named with the same filenames: "MY NEW MAILBOX HAS A VERY LONG N.MBX" and "IN.MBX"
The Type field denotes the mailbox typeIn, Out, Trash, or User. The Mbxclass field can be ignored for now because I have not found a definite purpose or explanation for it. The MsgCount field is the number of message headers that are contained in the TOC file. Again, this class is not exported from the DLL, so only those classes contained in the DLL can access it.
The CEudoraMsg class (Listing Two) is used to encapsulate the message header read from the TOC file and the message text read from the MBX file. The message-header structure is packed on a 2-byte boundary by the #pragma pack(2) directive. Leaving the structure packed at the default setting causes extra bytes to be read from the TOC file and subsequent reads to start at the wrong offset in the file.
After a CEudoraMsg object is created, the Read() function is called to populate the data members. The Read() function takes two CFile pointers as parameters. These parameters are the base classes of CEudoraMBXFile and CEudoraTOCFile, respectively. The VERIFY macro ensures that the CFile pointers passed into the Read() function are actually the base classes for the CEudoraMBXFile and CEudoraTOCFile. The message object reads the message header from the TOC file, then reads the message text from the MBX file.
Just like the TOC header, the message header has a lot of information used by the Eudora program to define sizes and positions of interface objects. The structure members needed to read the message from the MBX file are MBXOffset and MSGSize. The Read() function uses the MBXOffset to seek to the position in CEudoraMBXFile, then reads MSGSize bytes. This text will contain several e-mail headers and the messageall in ASCII format.
CEudoraMailbox, the main interface for the CEudoraTOCFile and CEudoraMBXFile classes, handles the opening and reading of both files. It also keeps an array of CEudoraMsgs in memory.
The DLL and classes presented here are not a complete solution to working with Eudora mailboxes and messages, but they are a start. Exception handling and error recovery are two of the areas that need more work. Euarchive.exe (also available electronically) is a test application that displays the TOC or message header. Most of the information displayed can be verified through Eudora. Any information about unknown fields in the structures is welcome.
DDJ
// Raw TOC file header
typedef struct TOCHeaderStruct
{
char Version[8];
char Title[32];
short Type;
short Unknown;
short Mbxclass;
short x;
short y;
short w;
short h;
short SCol;
short PCol;
short ACol;
short LabelCol;
short WhoCol;
short DateCol;
short KCol;
short VCol;
short Unknown2;
char Unknown3[30];
short MsgCount;
} TOCHeader; // 108 Bytes
#pragma pack(2)
struct TOCDetailStruct
{
long MBXOffset;
long MSGSize;
long DateTime;
short Status;
char Switch1;
char Switch2;
short Priority;
char DateTimeSent[32];
char Name[64];
char Subject[64];
short x;
short y;
short w;
short h;
short Unknown1;
long Unknown2;
char Unknown3[26];
} TOCDetail; // 218 Bytes
#pragma pack()