December 1995/Doing "32-bit" DMA on a PC

Memory Management

Doing "32-bit" DMA on a PC

Gregor Owen

Gregor Owen has been working with embedded hardware and software since 1979. He can be reached at 1-516-421-1807, or Compuserve mail 71121,625.
Direct Memory Access (DMA) — is a hardware feature found on every IBM-compatible PC ever made; it supports transfer between a device and PC memory without CPU intervention. Many modern systems use only a single DMA channel to support a diskette drive. Nevertheless, spare hardware for up to six DMA channels is manufactured into most desktop PCs. Therefore, if you are developing an add-on card that needs to access lots of memory rapidly — but not too rapidly — you can occasionally save money in your design by using the built-in DMA feature.
This indeed is how a PC-based product I'm involved with works. And it worked fine — until we installed a few of these things in a PC, and ran out of memory for the DMA buffers. Even my non-technical client knows that PCs frequently come with 4 MB; why didn't I just write software that would use that memory?
That is the kind of annoyingly-simple question clients love to ask, never realizing the complexities involved — which is to say, I spent a while figuring out a useful answer. In this article, I discuss the true capabilities and maddening limitations of the PC DMA channel, and present some methods to work around the limitations.

Describing DMA Memory Space
Like many programmers, I've become used to the PC C compilers' large model. The large model's memory addresses are typically represented in segment:offset notation, reflecting the segment and register architecture of the original 8088 and successors. While this architecture addresses a megabyte of "conventional memory" [see note A], I've been conditioned not to think about addresses larger than 64 KB, because the segmentation scheme is entirely implemented with two 16-bit, 64 KB quantities. I've always known a megabyte is bigger than 64 KB, but I've spent so much time thinking about addresses that can be represented in 16-bit quantities, I have had a hard time believing it.[H]
Segment:offset notation won't do when discussing DMA (or practically anything) in the new 32-bit environments. The addresses I prefer now are linear, or "flat" ones — simple hex numbers, often sizable. Memory pointers allocated in the large model, at least by Borland compilers, usually look something like 421C:0008, with leading zeros in the offset, and are easy to convert to the linear form of 421C8. To improve readability, I have taken to inserting commas in such numbers at 4-digit fields, i.e., 4,21C8.
Using the above notation, here are some example PC linear addresses: The address of the first byte after the first megabyte of memory is 10,0000 (not 1,0000: that's the address of the second 64 KB). The address of a common expanded-memory frame is E,0000. The address of video memory is often B,8000. The physical address of the last byte in a 4-meg PC might be 3F, FFFF (not necessarily, however, because the RAM map is sometimes discontinuous).

Using "Normal" DMA
You use DMA by allocating a memory buffer, and programming the DMA hardware with the buffer's address and size. But getting a working buffer is tricky (see note [B], and references [1], [2]), particularly if you want a large one. The free DMA hardware in the PC is a little too inexpensive: the hardware cannot transfer across a 64K physical boundary. So, if I write malloc(0xFFE0) and get the segment pointer 5C24:0008, it won't work.
The physical version of the address shows why: 5,C248 + 1,0000 (64 KB) produces 6,C248. The DMA operation would cross the 64 KB boundary at 6,0000; in that event, after processing the byte at 5,FFFF, the DMA would just "wrap-around" to 5,0000 (or 4,0000 if you're using 16-bit DMA[C]). This single aspect of DMA hardware — that it cannot cross 64 KB boundaries — can incredibly complicate software dealing with DMA.
On the other hand, the DMA hardware seems to have easily adjusted to 286+ machines, and current hardware DMA offerings usually can address 16 MB[D] (a limitation which introduced its own merry chaos when people started installing 32 MB of memory) — but still, only in 64 KB chunks.

The Solution
Those Who Pay the Rent want to use the extra memory available in most modern PCs; and for various reasons chiefly involving money, we didn't want to go the protected mode[E] DOS-extender/32-bit Windows route.
Our answer: use a combination of VCPI, VDS, and LIMEMS4.1 memory-management calls (see sidebar, "PC/MS-DOS Memory Standards") to get EMS-frame 64 KB buffers in conventional memory, but DMA them in extended memory. Specifically:
1. Set-up the 386-or-better PC system with a memory manager like HIMEM/EMM386 (but see the sidebar "A Problem with HIMEM/EMM386.")
2. Use the VDS Disable DMA Translation call to make sure that EMM386 or something like it doesn't interfere with your programming of the DMA hardware.
3. Use LIMEMS calls in the standard way to locate an expanded memory frame, typically in upper conventional memory near E,0000, and allocate lots of 16 KB EMS pages.
4. Use LIMEMS calls to map in all of the 16 KB EMS pages at the EMS frame address, one after another. For each page, use the VCPI Get Physical Address of 4 KB Page in First Megabyte call to find the actual physical address of the EMS page, using as input to the call the location of the EMS frame. Store this information, so that when you're through, you know the physical address of every EMS page you allocated.
5. Sort the EMS pages by physical address, and then pick-out 4-page DMA blocks: pages whose physical address ANDed with FFFF,3FFF are the same. 12,1000, 12,5000, 12,9000, and 12,D000 are such a set. Store the result in a table of DMA buffers. Each entry in such a table will include four EMS page numbers, and a 32-bit physical address, and constitute a DMA buffer 48 KB to 64 KB in size. That is, the size of these buffers will always be 64 KB; but they can only DMA in the part that doesn't cross the 64 KB boundary, which will vary between 48 KB and 64 KB.
6. Do actual work in program. When the program wants to manipulate one of these DMA buffers, it uses standard EMS calls to put the pages into the EMS frame in conventional memory. When it programs the DMA hardware to use the buffer, however, it programs it with the physical address of the buffer.
7. When the program exits, it might clean-up: use the VDS Enable DMA Translation call on all channels; and surrender the allocated EMS pages.

Restrictions and Assumptions
It might seem that my software is not entirely "well-behaved." Actually, it is; it will run under MS-DOS without incident. What it won't do is run under any multi-tasking operating system, including of course, Windows.
Steps 4 and 5 assume that the LIMEMS software will concoct EMS pages with contiguous physical memory. 386 mapping and associated memory standards tend to deal in 4 KB blocks, so a Limulator, as LIMEMS software is sometimes called, could map different areas of physical memory into single 16 KB EMS pages. My software checks for this case, but hasn't found one in a few weeks of testing with QEMM 6 and HIMEM/EMM386.
Perverse LIMEMS software could defeat steps 4 and 5 by carefully seting-up non-DMA-able EMS pages. For instance, a series with physical addresses like 12,0000, 12,5000, 12,B000, 12,F000, 13,3000 ... (a "missing" 4 KB buffer at every boundary) would fail.
Consequently, the described strategy is appropriate for the dedicated kind of application I was dealing with; that is, I can control the system environment, select the memory manager, and check the result before release. For a product to be distributed in less-controlled environments — i.e. shrink-wrapped software — this approach would require more test, and could be impossible.

Virtual Everything
Now is the time to explain why everything you do in modern PC programming actually may not happen at all. You may think you're extracting a byte from a particular memory address; you may think you're programming a particular I/O (Input/Output) port. But if HIMEM/EMM386 or something like it is active in a 386 or better system, your port programming can be converted to anything; your memory address can actually be anywhere.
These shenanigans are necessary so that Windows and other multi-tasking environments can create multiple virtual machines running multiple programs, each of which thinks it has the whole machine to itself.
Normally, when you do DMA in such a memory-managed system without the amazing steps I've outlined above, something like this may happen (even when you're not running under Windows):
1. Your program writes the desired address and count, and other programming details, to the DMA hardware.
2. The memory manager traps these I/O writes, and doesn't let them execute. Instead, it will probably translate the operation, conducting its own DMA on some buffer it has tucked-away for the purpose in extended memory. If you ask it to DMA too much (e.g. DMA count too big) it will stop the entire system with a cryptic message like "EMM386 DMA buffer is too small. Add d=128 parameter and reboot."
3. If that error doesn't occur, the memory manager will conduct the DMA on this other buffer. It will move the stuff in your buffer, as appropriate (i.e., to its buffer before the DMA, if you're DMAing from memory, and from its buffer after the DMA operation, if you're DMAing to memory).
4. I have no idea how the memory manager fiddles your code into waiting for all this to happen, but it probably has to intercept port reads so your software won't think it's over until it's actually over.
The memory manager does these things because:
1. The DMA buffer you concoct may not be at the physical address you think it is; the memory manager is free to map any physical memory anywhere.
2. The memory manager might want to "swap" you out while you're DMAing and run an entirely different program, and that program would probably object to your DMA bytes popping up in its data or code.
So, the crux of DMA-versus-EMM386 is this simple problem: the only memory the DMA hardware can see is real honest-to-goodness physical memory actually attached to bits of metal in the computer; the memory software sees is anything the memory manager wants it to see.
Happily, when you think you know what you're doing, you can use the VDS Disable DMA Translation call to stop all this from happening. I suspect that memory managers may in fact honor the call — at least HIMEM/EMM386 obviously does — not by disabling port trapping, but by trapping anyway, and then checking a flag to see if it should complete the I/O as requested, instead of translating it.

Conclusion
The whole experience has been very much like one of these adventure games where you wander around in some huge tunneled cave system, picking up odd things that may be useful later. The dark cave is the PC's memory management system, and the weapons and baubles are the seemingly-endless array of memory standards and calls. I had almost reached the end of my road without success when I got my ultimate standard, VCPI, with its Get Physical Address of 4KB Page in First Megabyte call.
If you are planning on working with PC memory-management, I can offer some simple advice: obtain standards documentation. Some sources are listed at the end of this article. Programming books are nice, and some of them are even amusing [8]; and the standards texts can certainly be bleak. But you will need them.

Notes
[A] PC memory jargon: Conventional memory is the lower 640 KB in which normal DOS programs execute. I use the term loosely here to refer to the entire first megabyte that the original 8088 addressed, because it's too somber to keep writing "conventional and reserved memory," the latter being the correct way to refer to the top o f the first megabyte (A,0000 and above), according to The Novell Dictionary of Networking, Peter Dyson, 1994, a usually reliable source. "Upper" also refers to this region. Expanded memory is extra memory provided usually in a 64 KB "frame" somewhere around E,0000 which doesn't exist in any other address space (this, by the way, is the identical scheme used on many old CP/M S-100 machines to expand memory). Originally implemented as special hardware on a plug-in card, recent implementations have used 386 memory management features to provide an emulation of such hardware, so that the framed memory actually does exist in an alternate address space, that is, somewhere in extended memory; Extended memory is the greater-than-1-MB memory first available on 286, and then really available on 386-and-better PC systems.
A linear memory address is one expressed simply as a hex number, i.e. not in the colonated segment:offset form. However, a physical memory address is a linear address that refers to real memory, i.e. the kind that DMA hardware can see — as opposed to the kind that memory managers like HIMEM/EMM386 "dream up," which is known as virtual memory. A linear address of a virtual memory byte is commonly not the same as the physical address of the exact same byte. Sometimes it doesn't even exist (e.g. in paging schemes where "memory" is retrieved from disk).
High memory is the 64 KB-or-so between 10,0000 and 10,FFEF, accessible in some contexts with "jiggered" segment:offset pairs like FFFF:0010 and FFFF:FFFF. The region has the advantage of being an extra 64 KB of "found" memory beyond the standard 8088 megabyte range, but which still can be accessed from real mode (see note [E]). Part of MS-DOS is often located there, using the CONFIG.SYS "DOS=HIGH" invocation.
[B] If you are going to do much DMA programming, get a description of the DMA controller part, the Intel 8237A, in an Intel book or from a clone supplier. No one actually uses this part these days, but all modern PC hardware emulates it. You may also want to read Interfacing to the IBM Personal Computer, by Eggebrecht [1], and the article "DMA Controller Programming in C," by Robert Watson [2].
[C] AT-class PCs actually contain two kinds of DMA hardware: the old, PC-style 8-bit DMA, and the more-recent, AT-style 16-bit DMA. Neither can cross 64 KB boundaries, but in the latter case, those are 64 KB of 16-bit words — or 128KB 8-bit byte boundaries. On an AT-class machine, 8-bit DMA channels 0, 1, and 3, and 16-bit channels 5, 6, and 7, are often available.
[D] The 16-MB limit to DMA addressing is a reflection of the add-on eight-bit page register where the high bits of a DMA address are stored. These eight bits, plus the usual 16 bits in the essentially-64 KB DMA controller, give you 24 bits, or an address range from 00,0000 to FF,FFFF — which is 16 MB. To maintain a sort-of compatibility (or really, who knows why) the 16-bit version of the DMA hardware simply discards the least significant bit of the eight-bit page register, and uses the 16 bits of the DMA hardware to drive bus address lines A1 to A16, thus achieving the 128 KB range, but still limited to a maximum of 16 MB.
[E] I've assumed the reader is familiar with the multi-mode nature of 386 and later Intel processors, but briefly, 1. "Real" mode is the old-fashioned, 1-MB memory, 8088-compatible mode in which MS-DOS and even Pentiums start up.
2. "Protected" mode means loosely any other mode than real, where vast quantities of extended memory can be available. It specifically refers to the ability of system software to intercept and control memory and hardware access so that multiple tasks may co-exist without injuring each other or the system.
3. "Virtual 8086" mode is a protected-mode variant. A task running in this mode perceives itself as running in the simple real mode, but is actually being supervised by system software. This is the mode MS-DOS runs in when a memory-manager like HIMEM/EMM386 is active.
[F] This behavior is apparently not related to the EMS-frame mapping scheme outlined here. To test this assumption, I turned off all my EMS code, and allocated some 64 KB DMA buffers in conventional memory — but still used the VDS calls to disable DMA translation. Allocating the DMA buffers in conventional memory works, evidently, because there's some kind of gentlemen's agreement that memory managers will start out by mapping conventional memory below A,0000 (640 KB) to the corresponding physical memory. However, I observed the same reverse DMA failure mode; and the failure disappeared when I replaced HIMEM/EMM386 with QEMM 6.
[G] Listing 1 shows the code I used to fix the HIMEM/EMM386 bug. Programs should call it only if they know that HIMEM/EMM386 is present and reverse DMA is being used. is16 is true when 16-bit DMA is contemplated. page, offset, and count are the values I am just about to program into the DMA controller hardware.
I use count+1 because both I and the HIMEM/EMM386 bug understand that the DMA controller is programmed with the desired count minus 1.
The rest of the gyrations are related to details of 16-bit (versus eight-bit) DMA hardware.
To reiterate, the basic problem is that HIMEM/EMM386 thinks software will program reverse DMA with the buffer's beginning address, while unfortunately the actual hardware requires the buffer's ending address.
[H] Intel's 8088 and derived products were based on a segment + offset scheme, which supported large address spaces without requiring greater-than-16-bit addresses. Before an address is emitted, the segment portion is multiplied by 16 and added to the offset. (Later enhancements added various levels of indirection to the segments in support of evermore-flexible memory management.) Each segment register is automatically associated with particular offset registers, and override instructions are provided so that almost all offset registers can be used with arbitrary segment registers. Programmers have been annoyed at this arrangement ever since its inception, and recent Intel offerings, while retaining backward-compatibility, now support 32-bit address spaces, which essentially do away with segment requirements.

Information Sources
[1] Lewis C Eggebrecht. Interfacing to the IBM Personal Computer, Second Edition (Sams, 1990).
[2] Robert Watson. "DMA Controller Programming in C," The C Users Journal, November: 1993, p. 35.
[3] File LIMEMS41.ZIP. Available on the Simtel MS-DOS CD-ROM. Price: $39.95. ISBN: 1-57176-039-3. Published by Walnut Creek CD-ROM, 4041 Pike Lane, Ste. D-692, Concord, CA 94520. +1-510-674-0783. e-mail: orders@cdrom.com. Also check Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124. This file may also be available on various BBSs.
[4] File XMS_200.LZH. I uploaded this to the Courts of Chaos BBS, 1-501-985-0059 (Arkansas). (This is shareware from Fernando M.I. Carreiro, Transvaal, R.S.A., and he appreciates donations; see README.1ST).
[5] VCPI, Virtual Control Program Interface, v1.0. Available from Phar Lap Software, 60 Aberdeen Ave., Cambridge, MA 02138. Also available from Quarterdeck Software, 150 Pico Boulevard, Santa Monica, CA 90405. +1-310-392-9851.
[6] DPMI, Dos Protected Mode Interface, v1.0. Part #240977, Intel Literature JP-26. Available from Intel Literature Center, 1000 Business Center Drive, Mount Prospect, IL 60056. 800-548-4725.
[7] VDS, Virtual DMA Services, revision date 9/92. Available via ftp from ftp.microsoft.com, path \SOFTLIB\MSLFILES\PW0519.
[8] Geoff Chappell. DOS Internals (Addison-Wesley, 1994). I particularly enjoyed this book. I'm not sure how "useful" it is, but Mr. Chappell pokes and probes with wild abandon into the most obscure corners of various MS-DOS issues, with a sort of feverish enthusiasm which is alternately exhilarating and puzzling.