November 1996/High Speed Parallel Port Transfers

Features

High-Speed Transfers on a PC Parallel Port

Kyle A. York

The simplest way to control a device with a PC is often through the parallel port, if you know how.

The parallel port is probably the most common and least utilized I/O port device on IBM PCs and compatibles. Although the parallel port has always been available for use in high speed I/O, most people simply connect a printer to it and ignore it.

With the advent of laptops, space for internal devices has become severely limited, so more and more users have employed the parallel port in everything from tape backup to network communications. The concept of parallel port I/O is not new. The original Apple Lisa came with a parallel hard drive option in the early 80s.

In this article I give a very basic overview of the original parallel port hardware. I show how it can be used for both input and output. The advantage of parallel output over the more commonly used serial is speed. The maximum throughput of a standard serial port is 115.2K bits per second (bps), while the maximum throughput of a standard parallel port is closer to 512Kbps.

Parallel Port Hardware

The standard IBM parallel adapter card includes a female 25-pin connector. A potential point of confusion is that most parallel printers have a 36-pin connector (Centronics), so the cable that connects computer to printer must adapt from 25 pins to 36. But the wiring of this cable is of no interest here; in this article we will be dealing only with the 25 pins coming out of the adapter card. (If you're wondering where the Centronics connector gets all those extra pins, note that half of its pins are tied to ground.) The pins of the adapter card are mapped to three consecutive ports: data, status, and control (see Table 1) .

Writing to the data port directly drives the data pins (2..9). Reading from this port returns the current state of the data pins. The hardware of the original PC specification uses totem-pole output, which consists of two transistors per pin, of which only one is active at a time. When a '1' bit is sent to the pin, the "high'' transistor is energized; that is, the "high'' transistor connects the pin to a voltage that represents a logic '1'. When a '0' bit is sent, the "low'' transistor becomes active and shunts the pin to the logic '0' voltage.

Pins that have the older totem-pole outputs cannot be connected to other outputs, so they cannot read data from external devices. However, the data port can read what it is putting on these pins. Reading the data port output is useful for diagnostic purposes, such as checking for stuck bits. If the value read does not match the last value written, there is an error.

The newer parallel ports can disable both transistors of the totem pole, allowing the pin to "float'' and receive data from external devices. These pins (sometimes referred to as tri-state) can do bidirectional I/O.

The status port is a read-only port. The status port returns information in bits 3..7 of the byte read (where 0 is the least significant bit). Note that bit 7 is inverted -- when the pin is high, the bit is 0.

The control port is a read/write port, with bits 0..3 mapped to I/O pins. Unlike the data port, the control port drives each of its pins with only one transistor, which is attached to the logic '0' voltage. When a program sends a '0' to one of the control port's bits, this transistor forces the corresponding pin low, by shunting it to the logic '0' voltage. When the bit is set to '1', the transistor becomes inactive and essentially behaves as if it weren't in the circuit. As in the tri-state example above, this pin is said to be "floating.'' This configuration is known as open collector output. Like the tri-state configuration, open collector output allows bidirectional I/O, but has some different characteristics which are especially useful for control applications.

The control port has a few functions other than just driving or reading pins. Bit 4 is used to enable Interrupt Requests (IRQ) from an external device. If this bit is set, when pin S6 goes high a hardware interrupt will occur. Bit 5 is not used for standard parallel ports, but becomes the direction control bit for the newer bidirectional data ports. When this bit is set, both transistors on the data lines are disabled, allowing an external source to control the pin.

I/O Driver

The Interlink/Laplink cables are diagrammed in Table 2. Depending upon the parallel port being used, either five or nine bits are available for synchronous I/O. My transfer method uses a simple strobe bit that inverts whenever new data is present. This eliminates any timing issues, but since each datum must be acknowledged, it entails a performance hit.

I use D0..D2, D4 and C0..C3 for data write, and S3..S5, S7 and C0..C3 for data read. I use D3 for the data write strobe, and S6 for data read strobe. The choice of S6 for the strobe was not arbitrary: as mentioned earlier, when the IRQ latch is enabled (C4 set high) and S6 goes from low to high an interrupt occurs, providing a "wakeup'' for the receiving machine.

Before a transfer, both machines are initialized by writing 0x00 to the data port. This forces the sync bit on each side to zero. The pseudocode for the transfer is as follows:
Transmit:
  set D3 high (wakeup)
  wait for S6 to go high
    the transfer mode is set by
    the receiver and returned
    in the data
  do
    write datum
    wait for S6 to invert
  repeat until on more data
  set D3 low (reset sync)

Receive:
  wait for S6 high
  set C2 high, C0..C1 and C3 low
  acknowledge with receive mode
  do
    read datum
    invert S6
  repeat until no more data
  wait for sync low
  set D3 low
For this to work, the transfer software must be able to tell when data transfer is complete. There are several ways to implement this: the data could contain some indication of how many bytes are to be transferred; I could assign a predefined end-of-data code; or I could arrange for the receiver to timeout. I chose a simple protocol: I send a two-byte length followed by a two-byte one's complement length. Sending the length in two different formats makes it unlikely that random data will be interpreted as a length.

Note above where I mentioned that some parallel ports can transmit and receive nine bits, while others are limited to five. It seems some ports do not disable output on the control pins, so they cannot read data correctly through those pins. The transfer software is responsible for setting the correct mode (nibble or byte transfer) when opening the port, though it can be reset later. The receiving machine determines the transfer mode to be used when acknowledging the wakeup. It is not uncommon for both machines to receive in a different mode.

As mentioned earlier, the parallel port can be set to interrupt whenever the S6 (Ack) goes high, though this has rarely been used. Had the original BIOS of the 4MHz PC tried to do so, the interrupts would have come in so quickly that there would have been no time to process them. To get around this problem, my driver uses the interrupt only as a wakeup. Once activated, the driver disables the port's interrupt and leaves it disabled until the packet is fully received. One interesting thing I found out about this process is that the act of disabling the hardware interrupt causes an interrupt to occur, so the semaphore must be set before doing so.

Software Interface

I've made the software interface as simple as possible. The steps needed to read and write are similar to those for files: open, read/write, close. Reads are asynchronous at the driver level and simply fill a queue. Writes are synchronous to allow the necessary handshaking. Since a block must be less than 64Kb in size, the maximum delay from a full write is one to two seconds.

The following functions constitute the software interface. To see their implementation, refer to Listing 1:
PP *PP_Open(WORD queueSize, int port, int base, int IRQ,
            PPMODE mode)
queueSize is the size of the receive queue. Note that sends are always synchronous, which should not be much of a problem since a 64Kb block would require only two seconds to transmit. If queueSize is zero, then the interrupt driver is not loaded and all reads are synchronous.
void PP_Close(PP *pp)
This function frees the buffer space allocated for the transfer.
int  PP_Write(PP *p, void *buf, int length)
This function writes length bytes from buf to p. It returns the total number of bytes sent. If total bytes sent does not equal length, a timeout error occurred.
int  PP_Read(PP *p, void *buf, int length)
This function reads length bytes from p into buf. It returns the number of bytes retrieved. This function waits up to 1/2 second for the first byte, then does not delay for each additional byte, allowing full block transfers.
BOOL PP_RecievePending(PP *p)
This function returns zero if no receives are pending, else non-zero.
void PP_ClearQueue(PP *p)
This function clears the receive queue

Sample Application

The sample application appears in Listing 2. (Full source is available on the code disk and online sources -- see p. 3 for details.) It is a simple command-driven interface that allows two commands: GET, to copy a file from the remote machine to the local machine; and PUT, to copy a file from the local machine to the remote machine. The syntax for the commands is
GET [remote] [local] and
               
 PUT [local] [remote]
Progress information is noted on the screen whenever a transfer occurs.

This software currently incorporates no error checking other than timeouts. Even though I have yet to lose or corrupt any data, I would strongly recommend adding some kind of integrity check into the sofware.

Other obvious extensions include grabbing entire directory trees, or -- for those who need to keep their laptops and desktops in sync -- a program that synchronizes directory trees.

Platform and Performance Considerations

I wrote and compiled with Borland's Turbo C v2.0. It should be easy to port to Microsoft or other compilers. For timeout I use the low word of the BIOS timer which has a resolution of roughly 55ms.

As written, this code executes only under MS-DOS. I have tested it on MS-DOS 3.3 through 7.0. Since all versions of MS-Windows (3.1, 95, NT) virtualize port access (i.e., allow no direct access to the ports), the program does not run in these environments. However, under these environments it would be possible to implement the driver with few changes as a true, loadable device driver. Unfortunately, I have not studied the gory details necessary to implement drivers on all of these systems.

Finally, I have noticed that every machine seems to transfer at different rates, even when using the same mode. My test machines for example both transfer in byte mode, but one direction has a throughput of 60KBps while the other direction has only 20KBs. Your mileage may vary.o

Kyle A. York is a programmer at TGV in Santa Cruz, CA. He can be reached at noesis@ucscb.ucsc.edu.