Sharing Peripherals Intelligently

Matching client/server needs to device exigencies

Ian Hirschsohn

Ian holds a BS in mechanical engineering and an MS in aerospace engineering. He is the principal author of DISSPLA and cofounder of ISSCO. He can be reached at Integral Research, 249 S. Highway 101, Suite 270, Solana Beach, CA 92075.

The problem of coping with mountains of data will get worse before it gets better. Part of the solution is being able to share available data at reasonable transfer rates. In this article, I'll describe how a standard PC can be used to provide a pool of high-performance tapes, disks, image printers, and other peripherals that are made available to client workstations at sustained rates ranging from 4--7 Mbytes/sec each. The design uses SCSI-2, but allows for IEEE 488 or other networks; the clients can be an arbitrary mix of workstations, Macs and PCs typically connected via SCSI-2 to a 486-based "peripherals manager." The result is the (commercially available) "STAR Peripherals Manager," a collaborative effort between Texaco (Briarpark, Texas) and Integral Research (my company).

Sustained throughput is clearly paramount for gigabyte files. At 50 Kbytes per second, it takes Ethernet or Token Ring more than five hours to move one gigabyte. Inexpensive QIC drives can hold several gigabytes, but it takes them about two and a half hours to move one gigabyte through the printer port. At a typical 250 Kbytes per second, even the SCSI Exabyte 8200 8mm requires an hour to transfer one gigabyte. High-performance drives such as the IBM 3490 or the $35,000 Metrum 2150 T120 will transfer the same amount of data at 2 Mbytes per second in about 9 minutes, and SCSI-2 disks capable of 4--5 Mbytes/sec take only about five minutes to do the same.

Though client transfer performance is important, sustained device throughput is critical. If a tape is streaming past the heads at 125 inches per second (ips), the host processor must be able to accept the data at that rate. Otherwise, the drive motors brake, reverse the tape, and reposition with throughput going from Mbytes/sec to Kbytes/sec in "washing-machine mode." Furthermore, the oscillating tape is susceptible to errors, increased head and motor wear, and degradation of the tape itself. In short, sustained data transfer is critical to all mechanical peripherals. Multi-megabyte device cache is not a panacea: If you're reading one tape record just to find the location of the next, streaming 20 Mbytes into cache is counterproductive.

While future fiber networks promise to solve the performance issue, the problem remains that every workstation platform has its own custom device drivers and each operating system has its own API. Each variance of the operating system often needs its own API, plus different SCSI drivers for each adapter. The resulting 3-D matrix of platforms, APIs, and drivers (or lack of them) often limits the acceptance of a device more than its physical characteristics. The optimum solution clearly involves an intelligent peripheral manager dedicated to matching client needs to device exigencies.

SCSI-2

Theoretically, SCSI-2 is the perfect solution--its 10-Mbyte/sec data-phase rate (synchronous mode) for 8-bit and 20-Mbyte/sec rate for 16-bit (Wide SCSI-2) is still beyond today's network capabilities. SCSI-2 is inexpensive--almost every platform sports the 50-pin connector at the back. You should be able to string up to eight machines together with cheap cables and transmit data at rates that leave Ethernet standing still. But it isn't that simple, even though it is perfectly feasible under the SCSI-2 specification. Although SCSI-2 is a standard, device nuances and vendor-specific options enter in. For example, SCSI-2-compatible tape drives are supposed to support variable-length records. However, most QIC drives can only handle fixed-length blocks, which are often only 512 bytes long. Applications expecting a QIC tape to behave like a 9-track will be disappointed, regardless of the SCSI-2 specs. Some drives support fast positioning, others don't. In practice, different APIs and even drivers are needed for 9-track, 8mm, 3490, QIC, DAT, and D2. Likewise many platforms view SCSI-2 disks differently.

A practical stumbling block to making SCSI-2 universal is the number of applications and systems that key to a specific vendor's device. For example, an application may expect an Exabyte 8200 and refuse any device that does not respond as such--even an Exabyte 8500. Many applications install their own drivers, and some customers customize their operating systems so that a complete suite of APIs and drivers may be inadequate. Another issue involves artificial system limitations. Most Sun SPARC systems, for example, are limited to 64-Kbyte contiguous tape blocks. If a program writes a 256-Kbyte record, four separate blocks are unknowingly written. This can be a surprise when you try to read it on another platform.

Even when devices are fully SCSI-2 compliant, there may be more efficient ways for them to operate. Most high-performance cassette drives, for example, sustain peak throughput with maximum-length fixed blocks. Small records substantially degrade throughput, so it is far more efficient to pack the small records into large blocks. However, this requires customized unpacking APIs for each platform. High-performance disks would come closer to delivering rated throughput if systems used blocks longer than the usual 512 bytes or took advantage of SCSI-2 queued requests. Published benchmarks show that disks with a theoretical 4 Mbytes/sec only yield 300--500 Kbytes/sec on a PC.

There are also devices with features not addressed by the SCSI-2 spec. Robotic tape drives need to be instructed to select a specific cartridge, typically using a separate RS-232 connection. This presents a challenge when you need to provide APIs and drivers on different platforms. Finally, many devices don't support SCSI-2 at all, but they have unique, highly desirable features.

STAR Peripheral Manager

The STAR Peripheral Manager is a standard 486/Pentium-based PC with up to eight EISA SCSI-2 adapters (a passive backplane can support up to 18 adapters). Each adapter card can interface up to seven clients or devices, although client workstations generally utilize a dedicated adapter. Figure 1 shows the STAR configuration. The 486 PC has a VGA monitor, keyboard, 200-Mbyte IDE drive, and from 16 to 256 Mbytes of memory. The IDE drive is for STAR-system use only: The client disk farm (if any) uses multi-gigabyte SCSI-2 drives connected to the SCSI-2 adapters. This isolates the STAR system from client disk use. The STAR system uses 8 Mbytes of RAM; the remaining 8--248 Mbytes are dedicated to a cache pool for client-device I/O.

Although STAR is capable of handling up to 24 active client-device queues concurrently, two to four queues are typical. Generally, a client workstation will copy a massive file to its own disk at maximum speed, then disconnect. A 486/66 EISA STAR PC easily keeps a 2-Mbyte/sec Metrum T120 tape running at maximum capability while sustaining 4.5 Mbytes/sec from a SCSI-2 disk. Measurements using other PCs to emulate tapes and disks (to measure rates beyond physical devices) show that the STAR Peripherals Manager (PM) can sustain 7 Mbytes/sec to a single client and about 5 Mbytes/sec to each of two clients. A limiting factor is the 33-Mbyte/sec DMA limit of the EISA bus coupled with the fact that each client-device queue requires concurrent I/O to both the client and device. Therefore, 7 Mbytes/sec corresponds to a 14-Mbyte/sec EISA load. These figures are for 8-bit (Fast) SCSI-2 with a theoretical 10-Mbyte/sec synchronous bandwidth (closer to 8 Mbytes/sec when SCSI-2 protocol overhead is included), 16-bit (Fast Wide) SCSI-2 has a theoretical 20-Mbyte/sec limit. So, the 7 Mbytes/sec could be exceeded with 16-bit SCSI-2, but few devices use Wide SCSI-2 at this time.

The STAR PM enables a mixed group of workstations (Macs and/or PCs) to share a common pool of disks, tapes, and other peripherals. Since the devices are connected to the clients via software, the peripherals are not restricted to SCSI-2. Devices can be interfaced via any protocol, but clients typically see the devices as generic SCSI-2 tape drives. Under SunOS, for example, a C program can communicate with a SCSI-2 tape via standard C reads and writes to /dev/mt0. STAR software then directs the data to or from the actual peripheral; see Figure 2.

For disks, STAR responds via standard SCSI-2 disk protocol so that STAR disks are indistinguishable from client system disks. But since STAR-based disks are connected via software, STAR is able to concatenate multiple disks into, say, a single 100-gigabyte virtual disk common to all clients. Alternatively, STAR can partition a large disk or mirror multiple disks as an inexpensive RAID. Another feature is that STAR caches client disk queues to its 486 extended memory so that up to 240 Mbytes can wait in cache, thereby improving disk I/O. An interesting possibility of this software connection is that STAR could emulate different disk strategies, such as FAT clusters for PCs and UNIX tables for workstations, on the same disk, enabling mixed-client platforms to share a common disk.

A second STAR function is to mimic whatever actual device a client expects to see. For each client, STAR is initialized to respond as a specific vendor's SCSI-2 device. If the client expects an Exabyte 8200, that is what the client sees--right down to the vendor's name and model number on the SCSI-2 Inquiry command. STAR emulates all nuances of the vendor's device. The actual peripheral may be a 9-track, 3480/90, T120, DAT, or QIC. STAR can even emulate tape with disk, enabling multiple workstations to retouch frames of the same movie or share a similar data set. Thus, standard software such as Landmark ITA, Adobe Photoshop, Advanced Geophysical Promax, and the like can access any vendor's device without platform-specific drivers or APIs. Additionally, since only one client usually uses a given STAR adapter at a time, STAR can mimic the device on multiple IDs and accommodate disparate apps.

An interesting aspect of mimicking widely used devices is that STAR enables new technologies to be pressed into immediate service. For example, robotic tape drives can place terabytes of data at your fingertips, but there are no standard SCSI-2 commands to select a specific cassette. STAR can readily issue the vendor-specific RS-232 commands for the robotic functions without custom modifications to the client system. This avails the device to its whole mix of client platforms, making it appear as a 3480, 9-track, or whatever else they are comfortable with. Although a driver has to be developed for STAR, it is only one driver.

A third and critical STAR function is to apply transformations or filters to the data on the fly. Even minor differences such as blocking factors and record headers precipitate this conversion process. Since a 486 separates the client from the actual device, STAR is able to apply arbitrary transformations to the data as it passes from the device to the client. These transformations are loaded as 32-bit, protected-mode overlays so that users can program whatever algorithms they wish. Generally, these transforms are no more than a few hundred lines of code. To sustain maximum throughput, transforms are written in 32-bit assembly language, but can be coded in any language that will link as a standard DOS overlay.

Data transforms also enable optimum utilization of a specific device. As previously mentioned, a Metrum T120 tape can sustain about 2 Mbytes/sec provided the tape blocks are its maximum 256 Kbytes, but most apps use tape records of a few Kbytes. The smaller records are packed/unpacked on the fly via a transform subroutine, thereby improving performance. Transforms are also key to emulating a specific vendor's peripheral, RGB to CYMK conversion, and so on.

PORT

Although simple in concept, implementing a peripherals manager presents some practical obstacles. Because it is more than a simple controller, STAR needs an operating system. STAR uses the PORT system which was expressly designed for multiprocessors (see my article series "Personal Supercomputing," DDJ, June--August, 1992). The PORT system is only used for startup, screen messages, contingency handling, and other functions that are not time critical. Client-device queues, transformation handling, and all other time-critical functions are handled by a 32-bit Assembly kernel. The kernel programs the SCSI-2 adapter processors directly, even bypassing ROM-BIOS. PORT remains in suspended animation until the transfer is complete, an error occurs, a new client signs on, or some other event.

A key factor in selecting PORT is its use of 486 virtual memory (vm). It is this feature that limits the use of OS/2 or Windows NT with STAR. This vm feature (common in workstations, minis, and some mainframes) translates the contiguous virtual memory seen by a program into 4-Kbyte pages via an on-chip Translate Lookaside Buffer (TLB). However, these 4-Kbyte pages could be scattered all over real RAM. Virtual memory enables the operating system to pack real memory full of tasks and not run out as tasks are spawned and deleted. The feature is used heavily by all 32-bit protected-mode operating systems, including the Phar Lap DOS extender. The SCSI-2 adapters, however, have on-board processors of their own. For reasonable throughput, data needs to be shipped via DMA direct from 486 memory. The problem is that if you requested a DMA transfer from address 123ABC of 64 Kbytes, the actual memory location of the first page may be at 403DEF, the second at 23AB55, and so on. Although multitasking OSs circumvent this via dedicated blocks of contiguous memory, the data has to be copied back and forth from these limited buffers. All 486 memory should be directly addressable by the SCSI-2 microprocessors.

PORT is a virtual-memory system. However, the difference is its use of software virtual memory, not the 486 TLB, so real memory is contiguous for buffers, cache, multiprocessing, and all kernel functions. PORT uses 8 Mbytes of extended memory and the remaining 8--248 Mbytes is used by the STAR kernel for its cache queues, buffers, and so on. The STAR Manager software, resident in PORT, is written in PORT's 64-bit Fortran_C, simplifying development of the extensive management program. As a virtual-memory system, PORT enjoys the benefits of a paged system and can also utilize multiple large buffers for tape directories, device-content analysis, and so on. Although software virtual memory (essentially demand overlay) is slower than an on-chip TLB, the difference is insignificant for the functions the STAR Manager performs. PORT buffers that overflow its 8-Mbyte space are staged to the STAR IDE disk transparently.

An interesting aspect of PORT is its ability to support multiple RISC processors (such as the Intel i860) on PC plug-in cards. On its own i860, the STAR Manager has almost no overhead. Alternatively, it can farm out Transforms to DSP or RISC processors, yielding performance well beyond the 486 for the particular algorithm; see Figure 3. A multiprocessor STAR dedicates the 486 entirely to the client-device I/O. PORT's orientation to asymmetrical multiprocessing (task specific, dissimilar, multiple processors) is a perfect fit for STAR. Although not required, multiple processors provide STAR with a growth path beyond the limitations of the 486 family.

Next Time

A sophisticated operating system should allow STAR to implement a universal networking and client communication protocol via SCSI-2 tape read/writes. Although this protocol isn't mandatory, it enables a client to query the devices available on the STAR, select a transform, position to specified file, and perform other functions that may not be supported by the application itself. All this is transparent to the client's host operating system. This protocol, which I'll cover in a future article, is important for device features not covered by the SCSI-2 command set.

Figure 1 STAR Peripherals Manager uses a 486-family PC to interface client workstations/Macs/PCs to a peripherals pool. Clients generally connect via SCSI-2, but devices can use a variety of protocols. Figure 2 Adapter cards typically service single client or device, but up to seven clients/devices can chain to a single 8-bit SCSI-2 adapter. Clients and devices do not share the same adapter. Figure 3 PORT's Cray multiprocessor model utilizes shared memory for maximum throughput. The system is designed to manage task-specific processors (asymmetrical multiprocessing), which is ideal for STAR.