Sydney S. Weinstein, CDP, CCP is a consultant, columnist, lecturer, author, professor, and President of Datacomp Systems, Inc., a consulting and contract programming firm specializing in databases, data presentation and windowing, transaction processing, networking, testing and test suites, and device management for UNIX and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the Internet/USENET mailbox syd@DSI.C0M (dsinc!syd for those who cannot do Internet addressing).
Each January I like to provide an overview of networks, both to answer questions I am commonly asked throughout the year, and to benefit readers who are completely new to networks. This is my fourth anniversary in writing this column, and its time to update the prior January columns. In prior years I have written about the following subjects: the internet, the Internet, USENET, Network News, obtaining a news feed, and obtaining sources from archive sites. This year, I go over pretty much the same subjects: some basic definitions and information about networks, a description of USENET Network News, and finally, instructions for obtaining software from the networks.
First, a quick overview of my column. "On The Networks" covers articles posted to several of the source groups on USENET Network News: comp.sources.games, comp.sources.misc, comp.sources.reviewed, comp.sources.unix, comp.sources.x, and alt.sources. Each of these groups is like a section of a large electronic magazine called USENET Network News. I call USENET Network News a magazine, and not a bulletin board, partly because of the way it distributes its news. Unlike a bulletin board, where each reader accesses a central machine to read the messages, USENET delivers Network News on a subscription basis to each computer, and subscribers read the articles locally. In "On the Networks," I let you know about some of the new postings to USENET Network News.
Some Definitions and Basic Information
Ok, I bandy about the terms USENET, Internet, and "the net," among others in this column. Its time to update the definitions of these items.
USENET, internets, and the Internet
USENET, oftentimes referred to as "the net," is a loose collection of cooperating computers. In the past, all of USENET ran UNIX, but now with other computers and operating systems supporting UUCP or similar transfer protocols, USENET computers could be running anything from MS/DOS to VAX/VMS. A computer is considered to be on USENET if it communicates via electronic mail to other computers on USENET (I realize I've provided a slightly circular definition). USENET consists of Electronic Mail, file transfers, and Network News. Most of the programs you read about in this column are distributed via Network News.If your computer talks to USENET or another computer via some forwarding gateway using a protocol other than UUCP, you are on an internet, short for inter-network. Being "on an internet" just means that you are using some network other than USENET. Note that this internet is spelled with a lower case i, and includes the Internet and several other networks such as CSNET and BITNET. When you're on an internet, your actual connection to USENET is via a gateway computer that talks to both the network you use and USENET.
The Internet (capital I) is the set of computer networks that are interconnected by the TCP/IP protocol and listed in the routing tables maintained by the InterNIC. This giant network of networks grew out of the Defense Department's ARPANET (Advanced Projects Research Agency Network). While USENET sites make phone calls to other computers, sending information in a store and forward fashion, the Internet is mostly a set of machines with permanent or on-demand connections that allow direct real-time communication between any two computers on the network. In addition, Internet lines usually run faster than the dial up lines used by UUCP. Most inter-city traffic and the vast majority of Network News is now transfered via the Internet.
The Internet is undergoing rapid change, and any column that attempts to describe it is chasing a moving target. In fact, as of the last months of 1993, as this column is being written, the Internet is on the verge of a very major change: it is moving from the public to the private sector, with NSFNET becoming the National Information Superhighway. However, to give a quick description of its current structure, I can say that Internet is a set of nation- wide backbone links connecting areas of the country at 45Mb/s (million bits per second). Connected to those backbone links are many regional networks running at speeds between 1.544Mb/s and 45Mb/s. Connected to the regionals are individual networks such as the networks at Datacomp Systems, Inc. These networks connect to the regionals at between 9.6kbps (on-demand dial-up links) through 1.544Mb/s and 45Mb/s (leased line connections).
Whereas connection to USENET via UUCP usually provides only mail and news, the Internet runs the TCP/IP protocol and thus supports news (NNTP, Network News Transfer Protocol), mail (SMTP, Simple Mail Transfer Protocol), remote logins to any computer on the network on which you have an account telnet), remote file transfer (FTP, file transfer protocol), and many real-time on-line search engines including Archie, Gopher, and World-Wide-Web. All of these services coexist and work in real time.
Internet and USENET Addressing
The Internet performs much of the bulk transfer work for USENET; problems often occur because Internet and USENET use two different addressing methods. Since a large amount of the software mentioned in this column comes from USENET or the Internet, it's worthwhile to understand how to format the two types of addresses. A UUCP or USENET address is made up of site names separated by ! characters, as in uunet!dsinc!syd. If a site wants to mention more than one "well known site" to use as a route, it usually lists them in { } characters, as in {uunet, decwrl}!dsinc!syd. In this case, you can use either uunet!dsinc!syd or decwrl!dsinc!syd. USENET addressing presents a problem to do it you must know the complete path from your site to the destination site. Some systems run programs to help with this routing, and USENET's UUCP Mapping Project publishes maps to automate this process. However, not all sites have registered to be listed in these maps. Registration is free, recommended, and accomplished by sending your entry to rutgers!uucpmap. The mapping project continuously updates the maps and distributes them via the USENET news group comp.mail.maps.On the Internet, all sites have a unique "Fully Qualified Domain Name" which is administered by the NIC. My site's domain name is node.DSI.COM, where node is the individual computer at my site. Thus my full current address is syd@dsinc.DSI.COM, but our mailer, like the mailers at a lot of Internet sites, is smart, and knows how to forward the mail to me even if you send it to syd@DSI.COM. This feature allows me to move around within the DSI.COM domain without having to tell everyone a new address. The Internet does not require you to know the path to the site; you only need to know the domain name. The domain name is the complete address to that site.
Now, a word of warning. Mixing both @ and ! in the same address leads to trouble. Not everyone follows the standard and processes the addresses correctly. Converting sitea!user@DSI.COM to a UUCP address would properly result in dsinc!sitea!user. Note that the @ has higher precedence than the !. Many sites get this standard wrong, and will cause your mail to bounce (be returned to you as undeliverable). Some sites, ours included, allow UUCP mail to have addresses including domain names in the ! path, as in dsinc!host.domain.type!user. Where allowed, this form of addressing is usually more reliable than mixing the ! and @'s.
Public Domain vs. Freely Distributed Software
Lastly, what is Public Domain Software and what is Freely Distributable Software? Much of the software described in this column is Freely Distributable, in that you pay no licensing fee if you are acquiring it for personal use. Some distributors even allow business use of Freely Distributable software for no fee. While most software in this column is Freely Distributable, almost all of it is not in the Public Domain. If software is in the Public Domain, either the copyright has run out and was not renewed, or its authors have specifically renounced copyright protection and have placed the software in the Public Domain. For most software mentioned in this column the copyright to the software is held either by the author or by some group. They then give the user rights to use and distribute the software for no charge. This practice does not place the software in the public domain. You still cannot sell this software, nor pretend that you wrote it. Many of the licensing agreements restrict how the software can be used for business purposes.Freely Distributable software is also different from Shareware, in that Shareware developers expect the user to pay a fee if he or she intends to continue using the program. Freely distributable software developers do not.
USENET Network News
This column refers to items posted to the source news groups of USENET Network News. How do USENET and USENET Network News differ? USENET Network News is a subset of the computers on USENET and the internet that agree to exchange one or more of the categories of Network News. Currently there are about 8,000 different news categories, called newsgroups. The newsgroups are broken down into several hierarchies. These categories include the traditional major hierarchies of news, comp, rec, sci, soc and talk; regional hierarchies such as na, usa, ba, pa, nj (and others); and specialized hierarchies such as bionet, biz, bit, UNIX-pc, u3b (and others). The major hierarchies are the most widely distributed, accessing over a million computers worldwide. Regional hierarchies distribute messages of interest only over a particular region, such as North America (na), the United States (usa), the San Francisco Bay Area (ba), the state of Pennsylvania (pa), or New Jersey (nj), just to name a few. The specialized hierarchies serve communities with special interests.Each hierarchy has its own set of rules, which are enforced by consensus. USENET itself has no governing body, just a set of guidelines, that individual computer owners or administrators follow as they see fit. This scheme seems to work most of the time, as the net runs without too much chaos. There is even a hierarchy that runs without rules, called alt.
You are considered to be a recipient (or to pass) network news if your computer subscribes to one or more of the newsgroups in any of the hierarchies. Some sites receive only a handful of the groups, some receive most, and some receive all. However, we are talking about a large amount of information over 60 megabytes of new postings every day. This volume is growing at about 7% per month. At 60 megabytes per day, each site can keep only a small portion of the feed on line at a time, and at that, only a few days worth.
With so much information coming in each day, it would seem like a lot of work just maintaining it, or finding something worthwhile to read. It's not that bad. The software to run the news system controls itself almost automatically. The software includes facilities to send only those groups to which a recipient subscribes, as well as expiring old articles to recover space. However, to be a major site in the USENET Network News distribution system does require a large amount of disk space, and considerable modem time.
Several of the groups are of interest to readers of this column; these are the source distribution groups. Generally, these groups congregate under the comp hierarchy in a collection called sources, thus the names comp.sources.unix and comp.sources.misc. Originally, comp.sources.unix released Freely Distributable sources that were designed to run on UNIX systems. Many of the sources posted there now also run on personal computers and other operating systems, but all can run on UNIX. Some of the other sources groups in the comp hierarchy currently are: amiga for software specific to amiga systems, atari.st for atari-specific software, games restricted to game software (and game software is also restricted to this group), mac for Apple mac's, misc general software, not necessarily for UNIX systems, reviewed a peer-reviewed software source code group, sun software specifically for Sun Microsystems workstations, and x software for the X windowing system.
Authors submit their sources to a moderator, who is the only person allowed to post to the group. The moderator bundles the sources for distribution, checks that they are complete, and posts them. The moderator also assigns Volume and Issue numbers to each of the postings. A submission might require several issues, because an issue is limited to 60K to 100K bytes. The moderator also posts periodic indices of the sources posted to his group. Unlike some groups, moderated groups post no discussions of software. Moderated groups post only software. This restriction gives moderated groups what is called a high "signal-to-noise ratio."
Because of the high "signal to noise ratio" in these groups, some computer sites around the world save the sources for future access. These sites are called archive sites. Each archive site decides on its own what groups to archive and for how long to keep the archives. It's to these archive sites I refer you to obtain the sources mentioned in the column. Why to the archive sites? Because the individual members of USENET, unless they archive these groups, will have deleted the sources to make room for the newer postings, usually within a week of the original posting.
60M a Day, How Can This Work?
A small bit of simple math applied to the Network News volumes yields some impressive numbers. If your computer exchanges network news with two neighbors (a small site), you are receiving 60Mb a day for a full feed, and sending that 60Mb each day to the second site. As a participating site in Network news broadcasts, you send any article you get from one site to all other sites you are connected to that have already not received the article. Now transferring 60Mb per day on a 9600 bps phone line (960 characters per second maximum speed) requires approximately 36.40 hours on the phone per day. Not possible, you will fall behind very quickly. The solution is to send articles in compressed batches. The compression reduces the batch sizes by about 50-70%, cutting that 36.40 hours to 12-18 hours. Still a big phone bill. How do sites cut that down even further? Most big sites run special modems, such as V.32bis/V.42bis (at 19,200bps) or Telebit Worldblazers (at 2,250cps), which cuts transmission time down by another factor of two, to about 6-9 hours per day.A major site might exchange news with 20 or more neighbors. How can they do that? Several ways one is via a whole bank of modems. Another way to send data to so many sites is to only exchange partial feeds of selections from the list of 8,000 newsgroups. And the last way is via the high speeds offered by the Internet.
Getting Software
Network News Software
There are now two current Network News transport software suites: C News and INN.The current version of the traditional Network News transport software is named C News, not because it is written in C, but because it follows A News and B News as the third rewrite of the transport software. C News supports transfer of the news articles (the individual messages) between every member of the USENET network. C News works best for smaller sites that mostly have UUCP feeds, and that feed a limited set of neighbors.
Larger sites, especially those with Internet connections, generally run Internet Network News (INN) from Rich Salz. INN is largely responsible for cutting down the time it takes for an article to be propagated throughout the backbone networks. Whereas C news uses batching to distribute articles in bulk, INN uses an immediate transfer to its NNTP (TCP/IP based network news protocol) neighbors. Thus an article on Internet now reaches most of the backbone and regional network sites in only one to five minutes. (Just three years ago this delay was close to a day).
Getting Software Mentioned in This Column
Since particular sites keep news articles online for a short period of time (usually less than two weeks), by the time a piece of software appears in this column, it will have been expired and deleted for a long time. Thus you must access a news archive site. Many sites around the country have agreed to archive specific news groups. These sites are listed in the comp.archives news group. Many of the archive sites also identify themselves in their USENET Mapping Project map entry. I have even listed some in this column. How you access the archives depends on where they are, and how that site has set up access. Most archives allow either FTP or UUCP access and a few even allow both.If a site supports FTP access, you need to be on the Internet to access it. FTP allows you to open up a direct connection to the FTP server on a remote system and transfer the files directly to your system FTP will prompt for a user name and optionally a password. Most FTP archive sites allow you to enter a user name of "anonymous." If such a site then prompts for a password, any password will work, but convention and courtesy dictate that you use your name and site address for the password.
If a site supports UUCP access, anyone with UUCP can access the archives. Most sites of this type publish a sample entry for the Systems file (L.sys) showing the system name, phone number of their modems, the connection speeds supported, and the login sequence. Using the uucp command you can poll the system directly and retrieve the software. Many sites post times-of-day restrictions on when you should access the modems. Courtesy dictates that you follow their requests, and some sites enforce the limit with programs. Be sure to call far enough before the end of the period to complete your transfer in time.
A third transfer method, used for smaller files, is through access to an electronic-mail-based archive server. In this method, you send an electronic mail message to the archive server's mailbox name specifying the files you wish. The server will return the files to you via electronic mail. Remember that many sites limit the size of a single mail message, so don't ask for too much at once. Also remember that the archive server is a program, so phrase your request exactly as specified in the instructions for that archive server, and limit your message to exactly that request. Other comments in the message could confuse the program and make it fail to honor your request.
Lastly, if your site is not connected to any network, some archive sites will copy the software onto media for you, if you send them a disk or tape along with return postage and a mailer. Other sites sell media with the software already copied onto it. This practice is especially useful for the largest distributions, such as the X windowing system, which spans multiple tapes.
If you don't have Internet access, but subscribe to UUNET, UUNET will retrieve the files via FTP for you and make them available for UUCP access.
What to Retrieve
When I list a package from the newsgroups, I provide five pieces of information for each package: The Volume number, Issue(s) numbers, archive name, the contributor's name, and the contributor's electronic mail address. The Volume and Issue are specifically named in the listing. The archive name is in italics, and the contributor's name is followed by his or her electronic mail address, enclosed in<>'s.To locate a package via WAIS or archie, use the archive name. The archive name is the short, one-word name in italics given with each listing. To find the file at an archive site, use the group name (from the section of the column you are reading I place all listings for each group together in the column), the volume number, and the archive name. Most archive sites store the postings as group/ volume/archive-name. The issue numbers tell you how many parts the package was split into when posted. You can use issue numbers to be sure to get all of the parts.
In addition, I report on patches to prior postings. These patches also include the volume numbers, issue(s) numbers, archive name, the contributor's name, and the contributor's electronic mail address. Patches are stored differently by different archive sites. Some sites store patches along with the original volume/archive name of the master posting. Some sites store them by the volume/archive name of the patch itself. The archive name listed is the same for both the patch and the original posting.
Alt.sources, being unmoderated, does not have volume and issue numbers. So I report on the date in the "Date:" header of the posting and the number of parts in which it appeared. If the posting was signed an archive-name by the contributor, I also report on that archive name. Archive sites for alt.sources are harder to find, but they usually store things by the archive name.
Where to Retrieve Listings
The problem then, is finding out which sites archive which groups, and how to access these archives. I again refer to the articles by Jonathan I. Kames of the Massachusetts Institute of Technology, posted to comp.sources.wanted and news.answers. These articles appear weekly and explain how to find sources.As a quick review, here are the steps:
I. Figure out in what group, Volume, and Issue(s) the posting appeared. Also try and determine its archive name. If you know these items, it's usually easy to find an archive site that archives that group. Most archive sites keep their information in a hierarchy, ordered first on the group, then on the volume number, and last on the archive name. These specifications together usually make up a directory path, as in
comp.sources.unix/volume22/elm2.3In that directory you will find all of the articles that made up the 2.3 release of the Elm Mail User Agent that was posted in Volume 22 of the comp.sources.unix newsgroup. If you do not know the archive name, but do know the volume, each volume also has an Index file that you can retrieve and read to determine the archive name. UUNET is one common publicly accessible archive site for each of the moderated groups mentioned in this article.II. If you do not know which sites archive the groups, or even if any site is archiving a particular item (because they are not archiving the entire group), consult Archie. (See "On the Networks," CUJ August 1991, Vol. 9, No. 8). Archie is a mail response program that tries to keep track of sites reachable via FTP that have sources available for distribution. Even if you cannot access the archive site directly via FTP, it is worth knowing that the archive site exists because there are other ways of retrieving sources available only via FTP. Archie can help you find out if the archive site exists, and where.
III. If you know the name of the program, but do no know what group it was posted in, try using Archie and search for the program based on the name. Since most sites store the archives by group and volume, the information returned will tell you what newsgroup and volume it was posted in. Then you can retrieve the item from any archive site for that newsgroup.
IV. If you do not even know the name, but know you are looking for source code that performs some function, retrieve the indexes for each of the newsgroups and see if any of the entries (usually listed as the archive name and a short description of the function) look reasonable. If so, try those. Or, make a query to Archie based on some keywords from the function of the software, and perhaps it can find items that match.
CD-ROM Archives
CD-ROMs containing USENET-posted sources, as well as other sources, are also available. Two of the larger publishers are Walnut Creek CD-ROM and Prime Time Freeware.Walnut Creek CD-ROM, 1547 Palos Verdes Mall, Suite 260, Walnut Creek, CA (800) 786-9907 or (510) 947-5996 publishes several CD-ROMs each year. Published software includes the Simtel20 MS-DOS Archive, the X and GNU archives, MS-Windows sources, and other collections of sources and binaries. Disks run from $25 to $60 each (varying by title) plus shipping. In addition, Walnut Creek offers those hard to find CD-caddys at reasonable prices.
Prime Time Freeware, Prime Time Freeware, 370 Altair Way, Suite 150, Sunnyvale, CA 94086, (408) 738-4832, <ptf@cfcl.com>, publishes twice a year a collection of Freely Distributable source code, including the complete USENET archives. Prime Time's disks run about $60 each set plus shipping. The latest issue, 1993, has over 3 Gb of source code spread over two disks. Prime Time also offers a standing subscription plan at a discount.
Conclusion
I hope this special edition of my column has given you a hint as to how to read my column and track down the sources. Note: I have been asked many times if I can make floppies or tapes containing the software mentioned in my column. I cannot spare the time to do this. I also have to work (and teach) for a living, and if I started doing this, I could easily spend all my time trying to fulfill the requests and never get any of my work done.However, what I have offered to do in the past, and am still willing to do, is provide a list of USENET sites in your area code. Send me a self-addressed, stamped envelope (my address is in the bio squib attached to this column). Those living in major metropolitan areas, please include two stamps on your letter. Note: I can only offer this service for US area codes. If you have net access, but need a news neighbor, I will also reply to Electronic Mail asking for nearby news sites.