Columns


On The Networks

Where To Get The Sources

Sydney S. Weinstein


Sydney S. Weinstein, CDP, CCP is a consultant, columnist, lecturer, author, professor, and president of Datacomp Systems, Inc., a consulting and contract programming firm specializing in databases, data presentation and windowing, transaction processing, networking, testing and test suites, and device management for UNIX and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the Internet/Usenet mailbox syd@DSI.COM (dsinc!syd for those who cannot do Internet addressing).

Another year has gone by, and once again, it's time to update my general information column. In January 1990 (CUJ Vol. 8, No. 1), I wrote about the internet, Internet, USENET, and Network News. In January 1991 (CUJ Vol. 9, No. 2), I wrote about obtaining USENET Network News. Both of those two columns are still valid, and I refer you to your libraries of back issues. (Maybe next year it will be time to update them once again).

However, as a quick review of what was covered, this column normally reports on freely distributable software that has recently been released via USENET Network News. Freely distributable means that one can freely (and for free) make copies of the software, use it as they see fit, and give it away as they desire. It does not mean the software is in the public domain. Most of the software is copyrighted. This means you cannot pretend you wrote it, or include code from it in a product you are selling. However, the authors have allowed you to use and distribute it for free. If you make changes, most authors do not let you call the changed version by the name of the original. This is to avoid confusion as to what is and is not part of the product and to reduce the authors' support headaches.

The sources mentioned in this column are released via several groups distributed as part of USENET Network News. USENET Network News performs two roles. It is a method of distributing information among a very large group of computers, and it is also a somewhat organized collection of that same information into news groups that are distributed via the news software.

First, the software. The current version of the software is named C News, because it follows A News and B News as the third rewrite of the transport software (not because it was written in C). It supports transfer of the news articles (the individual messages) between every member of the USENET network. At present there are about 40,000 computers exchanging network news, and over two million readers. News works by transmitting all articles that a site wishes to read (its subscription list) to that site. Unlike CompuServe, or a BBS system, users read the articles directly on their own computer. Thus you do not need to dial into anywhere to read News, if your computer is a member of the network, you read news directly on your own computer. (For large sites this is a simplification, they generally designate a single computer in their local network as the news host and then use news reader software that transparently accesses that news host as if it was local.)

To achieve this, News uses a flood algorithm. Each member of the network only communicates with a few neighbors (not all 40,000 members). When the member site receives an article from one of its neighbors, it then sends it to all of its other neighbors that have not already seen that article. This method allows the article to be posted (created) at any individual site, and appear at all sites in a very short time. Major backbone sites exchange news via NNTP, the Network News Transfer Protocol, running on top of TCP/IP over the Internet. This allows for very fast exchange of the messages and most backbone sites have received a new posting within minutes of its entry onto the network. Smaller sites usually use the UNIX UUCP protocol to exchange the information, although there is no requirement that the UUCP protocol be used. At present over 25MB of News are exchanged daily.

Twenty-five megabytes per day is too much for all but the major backbone sites to handle, so most sites no longer get a "full feed." Instead, they only receive a limited selection of the newsgroups. This brings us to the second item, the organization of the information in News.

News articles are separated into divisions called newsgroups. Each division is supposed to limit itself to a single topic, and the name of the group is supposed to give you some idea as to the content of the group. These groups are then organized into hierarchies of related topics. USENET Network News started out with just two hierarchies, mod and net. The mod hierarchy had those groups that had a person as the moderator to edit and control the information. The net hierarchy handled all other groups. With the release of B News and its ability to have any single group be moderated or open, the great renaming was undertaken. So, for the last several years, News has consisted of seven main or official hierarchies: comp (related to computers or computing), misc (anything that didn't belong elsewhere), news (related to network news itself, the software, distribution, groups or its administration), rec (recreational topics), sci (scientific topics), soc (social groups), and talk (discussion on controversial topics). In addition, there are several "private" hierarchies including: bionet (biological science including the Human Genome project), bit (bitnet sites and mailing lists, bitnet is another academic network that is not the Internet), biz (business, a commercial hierarchy), clari (paid subscription news including UPI style information), ddn (Defense Data Network reports), gnu (reports from the Gnu's Not Unix project), ieee (Institute for Electrical and Electronic Engineers), u3b (the AT&T 3B computer line) and vmsnet (items of interest for DEC Vax/VMS users). There are also unofficial hierarchies including the alt (alternate) hierarchy and a whole host of regional hierarchies that specialize in regional news. At present there are over 1800 different newsgroups to choose from.

This column normally reports on articles in the comp.sources and alt.sources sub-hierarchies. These include: comp.sources.games, comp.sources.misc, comp.sources.reviewed, comp.sources.unix, and alt.sources. Each of these is an individual news group. All but alt.sources are moderated. For the moderated groups, the authors of the software submit their packages to the moderator for posting. Each group has its own rules for acceptance. Alt.sources is unmoderated and a free-for-all. Sources in that group are posted directly by the author.

What Happens Next?

With that much information flowing into each site every day, most sites cannot keep the information on their local disks for very long. Usually only a couple of days. So, by the time you ready my articles, the sources have been deleted from the machines in the network to make room for newer articles. So, what good does it do to report on what was available (or that is no longer available)?

Most of the information posted on News is worth deleting in even less time that it remains on the local disk. Source code, however, is not one of those low value items. Many sites desire to keep it around so they and others can make use of it. What those sites do is archive the packages so others can access them. These sites are called, archive sites (surprise!). Since the moderated source groups are purely source postings, with no other traffic, the archiving task can be pretty automatic and many sites around the world have agreed to do it.

Alt.sources is a different problem. Since it's a free-for-all, often items other than source code end up in the group, and it takes more work to archive it. Most sites do not archive this group. However, since the better postings in alt.sources are previews of items to be posted later in one of the mainstream moderated groups, there is hope that the source will be available in those group archives.

The problem then, is finding out which sites archive which groups, and how to access these archives. I would like to give credit to Jonathan I. Kames of the Massachusetts Institute of Technology for gathering much of the data on how to find the sources and posting it as a "Frequently Asked Questions" article to the comp.sources.wanted newsgroup and the new moderated newsgroup news.answers.

How To Find Sources

First you have to decide what it is you are trying to find. If it's one of the source postings mentioned in my column, I always list the volume and Issue numbers of the articles in the moderated groups, or just the posting date for alt.sources. I also list a name in italics, this is the archive name of the posting. It's a big help in finding the files once you find a site that archives the information.

OK, Here Are The Steps:

1. Figure out in what group, Volume, and Issue the posting appeared. Also try and determine its archive name. If you know these items, it's usually easy to find an archive site that keeps that group. Most archive sites keep their information in a hierarchy ordered first on the group, then on the volume, and last on the archive name. These together usually make up a directory path, as in comp.sources.unix/volume22/elm2.3. In that directory you will find all of the articles that made up the 2.3 release of the ELM Mail User Agent that was posted in Volume 22 of the comp.sources.unix newsgroup. If you do not know the archive name, but do know the volume, each volume also has an Index file that can be retrieved and read to determine the archive name. One common publicly accessible archive site for each of the moderated groups in this article is UUNET.

2. If you do not know which sites archive the groups, or if any site is archiving that item, even though they are not archiving the entire group, consult Archie. (CUJ August 1991, Vol. 9, No. 8). Archie is a mail response program that tries to keep track of sites reachable via FTP (file transfer protocol, a TCP/IP protocol used by internet sites) that have sources available for distribution. Even if you cannot access the archive site directly via FTP, it is worth knowing that the archive site exists because there are other ways of retrieving sources only available via FTP.

3. If you know the name of the program, but do no know what group it was posted in, try using Archie and search based on the name. Since most sites store the archives by group and volume, the information returned will tell you what newsgroup and volume it was posted in. Then you can retrieve the item from any archive site for that newsgroup.

4. If you do not know the name, but know you are looking for source code that performs a particular task, retrieve the indexes for each of the newsgroups and see if any of the entries (usually listed as the archive name and a short description of the function) look reasonable. If so, try those. Or, make a query to Archie based on some keywords from the function of the software and perhaps it can find items that match. If the task is a mathematical item or algorithmic program that is commonly solved by computer, checkout the netlib archive at AT&T (mentioned later in this column) available via both FTP or electronic mail.

How To Transfer The File

Ok, you have now found the machine that has the software you need, how do you actually get it back to your machine?

First you have to determine what access methods the archive machine allows to retrieve software. Most archive sites are internet based, and support the FTP service. If you have access to FTP on the internet, this is the easiest, and fastest way of retrieving the sources. If you don't, perhaps a local college is a member of the internet and can assist you in using FTP to retrieve the sources you need.

Other sites support anonymous UUCP access. This is access via the UUCP protocol where you don't need to register in advance to call in. UUNET Communications Services supplies this using the (900) GOT-SRCS number at $.40/minute for non subscribers. Many other archive sites provide it for just the cost of your long distance telephone call. If you cannot use FTP, this is the next best method to use.

Many anonymous UUCP archive sites list what they carry in the Nixpub listing of public access Unix sites maintained by Phil Eschallier. The Nixpub listing is posted in comp.misc and alt.bbs periodically. If you don't get News and need a copy, it can be retrieved via electronic mail using the "periodic posting mail based archive server." This is run by MIT on the system pit-manager.mit.edu. To use the server, send an electronic mail message to the address mail-server@pit-manager.mit.edu with the subject "help" and it will reply with instructions on how to use the server.

If you are not on USENET, sending electronic mail to sites on the internet is also possible via the commercial mail services. CompuServe, MCI-Mail, ATT-Mail and Sprint-Mail all support sending messages to Internet addresses. Contact the support personnel at the commercial mail service you use for details on how to send messages to Internet addresses.

The last way to access the sources is via electronic mail. Several sites also make their archives available via automated mail-response servers. Note, this can be a very expensive way of accessing the information, and due to the load it places on the networks, most archive servers heavily restrict the amount of information they will send each day. This can lead to long waits for access to the source you are trying to retrieve. The following are some of the sites that maintain mail based archives:

hrc!archives: This site requires that you issue two commands to get the help message. Place these lines in the body of your electronic mail message:

send path <address>
send help
The <address> should be a UUCP path from a well known site to your mailbox.

netlib@uunet.uu.net: E-mail access is provided to most of the sources archived by UUNET. Place the word "help" in the body of your message to receive information on using the server.

ftpmail@decwrl.dec.com: Digital Equipment Corporation runs a mail based archive server that will retrieve sources via FTP and then mail them to you. To find out how to use this service, send it a message with the word "help" in the body.

netlib@research.att.com: All of the algorithmic and mathematical software available via ftp in the netlib archives at AT&T is also available via electronic mail. Again, send a message with the word "help" in the body for further instructions on using the service.

There are also other servers that specialize in particular sources. Send a message to these with the word "help" in the body for further information. Selected entries from Jonathan's posting in comp.sources,wanted are:

archive-server@ames.arc.nasa.gov: Space archives (also accessible via anonymous ftp to ames.arc.nasa.gov)

archive-server@athena-dist.mit.edu: MIT Project Athena papers and source code (also accessible via anonymous ftp to athena-dist.mit.edu)

archive-server@bcm.tmc.edu: UUCP maps, source-code for BCM WHOIS database, NFS and PC-NFS information and source-code, Unisys U-series information and source code, other stuff

archive-server@cc.purdue.edu: NeXT stuff (also accessible via anonymous ftp to sonta.cc.purdue.edu or nova.cc.purdue.edu)

archive-server@chsun1.uchicago.edu: Computer Underground Digest and references

archive-server@cs.leidenuniv.nl: IPX, patch for MS-DOS, sps diffs for SunOS 4.1

archive-server@dsi.com: elm, patch

archive-server@ecletic.com: Mac-security digest, information about Eclectic, other stuff

archive-server@ncsa.uiuc.edu: NCSA stuff, especially telnet and tcp for MAC and PC compatibles.

archive-server@rice.edu: Sun-spots, sun-source and sunicons, plus other software written or influenced by people at Rice (also accessible via anonymous ftp to titan.rice.edu)

archive-server@sun.soe.clarkson.edu: IBM and other good stuff (also accessible via anonymous ftp to sun.soe.clarkson.edu)

info-server@cl.cam.ac.uk: Various random stuff, including bmx, btoa, c-nrs, gdb, soft-gen, spad, top, unix-niftp, ups (Unix PostScript interpreter)

info-server@doc.ic.ac.uk: USENET source newsgroups, GNU, X11, news software, other stuff

info-server@hp4nl.nluug.nl: Macintosh, Sun, IBM-PC, Unix sources, some documents, GNU, graphics, USENET archives (or lots of newsgroups), X window system, TeX, programming languages (LISP, ICON, ABC, others), news sources, network sources, other stuff

mail-server@cs.ruu.nl: GIFs, Atari ST software, random documentation, ELM sources, USENET FAQ postings, GNU software HP-UX software, NN sources, SGI software, TeX software and TeXhax and TeXmag archives, random UNIX software, X11 software, other stuff (also accessible via anonymous ftp to praxis.cs.ruu.nl

mail-server@rusmv1.rus.uni-stuttgart.de: German TeX archives; benchmarks, journal indices, RFCs, network info, UNIX info; X, mac, pc, sun, aix, vax, and other software (also accessible via anonymous ftp to rusmv1.rus.uni-stuttgart.de)

mailserv@garbo.uwasa.fi: Frequently asked questions in various areas, some USENET source archives, some PC software archives

netlib@draci.cs.uow.edu.au: Australian Netlib (also accessible via anonymous ftp to draci.cs.uow.edu.au)

netlib@ornl.gov: Similar to the AT&T netlib archive

netlib@ukc.ac.uk: UK netlib server (mostly same contents as AT&T's netlib) (some files also accessible via anonymous ftp to harrier.ukc.ac.uk {username guest})

Very Important

It is considered very poor form to use a mail based archive server if any other method is available to you. In addition, accessing any service that is on a different continent than your own via electronic mail is also frowned upon. It is very expensive to transmit information across the oceans. If you use electronic mail via one of these archive-servers, you are forcing someone else to pay those charges. Abuse of this will cause the sites to remove their archive servers, which will hurt everyone. So, please, those in the United States and Canada, please restrict your accesses to those servers in the US and Canada, and those in Europe to those in Europe, and those in Australia, to those in Oz.

But how do I tell where the site is located? Internet addresses do give you a clue. You can tell the country an Internet named site (those with @ in their addresses) is located in by the abbreviation at the end of the address string. Thus .ca is Canada, .ge is Germany, and .au is Australia. Please don't spoil access for everyone else (and your own future access) by tieing up the transoceanic links with mail based archive server traffic.

Hopefully, this special edition of my column has given you a hint as to how to track down the sources. Note, I have been asked many times if I can make floppies or tapes containing the software mentioned in my column. I cannot spare the time to do this. I also have to work for a living, and if I started doing this, I could easily spend all my time trying to fulfill the requests and never get any of my work done.