Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author, and president of Datacomp Systems, Inc., a consulting and contract programming firm specializing in databases, data presentation and windowing, transaction processing, networking, testing and test suites, and device management for UNIX and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the Internet/Usenet mailbox syd@DSI. COM (dsinc!syd for those who cannot do Internet addressing).
It now has been a year that I have been writing this column for The C Users Journal. I think it's time for an update on what the networks are and how you can get on the ones that this column references. In last January's issue, I discussed the internet vs. the Internet, USENET, Network News, and freely distributed software. I will review and update this information before crossing into some new ground. Again, this column is my personal view of both the history and the current state of the networks. It is not intended to be a proper chronological history of USENET or Network News.
Network News: A Review
In this column, I refer to several USENET Network News groups, including comp.sources.unix, comp.sources.misc, comp.sources.games, and alt.sources. I mention software, usually in C, that has been posted to these groups that might interest CUJ readers. This leads to several questions: What is USENET Network News? How do I get to the software? What is the phone number to download the software?USENET is a moniker given to the UNIX Users Network. It's an outdated name, but still survives. USENET started as a small collection of UNIX computers connected with UUCP. Today, USENET is an amorphous mass of computers connected by a variety of protocols. The computers run different operating systems, including many of the popular personal computer operating systems. What makes a computer a member of USENET (or at least connected to USENET)? Generally a computer is considered connected to USENET if it can exchange electronic mail with other computers that are connected to USENET. A rather broad definition, and one that has blurred lately as other networks exchange mail with USENET computers via gateways. Perhaps it should be restricted to computers where you can exchange electronic mail with other computers on USENET without going through a gateway. This is a revised version of the older definition of a collection of computers that communicate using UUCP.
Joining USENET is easy. All you do is find a member who is willing to connect you via his or her computer. Sort of a Catch-22 you need to know who is already a member to become one. A partial list of connected sites is published only via USENET Network News, which you cannot receive until you are connected. Read on, anyway, and I'll present the solution to this Catch-22.
If USENET is an electronic mail network, what is Network News? Most readers might be familiar with the concept of a Bulletin Board System. A BBS is a computer that holds a database of messages posted by other members of the system. Your computer dials the BBS computer, and you read and post messages. Generally these are local systems, and usually small. Message bases range in the megabytes of information. One collection of BBSs has formed a worldwide organization called Fidonet that exchanges messages between systems, but most are small computers run by individuals. Network News is similar and yet different from a BBS. With Network News, users post messages, but that is where the similarity ends. On a BBS when you post a message, that message sits on the local computer to be read by others until it expires and is deleted. With USENET Network News the entire message base exists on every computer in the network. Thus the message you post is sent to every other computer. You read messages on your own computer, even though they were posted by others elsewhere.
How do USENET and USENET Network News differ? USENET is any computer connected for electronic mail to other USENET computers. A subset of these computers also agree to exchange one or more of the categories of Network News. Currently there are about 1,000 different news categories, called newsgroups. The newsgroups are broken down into several hierarchies. These include the traditional major hierarchies of news, comp, rec, sci, soc and talk; regional hierarchies such as na, usa, ba, pa, nj, and specialized hierarchies such as bionet, biz, bit, unix-pc, and u3b. The major hierarchies are the widest distributed, accessing over several hundred thousand computers worldwide. The regional hierarchies are used for messages only of regional interest, such as North America (na), the United States (usa), the San Francisco Bay Area (ba), the state of Pennsylvania (pa), or New Jersey (nj), just to name a few. The specialized hierarchies serve communities with special interests.
Each hierarchy has its own set of rules, which are enforced by consensus. USENET has no governing body, just a set of guidelines that individual computer owners or administrators follow as they see fit. It seems to work most of the time, as the net runs without too much chaos. There is even a hierarchy that runs without rules, called alt.
You are a recipient of network news if you subscribe to one or more of the newsgroups in any of the hierarchies. Some sites only receive a handful of the groups, and some receive most or all of the groups. This involves a large amount of information more than 20 megabytes of new postings every day. This is up from about two megabytes of posting per day only two years ago. At 20 megabytes per day, each site can only keep a small portion of the feed on line at a time, and at that, only a few days worth.
With so much information coming in each day, it would seem like a lot of work just maintaining it, or finding something worthwhile to read. It's not that bad. The software to run the news system controls itself almost automatically. It includes facilities to send only those groups to which you subscribe, as well as expiring old articles to recover space. However, being a major site in the USENET Network News distribution system requires a large amount of disk space and modem time.
The source distribution groups may interest readers of this column. These groups are generally grouped under the comp hierarchy in a collection called sources, thus the names comp.sources.unix and comp.sources.misc. Originally comp.sources.unix was for freely distributable sources designed to run on UNIX systems. Many of the sources posted there now are also able to run on personal computers and other operating systems, but all can run on UNIX. The other sources groups currently in the comp hierarchy are: amiga for software specific to amiga systems; atari.st for software specific to these computers; games restricted to game software which is in turn restricted to this group; mac software specifically for Apple Macs; misc general software, not necessarily for UNIX systems; sun for Sun Microsystems workstations; and x software for the X windowing system. This column generally restricts itself to the unix, misc, and games groups.
Authors submit their sources to a moderator, who is the only person allowed to post to the group. The moderator bundles the sources for distribution, checks that they are complete, and posts them. The moderator also assigns volume and issue numbers to each of the postings. A submission might take several issues, because an issue is limited to about 60,000 bytes. The moderator also posts periodic indices of the sources posted to his group. No discussion is allowed in these groups, and only postings approved by the moderator are accepted. Because of the high "signal to noise ratio" in these groups, some computer sites around the world save the sources for future access. These sites are called archive sites. Each archive site decides what groups to archive and how long to keep them. Contact these archive sites to obtain the sources mentioned in the column. You must contact the archive sites because the individual members of USENET, unless they archive these groups, will have deleted the sources to make room for the newer postings, usually within a week of the original posting.
20Mb A Day
A bit of simple math yields some impressive numbers. If your computer exchanges Network News with two neighbors (a small site), you are receiving 20Mb a day for a full feed, and sending that 20Mb each day to the second site. Network News broadcasts all articles to all sites that have not yet received them. Any article you get from one site, you send to all others that you are connected to and that have not received the article. Now, 20Mb per day on a phone line at 2,400 bps (240 characters per second maximum speed) yields approximately 24.25 hours on the phone per day. This isn't possible as there are only 24 hours in a day. In addition UUCP doesn't achieve the 240 cps maximum for 2400 bps due to delays and overhead. This problem is solved by sending articles in compressed batches. The compression is usually about 50-70 percent effective, cutting that 24.25 hours to eight to twelve hours. Still a big phone bill. How do sites cut that down even further? Most big sites run special modems, such as V.32 (at 960cps) or Telebit Trailblazers (at 1,200-1,400cps), which cuts it down to about two to three hours per day. It you talk to two sites, expect that to be four to six hours total.A major site might exchange news with 20 or more neighbors. They can accomplish this several ways using a bank of modems, exchanging partial feeds of selections from the list of 1,000 newsgroups, or using the Internet.
The Internet
Modems and serial protocols are not the only way to connect computers. Many larger computer sites, including corporations, universities, and government installations, have banded together to form a large network called the Internet. This network uses the TCP/IP protocol, designed by the Department of Defense and its contractors, to communicate over very high-speed links. Usual links for Internet sites range from 9,600bps (about 1,000cps), to 56kbps (about 5,400 cps), to 1.544Mbps. A full news feed at 1.544 Mbps doesn't take very long at all just a couple of minutes if it were sent as a big batched file transfer. The newest network connections that are the backbone of the Internet are switching from 1.544Mbps to 45Mbps, allowing for even greater capacity.Network News on the Internet uses a special protocol on top of TCP/IP for more efficient distribution. This is NNTP or the Network News Transfer Protocol. Switching from a backbone network of mostly UUCP connections to using NNTP over the Internet has improved the flow of network news and allowed for its dramatic growth over the past few years. Just two years ago, an article took several days to propagate to all the USENET computers (at that time tens of thousands of computers). Now it takes less than a day and usually only several hours to reach most of the hundreds of thousands of computers.
The Catch-22 Revisited
I promised to explain how you too can join the systems on USENET. It isn't that hard, and you can use several approaches.First, you can try to find some site near you that is already connected, and then connect to them. You might begin with local user groups. One of the newsgroups is comp.mail.maps, a collection of a list of sites on USENET. As I have offered in the past, if you send me a self addressed envelope with sufficient postage, I will run a program I have that will search the maps for sites in your area code. Those that live in major cities should provide two stamps, other areas need only one stamp. This list will include the site contact person for each of the sites in your area code. (Please provide your area code for me.) Many of them may not be willing to allow you to connect, for various reasons, but usually you can find one who will cooperate. These maps only list sites that use the UUCP protocol, so your computer must be able to communicate using UUCP. If you have access to electronic mail, it is easier for me to respond to electronic mail for the request. Note that you can reach me via AT&T Mail, Compuserve, Easynet, MCI Mail, or Sprintmail. Ask your network administrator how to send mail to an Internet address.
A second approach is to connect via a major site that sells reconnect services. This would include sites like mine (Datacomp Systems, Inc, 215-947-9900), UUNET Communications Services (703- 876-5050), or Portal Communications Company (408-973-9111). Each offers UUCP connections for a fee. These fees vary by the amount of information (time connected) transferred and can range from about $20 a month to several hundred dollars per month.
A third method is to read network news on another computer that is already connected. Several "public access" sites exist. They vary from free, to just a couple of dollars a month, to several dollars per hour. Portal is an example of one of these public access sites. Another is the Whole Earth 'Lectronic Link (415-332-4335). Both charge for their connect time. Public access sites also have their own hierarchy of network news and list many almost free connections. Generally the number of modems available for customers goes up with the price of the connection.
Back to USENET
Okay, enough on News. After all, USENET started with electronic mail. But how do you send electronic mail via USENET? If you've read enough of my columns, you see in the author's reference section an address called an electronic mail address. This is the key to sending mail via USENET. My address is syd@DSl.COM as listed in that section. How does that relate to USENET?UNIX computers have included mail facilities from almost the beginning. This mail allowed for sending messages from one user to another using their login names as the address. When UUCP came along, the address was extended to include the site name and the login name separated by an '!', as in my UUCP address dsinc!syd. However, to send to dsinc!syd, you had to be directly connected to dsinc. So dsinc had to be an entry in your UUCP node table in order for your computer to call my computer and exchange the mail.
As the number of computers grew, this method became impractical. So the concept of routing was introduced. If you talked (had the site in your UUCP site file) to abc, and I talked to abc, then you could send me mail as abc!dsinc!syd. Extend this and you end up with the long routes associated with UUCP such as abc!def!ghi!jkl!mno!pqr!dsinc!syd. Each is a hop on the message's journey from your machine to mine. But how do you determine, in advance, the routing that the message needs to take to efficiently travel from your computer to mine? Some sites might only exchange mail once a week. If you choose one of those as an intermediate hop, the message could be delayed for a long time.
USENET Network News comes to the rescue. The newsgroup comp.mail.maps contains periodic postings of information on the connections of each site registered on USENET. (Unfortunately, not all sites bother to register, although the process is painless and free.) A freely distributable computer program called pathalias takes this data and makes a file listing how to get from your site to each of the other sites in the maps. It produces a rather large file. With this paths file, the mail transport agent (the program that delivers the mail) on your computer can route the message for you, speeding it on its way.
If you only have a couple of connections, there is an easier way. You can always send the message to one of your neighbors, who can route it for you. This is known as hop routing. If your neighbor cooperates, hop routing can be a very effective method. Each neighbor then sends the message to a more intelligent neighbor until someone knows how to route it. This method is recommended for small sites, as it gives the larger sites a chance to optimize the delivery of the messages. The mail software supports hop routing via the concept of a smart-host. Your smart-host is a site you send the mail that you don't know how to route.
So far so good, this handles dsinc!syd. What do I do with syd@DSI.COM, which is not a UUCP address? UUCP addresses are not guaranteed to be unique. Two sites could easily choose the same name, which can and does lead to problems. Secondly, not all sites use UUCP for delivery.
Over the years, the Internet has developed a mail addressing scheme that uses the notation user@site, where user is either their login name, or some nickname that the site will translate into their login name. Site is a dot-separated list of items that make up a fully qualified domain name (FQDN).
FQDNs are advantageous because they are guaranteed to be unique. Also, they don't need to be routed since each site can deliver them using hop routing. Some UUCP sites can't handle FQDNs. The maps give a list of sites that can. You just forward them to your smart-host or to one of those sites.
Thanks to the interconnection and speed of the Internet backbone, using FQDNs has several advantages. Mail is now delivered with very few hops, making for much quicker delivery. The Internet doesn't need maps since it can now distribute the lookup over the entire network. A small explanation will show why the UUCP domain doesn't always work. My base computer is dsinc.dsi.com. If you are somewhere on the Internet, (anywhere in the world) and want to contact me at that machine, your computer will do a Domain Name Service query, asking who is dsinc.dsi.com. It does this by sending a request to its local (on site) program for nameservice queries asking that program who is dsinc.dsi.com. That program looks in its in-memory cached information for that name. If it finds a match, it returns my address. If it doesn't find a match, it looks for dsi.com to see if it finds a match. Again, if no match, it looks for com (one of the root domains). There is a fixed list of servers for the com domain, each of which is given the registration information for every "second-level domain" such as dsi. The nameservice query program asks that server for information on dsi.com and gets told which machine to ask to find out about machines in the dsi.com domain. It then makes one more request to that machine for dsinc.dsi.com's address. In two queries, it has my address. It then connects directly to my address, very much the way you dial a phone to connect directly to someone else's phone. Thus, every machine on the Internet is a single hop from every other.
So far, so good, but some FQDNs are not on the Internet, including the popular compuserve.com. Domains not directly on the Internet list a mail forwarder for themselves, which is on the Internet. Their mail is delivered to the mail forwarder who handles the final delivery.
If It Were That Easy
Looks easy. All you need to do is send the mail up the chain to a smart site and bingo, it's delivered, right? Wrong, unfortunately. The UUCP world and the Internet world have a small problem talking to each other. This problem is routing. In the UUCP world, each message is source routed, as in abc!dsinc!syd. In the Internet world, each message is directly routed, as in user@site. What if you want to send a message to a site off of mine, called abc. Would that be abc!user@dsi.com, or abc@dsi.com!user or dsi.com!abc!user, or what? There is no standard because there is no consensus of software. Some software evolved doing it one way, some another. There is a recommendation but no way to enforce that recommendation. On the Internet, if you don't follow the rules, you can be disconnected. Sort of like having your phone removed for misuse. UUCP hops are cooperative, not by edict of a ruling body. There is no standard answer. The recommendation is to use abc!user@dsi.com, but many sites will try to send that to the site abc and then from there to user@dsi.com. Some sites support the notion of dsi.com!abc!user, and some don't. Others pretend to be user@abc.UUCP and let other sites figure it out. Sometimes that works, sometimes it doesn't. If the site doing the routing understands that .UUCP is a fake domain, governed by the USENET maps and not by the normal Domain Name Service lookup servers, it works. Others just give up and throw the message away.If you ask your neighbors what they can handle, and try to coordinate things, you can usually works around these problems. Just don't expect the solution that works for you and your neighbors to work everywhere.
The Commonly Asked Questions
It's time to give some quick answers to the most common questions I am asked.1. Whats the phone number for USENET? Oh boy, my favorite! Please see the paragraphs above about how to find a site to connect with. There is no single phone number, but many options on how you can connect. Some very inexpensive or free, some costly.
2. How do I join the Internet? Most sites don't have to. Usually Network News and USENET access is sufficient. But if you really need direct access to many other computers for sharing information, and have the money to support the connection (varies by region, but usually not less than $1,000/month with installation charges upwards of $10,000), there are regional networks in each part of the country that you would connect to. Send me mail if you are interested and I can try to find out the regional contact in your area.
3. Where is the nearest archive site? This is a hard question, partially because near is relative, and partially because it varies based on what newsgroup is desired. Three popular archive sites are osu-cis (a UUCP reachable archive site listed in the maps), Portal, and UUNET. Ohio State is a free archive; Portal and UUNET charge for access.
4. Can you send me a posting you described in a prior column? Usually I can't. My site is not an archive site. After I write about a posting, it gets deleted (unless I keep it to use it on one or more of our computers). By the time the column appears, the posting is long deleted. So even if I could make floppies or tapes for you, the original file is gone. All I can do is refer you to an archive site.
5. How can I find the electronic mail address for a user at some site? The best way is to call that person and ask for his or her electronic mail address. If you post a request to the network, it gets sent to each of several hundred thousand computers, costing many thousands of dollars of everyone's money. Make a simple phone call and ask.
Lastly, remember that Network News is not a BBS. If you post something to a newsgroup, it is broadcast to hundreds of thousands of computers all over the world. Some of the readers will not have English as their native language, even though most postings are in English. Therefore, think twice before you write something. Check your facts, try not to inflame others, and remember that the comment that you took personally may only be the author's problem in phrasing a reply in a short, written English message. USENET Network News is a cooperative venture, and we all need to cooperate for it to work.
There will always be those that abuse available resources, and those that are quick to reply, insulting others. However, if we try to ignore them, the bandwidth saved may let USENET and Network News survive another year. Good luck finding a connection to USENET. When you do, say hello. Most of us are more than willing to help.