Dear DDJ,
I am reading with great interest the series of articles about the 386BSD port. After reading the 386BSD article in your June 1991 issue, I read Bill Gallmeister's "Reconciling Unix, Ada, and Realtime Processing." Thinking about the benefits of multithreaded processes, I formed a simple idea of how to implement them in BSD-Unix. The base of my idea is something between fork and vfork which I called sfork. It behaves like vfork for the code and data segments, but unlike vfork it copies the stack segment and doesn't block the parent. So, after a call to sfork, both processes (threads) share the same code and data segments but have a different stack segment.
Now the problem of the integrity of the shared data segment arises. This could be solved by implementing another kernel function, atom, that has two parameters, a pointer to a function, and an argument for that function. atom has the job of calling the function with its argument, with the guarantee that no other thread can interrupt the function if it is not blocking due to a call to sigpause. With the aid of atom, all mechanisms for synchronizing like semaphores, mutexes, monitors, and rendezvous could be implemented. All functions using common or static data (including some C-library functions such as printf, malloc, and others) must be synchronized via atom. The data segment must remain shared even through a call to sbrk. exit should not be called because of open file structures, exit must be used instead as an alternative to atom. Dijkstra's P and V primitives could be implemented in the kernel. Listing One, below, should highlight my idea.
I know there may be many problems with my idea which I have not figured out yet, but because I have no access to a BSD source, I am not able to implement my idea and test it.
Finally, thanks DDJ, for a great journal.
Gunter Jung
Neurenberg, Germany
Bill and Lynne respond: Your idea does have merit. In fact, the article in the September issue speaks to this area, and brings up vfork as an example. The only problem we see with your suggestion is the higher cost of the P( ) and V( ) primitives, and that signals would not be reliable (see Section 4.7 of the 4.3BSD book by Leffler, et al, on the sigvec( ) mechanism.) FasterP( ) and V( ) can be built using "test and set" atomic operations. See Maurice Bach's book on Unix for more information in this vein.
You might be interested in knowing that some groups have implemented something similar to what you suggest, called the "shopping" fork. It allows you to have variable weight processes ("I'd like a stack segment, a separate process state, hold the data and text segments.") However, a problem faced by all multithreaded/LWP implementations is to reimplement the library functions in a reentrant form (no small task). Another bone of contention is how to synchronize the threads: If we go through the kernel (yours does slightly, using sigpause), we lose speed but gain robustness. On the other hand, if we stay in user mode, we lose robustness and gain speed.
The standards people are trying to wrangle out all the hits here, but we think that this area will be ripe for experimentation for years to come. Too many experts in the field have differing opinions.
Now that the networking release, which incorporates most of 386BSD sources, is out, you may have a chance to actually build your sfork( ) into the kernel and experiment with it yourself. You can get access. The reason we did 386BSD was so more people could experiment with ideas such as yours. It may not be immediately obvious how to do so, since working in the kernel still requires much training and experience, but you can try. We hope our series helps clarify the internals of 386BSD, so that more may have a chance to gain such abilities.
As to the parts of 386BSD that may eventually handle LWPs, we won't forget you, stay tuned to coming issues, as we can only type so fast!
Dear DDJ,
A propos Kas Thomas's article on entropy (February 1991), Issue 12 of Glottometrika, (a German computational linguistics journal) contains the description (in English) of an algorithm for computing character entropy to any order. The publishers of Glottometrika distribute a fairly snazzy, and it seems, bug-free PC implementation of this algorithm that lets you compute the character or word (your choice) entropy of ASCII text files right up to order-120, as well as generate and retrieve text, without resorting to extended memory or swapping to disk.
If you want to know more, contact Prof Dr. Reinhard Kohler (Universitat Trier, FB II - Linguistische Datenverarbeitung, Postfach 38 25, D5500 Trier, Germany) who handles software distribution or Dr. Rolf Hammerl, Editor, Glottometrika (Ruhr-Universitat Bochum, Sprachwissenschaftliches Institut, Postfach 10 21 48, D4630 Bochum 1, Germany).
J.B.M. Guy
Clayton, Australia
Dear DDJ,
In connection with Victor Duvanenko's interesting article (DDJ, June 1991), it is worth mentioning that a matrix, M, of order N raised to a high integer power, K, can be expressed as a matrix polynomial in M of no higher degree than N - 1. This important result follows from the famous Hamilton-Caley Theorem which states that a matrix must satisfy its own characteristic equation.
By using this result, Duvanenko's illustrative calculation of Fibonacci numbers can be made even more efficient, by about two to one, by taking advantage of the structure and symmetry of all the matrices being squared and all the product matrices being evaluated. Rather than general-purpose matrix multiplications and subsequent moves of these matrices, more efficient specialized calculations can be done and array elements overlayed in the process.
The characteristic equation of any square matrix M may be obtained by evaluating the determinant of the matrix expression, |M - xI| = 0, where I is the identity matrix of order N and x is a scalar variable which defines the polynomial. The Theorem substitutes M for x in this resulting polynomial. M{N} may then be expressed by transposition as the resulting polynomial of degree (N - 1). The expression for any higher power may be obtained by repeated multiplication of both sides by M and substituting for M{N} where it occurs.
A matrix, M, of order 2 is used by the author in an illustration of calculating Fibonacci numbers of higher order, where M is given by Example 1.
(11)
M = (10) and
(F[n]) (F[(N-1)]) (F[1])
(F[(N-1)]) = M * (F[(N-2)]) = M{N-1}*(F[0])
This matrix to any higher integral power, K, can be expressed in a simple first order polynomial in M, for example, M{2} = M + I. It may be verified and proved by mathematical induction that M{K} = F[K] * M + F[(K-1)] * I where F[0] = 0, F[1] = 1 and F[K]= F[(K-1)] + F[(K-2)].
This implies that the matrix MK can be expressed as the symmetric matrix in Example 2, and that Example 3 is also a symmetric matrix with terms given in preferred calculation sequence by F[2K] = F[K] * (F[(K + 1)] + F[(K-1)]), F[(2K-1)] = F[K]{2} + F[(K-1)]{2}, F[(2K + 1)] = F[2K] + F[(2K-1)].
M{K} = (F[(K+1)] F[K])
(F[K] F[(K-1)])
(M{K}){2} = (F[(2K+1)] F[2K])
(F[2K] F[(2K-1)])
Thus, starting with the matrix M in row element order M11, M12, M21, and M22, the matrices M{2}, M{4}, M{8}, M{16}, etc. can each be successively evaluated, as needed, by overlaying these elements as follows: M21 := M21 * (M11 + M22), M22 := M12 * M12 + M22 * M22, M11 := M12 + M22, M21 := M12, where ":+" means "is replaced by."
The product matrix, P, is a selected product of these successively squared matrices, but at every stage of development it, too, is a power of the matrix M, and hence it, too, is symmetric and can be evaluated similarly.
It is also worth noting that although the development assumed that F[0] and F[1] were equal to 0 and 1 respectively, the matrix equation in Example 4 holds for arbitrary values of F[0] and F[1], without changing M even though this changes the values of the sequence.
(F[N]) (F[1])
(F[(N-1)]) = M{N-1} * (F[0])
John T. Godfrey
East Jordan, Michigan
Dear DDJ,
Use outside experts to check out patent applications/peer reviews, like scientific journals? Let the experts handle it? Right. Here are a couple of alternate suggestions. If you still fundamentally believe what your teachers taught you -- that the world should be centrally controlled by experts -- don't bother to read on.
Keep patent applications secret for a while. (I understand that Japan does not.) If the patent office gets two patents for the same thing, they cancel each other out. And if someone shows prior art for a patent, they cancel each other out. Remember, we citizens, as a communal body, are paying the inventor by giving him a patent. How often do you pay for something that you get for free or that you have already paid for?
Well, the patent office people and patent lawyers won't fall in love with that suggestion. Its object is to cheaply and naturally simplify many patent cases. But here's a real beauty that the bureaucrats will love: Put a tax on patents. Yeah, that's right. They have value; so tax 'em. OK, boys, before you call me a "liberal" and toss me out of the tavern, hear me out. We have property taxes on "our" land. Of course, that it's "our" land is just a little joke between us and the public (the government). But we do have more control over our land than someone in, say, Rumania. With control comes responsibility. And it comes in two forms: 1. We can get fined, jailed, sued, or shot if we do bad things with our land; 2. We must prove that we are not wasting the land. We prove it by coughing up property taxes every so often. If we waste the land, then it's harder to pay.
So let's make a patent "owner" demonstrate that he's putting the patent to good use -- not just wasting everybody's time. Make the application as cheap and easy as is possible. The quarterly tax and paperwork will keep it real.
Would either of these things discourage patents? You bet. Would they slow down the constitutional I.8.8, "progress of science and useful arts?" I don't think so. I believe that we could all quit mining for get-rich-quick gold and get back to building, selling, and using things instead.
B. Alex Robinson
Maple Valley, Washington
Dear DDJ,
The article by W.L. Maier on the r250 pseudo random number generator (May 1991) has proven to be extremely useful. I used it, after modifying the initiator as described below, in a program for generating sets of bridge hands for tournament play (see Example 5). Since the probability of all possible hand patterns has been calculated from probability theory (Borel and Cheron, The Mathematical Theory of Bridge), this application provides a good test of r250's randomness. Results from 16 sets of 36 deals, to produce 2304 hands after 29376 r250( ) calls (51 per deal) gave results, shown in Table 1, which agree closely with the theoretical probabilities. In addition, r250( ) was noticeably faster than Microsoft's rand( ). Dealing a set of 36 hands with r250( ) took about three seconds on my 16-MHz 80386, compared with about five seconds using rand( ).
4333 227 9.85 10.54
4432 518 22.48 21.55
4441 75 3.25 2.99
5332 349 15.15 15.52
5422 235 10.20 10.58
5431 301 13.06 12.93
* * * *
* * * *
* * * *
8311 7 0.31 0.12
8320 4 0.17 0.11
8410 1 0.04 0.05
8500 0 0.00 0.003
9999* 0 0.00 0.04
Total 2304 100.01 100.013
* All patterns containing 9-, 10-, 11-, 12-, and 13-card suits.
As published, r250_init( ) required 500 calls to rand( ) in order to initialize the r250_buffer array. I modified r250_init( ) to eliminate those 500 calls for two reasons: 1. to make r250( ) a free-standing pseudo random number generator; and 2. to maintain r250( )'s speed advantage vs. linear congruent generators when only a few random numbers are needed.
Kenneth Lindsay
Laie, Hawaii
Dear DDJ,
I was pleased to read Ken Skier's article "Assembly Language Macros" in the March 1991 issue.
I presented basically the same concept several years ago, except I did it by making procedural instructions part of the assembly language instruction set.
In 1983 I introduced a universal cross assembler which through the use of user-defined tables could generate the machine code for any microprocessor. At the same time I defined a universal set of mnemonics and addressing mode syntax which was applicable to any processor. The instruction set I defined included procedural instructions similar to the macros Ken described.
With my universal assembler, an individual was able to change the instruction set definition table to convert a single assembly language instruction into any desired sequence of machine code instructions and/or data. I chose this approach rather than macros to speed up the assembly process and eliminate the need for having a macro definition file reassembled with every program each time the program was assembled.
Also an individual could write relocatable subroutines which defined procedural instructions. These external subroutine instructions had the same format as normal assembly language instructions except instead of generating machine code to perform the procedure, a call to the subroutine was generated, followed by the data to be used by the procedure. The code for the subroutine was appended to the end of the program's executable machine code as part of the assembly process, without the need for an external linking loader.
In addition, my cross assembler allowed for words to be inserted within an assembly language instruction to help clarify the meaning of each parameter and thus make the source code more readable. There were also many other special features I included to assist in learning and using assembly language.
My universal cross assembler was called MOPI. I tried from 1983 to 1988 to distribute the cross assembler myself, under the name VOCS, without much success. In 1986 I wrote a book titled Universal Assembly Language (Tab Books) describing my proposed universal instruction set and assembler. Although still available from Tab Books, the book never made the best seller list.
Unfortunately, at the time my assembler and book were introduced, C was gaining popularity and nobody was interested in hearing about assembly language. I couldn't get any support for the concept and eventually gave up trying.
I always thought it ironical that while everyone complained how hard it was to read and understand an assembly language program, they embraced C with open arms. Yet C, with all of its special symbols, is equally if not harder to read and understand. Just think, they never had a yearly contest in assembly language to see "who can write the most unreadable code," as they do with C. Although now I do a lot more programming in C than assembler, I still like assembly language.
Glad to see that someone else thinks that assembly language, when used properly, is equally worthy of being used to write complete systems like any other so-called high-level language.
Robert M. Fitz
Plymouth, Minnesota
Copyright © 1991, Dr. Dobb's Journal