September 2001/We Have Mail

Departments

We Have Mail

Letters to the editor may be sent via email to cujed@cmp.com, or via the postal service to Letters to the Editor, C/C++ Users Journal, 1601 W. 23rd St., Ste 200, Lawrence, KS 66046-2700.

Dear CUJ,

I’ve belatedly got my hands on your August 2000 issue. In it Craig Hicks offers his C/C++ Tip on creating an index table in STL. Not being qualified to judge on efficiency, I can’t say whether his solution will outperform the STL, but I can say that there is a way to do it in STL without writing as much code. It’s not a function; it’s a container. The associative containers (set, multiset, map, multimap) all sort their contents according to a defined value; for sets, it’s whatever is used to compare two contained objects by less-than (skating over some details, see below), while for maps, it’s the index. For ease of use, let’s do it with a multimap.
class Thing;
vector<Thing> lotsofthings;
multimap<float,
  unsigned int> indexedthings;

for(unsigned int n=0;
  n<lotsofthings.size(); ++n)
{
  float value = lotsofthings[n].value();
  indexedthings.insert
    (make_pair(value,n));
}
indexedthings is now sorted by Thing::value and iterating through it will give indices to lotsfothings in value-sorted order.

That’s a lot less code. Whether it’s a lot less work depends on your STL implementation. The above will, at my last look, fail with Solaris C++ because multimap insert seems to be broken. It will work with GNU (g++) for this case, because float and unsigned int are nice and simple. I was trying to do it with map<float, map<float,void*>::iterator> and compilation failed after successful parsing because the generated lines were too long. I had to derive a class with a nice simple name from the iterator to get it to work. SGI C++ is fine; I haven’t looked further yet. The underlying problem for the GNU failure is that map and its iterators are themselves derived from (or simply are) red-black tree templated containers, so the full names can be lengthy template names. Even for SGI, trying to work out the errors is a pain, because the messages contain several layers of nested templates on a single wrapped line. If you’re writing a compiler this week, please think about writing a few extra << endls to break the lines up. Doing it in the editor by hand is very tedious.

The complex return types for map and multimap for member functions like find (often pairs of pairs) have brought repeated trips to the STL reference books and raised a related question. Heap didn’t make it into STL. But there is an STL priority_queue doing a similar thing: insert things into it and they come out the end in order. Having hacked away at associative containers a bit over this project now, it seemed logical to me that this would be done as a multiset. Insert the items (say, Things) with priority order defined by class less<Thing> whose operator< compares Thing::value, and the top priority item is always available at set location rbegin. But when I read the books, priority queues are implemented with vector and deque. Why? Extraction at the back will be fine, but insertion will be slow. Sets should do both in reasonable (i.e., logarithmic) time. Does anyone know a good reason for this? And would heap (out at the top of the tree instead of the end) be any more efficient than set?

Many thanks.

Chris Marooney

Hi Chris,

Two things I like about your solution are:

(1) The code is short and all in one place, i.e., no scrolling required.

(2) You use unsigned int as the index table type instead of an iterator.

Concerning (1): To have all the information in one place is great. When that strategy is an option, I think it is best. Unfortunately, there are some times when such a strategy is not an option.

Consider
class Thing { int a, b, c;  };
where you want to sort on a first, b second, and c third. (Or suppose thing is a string.) Then we can’t use lotsofthings[n].value as in your example.

That’s where a functor comes in really handy. You can do almost anything with a functor except play the violin. How I would modify your code in such a case is as follows:
namspace SomeFunc_ns
{
struct disposable_ftr
{
  Thing* p;
  bool operator()(int x, int y)

    return p[x].a<p[y].a
    || (p[x].a==p[y].a &&
      p[x].b<p[y].b)
    || (p[x].a==p[y].a &&
      p[x].b==p[y].b) &&
      p[x].c<p[y].c));
  }
}
}
void SomeFunc(...)
{
  using SomeFunc_ns;

  vector<Thing> lotsofthings;
  ............

  disposable_ftr f;
  f.p = &*lotsofthings.begin();
  set<unsigned int,
    > indexedthings(f);
  for(unsigned int n=0;
    n<lotsofthings.size(); ++n)
    indexedthings.insert(n));

  .................
}
You see I have taken the one level of indirection (an integer reference to an array) and the particular sorting requirement and bunged them both into disposable_ftr (which has no chance of ever being reused again), achieving a measure of economy.

This is disposable coding, the alter ego of reusable coding.

A reusable function:
template <class T, class Comp=less<T>>
struct reusable_ftr {
  T* p; bool operator()(int x, int y) {
    return Comp(p[x],p[y]);
  }
};
is of course also an option, but then the comparator must be supplied, and if the comparator is only going to be used once then nothing is gained.

Concerning (2): In the original suggestion, which appeared in August 2000, I gave the following reasons for using an index table:

Oftentimes when sorting an array, you do not want to actually rearrange the elements of the array, but rather to obtain an index table describing which number element is the first, which the second, etc. Two good reasons are: 1) The elements may be large and you don’t want to copy that much memory. 2) You have others distinct arrays in one-to-one correspondence with the array whose order you want to query.

As you can see, reason (2) makes the use of an integer indexing preferable. Somehow this reason got lost in the ensuing discussion, and there was much criticism for using an integer instead of an iterator table.

While there is no heap container in STL, the necessary functions, make_heap, pop_heap, and push_heap, are there. Perhaps this is a good thing, giving us the utility without the policy about how it must be used (although you have to remember to adjust the container size yourself).

The best reason I can think of for using a random access container is that it allows O(1) access into the list of elements during the sorting process. This makes sense when a priority queue is constantly being added and deleted from (as opposed to being the object of a one-time sort), and one wishes to peek into the list during operation. Of course it is better not to use a priority queue for a one time sort because it will take time O(n2).

As for heap, there is no way to access the elements in order during operation. Incidentally, my STL documentation gives insertion time (push_heap) as O(ln n), but I think that it is only O(1) for the average case. Remember, most of the heap is at the bottom rung or two, so getting up to mid-level should only take one or two compares.

Respectfully Yours,

Craig Hicks

[As mentioned above, a priority queue is really just a heap data structure. The traditional way to implement heap operations efficiently is to use arrays (see any of the fine classic books by Aho, Hopcroft, and/or Ullman). Since we want to allow dynamic sizing in most cases, however, a vector or deque in place of a fixed-size array is a logical choice. The worst-case complexity remains logarithmic. — cda]

Re: C++0x

What is it and how can I find more about it?

Karl Beem

C++0x represents a future version of the C++ Standard that at the moment is taking shape in the minds of a very small number of people. Whenever a standard is accepted, as the C++ Standard was officially in 1998, the associated standards committee is at will to reopen work on the language after five years. A few eager visionaries are starting to look forward to that, but for the moment I wouldn’t be too concerned. We’re still waiting for compilers to completely implement C++98. If there is any news worth sharing about C++0x, you’ll hear about it promptly in the pages of CUJ. —cda

Dear CUJ,

I had a question on the article “Tip #7: A remove_if for vector<T *>” by Harald Nowak (July 2001) that might be of general interest.

My group has had trouble using ATL smart pointers in a vector (e.g., CComPtr and CComBstr). We think it is because they override the & operator. It seems vector uses & in its internal implementation, and when it does, it gets what the smart pointer is trying to wrap (e.g., a COM interface pointer or a BSTR).

We found an ATL CAdapt class that doesn’t do much more than overload &. The purpose, we guess, is to a wrap a smart pointer when putting it in a vector. Now, when vector invokes the &, it will get the smart pointer itself instead of what the smart pointer wraps.

Regards,

Gary Choncholas

Hence the dangers of using such a class in this context. Thanks for the heads up. —cda

Dear CUJ,

A letter to the editor in the July 2001 edition of the C/C++ User’s Journal hits a very raw nerve. Pfalzgraff applauds criticism of OO (object-oriented) development, describing a nightmare project that was “made OO” and “The whole project had been wrapped in one BIG class”. The project sounds deserving of criticism, but the ire toward OO is very misplaced in this case.

OO development is a process that begins with careful analysis of a problem and proceeds through the many stages of design, implementation, and testing. Projects are not “made OO.” Using C++ or Java does not make a project OO. Creating objects does not make a project OO. Understanding the syntax of C++ or Java does not make a programmer an OO programmer.

If the cited project had been subjected to a careful analysis, redesign, and re-implementation by people who truly understood OO development, only then would the project have been “made OO,” and I can guarantee that the outcome would have been successful. Unfortunately, the misconception that writing classes equates to OO development is widespread. And, like many other processes or tools, bad implementation is more likely the fault of those doing the implementation. Bad software developers will develop bad software, no matter what the language or development paradigm.

Jill Courte

Dear CUJ,

Commendations to Matt Austern’s article “The Standard Librarian: Bitsets and Bit Vectors” (May 2001) showing how STL’s vector<bool> is a useful tool.

However, I have misgivings about having vector<bool> be a specialization of vector. In doing so, STL denies the possibility of using the general version of vector for the real “bool” type. The recent decision of the C++ committee to make vectors “officially” compatible with arrays recognizes the utility of being able to code with STL and then quickly switch to non-STL for interfacing.

vector<bool> could have been left unspecialized, and a separate class “bitvector” created instead. No utility would have been lost, no extra coding required on the part of STL library writers, and the “real” vector<bool> would be available to behave as expected.

Craig Hicks

Craig,

Thank you for your kind words!

For what it’s worth, many members of the C++ standardization committee agree with you that vector<bool> should have been a distinct class and that making it a specialization of vector<T> was a mistake. You can see some of the committee’s thoughts about this matter on <http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/lwg-active.html#96>.

In practice, regardless of whether the vector<bool> specialization was a mistake, it’s probably here to stay. Backward compatibility is a strong constraint, both for compilers and for standards. —Matt

[Ed. note: In the “beginning” (aka before STL), there were two stand-alone classes: bitset and bitstring. With all the STL excitement the latter was (reluctantly) shoe-horned into that paradigm as vector<bool>. A mistake, as Matt said. —cda]

Hello,

I’ve been trying to compile the fast class compiler and CommandLine class discussed in your May 2001 issue (“Building a Professional Software Toolkit” by John Hubbard), available in the downloads within hubbard.zip. The makefile supplied is for Solaris, and try as I have done, I’ve been completely unable to get it to compile for gcc in Linux (Mandrake 7.1) — it complains about too many things to note here.

So I was wondering if the author or yourselves have such a makefile — I note from his article that the code compiled okay on Linux. (I’d like to know how!)

I enjoy reading your magazine every month — the recent self-contained utility type classes (e.g., Debug) that you’ve printed have proven very nice, please print more!

Thanks for any assistance you can provide,

Steve Parry

Dear Steve,

You reported problems building Class Creator (fcc) on Linux. Here are some observations and solutions on that topic.

Observation # 1: Class Creator exercises the Standard C++ library heavily, and as a result, stock gcc will not work, because the C++ library that ships with gcc is not fully compliant.

Observation # 2: fcc is very easy to build if you have a fully standards-compliant C++ environment.

Observation # 3: The makefile that I provided does not make this obvious nor particularly easy.

Now for the answers. The first step is to create a standards-compliant C++ environment on Linux. (This is a useful thing to do even if you don’t need to build fcc.) The best way to do this is to go to <www.stlport.org> (free) or <www.dinkumware.com> (a small fee is required), download and build the C++ library, and set up your compiler to use this library.

The next step is simply to compile and link the following files: ClassCreator.cpp, WriteSentry.cpp, HeaderFileGen.cpp, CodeGenerator.cpp, ImplFileGen.cpp, InlineFileGen.cpp, UnitTestFileGen.cpp, MakefileGen.cpp, DigestedCommands.cpp, CommandLine.cpp, PackedString.cpp, and main.cpp. The result will be the fcc program.

To ease that last step, I am attaching a reworked makefile that should be a little more flexible (available for download from the CUJ website in letters.zip). Just change the TARGET, CPP_LINK_LIBRARY, and CPP_INCLUDE_DIRECTORY variables to match your environment, and it will work just fine.

I have had success with gcc 2.95.2 and Dinkumware, on RedHat 6.x. I also had success on RedHat 7.1, using gcc 2.95.3 (rather than their built-in, unauthorized “2.96” version, see <www.gnu.org/software/gcc/gcc.html> for more information) and STL-Port 4.0.

Furthermore, sometime in late June, my company will be posting both source code and binaries (Linux, Solaris, Windows, and more) for Class Creator and other tools. These are free of charge, as a service to the software community, and they will be located somewhere on the <www.azadtech.com> website.

And one last thing: I’m attaching a Linux binary for fcc, as fcc2.35b.linux.zip (available for download from the CUJ website in letters.zip). So you can actually skip the whole process, if all you want is the compiled program!

Thanks,

John Hubbard