Dr. Dobb's Sourcebook July/August 1997
You'd be amazed at the psychology that goes into running a grocery store. Have you ever noticed that, no matter what store you are in, the milk is in the back corner (usually opposite from the entrance). This is because nearly everyone buys highly perishable milk every few days. Instead of putting it up front, they put it in the back so you'll have to pass every other item in the store. Sort of reverse ergonomics.
There are many other trade secrets, like what items go on eye-level shelves and even which vendors get the coveted end displays in between the aisles. Of course, the vendors get into these mind games, too.
I suspect that's why Coca-Cola has so many different brands of soft drinks. Sure there's Coke. There's also Diet Coke, Cherry Coke, Caffeine-free Coke, Sprite, Fanta, Fresca, and even more. If you investigate it, nothing sells very well compared to Coke. However, the more shelf space you can consume with Coke products, the less space the store has for Pepsi (or other) products. Not to put PepsiCo in a bad light; I'm sure they do the same thing.
Don't forget: I don't have any proof that this is the case. Sometimes I wonder if Microsoft is up to the same trick. There are several versions each of Visual Basic and Visual C++. Then there's Visual J++, FoxPro, and Visual InterDev. Not to mention the essential (and pricey) Microsoft Developer's Network CD. That's a lot of shelf space. Just recently, Microsoft introduced something called Visual Studio. This is basically all of the above products in one box for about $1000 on the street.
I'm not sure if everyone will agree, but I think this is a great idea. Except for FoxPro, I write about all of these things and use them for real-world consulting projects. Now I can get them in one box and four CD-ROMs. The question is: How many other people really use more than two of these items? The price is such that if you even use three pieces of Visual Studio, it is probably cheaper to buy the whole thing. Which leads to another question: If you aren't using multiple development platforms, why not?
I've always been something of a polyglot, that is, for computer languages (my track record with human languages is quite bad). So it seems natural to me to use whatever language I think will best handle the job. In this column, I frequently switch between Visual Basic, Delphi, and C++Builder. In other places, I write about Visual C++. However, when I teach programming classes, I notice that most people are quite dogmatic about their language choice. There's always someone around who wants to convince me that VB is a "toy" language or that C++ is obsolete and too difficult.
As I've said many times before in this column, no one wants to build a house with just a hammer (or a saw, or whatever). With the level of integration possible now, why not use a mix of languages? The trick is to use each language in the place where it has a decided advantage.
ActiveX is one way you can write code in many languages and use it all together. DLLs are another. Sometimes, as I'll show you in this issue, both of them can work together.
C++ is very good at doing byte and bit manipulation, right? However, ActiveX control creation is still somewhat difficult (granted, not as difficult as it used to be). On the other hand, VB5 makes writing ActiveX controls practically trivial. But it isn't very good at doing bit-wise manipulation of data (yes, I know that it can be done).
What if you could write a VB5 ActiveX control and use it in conjunction with a C++ DLL to do processing? That would be the best of both worlds. That's what I'll show you this issue -- a compressed database control that uses VB for the ActiveX and database management, while relying on C++ for data compression.
Several years ago, I designed a system to do source control and configuration management for the International Space Station (the system was SafetyNet). They let me design the project largely on my own, and it worked out quite well. In a classic application of the Peter Principle, they decided that, since I had done so well by myself, think of what I could do as part of a big team! So they stuck me on a design team for the Core Systems Data Repository (CSDR), a system to store and retrieve data sent from the station. There were several reasons that I wasn't happy to be on the team (and several reasons the team probably wasn't happy to have me there, either). It was just as well, as NASA canceled CSDR about a month or two after I joined the team.
In fact, everything I ever did for the station got clobbered during congressionally mandated redesigns. Almost everything I even knew about went away in favor of smaller, cheaper designs.
I did learn something about CSDR during my brief tenure as a CSDR designer. No one had really thought about what it would be like to store telemetry coming in 24 hours a day, 7 days a week, for 30 years! That's more than 6 billion seconds of data (and a lot of data per second, too). It was clear to me that you'd need to compress the data. Even a 10 percent savings would add up to huge amounts of storage over 30 years.
One obscure compression method you don't hear much about is base-40 compression, a technique applied on mainframes that process giant mailing lists. The idea is simple. First, you limit the characters you want to store to 40 characters and number them from 0 to 39. Then you can store three characters in two bytes, like this: Word = Character2 × 402 + Character1 × 401 + Character0 × 400.
You reverse the process to recover the data. This gives you an instant compression of 33 percent. The downside is that you might waste one byte at the end of each string (not a big deal if the strings are long). Using only 40 characters might seem like a problem at first, but if you think about it, you can use 26 uppercase characters, 10 digits, and still have 4 punctuation marks or 3 punctuation marks and an end of string marker, if you prefer. Of course, you don't have to use the same 40 characters for every compression operation, either.
C++ is very good at stripping strings into bytes, performing math on them, and putting them back together as strings. Although you can do this in VB, it isn't nearly as convenient. Besides, you might already have a C++ DLL lying around that does base-40 compression (I did, as a matter of fact).
Accessing databases with VB5 is trivial. So is making an ActiveX control. So why not make a control that accesses the database but uses a C++ DLL to perform the base-40 decoding and encoding?
That's just what Listings One and Two do. This control is designed specifically for the sample database (available electronically; see "Programmer's Services," page 3). It stores all the data normally, except for the Abstract field. This field is compressed with the base-40 algorithm. You can find a program that uses it in the electronic listings (see Figure 1). There are several interesting things to note: First, you can easily use this control in any ActiveX-aware program. If you want to write code in VB, or VC++, or Delphi, or PowerBuilder, that's not a problem. Second, the programmer who uses the control doesn't know or care what languages the control uses -- in this case, it uses two languages.
Is there a downside? Well, the most obvious one is that the DLL is one more file to distribute with your control. Of course, you also have to maintain two separate development projects. This is offset if you already have the DLL or if you need it anyway. Also, if you have existing code, you can reuse it in the DLL more easily than rewriting it in VB.
The control (Listing One) is a simple application of the VB data control. This control manages all the details of reading a database, accessing a table, and distributing the fields to controls on the form (or, in this case, the UserControl object).
The only difficulty with the data control in this application is that the Abstract text box isn't a bound control. The data in the database is compressed and not suitable for placing directly in the text box. Therefore, the ActiveX control has to manually manage the Recordset object within the Data control to synchronize the compressed data with the display in the Abstract text box.
Other than that bit of bookkeeping, there isn't anything remarkable about the control. Notice that all the transfer of abstract data occurs through two private procedures: GetAbstract and SetAbstract. These isolate the remaining code from the compression algorithm. At first, I wrote these functions to store the data without compressing. Then, when I had the control working, I wrote the compression-aware versions of the procedures. I'll show you what's inside these two important functions after you look at the C++ DLL that does the compression.
The DLL is simple. I deliberately didn't create an MFC DLL, so I could avoid MFC's overhead. In fact, you could probably compile the code as a straight C program. To start the project, I selected New from the File menu and picked Win32 Dynamic Link Library from the project gallery dialog. Then I had to use the project menu to add an empty file: B40DLL.CPP (Listing Two). I also created B40DLL.H (available electronically) to define the functions that reside in the DLL. Truthfully, this simple DLL doesn't even need B40DLL.H, but another C programmer using the DLL would need it.
The code is straightforward. The Str2B40 function compresses a string in place based on the length and key string passed to it. You can perform the compression in place since the new string is sure to be shorter than the old string (that's the whole point, after all).
The companion function, B402Str, does the opposite conversion. Since the new string will now be longer than the old string, it is better to perform the decompression to a different string. The function also requires the length of the uncompressed string and the compression key. The calling program has to make sure the input string's length is evenly divisible by three. Note, however, you can pass a length that isn't a multiple of three, and the code will still encode the extra bytes at the end. Be sure to pad the end of the string with blanks or other innocuous characters.
C++ pointers, and the congruence between characters and integers, make this code very simple. However, calling convention problems can be tricky. VB (and many other non-C++ languages) assumes that the DLLs called will be part of the Windows API. Therefore DLL functions are expected to act like old-style Pascal functions. The 32-bit versions of Visual C++, however, lack the Pascal keyword. To get similar results, you have to use the __stdcall modifier. The problem is that __stdcall changes the procedure name by adding an underscore to the front of the name. It also adds an "at" sign and the number of bytes in the argument list.
All of this means you'll have to pull some VB stunts to make these calls. The key to calling DLLs from VB is the Declare statement. To declare the Str2B40 function, you need a Declare like Example 1(a).
What does this mean? It means that VB will find the function named Str2B40 in the DLL B40DLL.DLL (somewhere on the usual path). The Alias phrase indicates that the true name of the function in the DLL is _Str2B40@12 (the __stdcall name). Finally, there is the usual argument list and return type.
Except for strings, defining the argument list is easy. If the C declaration defines a simple type (like an int), just use a ByVal argument of the corresponding type. (Note that VB's Integer type is only 16-bits long; use Long for 32-bit integer types.) If the DLL's declarations call for a pointer, use a ByRef argument, instead. Be careful: VB's default argument type is ByRef. Don't leave off the type since that means VB will pass a Variant and that's usually not what you want.
Strings are different. Regular C strings (char * types) are always passed by value. VB arranges to pass these strings as the usual C NULL-terminated strings. This means the DLL can always modify these strings. It also means that the VB program has to make the string as long as the DLL expects. For example, in the B402Str function, the DLL expects the destination buffer to be a certain length. You can either declare the variable as a fixed-size string, or make a string of the proper length. In Example 1(b), for example, both arg1 and arg2 have 256 characters in them.
If you have a DLL argument that isn't always the same (for example, a void * or an LPARAM that might be different types of data at different times), you can use As Any in the Declare statement. This isn't a good idea, however, since VB will then pass any data type even if it might crash the DLL (and your program).
One final word of caution: Don't forget the return type on your Declare Function statements. If you do, VB will assume the function returns a Variant. So what? Functions that return Variants do so via an invisible reference argument. This will destroy your argument list and will very likely crash your program. If you have a function that returns void, Declare it as a Sub instead of a Function.
Since this control is just a simple demo, I didn't give it any external properties. Thus, the control is wired to a particular database, it always compresses the abstract, and it uses the same character set for compression every time. You could easily add properties to control these attributes.
Another possible enhancement would be to add an event for compression errors. That way, the end program could respond (and perhaps correct) errors. You might also add a property to allow the control to automatically shift characters to uppercase and map certain characters on input to match the encoding set.
Another enhancement would be creating an ActiveX control that performs base-40 compression. The question is: Why bother? The C++ DLL works fine. The packing algorithm is not pleasant to code in Basic and writing ActiveX controls isn't any fun in C++. Most languages can call DLLs like B40DLL, so why not take advantage of the simplicity of a C++ DLL?
There is, by the way, an alternate approach to this database project that you might like to try. Why not create an ActiveX control that is a bound text box, but knows to transfer data to the database in compressed format. While this is more general purpose, it will take some work to interact properly with the data control.
Of course, you don't need Visual Studio to work with the code in this month's column. All you really need is Visual Basic 5.0 and a C++ compiler. The only real advantage of Visual Studio is that everything is in one box. You still install each product separately. Of course, Visual C++, InterDev, and J++ all use the same development environment, so it only installs once. You can see all the appropriate Wizards and tools in the single environment. Although VB5's new interface looks more like the others, it is still a separate product and doesn't share anything with the others.
Here's a quick run down of what you get with the Professional Edition:
All of these tools are old standards except for Visual InterDev. It is little more than a set of extensions that facilitate creating HTML in the Developer's Studio. Personally, I'm not crazy about InterDev. It does syntax coloring, but very little visual editing is possible (see Figure 2). It does handle Active Server Pages (ASP files). These are the server-side scripts that can use VBScript or JavaScript to generate HTML on the fly. Of course, unless you are running IIS 3.0, you can't use ASP.
The only useful Wizard inserts a database into an ASP file so that you can display the data live. Again, this is only useful if you are using ASP. I also had trouble making InterDev work without crashing frequently.
Still, with the exception of InterDev, the Visual Studio is a good deal if you plan to buy any two of these products anyway. I really like VB5. VC++ 5 has many enhancements to the user interface. Some of them are good and others just annoy you if you are used to the old interface. Visual J++ is about the same as it was when introduced, and everyone should have the Microsoft Developer's Network CD. I don't use FoxPro, so I can't really comment on it.
In the technical department, there isn't too much new in Visual C++ 5.0's compiler or the MFC tools. One thing that I was ecstatic about: The MFC tools now generate #if lines around your header files to protect you from including files multiple times. Finally! If you don't know what this means, you won't care, but if you are a C programmer, you know what I mean, and you'll appreciate it. The other changes center around improved compiler performance, MFC support for asynchronous monikers, active documents, and WinInet. There is a new version of ATL (Microsoft's template-based ActiveX control library) and the compiler now generates COM-compatible classes directly. The compiler also has some new ANSI keywords (bool, explicit, mutable, and typename, for example). All good stuff, but probably not things you are dying to get right away.
Another example of the Coke Syndrome is the introduction of Microsoft Office Developer's Edition. This is little more than Microsoft Office Professional with some extra documentation and tools aimed at developers. That doesn't sound very exciting, but it is a good thing if you develop around Office. If you've ever wanted a complete reference to the Office object model, you'll find it here, for example.
You can only wonder what Microsoft will come out with next. Perhaps Solitaire Developer's Edition? Or perhaps when CD densities increase, they'll produce a disk that contains everything they make on it and sell it for $99.95. Of course, that won't include the annual updates.
One good thing that might come from Visual Studio is that some developers will have a chance to explore a new language they might otherwise miss. Even if you don't program real code using that language, you might find some of its unique ideas useful to you. I haven't programmed in PL/I, APL, or SNOBOL for a long time, but I still find certain things creeping into my code from those languages. So take the time to look at a language you don't use already. I might even check out FoxPro!
DDJ
VERSION 5.00Begin VB.UserControl AbsDB
ClientHeight = 4290
ClientLeft = 0
ClientTop = 0
ClientWidth = 4695
ScaleHeight = 4290
ScaleWidth = 4695
Begin VB.CommandButton Delete
Caption = "Delete"
Height = 495
Left = 3120
TabIndex = 10
Top = 3240
Width = 1215
End
Begin VB.CommandButton Update
Caption = "Cancel Update"
Height = 495
Left = 1680
TabIndex = 9
Top = 3240
Width = 1215
End
Begin VB.CommandButton New
Caption = "New Record"
Height = 495
Left = 240
TabIndex = 8
Top = 3240
Width = 1215
End
Begin VB.TextBox Abstract
Height = 1455
Left = 360
MultiLine = -1 'True
TabIndex = 7
Top = 1440
Width = 4215
End
Begin VB.TextBox Mag
DataField = "Magazine"
DataSource = "Data1"
Height = 285
Left = 1920
TabIndex = 6
Text = "Text1"
Top = 840
Width = 2655
End
Begin VB.TextBox DateField
DataField = "Date"
DataSource = "Data1"
Height = 285
Left = 840
TabIndex = 5
Text = "Text1"
Top = 840
Width = 855
End
Begin VB.TextBox Author
DataField = "Author"
DataSource = "Data1"
Height = 285
Left = 840
TabIndex = 3
Text = "Text1"
Top = 480
Width = 3735
End
Begin VB.TextBox Title
DataField = "Title"
DataSource = "Data1"
Height = 285
Left = 840
TabIndex = 0
Text = "Text1"
Top = 120
Width = 3735
End
Begin VB.Data Data1
Align = 2 'Align Bottom
Caption = "Use these controls to navigate the data"
Connect = "Access"
DatabaseName = "D:\20-20\COLUMNS\col10\abstracts.mdb"
DefaultCursorType= 0 'DefaultCursor
DefaultType = 2 'UseODBC
EOFAction = 2 'Add New
Exclusive = 0 'False
Height = 345
Left = 0
Options = 0
ReadOnly = 0 'False
RecordsetType = 1 'Dynaset
RecordSource = "Abstracts"
Top = 3945
Width = 4695
End
Begin VB.Label Label3
Caption = "in:"
Height = 255
Left = 360
TabIndex = 4
Top = 840
Width = 255
End
Begin VB.Label Label2
Caption = "by:"
Height = 255
Left = 360
TabIndex = 2
Top = 480
Width = 255
End
Begin VB.Label Label1
Caption = "Title:"
Height = 255
Left = 240
TabIndex = 1
Top = 120
Width = 495
End
End
Attribute VB_Name = "AbsDB"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = True
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = True
Option Explicit
Private Key As String
' DLL Functions
Private Declare Function Str2B40 Lib "B40DLL" Alias "_Str2B40@12" _
(ByVal s As String, ByVal Length As Long, ByVal Key As String) As Long
Private Declare Function B402Str Lib "B40DLL" Alias "_B402Str@16" _
(ByVal src As String, ByVal dst As String, _
ByVal dstLen As Long, ByVal Key As String) As Boolean
' Access Abstract "virtual" compressed field
Private Function GetAbstract() As String
Dim l As Long
Dim temps As String * 1024
Dim inputstr As String
If IsNull(Data1.Recordset("Abstract")) Or Data1.Recordset.EOF _
Or Data1.Recordset.BOF Then
GetAbstract = ""
Else
l = Data1.Recordset("AbsLen")
inputstr = Data1.Recordset("Abstract")
B402Str inputstr, temps, l, Key
GetAbstract = Left(temps, l)
End If
End Function
Private Sub SetAbstract(ByVal s As String)
Dim l As Long
Dim origlen As Long
origlen = Len(s)
s = UCase(s)
s = s & String(origlen Mod 3, " ") ' pad with spaces for even multiple of 3
l = Str2B40(s, origlen, Key)
If l = 0 Then
MsgBox "Error encoding database record"
Data1.Recordset("Abstract") = ""
Data1.Recordset("AbsLen") = 0
Else
s = Left(s, 2 * l)
Data1.Recordset("Abstract") = s
Data1.Recordset("AbsLen") = origlen
End If
End Sub
' Test for empty database
Private Function IsData() As Boolean
IsData = Not (Data1.Recordset.BOF And Data1.Recordset.EOF)
End Function
' Decompress the data into the text box
Private Sub Data1_Reposition()
Abstract.Text = GetAbstract
Abstract.DataChanged = False
End Sub
' Sync the compressed field with the text box
Private Sub Data1_Validate(Action As Integer, Save As Integer)
Debug.Print "Action=" & Action
Select Case Action
Case vbDataActionMovePrevious, vbDataActionMoveFirst, _
vbDataActionMoveNext, vbDataActionMoveLast, vbDataActionAddNew
If Abstract.DataChanged Then
On Error Resume Next
If Data1.Recordset.EditMode = dbEditNone Then Data1.Recordset.Edit
SetAbstract Abstract.Text
Data1.UpdateRecord
Abstract.DataChanged = False
Save = False
End If
End Select
End Sub
Private Sub Delete_Click()
Abstract.DataChanged = False ' Don't confuse validate
On Error Resume Next
Data1.Recordset.Delete
Data1.Recordset.MoveNext
If Not IsData Then Data1.Recordset.AddNew
End Sub
Private Sub New_Click()
Data1.Recordset.AddNew
Title.SetFocus
End Sub
Private Sub Update_Click()
Data1.Recordset.Edit
Data1.Recordset.CancelUpdate
Abstract.Text = GetAbstract
End Sub
Private Sub UserControl_Initialize()
' Set up key and an empty record
Key = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 .,?"
If Not IsData Then Data1.Recordset.AddNew
End Sub
#include <windows.h>#include "b40dll.h"
// Convert String to compressed string
__declspec(dllexport) unsigned int __stdcall Str2B40(unsigned char *str, _
unsigned len, const char *key)
{
unsigned short *b40=(unsigned short *)str;
if (strlen(key)>40) return 0;
for (unsigned int i=0,j=0;i<len;i+=3,j++)
{
// convert to 0-39
for (int k=0;k<3;k++)
{
if (i+k>len) break;
char *p=strchr(key,str[i+k]);
if (p) str[i+k]=p-key;
else return 0;
}
// do the math
b40[j]=str[i]*1600+str[i+1]*40+str[i+2];
}
return j;
}
__declspec(dllexport) BOOL __stdcall B402Str(unsigned char *src, _
unsigned char *dst, unsigned dstlen, const char *key)
{
unsigned short *b40=(unsigned short *)src;
if (strlen(key)>40) return FALSE;
for (unsigned int i=0,j=0;i<dstlen;i+=3,j++)
{
// unpack 0-39
dst[i]=b40[j]/1600;
dst[i+1]=(b40[j]-dst[i]*1600)/40;
dst[i+2]=(b40[j]-(dst[i]*1600+dst[i+1]*40));
// convert to ASCII code
for (unsigned int k=0;k<3;k++)
if (i+k<dstlen)
dst[i+k]=key[dst[i+k]];
}
return TRUE;
}