Dr. Dobb's Journal May 1998
Windows CE programming is a lot like Win32 programming, since the CE API is a subset of Win32, with some CE-specific extensions. One of those extensions is an API for object databases. The object database API is reasonably rich, including indexed access to records in databases, with up to four indexes supported for each database. Indexes are only on single integer or string properties, in ascending or descending order. There is a seek function in the API that uses the current index for rapid searches. Like its bigger siblings, Windows CE has a version of the Microsoft Foundation Class (MFC) library, which I consider the highway -- or at least the paved road -- for developing CE applications. By comparison, using Win32/C is more like a gravel road. But because MFC in Version 1.0 of the CE SDK did not provide classes for dealing with databases, I hit the gravel road pretty often.
Just when I was about to break down and write my own wrapper classes, the CE SDK Version 2.0 beta was released. Thankfully, the SDK 2.0 adds several classes to MFC that wrap the database API and data structures into a more usable form. While this API is sufficient to accomplish most database tasks, it still lacks a general-purpose searching class. Consequently, in this article, I'll present a set of CE database classes and subclasses of the new 2.0 MFC classes that provide an object-oriented wrapper to the basic database search API. The complete wrapper is available electronically; see "Resource Center," page 3.
The main database class is CCeDBDatabase. Objects of this class encapsulate an open database, and support several operations. The operations I'm most interested in for searching are SeekFirst(), SeekFirstEqual(), SeekNext(), SeekNextEqual(), SeekValueSmaller(), and SeekValueGreater(). The SeekFirstEqual() and last two seek methods take a CCeDBProp object as their only parameter, and use the current index to locate the first record equal to, smaller, or larger than the value encapsulated within. CCeDBProp is a wrapper class for both record properties and index properties. It contains a property identifier and value. The CCeDBRecord class encapsulates a record in the database, and consists of a set of CCeDBProp objects. It provides a couple of methods for extracting a property from the record.
While there is a lot more to these classes than I've covered here, it's about all I need to extend the classes to add a more robust search capability.
What I'd like to be able to do with MFC is encapsulate a query on an open database in an object, hand that object to a method of a database object, and get back an "iterater." I could then retrieve matching records by simply calling the first() and next() methods of iterater. The query would support the intrinsic queries supported by the database, that is less than, greater than, or equal to tests. Unlike the basic index seek operations, I'd like to be able to use more than one property in the query, as well as supply multiple matching tests for a property. Finally, I'd like to lift the restriction on using nonindexed properties in a query. To top it all off, I'd like the query to be as optimal as possible, including making full use of the correct index to retrieve all the matching records while accessing as few records as possible in the database.
This sounds wonderful, but without a few simplifying assumptions I'd end up coding it for the rest of my life. In addition to the good things I've decided on, I'll make one huge concession -- I'll assume that any query that contains multiple properties, or even multiple tests against a single property, is an implicit AND of all tests. Sure, I could support OR, but then I'd need a more complex representation of the query syntax as well as a query parser. I'll leave that as an exercise for the reader. Also not implemented but equally desirable are more general tests. It would be nice to be able to do regular expression matches or lots of other tests, and it's obvious exactly where those tests could be added, but I'm not doing them now.
To implement extended searching, I'll first need to create the CEQuery class to encapsulate a query. That will require subclassing the CCeDBProp class so I can store test criteria with a property. Then I'll need to extend the existing CCeDBDatabase class to add the method to create an iterater used to search. Finally, I'll have to do the hard work of actually implementing the iterater class.
A query object consists of a CCeDBDatabase object, a number of CCeDBQueryProp objects representing the properties to test, a test value (and the test operation), and a CCeDBQueryProp object representing the desired sort order (if any). Listing One presents the CEQuery class declaration.
The addField() method adds a new CCeDBQueryProp object to the query. It's expected that CEQuery deletes this object in its destructor. CCeDBQueryProp objects are stored in an array named _fields that is expanded as needed, so any number of tests can be performed in a query.
The setSortField() method takes a CCeDBQueryProp object (which is also deleted in the destructor) that specifies the desired sort order for the query. The numFields() method returns the current number of test fields in the query.
The testOn() method is called during searching. It determines which CCeDBQueryProp value (if any) to use to begin the query when querying the database. Each CCeDBQueryProp is tested against the supplied property ID, which is also saved in the _index member variable. If a matching CCeDBQueryProp uses an Equal test, then that object is selected to begin the query. If it matches and uses a Less Than (on a property with a descending index) or Greater Than (on a property with an ascending index) then it is used if none are found with an Equal test. At this time if a range search is detected (both a Greater Than and Less Than test on the indexed property, for example) then the _range member variable is set to True. The goal of this method is to start the search off using a query value against the selected index that eliminates as many nonmatching records as possible. It also determines (by setting _range) if it is possible to stop the query after a record fails a test on the indexed property.
The match() method is called during searching. It tests every property in the supplied record against the CCeDBQueryProp objects in the query. If a range test is being done, the index property is tested first to see if it falls within the range. If it does not, then False is returned, but also an EOF flag is set indicating that the search is complete. The EOF flag is not set the first time the range test fails, to work around a bug in the MFC SeekValueSmaller() method. When used with a descending index, it actually returns the record larger than the supplied value, so the first time, the range test will fail. It's not too expensive to do a couple of extra tests just to be sure, and it works around this bug just fine. The match() method calls matchField() to do the actual test for each property. Assuming the range test succeeds or is not performed, then each property in the record is tested against its corresponding CCeDBQueryProp objects using matchField(). When doing these tests the index field tests are skipped if range testing was previously done. If all tests pass, match() returns True.
The matchField() method tests the supplied property from a potentially matching record against the specified test object, and returns True or False. It consists of a large case statement that does the appropriate comparison based on the data type and test operation for a property.
The CCeDBQueryProp class is a subclass of CCeDBProp (see Listing Two for its declaration). It adds an enumerated type that describes the three types of test possible on a property, as well as a member variable to hold the specific test for the property. To create a query, I allocate a new CCeDBQueryProp object for each desired test, set its property ID, type, value, and operation, and then add it to the query object using addField().
I'd like to be able to hand my query object to a method of a CCeDBDatabase object to begin a query. That's accomplished by subclassing CCeDBDatabase and adding a Find() method; see Listing Three.
The Find() method accepts a query object and allocates a new CCeFindDBDatabaseIterater object. This object should be deleted when I'm done with a query. To create a new iterater, I call the FindKey() method to determine which key I should use, and then I pass that key and the CEQuery object to the CCeFindDBDatabaseIterater constructor.
The TestFind() method is useful to determine if a particular query will use an index or require a sequential search. It also calls FindKey(), and if no key is found it returns False. This might be handy if I want to warn the user that a query might take some time, since it will be a sequential search.
The FindKey() method examines the CEQuery object against the current indexes defined for the database using the IsIndexed() method. It determines which index (if any) to use when querying the database. If the query includes a sort then that index must be used. If not, then the CCeDBQueryProp objects that use properties that are indexed and weighted to determine the best index to use. If any of them are Equal tests, then that index is heavily weighted. If any are Less Than (on a property with a descending index) or Greater Than (on a property with an ascending index) then they are weighted at half of properties using Equal. The highest weighted index is then chosen. This ends up favoring the index that will require the fewest number of tests to complete the query.
The IsIndexed() method compares the supplied property ID against all property IDs in the database that have an index. If the property ID is indexed, it returns True, and also returns the sort order for the index.
When I query a database, I want a single object to encapsulate everything having to do with the search in progress. The CCeFindDBDatabaseIterater class holds everything needed to perform a search, including copies of the current query object and the database object. Listing Four is its declaration.
The constructor stores copies of the CEQuery (query) and CCeDBDatabase (database) objects in the _myQuery and _myDb member variables. It then opens the database object in _myDB using the selected index. It sets _start to True, _eof to False, and zeros out the _hits and _tests member variables. It sets the _primaryQuery member variable to the query property to begin the search by calling the testOn() method of the CEQuery object. The iterater is now ready to retrieve matching records.
The first() method returns the first matching record for the query. It can be used to restart a search in progress, or just to begin a search. It just sets _start to True, clears the search statistics variables, and calls the next() method.
The next() method finds the first (if _start is True) or the next matching record in the database. If _start is True, it begins the query by calling SeekFirst() if there is no appropriate test on an indexed field. If there is an Equal test in _primaryQuery it calls SeekFirstEqual(). If there is a Greater or Less Than test in _primaryQuery it calls SeekValueGreater() or SeekValueSmaller(). If _start is False, then it calls SeekNextEqual() if there is an Equal test in _primaryQuery, or SeekNext() otherwise. In other words, it tries to find the first (or next) value using the index and the test value supplied in the query object. If the seek operation found a record, it's still not ready to return. Instead, the method enters a while loop calling match() with the record. If match() returns False but does not indicate that EOF has been reached, it gets the next record (again using SeekNextEqual(), or SeekNext()) and tries again. This continues until a record matches or EOF is detected. It then return the matching record or NULL if EOF was reached.
The stats() method returns the number of records tested and the number of matching records for the query. It can be called at any time, even during a query, to find out how many records have been examined and matched by the query.
To test the classes, I wrote a simple MFC application that creates a database with 1000 records containing random string, long, and date (FILETIME) properties and then allows searches on it. The string property has an ascending index, and the long property has a descending index. The application has a simple form (see Figure 1) that accepts one or two strings, longs, and dates with search criteria, as well as a desired sort of none, string, or integer.
Selecting Initialize creates a new test database with 1000 records. The initialize routine uses the CE2.0 database classes to create a database with indexes on a string and integer field, and populate it with records containing random data; see Listing Five.
The database is created and if the Create() method fails, the existing database is deleted and then recreated. It's created with the two indexes (on property 1 -- an ascending sort on strings, and property 2 -- a descending sort on longs) I use to test the query classes. Finally, 1000 records are added with random strings, longs, and dates that fall between 1/1/1900 and 12/28/2100.
To perform a search, users fill in the desired criteria and select Search. The OnSearch() method extracts the current criteria, builds a CEQuery object, opens the database using a CCeFindDBDatabase object, and then uses the Find() method to get an iterater for the results. Listing Six is the portion of the search routine that creates one of the query properties in the query object and then performs the query.
For each query operation and test value in the form, it tests to see if the user entered a value. If the user did enter a value, it extracts the desired test and value from the dialog and creates a CCeDBQueryProp object. It then adds this object to the CEQuery object using addField(). Once all the properties are added, it opens the database and calls Find() to get the iterater. A simple for/next loop retrieves the matching records from the iterater. It uses the MFC TRACE() macro to display the properties of the matching records and search statistics in the debugger output window.
To test the classes, I performed a series of queries. The tests consist of doing single and then range searches on the string, long, and date fields, and then equality tests on each. Finally, a query using all three properties is done. For all these tests I tried selecting no sort order, a sort order that did not correspond to the query properties, and finally one that did. The results are encouraging. In cases where I did not select an index, but queried using the string (Greater Than, Equal tests) or integer (Less Than, Equal tests) properties the correct index was selected and the search completed without examining all records. In the case of range tests the fewest possible records were examined. Adding tests on the date property properly excluded nonmatching records. If I did a test using the long property but selected the string sort order, records were returned in the correct order but an exhaustive search was (correctly) performed. In other words, the classes meet all the desired objectives. Figure 2 is output from a query including range tests in all three properties.
The classes I've presented here add a powerful search capability to the basic database classes found in Windows CE 2.0. They're easy to use, provide fairly optimized searches, and allow for testing on every property in a database -- not just strings and integers as is supported by the base class. The memory overhead of using the classes is reasonably low -- a typical query object and iterater require only one to two KB of memory over the base classes.
There are some obvious enhancements that are possible: I could implement a class that would allow for Boolean operations for combining query properties (AND, OR, NOT) rather than the implicit AND I use now. That would require a more complex match() method. I could add more test operations including substring and regular expression matching. That would expand the matchField() method. There is one additional query optimization possible as well: The case where the query specifies a Less Than (or Greater Than) test on an index that is ascending (or descending) could use the index, and terminate the query as soon as a key value fails. This is basically half the range test I've already implemented, and it would be easy to add to the testOn() and match() methods. Even without these enhancements, the query classes are a powerful addition to my programming toolchest.
DDJ
class CEQuery{
friend class CCeFindDBDatabaseIterater;
public:
CEQuery();
virtual ~CEQuery();
void addField(CCeDBQueryProp *);
void setSortField(CCeDBQueryProp *);
int numFields(void);
CCeDBQueryProp * testOn(WORD, WORD);
bool match(CCeDBRecord*, bool &, int &);
CCeDBQueryProp * sortField();
CCeDBQueryProp * operator[](int fieldNo);
private:
bool matchField(CCeDBProp*, int);
CCeDBQueryProp ** _fields;
CCeDBQueryProp * _sortField;
int _numFields;
int _maxFields;
WORD _index;
bool _range;
}
class CCeDBQueryProp : public CCeDBProp{
public:
enum FieldOperation {
Equal, LE, GE
};
void setOperation(FieldOperation);
FieldOperation getOperation();
private:
FieldOperation _operation;
}
class CCeFindDBDatabase : public CCeDBDatabase{
public:
CCeFindDBDatabaseIterater * Find (CEQuery & );
bool TestFind (CEQuery & );
private:
CCeDBQueryProp * FindKey(CEQuery &);
bool IsIndexed(CEPROPID, WORD *);
};
class CCeFindDBDatabaseIterater{
public:
CCeFindDBDatabaseIterater(CEQuery *, CCeDBQueryProp *, CCeDBDatabase *);
~CCeFindDBDatabaseIterater();
CCeDBRecord * current();
CCeDBRecord * first(); // resets search
CCeDBRecord * next();
void stats(int &, int &, bool &);
private:
bool _start;
bool _eof;
int _hits;
int _tests;
CCeDBQueryProp * _primaryQuery;
CCeDBDatabase * _myDb;
CEQuery * _myQuery;
WORD _keyOrder;
}
void CExampleDlg::OnInit() {
CCeDBDatabase newDB;
CCeDBProp sorts[2];
sorts[0].SetType(CCeDBProp::Type_String);
sorts[0].SetIdent(1);
sorts[0].SetSortFlags(CCeDBProp::Sort_Ascending );
sorts[1].SetType(CCeDBProp::Type_Long);
sorts[1].SetIdent(2);
sorts[1].SetSortFlags(CCeDBProp::Sort_Descending );
if (newDB.Create(TEXT("JCSExample"), 123, 2, sorts) == NULL)
{
newDB.Open(TEXT("JCSExample"));
newDB.Delete();
newDB.Create(TEXT("JCSExample"), 123, 2, sorts);
}
newDB.Open(TEXT("JCSExample"));
CCeDBRecord r;
r.AddProp(&sorts[0]);
r.AddProp(&sorts[1]);
r.AddProp(new CCeDBProp(CCeDBProp::Type_Filetime, 3));
for (int i = 0; i < 1000; i++)
{
TCHAR string_field[256];
FILETIME filetime_field;
SYSTEMTIME systemTime;
int j;
// generate three random values for fields
CCeDBProp* p = r.GetPropFromIdent(1);
string_field[0] = (TCHAR)('A'+Random()%26);
for (j=1; j < (int)(5 + (Random()%10)); j++)
string_field[j] = (TCHAR)('a'+Random()%26);
string_field[j] = 0;
p->SetString(string_field);
p = r.GetPropFromIdent(2);
p->SetLong(Random());
p = r.GetPropFromIdent(3);
memset(&systemTime, 0, sizeof(systemTime));
systemTime.wYear = (unsigned short) (1900 + Random()%200);
systemTime.wMonth = (unsigned short) (1 + Random()%12);
systemTime.wDay = (unsigned short) (1 + Random()%28);
SystemTimeToFileTime(&systemTime, &filetime_field);
p->SetFiletime(filetime_field);
newDB.AddRecord(&r);
}
}
CCeDBQueryProp::FieldOperation _ops[] = { CCeDBQueryProp::LE, CCeDBQueryProp::GE, CCeDBQueryProp::Equal};
CCeDBQueryProp * qp;
CEQuery q;
if (m_s11val != -1 && !m_sf11.IsEmpty())
{
qp = new CCeDBQueryProp();
qp->SetType(CCeDBProp::Type_String);
qp->SetIdent(1);
qp->SetString((TCHAR *)(LPCTSTR)m_sf11);
qp->setOperation(_ops[m_s11val]);
q.addField(qp);
}
// NOTE: other query properties omitted, see listing
CCeFindDBDatabase theDB;
theDB.Open(TEXT("JCSExample"));
CCeFindDBDatabaseIterater *results = theDB.Find(q);
TRACE0("\nResults:\n");
for (CCeDBRecord *r = results->first(); r; r = results->next())
{
CCeDBProp* strfield = r->GetPropFromIdent(1);
CCeDBProp* intfield = r->GetPropFromIdent(2);
CCeDBProp* datefield = r->GetPropFromIdent(3);
FILETIME f = (datefield->GetFiletime());
SYSTEMTIME t;
FileTimeToSystemTime(&f, &t);
TCHAR timeStr[256];
GetDateFormat(LOCALE_SYSTEM_DEFAULT,DATE_LONGDATE,&t,0,timeStr,256);
TRACE3("\t%s\t%d\t%s\n", strfield->GetString(),
intfield->GetLong(), timeStr);
delete r;
}
int hits, tests;
bool range;
results->stats(hits, tests, range);
TRACE3("Stats\n\tmatches: %d\n\tmisses: %d\n\tTotal: %d\n",
hits, tests, hits+tests);
TRACE1("Range search %s used\n", range ? TEXT("was") : TEXT("was not"));