Michael Kelly is a free-lance Writer/Programmer who uses the languages C, C++, Pascal, 80x86 Assembler, and English. He can be reached via M&T Online as mkelly.
The introductory texts on abstract data types that I have seen contain examples of trees that tend to fall into two categories. The first, a "bare-bones" recursive implementation, is easily understood but practically useless. The second, the "balanced" variety, promises quick look-ups but can be very confusing to follow, because it contains left and right, single and double rotations, node splitting and joining, and more. I have found it useful, when trying to learn from pointer-based C code, to experiment with the simple implementation, add enhancements incrementally, and repeat the cycle. This article discusses how to apply that process to the binary tree abstract data type.
Binary Tree Test ProgramListing 1 is a minimal binary tree test program useful for debugger single-stepping. The node structure type defines a node in the tree. This minimal tree node consists of a left node pointer, a right node pointer, and a character key. A recursive tree insertion routine, (tree_insert) compares keys and moves left if the new node has a lesser key, right if the new node has a greater key, and returns if the keys are equal (no duplicate keys in this scenario). Note that for this insertion function to work, the node returned from the outermost call must be stored as the tree anchor, and the left and right node pointers relinked all the way back through the calling chain.
The function tree_trace uses recursion to traverse the tree in order and display the sorted character keys. The variable tree_root stores the first insertion to anchor the tree, then after a few insertions are made in a loop, tree_trace is called to display the keys.
Adding EnhancementsListing 2 illustrates some enhancements that provide the binary tree with practical utility. The first modification is to the tree node structure so that data you might need to manipulate can be stored. The keys now consist of strings. To insert nodes into the tree efficiently, the tree_insert function has been replaced with a non-recursive, or iterative, implementation.
On the first call to tree_insert, the root of the tree is NULL, so the program does not enter the while loop. The program simply allocates memory with calloc, so that the left and right node pointers are automatically set to zero (NULL in Turbo C), the root of the tree is stored, and the key string is copied and NULL-terminated. On subsequent calls the program does enter the while loop.
C does not have a string data type so a comparison function must be used instead of the relational operators to compare the key strings. The Standard C library provides the strcmp function, which returns an integer result. The result will be zero if the strings are equal, less than zero if the left string is less than the right string, and greater than zero if the right string is greater. Branching is done based on the key comparisons until either a duplicate key or an empty "leaf" node is found. Once at an empty leaf, the program allocates memory and a jump is made to common code for copying the key.
The code that copies a duplicate key over identical data in the case of equal keys seems redundant. However, this loop configuration lends itself to the next modification in which the data to be copied will contain, but may not be identical to, the key.
The tree traversal function is modified to print a string instead of a character. The tree functions could now be applied (at our peril, for error recovery has yet to be considered) to a string sorting utility. In this test stub, a few keys, one of which is a duplicate, are inserted and displayed, to demonstrate that duplicates all map to one node in this tree.
Expanding and Adding FunctionsWith the program in Listing 3 not only are functions expanded in their generality, but additional routines begin to be introduced to perform operations appropriate to binary trees. Trees are noted more for efficiently finding data that has already been ordered than for doing the ordering.
The tree_find function uses an iterative process to search the tree for an item matching the specified key. tree_find returns NULL on failure and a pointer to the node containing the item on success. Note that tree_find, as well as the new incarnations of routines from previous listings, treat the data as an amorphous blob. To make the routines more generally useful and thus reusable, the task of molding the data into a recognizable form should be left to the function that wishes to use it, so the tree functions will no longer make assumptions about the structure of the data. This requires that the functional interfaces be defined so as to be able to accomodate data of unknown type.
This improved reusability exacts a performance penalty, since the tree functions must now depend on the users of the data, or client modules, to provide the means to compare two data items, determine the size of an item, and display an item. Pointers to functions are used in the tree-function interfaces in a manner similar to qsort in the Standard C library to access the necessary client-supplied code.
The arguments to the tree_insert function are now a pointer to the root node pointer (so the root can be set from -inside the function), an untyped pointer to the data, the size of the data, and a pointer to a function that compares two data items and returns an integer result modeled after the strcmp function.
The tree_trace function uses a global function pointer, report, and a global variable of enumerated type, t_order, to minimize the stack space eaten up by the function arguments when it calls itself. The variable t_order specifies the tree traversal order that tree_trace will use — pre order (root, then left subtree, then right subtree), in order (left subtree, then root, then right subtree), and post order (left subtree, then right subtree, then root). The report function pointer is used to call a reporting or displaying function with an untyped pointer to the data (void *info) as its sole argument.
The function tree_free traverses the tree in post order to free tree memory in a "last allocated, first freed" fashion.
Deleting NodesNow you can build binary trees containing items of unknown type, search for a matching item, display the items, and free the memory for the entire tree. What happens if you need to extract an item from the tree, freeing memory for the node and leaving the remainder of the tree in a usable state? With the current definition of a node you can delete the node, but there is no easy way to link up the node's children. Another modification to the node structure type will simplify the solution.
In listing 4 a parent pointer is added to each node to bridge the gap created by the deleted node. An enumerated type tree_child describes the type of node in the process of being deleted. If the node you want to delete is the root node, and has children, the root of the entire tree will be changed so you must pass the address of the new root back to the caller. A tree_child value of NOT_CHILD is a node that is not a child node (the root of the tree), LEFT_CHILD is a left node and RIGHT_CHILD is a right node. These designations are temporary and only used by the deletion function tree_delete. If none of these condition applies, then an error must have occurred linking the nodes so the error code of LINK_ERROR is stored in the global error indicator t_error. In this case, the abstract data type should be considered corrupted and drastic action should be triggered.
Since C passes parameters by value, changes to this_node made inside the tree_delete function will disappear when the function goes out of scope. However if this_node is a right child and we change it indirectly by assigning this_node to temp and a value to temp->parent->right, this change will be effective external to the function. This method seems less error prone than using node pointer pointers (**this_node) and makes the source a little easier to follow.
The code for deleting a node that has both left and right children may look a little confusing. In English, the program appends the entire left subtree of the node to be deleted to the left-most extremity of its right subtree. The program then redirects the parent pointer of the left subtree to the node from which the left subtree now dangles. Then the program copies the parent pointer for the node to be deleted to its right child, spanning the "hole" that will be made by deleting the node. Now the target node can be deleted without disrupting the tree structure. If the deleted node was the root of the tree, this_node has the new root and is returned, otherwise root is returned to the caller.
Of course, now that the node data type has been modified, the functions that use it should be checked for side-effects. The tree_insert function must now link the parent pointers as it adds items to the tree. When memory for a new left or right child node is allocated, the parent pointer of that child of this_node must be set to this_node before this_node is given that child's address. In the case of a duplicate key, the parent pointer remains intact.
SummaryThis binary tree, though generic, is still fairly simple. No question of balance has yet been posed. If the tree is built from ordered data, it degenerates into a linked list, with every search beginning at the head of the list. In this particular implementation, advantage could be made of the fact that new data having a key already in the tree is handled by assigning the data to the node whose key is already in the tree. For instance, if you wished to build the tree from a disk file of ordered records, you could take a random sample of the records, sort it, use the center record for the root, rewind the file and simply process the input sequentially. In this way you could reduce the likelihood of the tree becoming a linear list. More complex insertion and deletion techniques are required to insure tree balance. One type of binary tree that guarantees near balance (meaning that the left subtree and right subtree of each node in the tree differ in the number of children by at most one node) is the AVL Tree.
Suggested ReadingSedgewick, Robert. 1988. Algorithms. Addison-Wesley: Reading, MA.
Weiskamp, Keith, Namir Shammas, and Ron Pronk. 1989. Turbo Algorithms, A Programmer's Reference. Wiley: New York.