Python data structure and algorithm -- binary search tree

What is binary search tree

Binary Search Tree (also: Binary Search Tree, binary sort tree) it is either an empty tree or a binary tree with the following properties: if its left subtree is not empty, the values of all nodes on the left subtree are less than the values of its root nodes; if its right subtree is not empty, the values of all nodes on the right subtree are greater than the values of its root nodes; Its left and right subtrees are also binary sorting trees.

Implementation of binary search tree

In a binary search tree, if the key values in the left subtree are smaller than the parent node, and the key values in the right subtree are larger than the parent node, we call this tree BST search tree. As mentioned earlier, when we implement Map, the BST method will guide us through this. Figure 1 shows this feature of a binary search tree, with no values associated with the displayed keys. Note that this attribute applies to each parent and child node. All the key values in the left subtree are smaller than those in the root node, and all the key values in the right subtree are larger than those in the root node.

Figure 1: a simple binary search tree

Now that you know what a binary search tree is, let's see how to construct a binary search tree. We insert these key values in the search tree according to the order of the nodes shown in Figure 1. The nodes in the search tree in Figure 1 are 70, 31, 93, 94, 14, 23, 73. Because 70 is the first value to be inserted into the tree, it is the root node. Next, 31 is less than 70, so it's the left subtree of 70. Next, 93 is greater than 70, so it's the right subtree of 70. Now we've filled in two layers of the tree, so the next key value will be the left or right subtree of 31 or 93. Because 94 is larger than 70 and 93, it becomes the right subtree of 93. Similarly, 14 is less than 70 and 31, so it becomes the left subtree of 31. 23 is also less than 31, so it must be the left subtree of 31. However, it is larger than 14, so it is the right subtree of 14.

In order to implement binary search tree, we will use the method of node and reference, which is similar to the process of implementing linked list and expression tree. Because we must be able to create and use an empty binary search tree, we will use two classes to implement it. The first class is called BinarySearchTree, and the second class is called TreeNode. The BinarySearchTree class has a TreeNode class reference as the root of the binary search tree. In most cases, the external methods defined by the external classes only need to check whether the tree is empty. If there are nodes on the tree, the BinarySearchTree class is required to have a private method to define the root as a parameter. In this case, if the tree is empty or we want to delete the root of the tree, we have to take special actions.

The code for the constructor of the BinarySearchTree class and some other functions is as follows


class BinarySearchTree:
    def __init__(self):
        self.root = None
        self.size = 0

    def length(self):
        return self.size

    def __len__(self):
        return self.length()

    def __iter__(self):
        return self.root.__iter__()

The TreeNode class provides a number of auxiliary functions, making the methods of the BinarySearchTree class easier to implement. The structure of a tree node is implemented by these auxiliary functions. As you can see, these auxiliary functions can divide a node as a left or right child and the type of the child node according to its location. The TreeNode class clearly tracks the properties of each parent node. When we talk about the implementation of the delete operation, you'll see why it's important.

Another interesting thing about the TreeNode implementation is that we use Python's optional parameters. Optional parameters make it easy to create a tree node in several different situations. Sometimes we want to create a new tree node, even if we already have a parent node and a child node. As with the existing parent and child nodes, we can use the parent and child nodes as parameters. Sometimes we also create a tree with key value pairs, and we do not pass any parameters of the parent or child nodes. In this case, we will use the default value of the optional parameter.


class TreeNode:
    def __init__(self, key, value, left=None, right=None, parent=None):
        self.key = key
        self.value = value
        self.left_child = left
        self.right_child = right
        self.parent = parent

    def has_left_child(self):
        return self.left_child

    def has_right_child(self):
        return self.right_child

    def is_left_child(self):
        return self.parent and self.left_child.parent == self

    def is_right_child(self):
        return self.parent and self.right_child.parent == self

    def is_leaf(self):
        return not (self.left_child or self.right_child)

    def is_root(self):
        return not self.parent

    def has_any_child(self):
        return self.right_child or self.right_child

    def has_both_child(self):
        return self.left_child and self.right_child

    def replace_data(self, key, value, lc, rc):
        self.key = key
        self.value = value
        self.left_child = lc
        self.right_child = rc

        if self.has_left_child():
            self.left_child.parent = self

        if self.has_right_child():
            self.right_child.parent = self

Insert operation of binary search tree

Now that we have the BinarySearchTree and TreeNode classes, it's time to write a put method that allows us to build a binary search tree. The put method is a method of the BinarySearchTree class. This method will check whether the tree has roots. If not, we will create a new tree node and set it as the root of the tree. If there is already a root node, we call itself, recurse, and use auxiliary functions_ Put searches the tree as follows:

  • Starting from the root node of the tree, search the binary tree to compare the new key value with the key value of the current node. If the new key value is less than the current node, search the left subtree. If the new key is larger than the current node, the right subtree is searched.
  • When we can't find the left (or right) subtree, our position in the tree is to set the position of the new node.
  • Add a node to the tree, create a new TreeNode object, and insert the object in the previous node at this point.

A kind of put function should write recursive algorithm according to the above steps. Note that when a new subtree is inserted, the current node is passed to the new tree as the parent.


    def put(self, key, value):
        if self.root:
            self._put(key, value, self.root)
        else:
            self.root = TreeNode(key, value)

        self.size = self.size + 1

    def _put(self, key, value, current_node):
        if key < current_node.key:
            if current_node.has_left_child():
                self._put(key, value, current_node.left_child)
            else:
                current_node.left_child = TreeNode(key, value, parent=current_node)
        else:
            if current_node.has_right_child():
                self._put(key, value, current_node.right_child)
            else:
                current_node.right_child = TreeNode(key, value, parent=current_node)

With the implementation of put method, we can easily pass__ setitem__ Method overload [] calls the put method as an operator. This enables us to write Python statements like myZipTree['Plymouth'] = 55446, which looks like a python dictionary.


    def __setitem__(self, key, value):
        self.put(key, value)

Figure 2 illustrates the process of inserting a new node into a binary search tree. Gray nodes show the order of nodes traversing the tree during insertion.

Figure 2: inserting a node with key value = 19

Search operation of binary search tree

Once the tree is constructed, the next task is to retrieve a given key value. The get method is easier than the put method because it simply recursively searches the tree until it finds a mismatched leaf node or a matching key value. When a matching key value is found, the value in the node is returned.

A kind of Get and__ getitem__ Code for. With_ The code searched by get method and put method have the same logic of selecting left or right subtree. Please note that_ The get method returns the value of get in TreeNode_ Get can be used as a flexible and effective way to provide parameters for other methods of BinarySearchTree that may need to use the data in TreeNode.

By implementing__ getitem__ Method, we can write a Python statement that looks like our access dictionary, but in fact we just operate on a binary search tree, such as Z = myziptree ['fargo ']. As you can see__ getitem__ Methods are all calling get.


    def get(self, key):
        if self.root:
            res = self._get(key, self.root)

            if res:
                return res.value
            else:
                return None

        else:
            return None

    def _get(self, key, current_node):
        if not current_node:
            return None
        elif current_node.key == key:
            return current_node
        elif key < current_node.key:
            return self._get(key, current_node.left_child)
        else:
            return self._get(key, current_node.right_child)

    def __getitem__(self, key):
        return self.get(key)

Using get, we can write a binary search tree__ contains__ Method__ contains__ Method simply calls the get method to return True if it has a return value and False if it is None.


    def __contains__(self, key):
        return True if self.get(key) else False

Deleting binary search tree

Finally, we turn our attention to the most challenging method in the binary search tree, deleting a key value. The first task is to find the node to delete in the search tree. If the tree has more than one node, we use_ The get method finds the node to delete. If the tree has only one node, this means that we want to delete the root of the tree, but we still need to check whether the key value of the root matches the key value to be deleted. In both cases, if the key is not found, the del operation will report an error.


    def delete(self, key):
        if self.size > 1:
            node_to_remove = self.get(key)

            if node_to_remove:
                self._remove(node_to_remove)
                self.size = self.size - 1
            else:
                raise KeyError('Error! The key is not on the tree.')

        if self.size == 1 and self.root.key == key:
            self.root = None
            self.size = self.size - 1
        else:
            raise KeyError('Error! The key is not on the tree.')

Once we find the node that contains the node to be deleted, we have to consider three situations

  1. The node to be deleted has no children (see Figure 3)
  2. The node to be deleted has only one child (see Figure 4)
  3. The node to be deleted has two children (see Figure 5)

The first is the simplest. If the current node does not have a child node, what we need to do is to delete the node by reference and delete the reference of the parent node. The code for this example is shown below.

Figure 3: delete the node whose key value is 16. This node has no children


        if current_node.is_leaf():
            if current_node == current_node.parent.left_child:
                current_node.parent.left_child = None
            else:
                current_node.parent.right_child = None

The second case is only slightly more complicated. If a node has only one child, we can simply have the child replace its parent. The code for this case is shown below. When you look at this code, you will find that there are six situations to consider. Because of whether there is a left subtree or a right subtree, we only discuss the case where the current node has only a left subtree. The specific process is as follows:

  1. If the current node is a left subtree, we only need to update the reference of the left subtree to the parent node of the current node, and then update the reference of the left subtree of the parent node to the left subtree of the current node.
  2. If the current node is a right subtree, we only need to update the reference of the right subtree to the parent node of the current node, and then update the reference of the right subtree of the parent node to the right subtree of the current node.
  3. If the current node does not have a parent, it must be the root. In this case, we just need to call replace_ The data method replaces the key with the data in the left and right subtrees.

Figure 4: delete the node with key value 25, which has only one child node


        else:
            if current_node.has_left_child():
                if current_node.is_left_child():
                    current_node.left_child.parent = current_node.parent
                    current_node.parent.left_child = current_node.left_child
                elif current_node.is_right_child():
                    current_node.left_child.parent = current_node.parent
                    current_node.parent.right_child = current_node.left_child
                else:
                    current_node.replace_data(current_node.key, current_node.value, current_node.left_child, current_node.right_child)
            else:
                if current_node.is_left_child():
                    current_node.left_child.parent = current_node.parent
                    current_node.parent.left_child = current_node.right_child
                elif current_node.is_right_child():
                    current_node.right_child.parent = current_node.parent
                    current_node.parent.right_child = current_node.right_child
                else:
                    current_node.replace_data(current_node.key, current_node.value, current_child.left_child, current_node.right_child)

The third situation is the most difficult to deal with. If a node has two children, we can't simply let one of them replace the location of the node. We need to find a node to replace the node to be deleted. The node we need can maintain the relationship between the left and right subtrees of the existing binary search tree. This node has the second largest key value in the tree. We call this node a successor node. We will look for this successor node all the way. The successor node must ensure that there is no more than one child. So now that we know how to deal with these two situations, we can implement it. Once the subsequent node is deleted, we put it in the tree node that will be deleted.

Figure 5: delete the node with key value 5, which has two child nodes

The processing code for the third case is as follows. Notice that we use find_ Success and find_min method is used to find the successor nodes. To delete subsequent nodes, we use splice_out method. We use splice_ The reason for out is that it can directly make the correct changes to the nodes we want to remove.


        elif current_node.has_both_child():
            succ = current_node.find_successor()
            succ.splice_out()
            current_node.key = succ.key
            current_node.value = succ.value

The code to find the successor node is shown below. You can see a method of the TreeNode class. This code uses the property of order traversal in binary search tree to print out the nodes in the tree from minimum to maximum. There are three situations to consider when looking for successor nodes:

  • If the node has a right sub node, the subsequent node is the smallest key node in the right sub tree.
  • If a node has no right child and is the left child tree of its parent node, then the parent node is the successor node. -`If a node is the right child of its parent node, and it has no right child itself, the successor node of this node is the successor node of its parent node, but does not include this node.

The first problem for us now is to remove a node from the binary search tree.


    def find_successor(self):
        succ = None
        if self.has_right_child():
            succ = self.find_min()
        else:
            if self.parent:
                if self.is_left_child():
                    succ = self.parent
                else:
                    self.right_child = None
                    succ = self.parent.find_successor()
                    self.right_child = self

        return succ

    def find_min(self):
        current = self

        while self.has_left_child():
            current = self.left_child

        return current

    def splice_out(self):
        if self.is_leaf():
            if self.is_left_child():
                self.parent.left_child = None
            else:
                self.parent.right_child = None
        elif self.has_any_child():
            if self.has_left_child():
                if self.is_left_child():
                    self.parent.left_child = self.left_child
                else:
                    self.parent.right_child = self.left_child

                self.left_child.parent = self.parent
            else:
                if self.is_right_child():
                    self.parent.left_child = self.right_child
                else:
                    self.parent.right_child = self.right_child

                self.right_child.parent = self.parent

Traversal operation of binary search tree

We also need to look at the last interface of the binary search tree. Assuming that we have simply traversed all the key values on the subtree in order, we are sure to use the dictionary to implement it. We will have a question: why is it not a tree? We already know how to use the algorithm of traversing binary tree in medium order. However, writing an iterator requires more operations, because each time the iterator is called, an iterator returns only one node.

Python provides a very powerful ability to create iterators. This function is yield. Yield, similar to return, returns a value to the caller. However, yield also requires additional steps to pause the execution of the function so that it can be prepared to continue execution the next time the function is called. Its function is to create iterative objects, called generators.

The code for the binary tree iterator is shown below. Look at the code carefully: at first glance, you might think it's non recursive. But remember__ iter__ The operator for x in is overridden for iteration, so it's really recursive! Because it's__ iter__ Method to recurse the instances of TreeNode defined in the TreeNode class.


    def __iter__(self):
        if self:
            if self.has_left_child():
                for elem in self.left_child:
                    yield elem

            yield self.key

            if self.has_right_child():
                for elem in self.right_child:
                    yield elem

Finally, a complete implementation code of binary search tree is provided:


class BinarySearchTree:
    def __init__(self):
        self.root = None
        self.size = 0

    def length(self):
        return self.size

    def __len__(self):
        return self.length()

    def __iter__(self):
        return self.root.__iter__()

    def put(self, key, value):
        if self.root:
            self._put(key, value, self.root)
        else:
            self.root = TreeNode(key, value)

        self.size = self.size + 1

    def _put(self, key, value, current_node):
        if key < current_node.key:
            if current_node.has_left_child():
                self._put(key, value, current_node.left_child)
            else:
                current_node.left_child = TreeNode(key, value, parent=current_node)
        else:
            if current_node.has_right_child():
                self._put(key, value, current_node.right_child)
            else:
                current_node.right_child = TreeNode(key, value, parent=current_node)

    def __setitem__(self, key, value):
        self.put(key, value)

    def get(self, key):
        if self.root:
            res = self._get(key, self.root)

            if res:
                return res.value
            else:
                return None

        else:
            return None

    def _get(self, key, current_node):
        if not current_node:
            return None
        elif current_node.key == key:
            return current_node
        elif key < current_node.key:
            return self._get(key, current_node.left_child)
        else:
            return self._get(key, current_node.right_child)

    def __getitem__(self, key):
        return self.get(key)

    def __contains__(self, key):
        return True if self.get(key) else False

    def delete(self, key):
        if self.size > 1:
            node_to_remove = self.get(key)

            if node_to_remove:
                self._remove(node_to_remove)
                self.size = self.size - 1
            else:
                raise KeyError('Error! The key is not on the tree.')

        if self.size == 1 and self.root.key == key:
            self.root = None
            self.size = self.size - 1
        else:
            raise KeyError('Error! The key is not on the tree.')

    def _remove(self, current_node):
        if current_node.is_leaf():
            if current_node == current_node.parent.left_child:
                current_node.parent.left_child = None
            else:
                current_node.parent.right_child = None
        elif current_node.has_both_child():
            succ = current_node.find_successor()
            succ.splice_out()
            current_node.key = succ.key
            current_node.value = succ.value
        else:
            if current_node.has_left_child():
                if current_node.is_left_child():
                    current_node.left_child.parent = current_node.parent
                    current_node.parent.left_child = current_node.left_child
                elif current_node.is_right_child():
                    current_node.left_child.parent = current_node.parent
                    current_node.parent.right_child = current_node.left_child
                else:
                    current_node.replace_data(current_node.key, current_node.value, current_node.left_child, current_node.right_child)
            else:
                if current_node.is_left_child():
                    current_node.left_child.parent = current_node.parent
                    current_node.parent.left_child = current_node.right_child
                elif current_node.is_right_child():
                    current_node.right_child.parent = current_node.parent
                    current_node.parent.right_child = current_node.right_child
                else:
                    current_node.replace_data(current_node.key, current_node.value, current_node.left_child, current_node.right_child)


class TreeNode:
    def __init__(self, key, value, left=None, right=None, parent=None):
        self.key = key
        self.value = value
        self.left_child = left
        self.right_child = right
        self.parent = parent

    def has_left_child(self):
        return self.left_child

    def has_right_child(self):
        return self.right_child

    def is_left_child(self):
        return self.parent and self.left_child.parent == self

    def is_right_child(self):
        return self.parent and self.right_child.parent == self

    def is_leaf(self):
        return not (self.left_child or self.right_child)

    def is_root(self):
        return not self.parent

    def has_any_child(self):
        return self.right_child or self.right_child

    def has_both_child(self):
        return self.left_child and self.right_child

    def replace_data(self, key, value, lc, rc):
        self.key = key
        self.value = value
        self.left_child = lc
        self.right_child = rc

        if self.has_left_child():
            self.left_child.parent = self

        if self.has_right_child():
            self.right_child.parent = self

    def find_successor(self):
        succ = None
        if self.has_right_child():
            succ = self.find_min()
        else:
            if self.parent:
                if self.is_left_child():
                    succ = self.parent
                else:
                    self.right_child = None
                    succ = self.parent.find_successor()
                    self.right_child = self

        return succ

    def find_min(self):
        current = self

        while self.has_left_child():
            current = self.left_child

        return current

    def splice_out(self):
        if self.is_leaf():
            if self.is_left_child():
                self.parent.left_child = None
            else:
                self.parent.right_child = None
        elif self.has_any_child():
            if self.has_left_child():
                if self.is_left_child():
                    self.parent.left_child = self.left_child
                else:
                    self.parent.right_child = self.left_child

                self.left_child.parent = self.parent
            else:
                if self.is_right_child():
                    self.parent.left_child = self.right_child
                else:
                    self.parent.right_child = self.right_child

                self.right_child.parent = self.parent

    def __iter__(self):
        if self:
            if self.has_left_child():
                for elem in self.left_child:
                    yield elem

            yield self.key

            if self.has_right_child():
                for elem in self.right_child:
                    yield elem

Tags: Programming less Python Attribute

Posted on Fri, 12 Jun 2020 02:29:35 -0400 by WebbieDave