Java implementation of red black tree (balanced binary tree)

preface

Before implementing the red black tree, let's take a look at the symbol table.

The description of the symbol table is based on the fourth edition of Algorithms. The details are as follows: https://algs4.cs.princeton.edu/home/

The symbol table is sometimes called a dictionary, just as in an English dictionary, a word corresponds to an explanation. The symbol table is sometimes called an index, that is, the part of the book that lists the terms in alphabetical order for easy search. In general, the symbol table is to associate a key with a value, such as the dictionary in Python, the HashMap and hashtable in JAVA, and the storage method of key value pairs in redis.

In today's big data era, the use of symbol tables is very frequent, but in a symbol table with massive data, how to achieve rapid search and insert data is an efficient algorithm to complete. It can be said that without the invention of these algorithms, the information age is impossible.

Since the data structure is used to implement the symbol table, it requires us to define the API of the symbol table, that is, the function of the symbol table. We mentioned earlier that since the use of the symbol table is how to find and insert data in massive data, we define the API of the symbol table, which has the four basic functions of adding, deleting, modifying and querying.

/**
 * <p>
 *     Basic API for symbol table
 * </p>
 * @author qzlzzz
 * @version 1.0
 * @since 2021/10/8
 */
public interface RedBlackBST<Key extends Comparable<Key>,Value> {

    /**
     * Find the Value in the symbol table according to the Key
     * @param key the key
     * @return the value of key
     */
    Value get(Key key);

    /**
     * Insert Key Value. If there is a Key in the symbol table and the Key is not empty, the Value of the Key will be converted to the incoming Value
     * @param key the-key
     * @param value the-value
     */
    void put(Key key,Value value);

    /**
     * Delete the Key from the symbol table according to the Key
     * @param key the key
     */
    void delete(Key key);

}

Here, because the red black tree is a balanced binary tree, it means that it has balance and order. Because of the characteristics of order, we can find the keys according to the range or location, and we can also find the minimum and maximum keys in the tree.

As for what is balance, let's stop here.

Therefore, we can add additional definitions:

    /**
     * Returns the key according to the position. If it does not return null
     * @param k the index of key
     * @return the key
     */
    Key select(int k);

    /**
     * Returns the smallest key in the red black tree
     * @return the min key in this tree
     */
    Key min();

    /**
     * Returns the largest key in the red black tree
     * @return the max key in this tree
     */
    Key max();

    /**
     * Returns the number less than the key
     * @param key the key
     * @return amount of key small than the key
     */
    int rank(Key key);

Next, let's talk about red and black trees.

Red black binary lookup tree

The red black binary lookup tree actually implements a 2-3 tree based on the binary lookup tree, that is, the red black binary lookup tree is a 2-3 tree. So before we know the red black binary search tree, we have to understand the principle and structure of 2-3 tree.

2-3 tree

We call a node with one key and two links as a 2-node. Each node of a standard binary search tree is a 2-node. Under good consideration, we construct a standard binary search tree. Generally, we can get a search tree whose tree height is the logarithm of the total key tree, and its search and insertion operations are at the logarithm level, However, the good performance of the basic implementation of the standard binary search tree depends on the problem caused by the chaotic distribution of key value pairs, which leads to the problem of playing the growth and decline path. However, we can not guarantee that the insertion situation is random. If the insertion of key value pairs is sequential, the following problems will be caused:

We can see from the figure that if we insert a, B, C, D and E in order, we will get a binary search tree whose key value is proportional to the tree height, and its insertion and search will be raised from the logarithmic level to the O(N) level.

Of course, what we want is that no matter what the key value pair is, we can construct a data structure in which the tree height is logarithmic to the total number of keys, and the operations such as insertion and search can be completed in logarithmic time. In other words, in the case of sequential insertion, we hope that the tree height is still ~ lgN, so that we can ensure that all searches can end in ~ lgN comparisons.

In order to ensure the balance of the search tree, we need some flexibility. Therefore, here we allow a node in the tree to save multiple keys. We introduce 3-nodes. The so-called 3-nodes are two keys and three links in a node.

Therefore, a 2-3 search tree is either an empty tree or composed of 2-nodes and 3-nodes. Before introducing the operation of 2-3 tree, we insert a, B, C, D, e, F, G and h in order, as shown in the following figure:

From the figure, we can see the balance and flexibility of the 2-3 tree, which ensures that the tree height obtained by arbitrary insertion is still the logarithm of the total bond.

2-3 tree insertion

Understanding the insertion operation of 2-3 tree is conducive to the construction of red black tree. There are three cases:

  1. Insert a new key. The underlying node is a 2-node
  2. Insert a new key. The underlying node is a 3-node and the parent node is a 2-node
  3. Insert a new key. The underlying node is a 3-node and the parent node is a 3-node

First case

If a new key is inserted and the underlying node is a 2-node, the underlying node becomes a 3-node, and the inserted key can be saved.

The second case

If a new key is inserted, the underlying node is a 3-node. The underlying node first becomes a temporary 4-node (3 keys and 4 links), and then the middle key in the 4-node is spitted out, so that the parent node changes from a 2-node to a 3-node, and the keys on both sides of the key in the original 4-node become two 2-nodes. The original link from the parent node to the child node is replaced by the links on the left and right sides of the key in the original 4-node, Point to two new 2-nodes respectively.

The third case

If a new key is inserted and the underlying node is a 3-node and its parent node is also a 3-node, the underlying node becomes a temporary 4-node, and the middle key in the latter 4-node spits out, so that the parent node becomes a temporary 4-node from a 3-node, and the keys on both sides of the key in the original 4-node become two 2-nodes. The original link from the parent node to the child node is replaced by the links on the left and right sides of the key in the original 4-node, Point to two new 2-nodes respectively, and then the parent node will spit out the middle key. Repeat the above steps. If the parent node of the parent node is also a 3-node, continue the above steps. If the root node is also a 3-node, the root node spits out the middle key. After generating two 2-nodes, the whole tree height is + 1, but the path from each bottom node to the root node is always equal.

The above three changes are the core of the dynamic changes of the 2-3 tree and are very key. We can see that this change is from bottom to top and local. This local change does not affect the order and balance of the 2-3 tree.

At the same time, we can also see that it is quite troublesome to implement 2-3 trees in code, because there are too many situations to deal with. We need to maintain two different types of nodes, compare the searched keys with each key in the node, and copy links and other information from one node to another. Implementing this requires a lot of code, and the cost of implementing this code may be more than that of a standard binary lookup tree. Therefore, people came up with a data structure of 2-3 tree combined with standard binary tree, which is red black tree.

Implementation of red black binary tree

Red black tree is based on standard binary tree. The key point of realizing 2-3 tree is that it divides the links of binary tree into red and black. It regards two nodes linked by red chain as 3-nodes, and the node linked by black chain as 2-nodes. This also means that we don't have to rewrite the get() method of a red black tree at all. We only need to use the get() method of a standard binary tree to realize the search. The difference is that we can implement a red black binary search tree by changing the put() method. There are few changes in the code to realize the red black tree, but the idea behind it is actually very complex. Due to space reasons, there is no too much description of the principle of how to realize the three changes of 2-3 tree.

First define the node

/**
 * <h3>
 *     Implementation of red black tree, blog: https://www.cnblogs.com/qzlzzz/p/15395010.html
 * </h3>
 * @author qzlzzz
 * @since 2021/10/12
 * @version 1.0
 */
public class RedBlackBST<Key extends Comparable<Key>,Value> {
    
        
    private Node root;//Root node

    //The link of < parent node > to its < child node > is black
    private static final boolean RED = true;

    //The link of < parent node > to its < child node > is black
    private static final boolean BLACK = false;

    /**
     * <p>Node definition of red black tree</p>
     * @author qzlzzz
     */
    private class Node{
        
        private boolean color;//The color of the link to this node
        
        private Key key;//key
        
        private Value value;//value
        
        private Node left,right;//The link of the node to the left node and the link to the right node
        
        private int n;//Node tree of the subtree

        public Node(Key key,Value value,boolean color,int n){
            this.key = key;
            this.value = value;
            this.color = color;
            this.n = n;
        }
    }
    
}

If the red link is a right link, turn the link to the left.

Here we need to keep the red link as the left link. But it's OK to keep the red link as the right link, but the left link is better.

    /**
     * Calculate the total number of nodes in the red black tree, and internally call {@ link RedBlackBST#size(Node)}
     * @return
     */
    public int size(){
        return size(root);
    }

    //Calculate the total number of nodes of a subtree
    private int size(Node x){
        if (x == null) return 0;
        else return x.n;
    }

    /**
     * Change the red right link to the left link, the overall order remains unchanged, and the number of subtree nodes remains unchanged
     * @param h
     * @return
     */
    private Node rotateLeft(Node h){
        Node t = h.right;
        h.right = t.left;
        t.left = h;
        t.color = h.color;
        h.color = RED;
        t.n = h.n;//After the transformation, the nodes of the subtree are unchanged,
        h.n = size(h.left) + size(h.right) + 1;
        return t;
    }

The converted code diagram is as follows:

1, 2, 3 here refers to the size of the key, not the value. The total number of black links from each bottom layer of the red black tree to the root node is the same, which conforms to the equal distance from each bottom node to the root node in the 2-3 tree.

Here, the idea of converting red left links to right links is the same, and readers can try to realize it by themselves.

Determine whether the link is a red link

    //Judge whether the link is red, not false
    private boolean isRed(Node x){
        if (x == null) return false;
        return x.color;
    }

If the left and right links are red, set the color of the links on both sides to black, and set the color of the links pointing to yourself to red

    /**
     * <p>If the left and right links are red, set the color of the links on both sides to black, and set the color of the links pointing to yourself to red</p>
     * @param x
     */
    private void changeColor(Node x){
        x.color = true;
        x.left.color = false;
        x.right.color = true;
    }

Why?

  • In fact, it is inseparable from the second operation of the above 2-3 tree. When the node is a temporary 4-node, the middle key will be spit out, and the keys on both sides will become two 2-nodes. The original link to the temporary 4-node will become the links on both sides in the middle of the original 4-node and point to the new 2-node. If the parent node is a 2-node, the original 4-middle key will become a 3-node together. If the parent node is a 3-node, the above operation will be repeated, because we want to keep the red link as the link, If there is a right red link in the middle, you need to use the rotateLeft() method to convert.

Next, let's use the red black binary tree to get and put the symbol table

    /**
     * Find the value through the key, and call {@ link redblackberry #get (node, comparable)}
     * @param key
     * @return
     */
    public Value get(Key key){
        if (key == null) throw new IllegalArgumentException("argument to get() is null");
        return get(root,key);
    }
    
    private Value get(Node x,Key key){
        for (;;){
            if (x == null) return null;
            int cmp = key.compareTo(x.key);
            if (cmp == 0) return x.value;
            else if (cmp < 0) x = x.left;
            else x = x.right;
        }
    }
    /**
     * Insert a key value pair and use {@ link redblackbest #put (node, comparable, object)}
     * @param key
     * @param value
     */
    public void put(Key key,Value value){
        if (key == null) throw new IllegalArgumentException("argument to put() is null");
        root = put(root,key,value);
        root.color = false;
    }

    private Node put(Node x,Key key,Value value){
        if (x == null) return new Node(key,value,RED,1);
        int cmp = key.compareTo(x.key);
        if (cmp == 0) {x.value = value;}
        else if (cmp < 0) x.left = put(x.left,key,value);
        else x.right = put(x.right,key,value);
        
        if (isRed(x.right) && !isRed(x.left)) x = rotateLeft(x);
        if (isRed(x.left) && isRed(x.left.left)) x = rotateRight(x);
        if (isRed(x.left) && isRed(x.right)) changeColor(x);
        
        x.n = size(x.left) + size(x.right) + 1;
        return x;
    }

As for the put method, the following three if statements are:

  • If the right link of the current node is red, turn it to the left red link. When the left and right links are red, call the changeColor() method to complete the local dynamic changes of the 2-3 tree, that is, the above-mentioned 2-3 tree inserts a new key. The underlying node is a 3-node and the parent node is a 2-node.
  • If the left link of the current node and the left link of the left link are red, it indicates that this is a temporary 4-node. We need to turn the first left red link into a right red link, and then get a subtree with both left and right links in red. Call changeRed() method to complete the local dynamic change of 2-3 tree, that is, the new key of 2-3 tree mentioned above, The underlying node is a 3-node and the parent node is a 2-node operation.
  • When the left and right links are red, call the changeColor() method.

Finally, rank and select the symbol table

    /**
     * According to the location return key, call {@ link redblackberry #select (node, int)}
     * @param k
     * @return
     */
    public Key select(int k){
        return select(root,k);
    }

    private Key select(Node x,int k){
        while(x != null){
            int t = x.left.N;
            if (t > k) x = x.left;
            else if (t < k){
                x = x.right;
                k = k - t - 1;
            }
            else return x.key;
        }
        return null;
    }

    /**
     * Return the number of keys according to the key, and internally call {@ link redblackberry #rank (node, comparable)}
     * @param key
     * @return
     */
    public int rank(Key key){
        return rank(root,key);
    }

    private int rank(Node x,Key key){
        while (x != null){
            int cmp = key.compareTo(x.key);
            int count = x.left.N;
            if (cmp == 0) return (count < root.N ? count : 1 + root.left.N + count);
            else if (cmp < 0) x = x.left;
            else x = x.right;
        }
        return 0;
    }

Finally, the implementation of the symbol table of the red black binary tree is completed. Readers can also try to put the changeColor() statement in the put() method behind the statement judging that node x is empty. Interestingly, the tree will become a 2-3-4 tree, that is, a tree with 4-nodes

Posted on Wed, 01 Dec 2021 23:28:08 -0500 by MoombaDS