[data structure and algorithm] Chapter 10, 11 and 12: balanced tree (2-3 search tree, red black tree) B tree and B + tree

10. Balanced tree

The query efficiency of the previous binary search tree is much higher than that of the simple linked list and array. In most cases, it is true, but in the worst case, the performance of the binary search tree is still very poor
For example, insert 9,8,7,6,5,4,3,2,1 data into the binary search tree in turn, and the resulting tree will look like the following:

If you want to find the element 1, the search efficiency will still be very low. The reason for the low efficiency is that the tree is not balanced and all branches are to the left. If there is a way to make the generated trees look like a complete binary tree without being affected by the inserted data, the search efficiency will still be very good even in the worst case

10.1, 2-3 lookup tree

In order to ensure the balance of the search tree, some flexibility is required, so one node in the tree is allowed to save multiple keys

To be exact, the nodes in a standard binary search tree are called 2-nodes (containing one key and two chains)

Now we introduce a 3-node, which contains two bonds and three chains

Each chain in 2-node and 3-node corresponds to an interval divided by the saved key

1) Definition

A 2-3 lookup tree is either empty or meets the following two requirements:

  • 2-node

    • Contains a key (and its corresponding value) and two chains
    • The keys of the left link pointing to the 2-3 tree are smaller than the node
    • The keys in the 2-3 tree pointed to by the right link are greater than the node
  • 3-node

    • Contains two keys (and their corresponding values) and three chains
    • The keys in the 2-3 tree pointed to by the left link are smaller than the node
    • The keys in the 2-3 tree pointed to by the link in are located between the two keys of the node
    • The keys in the 2-3 tree pointed to by the right link are greater than the node

2) Search

  • Search algorithm similar to binary search tree
  • To determine whether a key is in the tree, first compare it with the key in the root node. If it is equal to any one of them, find the hit; Otherwise, the connection pointing to the corresponding interval is found according to the comparison results, and the search continues recursively in the subtree it points to. If this is an empty link, look for a miss

3) Insert

  • Insert new key into 2- node

    Inserting elements into a 2-3 tree is the same as inserting elements into a binary lookup tree

    First, find the node, and then hang the node to the node that is not found. The reason why the 2-3 tree can guarantee the efficiency in the worst case is that it can still maintain a balanced state after insertion.

    If the node not found after searching is a 2-node, it is easy. Just put the new element into the 2-node to make it a 3-node. However, if the lookup node ends at a 3-node, it may be a little troublesome

  • Insert a new key into a tree with only one 3-node

    Suppose that the 2-3 tree contains only one 3-node. This node has two keys and there is no space to insert the third key. The most natural way is to assume that this node can store three elements and temporarily turn it into a 4-node. At the same time, it contains four links

    Then, the middle element of this 4-node is promoted, the left key is its left child node, and the right key is its right child node. When the insertion is completed, it becomes a balanced 2-3 search tree, and the height of the tree changes from 0 to 1

  • Insert a new key into a 3-node whose parent node is 2-node

    As in the above case, you can also insert a new element into the 3-node to make it a temporary 4-node, then promote the intermediate element in the node to the parent node (i.e. 2-node), make its parent node a 3-node, and then hang the left and right nodes in the appropriate position of the 3-node respectively

  • Insert a new key into a 3-node whose parent node is a 3-node

    When the inserted node is a 3-node, split the node and promote the intermediate element to the parent node. However, the parent node is a 3-node. After insertion, the parent node becomes a 4-node, and then continue to promote the intermediate element to its parent node until a 2-node is encountered, and then it becomes a 3-node. There is no need to continue splitting

  • Decomposition root node

    When the path from the inserted node to the root node is all 3-nodes, the final root node will become a temporary 4-node. At this time, the root node needs to be divided into two 2-nodes, and the height of the tree is increased by 1

4) Nature

Through the analysis of 2-3 tree insertion operation, it is found that 2-3 tree needs to make some local transformations to maintain the balance of 2-3 tree

A completely balanced 2-3 tree has the following properties:

  • The path length of any empty link to the root node is equal
  • When a 4-node is transformed into a 3-node, the height of the tree will not change. Only when the root node is a temporary 4-node, the tree height + 1 when the root node is decomposed
  • The biggest difference between 2-3 tree and ordinary binary search tree is that ordinary binary search tree grows from top to bottom, while 2-3 tree grows from bottom to top

5) Realize

  • It is complicated to directly implement the 2-3 tree because:
    • Different node types need to be handled, which is very cumbersome
    • Multiple comparisons are required to move the node down
    • Need to move up to split 4- nodes
    • There are many ways to split 4- nodes

2-3 the implementation of lookup tree is complex. In some cases, the balance operation after insertion may reduce the efficiency. However, as an important concept and idea, 2-3 search tree is very important for red black tree, B tree and B + tree

10.2, 2-3-4 and red black trees

1) Equivalence relation with red black tree

2) Convert to red black tree

A 2-3-4 tree can have multiple red and black trees, and a red and black tree has only one 2-3-4 tree

3) Add

Idea: add first and then adjust

  • All new nodes are red
  • They are all added from the leaf node



There are four situations that need to be adjusted:

  • Left left:/
    • A slash (it's best to draw by hand at this time). The vertex represents the parent node, the middle point represents the parent node, and the bottom point represents the insertion node
  • Right right:\
    • A backslash indicates the same as above
  • Left and right:<
    • A less than sign, the vertex represents the parent node, the middle cusp represents the parent node, and the bottom point represents the insertion node
  • Right left: >
    • A greater than sign indicates the same as above


4) adjustment after adding

TreeMap source code

Online presentation: https://www.cs.usfca.edu/~galles/visualization/RedBlack.html

5) Delete

Idea: delete first and then adjust

6) adjustment after deletion

See another blog post

10.3 red and black trees

2-3 tree can ensure that after inserting elements, the tree still maintains a balanced state. In the worst case, all its child nodes are 2-nodes, and the height of the tree is lgN. Compared with ordinary binary search trees, the height of the tree in the worst case is N, which does ensure the time complexity in the worst case, but the implementation of 2-3 tree is too complex, Therefore, this paper introduces a simple implementation of 2-3 tree idea: red black tree

  • Red black tree mainly encodes 2-3 trees. The basic idea behind red black tree is:

    • The 2-3 tree is represented by a standard binary search tree (completely composed of 2-nodes) and some additional information (replacing 3-nodes)
  • There are two types of links in the tree

    • Red link: connect two 2-nodes to form a 3-node
    • Black links: normal links in the 2-3 tree
  • Specifically, the 3-node is represented as two 2-nodes connected by a left oblique red link

    Two 2-nodes: one is the left child node of the other

  • Advantages: you can directly use the get method of the standard binary lookup tree without modification

1) Definition

A red black tree is a binary lookup tree that contains red black links and meets the following conditions:

  • Red links are left links
  • No node is connected to two red links at the same time
  • The tree is perfectly black balanced, that is, the number of black links on the path of any empty link to the root node is the same

The following is the corresponding relationship between red black tree and 2-3 tree:

2) Node API

Because each Node (except the root Node) will only have a link to itself (from its parent Node to it), you can add a boolean variable color to the previous Node to represent the color of the link. If the link to it is red, the value of the variable is true. If the link is black, the value of the variable is false


package chapter10;

/**
 * @author Soil flavor
 * Date 2021/9/10
 * @version 1.0
 * Red black tree node class
 */
public class Node<K,V> {
    /**
     * Key key
     */
    private K key;
    /**
     * Value value
     */
    private V value;
    /**
     * Left child node
     */
    private Node left;
    /**
     * Right child node
     */
    private Node right;
    /**
     * Link color to the parent node of this node
     * Red: true
     * Black: false
     */
    private boolean color;

    /**
     * constructor 
     * @param key
     * @param value
     * @param left
     * @param right
     * @param color
     */
    public Node(K key, V value, Node left, Node right, boolean color) {
        this.key = key;
        this.value = value;
        this.left = left;
        this.right = right;
        this.color = color;
    }
}

3) Balancing

After adding, deleting, modifying and querying the red black tree, there may be red right links or two consecutive red links, which do not meet the definition of the red black tree. Therefore, these situations need to be repaired by rotation to keep the red black tree balanced

1. Sinistral

When the left child node of a node is black and the right child node is red, left rotation is required

Left black right red

Premise: the current node is h, and its right child node is x

  • Left-handed process
    • Let the left child of X become the right child of H: h.right = x.left
    • Let H be the left child of X: x.left = h
    • Assign the color attribute of h to the color attribute value of X: x.color = h.color
    • Change the color of h to: RED

Sinistral initial:

Left rotation process:

Left rotation end:

2. Dextral

When the left child node of a node is red and the left child node of the left child node is also red, it needs to rotate right

Zuo Zi and Zuo sun duhong

Premise: the current node is h, and its left child node is x

  • Dextral process
    • Let the right child of X become the left child of H: h.left = x.right
    • Let H be the right child of X: x.right = h
    • Assign the color attribute of h to the color attribute value of X: x.color = h.color
    • Change the color of h to: RED

The right-handed back x node is still connected with two red links, which can be solved by subsequent color inversion

Right hand initial:

Dextral process:

Right hand rotation end:

4) Insert a new key into a single 2- node

A red black tree with only one key contains only one 2-node. After inserting another key, you may need to rotate

  • If the new key is less than the key of the current node

    Just add a red node

    The new red black tree is completely equivalent to a single 3-node

  • If the new key is greater than the key of the current node

    Then the new red node will generate a red right link. At this time, you need to turn the red right link into a left link through left rotation, and the insertion operation is completed

    The new red black tree is still equivalent to 3-node, which contains two keys and a red link

5) Insert a new key to the bottom 2- node

Inserting a new key into a red black tree in the same way as a binary search tree will add a node at the bottom of the tree (to ensure order). The only difference is that the new node will be connected with its parent node with a red link. If its parent node is a 2-node, the above two methods still apply

6) Color reversal

When the color of the left and right child nodes of a node is RED, that is, a temporary 4-node appears. At this time, you only need to change the color of the left and right child nodes to BLACK and the color of the current node to RED

Both sides are red

7) Inserts a new key into a double key tree

Double bond tree: a 3-node tree

It is divided into three sub situations

  • The new key is larger than the two keys in the original tree

  • The new key is smaller than the two keys in the original tree

  • The new key is between two keys in the original tree

8) The root node color is always black

In the Node object, the color attribute represents the color of the connection from the parent Node to the current Node. Since there is no parent Node in the root Node, it is necessary to set the color of the root Node to black after each insertion operation

9) Insert a new key to the bottom 3- node of the tree

Suppose a new node is added under a 3-node at the bottom of the tree

The first three situations may occur

  • Right link: just change the color
  • Left link: you need to rotate right, and then convert the color
  • Middle link: you need to rotate left, then right, and finally change the color

Color conversion will make the color of the intermediate node turn red, which is equivalent to sending it to the parent node. This means that you can continue to insert a new key into the parent node by using the same method until you encounter a 2-node or root node

10)API

11) Realize

package chapter10;

/**
 * @author Soil flavor
 * Date 2021/9/10
 * @version 1.0
 * Red black tree
 */
public class RedBlackTree<K extends Comparable<K>, V> {
    /**
     * Root node
     */
    private Node root;
    /**
     * Number of elements
     */
    private int n;
    /**
     * Red link
     */
    private static final boolean RED = true;
    /**
     * Black link
     */
    private static final boolean BLACK = false;

    /**
     * constructor 
     */
    public RedBlackTree() {
        //this.root = new Node(null, null, null, null, BLACK);
        this.n = 0;
    }


    /**
     * Judge whether the parent pointing link of node x is red
     *
     * @param x
     * @return
     */
    private boolean isRed(Node x) {
        if (x != null) {
            return x.color == RED;
        }
        return false;
    }

    /**
     * Left rotation of node h
     * -------------------------------------
     *      H                       X
     *    /  \\                   // \
     *   a    X      Left back rotation > H C
     *      /  \               /  \
     *     b    c             a    b
     * -------------------------------------
     *
     * @param h
     * @return
     */
    private Node rotateLeft(Node h) {
        // Parameter validity detection
        if (h == null || h.right == null) {
            return null;
        }

        // The current node is h and its right child node is x
        Node x = h.right;

        // Let the left child of X become the right child of H: h.right = x.left
        h.right = x.left;

        // Let H be the left child of X: x.left = h
        x.left = h;

        // Assign the color of h to the color value of X: x.color = h.color
        x.color = h.color;

        // Change the color of h to RED
        h.color = RED;

        return x;
    }

    /**
     * Right rotation of node h
     * After right rotation, there are still nodes connected with two red links, and the color needs to be reversed
     * -------------------------------------
     *         H                  X
     *       // \               // \\
     *      X    c    Dextral backspin > A H
     *    // \                      / \
     *   a    b                    b   c
     * -------------------------------------
     *
     * @param h
     * @return
     */
    private Node rotateRight(Node h) {
        // Parameter validity detection
        if (h == null || h.left == null) {
            return null;
        }

        // The current node is h and its left child node is x
        Node x = h.left;

        // Let the right child of x become the left child of h
        h.left = x.right;

        // Let h be the right child of x
        x.right = h;

        // Assign the color of h to the color of X: x.color = h.color
        x.color = h.color;

        // Change the color of h to: RED
        h.color = RED;

        return x;
    }

    /**
     * Invert the color of h node
     * It is equivalent to splitting 4-nodes
     * ------------------------------------
     *        |               ||
     *        H      ===>     H
     *      // \\            / \
     *     a    b           a   b
     * ------------------------------------
     *
     * @param h
     */
    private void flipColors(Node h) {
        if (h == null) {
            return;
        }
        // Change the color of the left and right child nodes of h to black
        h.left.color = BLACK;
        h.right.color = BLACK;
        // Change the color of h to red
        h.color = RED;
    }

    /**
     * Insert / modify element
     *
     * @param key
     * @param value
     */
    public void put(K key, V value) {
        root = put(root, key, value);

        // The root node is always black
        root.color = BLACK;
    }

    /**
     * Insert the element on h and return to the new tree
     *
     * @param h
     * @param key
     * @param value
     * @return
     */
    private Node put(Node h, K key, V value) {
        // If h is null
        if (h == null) {
            // Quantity plus 1
            n++;
            // Create a new node and return
            return new Node(key, value, null, null, RED);
        }

        // Compare the size of the key with that of the h node
        int cmp = key.compareTo(h.key);
        if (cmp < 0) {
            // Less than: add (recursive) to the left child node
            h.left = put(h.left, key, value);
        } else if (cmp > 0) {
            // Greater than: add right child node (recursive)
            h.right = put(h.right, key, value);
        } else {
            // Equal to: value substitution
            h.value = value;
        }

        // Sinistral: left black right red
        if (!isRed(h.left) && isRed(h.right)) {
            h = rotateLeft(h);
        }

        // Right handed: Zuo zizuo sun duhong
        if (h.left != null && isRed(h.left) && isRed(h.left.left)) {
            h = rotateRight(h);
        }

        // Color reversal: left and right red
        if (isRed(h.left) && isRed(h.right)) {
            flipColors(h);
        }

        // Return to new tree h
        return h;
    }

    /**
     * Get the value of key
     *
     * @param key
     * @return
     */
    public V get(K key) {
        return get(root, key);
    }

    /**
     * Get the value of key in h
     *
     * @param h
     * @param key
     * @return
     */
    private V get(Node h, K key) {
        if (h == null) {
            return null;
        }

        // Compare the size of key and h key
        int cmp = key.compareTo(h.key);
        if (cmp < 0) {
            // Less than: recursively find left subtree
            return get(h.left, key);
        } else if (cmp > 0) {
            // Greater than: recursively find right subtree
            return get(h.right, key);
        } else {
            // Equal to: return value
            return h.value;
        }
    }

    /**
     * Number of elements
     *
     * @return
     */
    public int size() {
        return n;
    }

    /**
     * Internal node class
     */
    private class Node {
        /**
         * Key key
         */
        private K key;
        /**
         * Value value
         */
        private V value;
        /**
         * Left child node
         */
        private Node left;
        /**
         * Right child node
         */
        private Node right;
        /**
         * Link color to the parent node of this node
         * Red: true
         * Black: false
         */
        private boolean color;

        /**
         * constructor 
         *
         * @param key
         * @param value
         * @param left
         * @param right
         * @param color
         */
        public Node(K key, V value, Node left, Node right, boolean color) {
            this.key = key;
            this.value = value;
            this.left = left;
            this.right = right;
            this.color = color;
        }
    }
}
package chapter10;

import org.junit.Test;

/**
 * @author Soil flavor
 * Date 2021/9/10
 * @version 1.0
 * Test red black tree
 */
public class RedBlackTreeTest {
    @Test
    public void test(){
        RedBlackTree<String, String> tree = new RedBlackTree<>();
        tree.put("3","Zhang San");
        tree.put("2","Wang Wu");
        tree.put("7","pseudo-ginseng");
        tree.put("4","Li Si");

        System.out.println(tree.size());
        System.out.println(tree.get("7"));

        tree.put("1","boss");
        System.out.println(tree.size());

        tree.put("3","Third brother");
        System.out.println(tree.size());
        System.out.println(tree.get("3"));
    }
}
4
 pseudo-ginseng
5
5
 Third brother

10.4 Java classes

java.util.TreeMap

11. B tree

  • More than two key s are allowed in a node
  • B-tree is a tree data structure, which can store data, sort it, and allow operations such as searching, sequential reading, inserting and deleting with O(logn) time complexity

11.1 characteristics of B-tree

  • The B tree allows a node to contain multiple key s, which can be 3, 4, 5 or more. It is uncertain. It depends on the specific implementation
  • Select a parameter m to construct a B-tree, which can be called a B-tree of order M. then the tree will have the following characteristics:
    • Each node can have at most M-1 key s, which are arranged in ascending order
    • Each node can have at most M child nodes
    • The root node has at least two child nodes

In practical applications, the order of B-tree is generally large (usually greater than 100). Therefore, even if a large amount of data is stored, the height of B-tree is still small, so its advantages can be reflected in some application scenarios

11.2. B-tree storage data

Example: M = 5, each node contains up to 4 key value pairs

11.3 application of B-tree in disk file

In the program, it is inevitable to operate the file through IO, and the file is stored on disk. The files on the computer operating disk are operated through the file system, and the B-tree data structure is used in the file system

1) Disk

The disk can store a large amount of data from GB to TB, but the reading speed is slow because it involves machine operation, and the reading speed is milliseconds

A disk is composed of disks. Each disk has two sides, also known as disk surface. There is a rotatable spindle in the center of the disk, so that the disk rotates at a fixed rotation rate, usually 5400 rpm or 7200 rpm. A disk contains multiple such disks and is encapsulated in a sealed container. Each surface of the disk is composed of a group of concentric circles called tracks. Each track is divided into a group of sectors. Each sector contains an equal number of data bits, usually 512 sub sections. The sectors are separated by some gaps. No data is stored in these gaps

2) Disk IO

  • The disk uses the magnetic head to read and write the bits stored on the disk surface, and the magnetic head is connected to a moving arm, which moves back and forth along the disk radius to position the magnetic head on any track, which is called seek operation. Once the track is located, the disk rotates. When each bit on the track passes through the head, the read-write head can perceive the value of the bit or modify the value. The access time to the disk is divided into seek time, rotation time, and transfer time

  • Due to the characteristics of the storage medium, the access of the disk itself is much slower than the main memory, and the mechanical movement is time-consuming. Therefore, in order to improve efficiency, it is necessary to minimize disk I/O and read-write operations. In order to achieve this goal, the disk is often not read strictly on demand, but will be read in advance every time. Even if only one byte is required, the disk will start from this position, read a certain length of data backward in order and put it into memory. The theoretical basis for this is the famous locality principle in Computer Science: when a data is used, the nearby data is usually used immediately. Due to the high efficiency of disk sequential reading (no seek time and little rotation time), pre reading can improve I/O efficiency

  • Page is the logical block of computer management memory. The hardware and operating system often divide the main memory and disk storage area into consecutive blocks of equal size. Each storage block is called a page (1024 bytes or its integer multiple), and the length of pre read is generally an integer multiple of the page. Main memory and disk exchange data on a page by page basis. When the data to be read by the program is not in the main memory, a page missing exception will be triggered. At this time, the system will send a disk reading signal to the disk. The disk will find the starting position of the data, read one or more pages backward continuously, load them into the memory, and then return to the exception, and the program continues to run

  • The file system designer uses the disk read ahead principle to set the size of a node equal to one page (1024 bytes or integer multiples), so that each node can be fully loaded only once with I/O. Then the three-tier B-tree can hold 1024 * 1024 * 1024 almost 1 billion data. If you replace it with a binary search tree, you need 30 layers! Assuming that the operating system reads one node at a time and the root node remains in memory, the B tree searches for the target value in 1 billion data. It only needs less than 3 hard disk reads to find the target value, but the red black tree needs less than 30 times. Therefore, the B tree greatly improves the operation efficiency of IO

12. B + tree

B + tree is a variant of B tree, which is different from B tree in that:

  1. Non leaf nodes only serve as indexes, that is, non leaf nodes only store key s, not value s
  2. All leaf nodes of the tree form an ordered linked list, which can traverse all data in the order of key sorting

12.1. B + tree storage data

Example: if M = 5, each node can contain up to 4 key value pairs


12.2 comparison of B + tree and B tree

  • The advantages of B + tree are:
    • Because the B + tree does not contain real data on non leaf nodes and is only used as an index, it can store more key s with the same memory
    • The leaf nodes of B + tree are connected, so the traversal of the whole tree only needs to traverse the leaf nodes linearly once. Moreover, because the data are arranged in sequence and connected, it is convenient for interval search and search. The B tree needs recursive traversal at each layer
  • The advantages of B-tree are:
    • Since each node of the B tree contains key and value, when searching for value according to the key, you only need to find the location of the key to find the value. However, only the leaf node of the B + tree stores data. Each search of the index must be done once, until you find the maximum depth of the tree, that is, the depth of the leaf node, to find the value

12.3 application of B + tree in database

In the database operation, query operation can be said to be the most frequent operation. Therefore, the query efficiency must be considered when designing the database. In many databases, B + trees are used to improve the query efficiency; When operating the database, in order to improve the query efficiency, you can establish an index based on a field of a table to improve the query efficiency. In fact, this index is realized by the data structure of B + tree

1) Primary key index query not established

[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-zcfm6tku-1631976329182) (data structure and algorithm. assets/image-20210914210020062.png)]

implement

select * from user where id = 18

You need to query from the first data to the sixth data and find id=18. Then you can query the target result. You need to compare it six times

2) Establish primary key index query

3) Interval query

implement

select * from user where id>=12 and id<=18

If you have an index, the leaf nodes of the B + tree form an ordered linked list, so you only need to find the leaf nodes with id 12 and look them up in the order of traversing the linked list, which is very efficient

Tags: Algorithm data structure linked list

Posted on Sat, 18 Sep 2021 20:22:35 -0400 by Teddy B.