# HashMap basic principle and underlying source code analysis

# 1. Storage structure of HashMap:

HashMap is composed of array, chain structure (linked list) and red black tree. The structure of red black tree is added in JDK 1.8. (the storage structure will change dynamically according to the amount of stored data).

Source code implementation:

/** * Basic hash box node for most entries. (yes) * For information about the subclass of TreeNode, see below; for information about the subclass of EntryEntry, see LinkedHashMap.) * Node: Data node */ static class Node<K,V> implements Map.Entry<K,V> { /** * hash Value the value obtained by hashing the hashcode value of the key is stored in the Entry to avoid repeated calculation * */ final int hash; /** * key Indexes * */ final K key; /** * data Data domain * */ V value; /** * Next node node * */ Node<K,V> next; /** * Constructor */ Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } /** * Get key value * */ public final K getKey() { return key; } /** * Get value * */ public final V getValue() { return value; } /** * key = value * */ public final String toString() { return key + "=" + value; } /** * hashCode hashCode is used to determine the storage address of an object in the hash storage structure; * Note: the same hashCode of two objects does not necessarily mean that two objects are the same * * 1.hashcode For example, there is such a location in memory * 0 1 2 3 4 5 6 7 * And I have a class. This class has a field called ID. I want to store this class in one of the above 8 locations. If it is stored arbitrarily without hashcode, when searching * You need to go to these eight positions one by one, or use algorithms such as dichotomy. * But if hashcode is used, it will improve the efficiency a lot. * There is a field called ID in our class, so we define our hashcode as ID% 8, and then store our class in the location where we get the remainder. than * If our ID is 9 and the remainder of 9 divided by 8 is 1, then we will put the class in the position of 1. If the ID is 13 and the remainder is 5, then we will put the class * Put it in 5 this position. In this way, when looking for this class in the future, you can find the storage location directly by dividing the ID by 8. * * 2.But what if two classes have the same hashcode (we assume that the ID of the above class is not unique), for example, if the remainder * of 9 divided by 8 and 17 divided by 8 is 1, is this legal? The answer is: Yes. So how to judge? At this time, you need to define equals. * In other words, we first judge whether the two classes are stored in a bucket through hashcode, but there may be many classes in this bucket, so we need to find the class we want in this bucket through * equals. * So. Why rewrite hashCode() when equals() is overridden? * Think about it. If you want to find something in a bucket, you must first find the bucket. You don't find the bucket by rewriting hashcode(). What's the use of rewriting equals() * */ public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } /** * Setting a new value will return the old data * */ public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } /** * Judge whether objects are equal * */ public final boolean equals(Object o) { if (o == this) { return true; } if (o instanceof Map.Entry) { Map.Entry<?,?> e = (Map.Entry<?,?>)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) { return true; } } return false; } }

Some basic parameters used:

/** * Default initial capacity - must be a power of 2. */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 /** * Maximum capacity, which is used if both constructors implicitly specify a higher value using parameters. * Max 1073741824 */ static final int MAXIMUM_CAPACITY = 1 << 30; /** * The load factor to use when not specified in the constructor. * The default loading factor is 0.75 */ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * Use a tree instead of a list to list bin count thresholds for bin. * When an element is added to a bin with at least so many nodes, the bin is converted to a tree. * The value must be greater than 2 and at least 8 to be related to the assumption of deleting the tree, that is, converting back to the original category box when shrinking. * When the number of elements in the bucket exceeds this value, you need to replace the linked list node with a red black tree node to match the optimization speed * * That is, when the length of the linked list reaches 8, it is transformed into a tree structure * */ static final int TREEIFY_THRESHOLD = 8; /** * Box count threshold used to de tree (split) boxes during sizing operations. * Should be less than TREEIFY_THRESHOLD and up to 6 to engage with the shrinkage detection under removal. * When the capacity is expanded, if the number of elements in the bucket is less than this value, the tree bucket elements will be restored (segmented) into a linked list structure * From tree structure to chain structure */ static final int UNTREEIFY_THRESHOLD = 6; /** * It can be classified as the minimum capacity of the tree. * (Otherwise, if there are too many nodes in the bin, the table will be resized.) Should be at least 4 TREEIFY_THRESHOLD to avoid conflicts between resizing and treelization thresholds. * When the capacity in the hash table is greater than this value, the bucket in the table can be tree shaped * Otherwise, if there are too many elements in the bucket, the capacity will be expanded rather than tree shaped * In order to avoid the conflict between capacity expansion and tree selection, this value cannot be less than 4 * tree_ THRESHOLD (256) * */ static final int MIN_TREEIFY_CAPACITY = 64;

Definition of basic structural parameters:

/** * The table is initialized on first use and resized as needed. After allocation, the length is always a power of 2. * (In some operations, we also allow zero length to allow the use of boot mechanisms that are not currently needed.) * Main function: save the array structure of Node nodes. */ transient Node<K,V>[] table; /** * Save the cached entrySet(). * Note that the AbstractMap field is used for keySet () and values (). * Main function: Set data structure composed of Node nodes */ transient Set<Map.Entry<K,V>> entrySet; /** * The number of key value mappings contained in this mapping. */ transient int size; /** * The number of structural modifications to the HashMap * Structural modification: * A modification that changes the number of mappings in a HashMap or otherwise modifies its internal structure (for example, re hashing). * This field is used to make the iterator on the collection view of HashMap fail quickly. * (See concurrent modificationexception). */ transient int modCount; /** * The next size value to resize (capacity load factor). * threshold Indicates that the resize operation will be performed when the size of the HashMap is greater than the threshold. * * Usually: threshold = loadFactor * capacity * @serial */ // (after serialization, javadoc is described as true. //In addition, if a table array has not been allocated, //This field will retain the initial array capacity, //Or zero, indicating default_initial_capability.) int threshold; /** * Load factor of the hash table. * * @serial */ final float loadFactor;

# 2. Initialize HashMap

Four initializing constructors are given by default

During initialization, you can specify the initial capacity and load factor of hashMap. jdk1.7 calculates when calling the constructor for initialization, but 1.8 initializes when the first put operation is performed. The resize() method undertakes the tasks of initialization and capacity expansion to a certain extent. (initialization is also equivalent to capacity expansion.)

/** * Construct an empty Map with a specified initial capacity and load factor * * @param initialCapacity the initial capacity Initialization space * @param loadFactor the load factor Load factor * @throws IllegalArgumentException if the initial capacity is negative or the load factor is nonpositive */ public HashMap(int initialCapacity, float loadFactor) { /* If the initial capacity is less than 0: exception: the initial capacity is illegal */ if (initialCapacity < 0) { throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); } //If the initialized capacity is greater than the maximum capacity: 1 < < 30, the initialized capacity becomes the maximum capacity if (initialCapacity > MAXIMUM_CAPACITY) { initialCapacity = MAXIMUM_CAPACITY; } //Load factor is less than or equal to 0. Or no load factor was passed in. Exception thrown: load factor error if (loadFactor <= 0 || Float.isNaN(loadFactor)) { throw new IllegalArgumentException("Illegal load factor: " + loadFactor); } this.loadFactor = loadFactor; /*Initialization parameter threshold */ this.threshold = tableSizeFor(initialCapacity); }

- Threshold capacity threshold: the threshold that needs to be resized next time. This calculation method is very interesting:

If the given capacity is 3, the closest value is 22 = 4. If the given capacity is 5, the closest value is 23 = 8. If the given capacity is 13, the closest value is 24 = 16. From this, we can draw a law: the function of the algorithm is to change all the values after the highest 1 into 1, and finally add the calculated result + 1.

/** * For a given target capacity, the transmitted parameter is transformed into a value to the nth power of 2 * cap The current capacity returns the n-th power of 2 of a cap binary bit through operation. * MAXIMUM_CAPACITY Is the maximum upper limit * Calculation principle: * 5: 0000 0000 0000 0101 * 7: 0000 0000 0000 0111 step 1: Shift a binary number to the right in turn, and then take or with the original value. For the binary of a number, start with the first bit that is not 0 and set all subsequent bits to 1. * 8: 0000 0000 0000 1000 step 2: 7 + 1 -> 8 Get the power of the first 2 greater than 0000 0101 * * However, the above operation is not suitable for 2, 4 and 8, which are originally the n-th power of 2, so the cap - 1 operation is implemented, so the minimum power of 2 (itself) will be obtained * */ static final int tableSizeFor(int cap) { int n = cap - 1; n |= n >>> 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }

Other construction methods:

/** * Construct an empty HashMap with the specified initial capacity and default load factor (0.75). * If we specify the capacity value, we will generally use the power greater than the first 2 of the value as the initialization capacity * @param initialCapacity the initial capacity. * @throws IllegalArgumentException if the initial capacity is negative. */ public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); } /** * Construct an empty < TT > HashMap < TT >, * It has a default initial capacity (16) and a default load factor (0.75). * If the initialization size is not specified, the default size is 16 and the load factor is 0.75 */ public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; //All other fields are default } /** * Construct a new HashMap, * Its mapping is the same as the specified Map. * HashMap It is created using the default load factor (0.75) and an initial capacity sufficient to save the Map in the specified Map. * * @param m The map whose map you want to place in its map * @throws NullPointerException if the specified map is null */ public HashMap(Map<? extends K, ? extends V> m) { /*The loading factor is 0.75 by default*/ this.loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m, false); }

# 3. put method of HashMap:

- If the table is empty, call the resize() method for the first capacity expansion, that is, initialize the HashMap. Allocate initialization capacity for HashMap.
- There are no nodes in the bucket. Create a new node, node < K, V >
- There are the same nodes p.hash == hash and (k = p.key) == key, which indicates that a hash conflict has occurred, and the new node is the same as the old node. Update old nodes.
- Zipper method: loop through the linked list, find the address corresponding to the node, and judge whether to update the data node. Or create a node to judge whether the length of the linked list is 8 (head node + other 7 nodes). Determine whether tree structure is required.
- Capacity expansion mechanism: if the length after adding elements is greater than the critical value, call the resize method

/** * Associates the specified value with the specified key in the mapping. If the mapping previously contained a mapping for the key, the old value is replaced. * * @param key Specifies the key with which the value will be associated * @param value The value associated with the specified key * @return the previous value associated with <tt>key</tt>, or * <tt>null</tt> if there was no mapping for <tt>key</tt>. * (A <tt>null</tt> return can also indicate that the map * previously associated <tt>null</tt> with <tt>key</tt>.) */ public V put(K key, V value) { return putVal(hash(key), key, value, false, true); } /** * Implements Map.put and related methods. * * @param hash hash for key key hash value of * @param key the key Index key * @param value the value to put value * @param onlyIfAbsent If true, do not change the existing value * @param evict If false, the table is in create mode. * @return If the original value exists, return the previous value; null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { /*Bucket node array*/ Node<K,V>[] tab; /*New node bucket*/ Node<K,V> p; int n, i; //Initialize if the table of the storage element is empty if ((tab = table) == null || (n = tab.length) == 0) { //The initialization length here is 16: resize() expansion. n = (tab = resize()).length; } // (n - 1) & hash: & divide hash method to perform hash calculation. According to the hash value, the node is empty, and a new data node is initialized if ((p = tab[i = (n - 1) & hash]) == null) { //Initialize data node tab[i] = newNode(hash, key, value, null); } //Calculate and find the p node according to the hash value else { //New node Node<K,V> e; // Indexes K k; //p. Hash = = hash: the hash value of the P node is equal to the hash of the new data, and the (k = p.key) = = key index is the same //Or the key is not empty and equal. In short, e and p have the same hash and the same key. Directly use e to overwrite the original p node if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) { // e node = = p node (same address) e = p; } /*Indicates that the hash values are the same, but the key s are not the same*/ // If it is a tree node, insert it. Red black tree else if (p instanceof TreeNode) { e = ((TreeNode<K,V>) p).putTreeVal(this, tab, hash, key, value); } // If it is not a tree structure, it belongs to a chain structure. Create a new chain node else { //The length of nodes in the statistical chain, greater than 8, is transformed into a tree for (int binCount = 0; ; ++binCount) { // e = p.next indicates the next node to which the P node points. Each time, the next node is assigned to e node, which is equivalent to traversing the node if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); // The first node is - 1. Plus the head node, the length of the linked list needs to be less than the threshold value of 8. When there are more than 8 nodes, the chain structure will be transformed into a tree structure if (binCount >= TREEIFY_THRESHOLD - 1) { //Chain to tree treeifyBin(tab, hash); } break; } //In the process of node traversal, if the hash value is the same and the key value is the same, exit the loop directly and assign the value to the found node directly if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) { break; } //Update the node and point to the next node every time p = e; } } // Mapping of existing keys. If there is a mapping relationship, replace the original value if (e != null) { V oldValue = e.value; // Judge whether overwrite is allowed and whether value is empty if (!onlyIfAbsent || oldValue == null) { e.value = value; } // Callback to allow LinkedHashMap post operation afterNodeAccess(e); return oldValue; } } ++modCount; //After the size of the hash table has checked the capacity expansion threshold, perform the capacity expansion operation if (++size > threshold) { resize(); } afterNodeInsertion(evict); return null; }

# Core mechanism: 1. Resize

/** * Initialize or increase the table size. * If it is blank, it is allocated according to the initial capacity target maintained in the field threshold. * Otherwise, because we use a power of 2, the elements in each bin must maintain the same index or be offset by a power of 2 in the new table. * * The first method: initialize HashMap using the default construction method. From the above, we can know that HashMap will return an empty table at the beginning of initialization, and thershold is 0. Therefore, the capacity of the first expansion is default_ INITIAL_ Capability is 16. At the same time, threshold = DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR = 12. * The second method is to initialize HashMap by specifying the construction method of initial capacity. From the following source code, we can see that the initial capacity will be equal to threshold, and then threshold = current capacity (threshold) * DEFAULT_LOAD_FACTOR. * Third: HashMap is not the first expansion. If the HashMap has been expanded, the capacity and threshold of each table will be twice as large as the original. * @return the table */ final Node<K,V>[] resize() { //Save the current table to oldTable Node<K,V>[] oldTab = table; //Length of old table int oldCap = (oldTab == null) ? 0 : oldTab.length; //Threshold of old table int oldThr = threshold; int newCap, newThr = 0; //1. The old table has been initialized if (oldCap > 0) { //If the old capacity is greater than the maximum capacity, to reach the maximum capacity if (oldCap >= MAXIMUM_CAPACITY) { //The threshold is equal to the maximum value of Int type 2 ^ (30) - 1 threshold = Integer.MAX_VALUE; //Unable to expand, return to old table return oldTab; } //1. Expand the capacity of the old value (use the only left digit (old capacity multiplied by 2)) //2. If the capacity after capacity expansion is less than the maximum capacity and the old capacity value is greater than or less than the default capacity (16), double the old threshold (these two conditions must be met) else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) { //The new threshold is twice the old threshold newThr = oldThr << 1; } } // Initial capacity set to threshold //If initialization has not occurred and initialCapacity is specified through the constructor during use, the size of the table is threshold, that is, an integer power greater than the minimum 2 of the specified initialCapacity (which can be obtained through the constructor) else if (oldThr > 0) { newCap = oldThr; } else { //If initialization has not been experienced and initialCapacity is not specified through the constructor, the default value is given (the array size is 16 and the load factor is 0.75) newCap = DEFAULT_INITIAL_CAPACITY; //threshold = loadFactor * capacity newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } //After the above method (capacity expansion or initialization) is completed, the capacity operation is completed, but the threshold value is not specified (initialCapacity is specified during normal capacity expansion or initialization), and the threshold value (final capacity * loading factor) is calculated if (newThr == 0) { float ft = (float)newCap * loadFactor; //If the last calculated threshold is less than the maximum capacity and the last determined capacity is less than the maximum capacity, the calculated threshold can be used. If either of the above two conditions is not met, the threshold is the Integer maximum newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; //Initialize a new table and redefine an array according to the new capacity //Assign the newly created array to the HashMap member variable table = newTab; //The previous table has data if (oldTab != null) { //Because HashMap is array + linked list or array + red black tree (find the corresponding linked list or red black tree according to the array subscript), traverse the original array and find the corresponding linked list or red black tree for operation for (int j = 0; j < oldCap; ++j) { //Temporary node Node<K,V> e; //Array [j] has data for subsequent operations //Assign the head node (root node) of the linked list or red black tree represented under array [j] to e if ((e = oldTab[j]) != null) { //Dispose of old array [j] as empty //I think the operation of these two steps is to move the data at the subscript of the original array from its original position, and then operate the linked list oldTab[j] = null; if (e.next == null) { //If there is only one head node (root node), the corresponding position in the new table can be calculated and inserted directly according to the calculation method of E. hash & (newcap - 1) //It is consistent with the operation in the put method newTab[e.hash & (newCap - 1)] = e; } else if (e instanceof TreeNode) { //If it is a red black tree node ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); } else { //First define five variables (Tail means Tail), so we can understand it this way //loHead lo head loTail lo tail //hiHead hi head hiTail hi tail // The low order above refers to 0 to oldCap-1 of the new array, and the high order specifies oldCap to newCap - 1 Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; // The length of the array must be the nth power of 2 (for example, 16). If the hash value and the length are combined, the effective binary bits of the hash value that can participate in the calculation are the last few bits equivalent to the length binary. If the result is 0, it means that the highest bit of the binary bit of the hash value participating in the calculation must be 0 //Because the binary effective highest bit of the array length is 1 (for example, the binary corresponding to 16 is 10000), only when *.. 0 * * * * and 10000 are combined, the result is 00000 (*. Represents multiple binary bits of uncertainty). In addition, because the modulo operation when positioning the subscript is the sum operation based on the hash value and length minus 1, the subscript = (*. 0 * * * * & 1111) is also = (*. 0 * * * * & 11111). 1111 is a binary of 15 and 11111 is a two-level system of 16 * 2-1, that is, 31 (double capacity expansion). // Therefore, if the hash value is touched with the length of the new array, the mod value will not change. That is, the position of the element in the new array is the same as that in the old array, so the element can be placed in the low-order linked list. if ((e.hash & oldCap) == 0) { //This part is very similar to the operations required to insert a node into the linked list (Figure 1 below shows the final state of the following code when there is only one data on the right, and the final state of multiple data on the left) if (loTail == null) { // If there is no tail, the linked list is empty loHead = e; // When the linked list is empty, the header node points to the element } else { loTail.next = e; // If there is a tail, the linked list is not empty. Hang the element to the end of the linked list. } loTail = e; // Set the tail node as the current element } // If the result of the and operation is not 0, the hash value is greater than the length of the old array (for example, the hash value is 17) // At this point, the element should be placed in the high position of the new array // For example: if the old array has a length of 16, the new array with a length of 32 and a hash of 17 should be placed at the 17th position of the array, that is, if the subscript is 16, then the subscript of 16 already belongs to the high order, the low order is [0-15] and the high order is [16-31] else { if (hiTail == null) { hiHead = e; } else { hiTail.next = e; } hiTail = e; } } while ((e = next) != null); // The linked list composed of low-order elements is still placed in the original position if (loTail != null) { loTail.next = null; newTab[j] = loHead; } // The position of the linked list composed of high-order elements is only offset by the length of the old array. if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }

The most important thing in resizing is the rehash operation of linked list and red black tree:

Usually, when we expand the capacity, we usually expand the length to twice the original. Therefore, the position of the element is either in the original position or moved to the power of 2 in the original position.

When expanding the HashMap, we only need to see whether the new bit of the original hash value is 1 or 0. If it is 0, the index does not change. If it is 1, the index becomes "original index + oldCap". Since the new 1 bit is 0 or 1, it can be considered random, so the resize process evenly disperses the previously conflicting nodes into new slots

# Core mechanism: 2: split tree

/** * Split the nodes in the number shape into higher and lower trees, or cancel the tree if the tree is now too small. Call only from resize; * That is, cut the fraction to avoid excessive number * @param map the map * @param tab the table for recording bin heads * @param index the index of the table being split * @param bit the bit of hash to split on */ final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) { TreeNode<K,V> b = this; // Relink to the lo and hi lists, keeping the order TreeNode<K,V> loHead = null, loTail = null; TreeNode<K,V> hiHead = null, hiTail = null; int lc = 0, hc = 0; //Loop through the tree. Because there is a double ended linked list relationship between TreeNode nodes, the linked list relationship can be used for rehash for (TreeNode<K,V> e = b, next; e != null; e = next) { next = (TreeNode<K,V>)e.next; e.next = null; if ((e.hash & bit) == 0) { if ((e.prev = loTail) == null) { loHead = e; } else { loTail.next = e; } loTail = e; ++lc; } else { if ((e.prev = hiTail) == null) { hiHead = e; } else { hiTail.next = e; } hiTail = e; ++hc; } } //After the rehash operation, pay attention to the untreeify or treeify operation according to the length of the linked list if (loHead != null) { if (lc <= UNTREEIFY_THRESHOLD) { tab[index] = loHead.untreeify(map); } else { tab[index] = loHead; //Otherwise it's already treelized if (hiHead != null) { loHead.treeify(tab); } } } if (hiHead != null) { if (hc <= UNTREEIFY_THRESHOLD) { tab[index + bit] = hiHead.untreeify(map); } else { tab[index + bit] = hiHead; if (loHead != null) { hiHead.treeify(tab); } } } }

Author: coffee rabbit

Link: https://juejin.cn/post/6952900921772900388