Briefly talk about HashMap

catalogue

Briefly talk about HashMap

preface

HashMap is implemented based on the Map interface of hash table. It exists in the form of key value storage, that is, it is mainly used to store key value pairs. The implementation of HashMap is not synchronous, which means that it is not thread safe. Its key and value can be null. In addition, the mappings in HashMap are not ordered

hash algorithm

hash algorithm, also known as Digest algorithm, is a function that converts given data into irregular values of fixed length

The data length of the output value is fixed (under the same hash function)
The same input must get the same output value
The difference between the output values obtained from similar data may be a little large
It is possible to get consistent output values by inputting different data (hash collision, also known as hash collision)

In Java, hashCode() is defined in the Object class of JDK, which means that any class in java contains the hashCode() function

//The hashCode method of the Object class returns the 32-bit jvm memory address of the Object public native int hashCode();

Use the native keyword to indicate that this method is a native function, that is, this method is implemented in C/C + + language, compiled into a DLL and called by java

The String class overrides the hashCode() method

public int hashCode() { int h = hash; //Default to 0 if (h == 0 && value.length > 0) { //private final char value[]; char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; } return h; }

Since it is a hash algorithm, collision cannot be completely avoided

Because the probability of collision is related to the security of the algorithm, a secure hash algorithm must meet

Low collision probability
The original value cannot be calculated from the hash value

Common hash algorithms

algorithm Output length (bytes) Output length (bits) MD5 16 bytes 128 bits SHA-1 20 bytes 160 bits RipeMD-160 20 bytes 160 bits SHA-256 32 bytes 256 bits SHA-512 64 bytes 512 bits

According to the collision probability, the longer the output length of the hash algorithm is, the harder it is to produce collision and the safer it is

Data structure of hashmap

Before jdk1.8: array + linked list

After jdk1.8: array + linked list + red black tree

Differences between jdk1.7 and jdk1.8

Simple explanation process

We should all know the general process. In short, hashcode combines the array length to obtain the array index. Empty arrays are put in. If there are elements, the hash value is compared. If the hash value is the same, hash collision occurs. Use equals to compare whether the key is consistent. If it is consistent, the corresponding value is obtained (put is overwritten). If it is inconsistent, the comparison continues. When it reaches the tail node, the chain is supplemented. The chain length is too long, which affects the efficiency, The default linked list length is greater than 8 and the array length is greater than 64

Why should red black trees be introduced after 1.8?

No matter how good the hash algorithm is, it can not avoid hash collision. When there are too many collisions and accessing the linked list, there is less data in the early stage and the efficiency is OK

However, the continuous collision in the later stage causes the linked list to be too long, the reading needs to be traversed, and the time consumption is O(n)

After the introduction of red black tree, the search time O(logn) improves the efficiency

When will the capacity be expanded?

/** * The load factor used when none specified in constructor. */ static final float DEFAULT_LOAD_FACTOR = 0.75f; //Capacity expansion condition 1: table is null or tab length is 0 if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; //Expansion condition 2: whether the size is greater than the boundary value if (++size > threshold) resize();

The first condition has nothing to say. If it's empty, expand it

The second condition requires more attention

size indicates the actual number of k-v in hashmap, not the length of the array
Boundary value threshold = capacity default_ INITIAL_ Capability * load factor DEFAULT_LOAD_FACTOR

Therefore, the threshold point for capacity expansion is 16 * 0.75 = 12

Expansion source code

hashmap capacity expansion is doubled. Newthr = oldthr < < 1// Double threshold

cap capacity gives the old boundary value newCap = oldThr;

If the old boundary value is 0, the initial value is given

newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

High frequency set expansion information

Collection class Initial capacity Load factor Capacity expansion increment ArrayList 10 1 0.5x Vector 10 1 1 times HashSet 16 0.75 1 times HashMap 16 0.75 1 times

When will it turn red and black?

/** * The value must be greater than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon shrinkage. */ static final int TREEIFY_THRESHOLD = 8; /** * The smallest table capacity for which bins may be treeified. * (Otherwise the table is resized if too many nodes in a bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. */ static final int MIN_TREEIFY_CAPACITY = 64;

Linked list length threshold 8, array length size 64

The linked list threshold must be at least 2 and at least 8 to meet the requirement that the red black tree can be turned into an ordinary linked list when it shrinks

A linked list threshold with an array length of at least 4 times to avoid conflicts

if (binCount >= TREEIFY_THRESHOLD - 1) //First judge whether the linked list threshold is reached treeifyBin(tab, hash); if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)//Then judge whether the array threshold is reached resize();

How is the hash function implemented in HashMap? What are the implementation methods of hash functions?

A: for the hashCode of the key, perform the hash operation, move the unsigned right 16 bits, and then perform the XOR operation. There are also square middle method, pseudo-random number method and remainder method. These three kinds of efficiency are relatively low. The operation efficiency of unsigned right shift 16 bit XOR is the highest.

What happens when the hashcodes of two objects are equal?

A: hash collision will occur. If the content of the key value is the same, replace the old value. Otherwise, connect to the back of the linked list. If the length of the linked list exceeds the threshold 8, it will be converted to red black tree storage.

What is hash collision and how to solve it?

A: as long as the hash code values calculated by the key s of the two elements are the same, hash collision will occur. jdk8 previously used linked lists to resolve hash collisions. jdk8 then uses linked list + red black tree to solve hash collision.

If the hashcodes of two keys are the same, how to store key value pairs?

A: compare whether the contents are the same through equals. Same: the new value overwrites the previous value. Different: the new key value pair is added to the hash table.

Members of the HashMap collection class

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable

Clonable means clonable and Serializable means Serializable

Both interfaces are a declarative interface

public interface Serializable { } public interface Cloneable { }

hashmap inherits abstractmap < K, V > and implements map < K, V >, but abstractmap < K, V > also implements map < K, V >, which may be a little bug in the design

1. Serial version number

private static final long serialVersionUID = 362498820763181265L;

Meaning, meaning and usage scenario of serialization

Serialization: writes objects to the IO stream
Deserialization: recovering objects from IO streams
Meaning: the serialization mechanism allows the serialized Java objects to be converted into bit byte sequences. These byte sequences can be saved on disk or transmitted through the network to restore to the original objects in the future. The serialization mechanism allows objects to exist independently of the program

2. Initial capacity of the set

/** * The default initial capacity - MUST be a power of two. */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

Default initial capacity - must be a power of 2. Why must it be a power of 2?

As we all know, in order to access hashmap efficiently, hash collision should be minimized, that is, the data should be evenly distributed, the array space should be occupied, and the length of the linked list should be close

The algorithm of hashmap is modular hash%length, and the efficiency of computer direct remainder is not as good as bit operation

Therefore, hash & (length-1) is optimized in the source code, and the premise of hash% length = hash & (length-1) is that length is a power of 2

&: bitwise and operation. The result is 1 if and only if both are 1

Capacity expansion is very performance consuming, and the default capacity is not necessarily applicable, so it is correct to specify an appropriate size for creation

What if the capacity I give is not 2 to the nth power?

A 2n number larger than the user-defined capacity will be given, for example, 16 at 14 and 32 at 20

/** * Returns a power of two size for the given target capacity. */ static final int tableSizeFor(int cap) { int n = cap - 1; n |= n >>> 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }

3. Load factor (load factor)

static final float DEFAULT_LOAD_FACTOR = 0.75f;

The above has been made very clear and will not be repeated here

4. Maximum aggregate capacity

static final int MAXIMUM_CAPACITY = 1 << 30;

The maximum capacity is 230, shifted 30 bits left

5. Treeing threshold (boundary value of red black tree)

static final int TREEIFY_THRESHOLD = 8; static final int MIN_TREEIFY_CAPACITY = 64;

When the length of the linked list is greater than 8, it will turn into a red black tree

(in fact, the array length should be greater than 64, as mentioned above)

final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); ...... }

The explanation in the source code is: to 8 to red black tree to 6 to linked list

The space occupied by the red black tree is much larger than that of the linked list, so it is better to start with the linked list - > trade-off between space and time

Why is the threshold 8 for converting to red black tree different from the threshold 6 for converting to linked list in order to avoid frequent back and forth conversion

The average search length of the red black tree is log(n). If the length is 8, the average search length is log(8)=3, and the average search length of the linked list is n/2. When the length is 8, the average search length is 8 / 2 = 4, which is necessary to convert into a tree; If the length of the linked list is less than or equal to 6, 6 / 2 = 3, and log (6) = 2.6, although the speed is also very fast, the time to convert into tree structure and generate tree will not be too short.

6. Cancel digitization threshold (tree to linked list)

static final int UNTREEIFY_THRESHOLD = 6;

To 8 into red black tree to 6 into linked list

The value must be greater than 2 and should be at least 8 to mesh with assumptions in tree removal about conversion back to plain bins upon shrinkage.

The value must be greater than 2 and should be at least 8 to comply with the assumption in tree removal that it is converted back to normal bin when shrinking.

We can see that the contraction is turned back to the normal bin, which proves that hashmap will shrink from the red black tree to a linked list

And when they become too small (due to removal or resizing) they are converted back to plain bins.
When they become too small (due to removal or resizing), they are converted back to the bucket.

Construction method of HashMap

1. Default

public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted }

The most used values are the default values, which may not be the best choice

2. Specified capacity

public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); }

Because capacity expansion is very performance consuming, we can customize the capacity

You can see that specifying the capacity is to pass the user-defined capacity and the default load factor to the following method ↓

3. Specify capacity and load factor

public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); }

this.threshold = tableSizeFor(initialCapacity);

The specified capacity must be 2n. As mentioned above, is the capacity I customized not to the nth power of 2? I won't repeat it here

4. Specify the collection constructor

public HashMap(Map<? extends K, ? extends V> m) { this.loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m, false); } final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) { int s = m.size(); if (s > 0) { if (table == null) { // pre-size float ft = ((float)s / loadFactor) + 1.0F; int t = ((ft < (float)MAXIMUM_CAPACITY) ? (int)ft : MAXIMUM_CAPACITY); if (t > threshold) threshold = tableSizeFor(t); } else if (s > threshold) resize(); for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) { K key = e.getKey(); V value = e.getValue(); putVal(hash(key), key, value, false, evict); } } }

Why float ft = ((float)s / loadFactor) + 1.0F; What about adding this 1.0F in the?

Bottom line: in order to get more capacity, reduce expansion and improve efficiency

The result of s/loadFactor is decimal. Adding 1.0F is equivalent to (int)ft. therefore, the decimal is rounded up to ensure greater capacity as much as possible. Greater capacity can reduce the number of calls to resize

So + 1.0F is to get more capacity

For example, if the number of elements in the original set is 6, then 6 / 0.75 is 8, which is the n-power of 2, then the size of the new array is 8. Then, the data of the original array will be stored in a new array with a length of 8. This will lead to insufficient capacity and continued capacity expansion when storing elements, so the performance will be reduced

If + 1 is used, the length of the array will directly change to 16, which can reduce the expansion of the array

Member method of HashMap

1. Add input

The general steps are as follows

First, calculate the bucket (array index) to which the key is mapped through the hash value
Compare the hash. If there is no collision on the bucket, insert it directly
If a collision occurs, you need to deal with the conflict
- If the bucket uses a red black tree to handle conflicts, the method of the red black tree is called to insert data
- Otherwise, the traditional chain method is used to insert. If the length of the chain reaches the critical value, the chain is transformed into a red black tree
Compare the key. If there is a duplicate key in the bucket, replace the key with the new value value
If the size is greater than the threshold, expand the capacity

//HashMap public V put(K key, V value) { //Calculate the hash value of the key and pass it to the puval method return putVal(hash(key), key, value, false, true); } static final int hash(Object key) { int h; //You can see that when the key is null, the hash value is 0 //The hashCode() of the key is shifted to the right by 16 bits without sign, and then XOR ^ - > the original value to get the hash value return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } //As mentioned earlier, hash and array length are bitwise combined to get the index //(tab. Length-1) & the result of hash is consistent with that of hash%tab.length (when length is the power of 2) n = tab.length; tab[i = (n - 1) & hash]

Interestingly, hashmap returns an index of 0 when the key here is null

However, hashtable does not have an empty operation, so the key of hashtable cannot be null

//HashTable put() if (value == null) { throw new NullPointerException(); } int hash = key.hashCode();

2. putVal is added to the core

/** * Implements Map.put and related methods. * * @param hash hash for key * @param key the key * @param value the value to put * @param onlyIfAbsent If true, do not change the existing value * @param evict If false, the table is in create mode * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict)

Creation of array table

Is the hashmap array created with the new HashMap?

It is not, but created with the first putVal empty judgment (the capacity expansion method is called)

if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length;

Algorithm of array index

n = tab.length tab[i = (n - 1) & hash]

As mentioned above, copy it again here

The algorithm of hashmap is modular hash%length, and the efficiency of computer direct remainder is not as good as bit operation

Therefore, hash & (length-1) is optimized in the source code, and the premise of hash% length = hash & (length-1) is that length is a power of 2

3. Transform red black tree treeifyBin

Replace all linked nodes in the bin at the given hash index unless the table is too small, in which case resize instead

//In the putVal method, cycle the linked list, and convert it when count > = linked list threshold - 1 for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; }

Before explaining the transformation method, let's talk about what are 2-3-4 trees and red black trees

Here's one Data structure learning website

What is a 2-3-4 tree?

2-3-4 tree is a fourth-order B tree. It belongs to a multi-channel lookup tree. Its structure has the following limitations:
All leaf nodes have the same depth
Nodes can only be 2-nodes, 3-nodes and 4-nodes

2-node: a node with 1 element and 2 child nodes
3-node: a node with 2 elements and 3 child nodes
4-node: a node with 3 elements and 4 child nodes

All nodes must contain at least 1 element

The elements always maintain the sorting order and the nature of binary search tree as a whole, that is, the parent node is greater than the left child node and less than the right child node

Moreover, when a node has multiple elements, each element must be greater than the element on its left and the element in its left subtree

The query operation of 2-3-4 tree is like an ordinary binary search tree, which is very simple. However, due to the uncertain number of node elements, it is not convenient to implement in some programming languages. Its equivalent - red black tree is generally used

2-3-4 relationship between red and black trees

2 node - Black node
3 nodes - tilt left / right
4 nodes - move up in the middle

What is red black tree?

The red black tree [RBT] is a self balancing binary lookup tree. Each node follows the following principles

Each node is either black or red
The root node is black
Each leaf node is black
- A node without child nodes (i.e. degree 0) in a tree is called a leaf node
The two child nodes of each red node must be black
- The red node in the figure has no child nodes because the null black node is hidden by default
The path from any node to each leaf node contains an equal number of black nodes

What does red and black tree rely on for self balance? Left hand, right hand, discoloration

Sinistral: the son left to the father right, and the father and son exchange

Take a node as a fulcrum (rotation node), and its right child node becomes the parent node of the rotation node,

The left child of the right child becomes the right child of the rotation node, and the left child remains unchanged.

Right rotation: the son right gives the father left, and the father and son exchange

Take a node as a fulcrum (rotation node), and its left child node becomes the parent node of the rotation node,

The right child of the left child becomes the left child of the rotation node, and the right child remains unchanged.

Discoloration

The node color changes from black edge to red or from red to black

Analysis of red black tree for inserting nodes

For an ordinary binary search tree, it is small on the left and large on the right

The same is true for the insertion of red and black trees, but balance processing (rotation and color change) may be required after insertion

Adjustment of red black tree balance

2-3-4 add an element to node 2 of the tree and merge it directly into node 3
- Red black tree: add a red node, which will be added under the black node: no adjustment is required
2-3-4 three new elements in three nodes of the tree are merged into one four node
- Red black tree: there will be 6 situations, and two (left, middle and right) do not need to be adjusted
- Rotate the root left and right once
- Rotate the root right and left twice
2-3-4 add an element in node 4 of the tree, upgrade the intermediate elements in node 4 to the parent node, and merge the new elements with the remaining nodes
- Red black tree: new nodes are red + grandfather nodes are black, parent nodes and uncle nodes are red
- Adjust to Grandpa node turns red, father and uncle nodes turn black, and if Grandpa node is root node, adjust to black

Analysis of red black tree for deleting nodes

Let's first look at an ordinary binary tree

Precursor node: a binary tree is traversed in middle order. The order after traversal. The previous node of the current node is the precursor node of the node

Successor node: a binary tree is traversed in middle order. The next node of the current node is the successor node of the node

For example, a complete binary tree (1,2,3,4,5,6,7) is traversed in the middle order as follows: (4,2,5,1,6,3,7)

That is, the precursor node of node 1 is: 5, and the successor node is: 6

There are three cases of deletion

Delete leaf node directly
If the deleted node has a child node, replace it with a child node
If you want to delete the right two child nodes of the node, you need to find a predecessor node or a successor node to replace the case that can be converted to 1 and 2

Conversion method treeifyBin

//Pass in the array tab and hash values to determine the subscript final void treeifyBin(Node<K,V>[] tab, int hash) { //n is the length of the array, index is the subscript, and e is the node taken from the array according to the hash value int n, index; Node<K,V> e; //Judge whether the array has capacity and whether it is greater than the array threshold of 64. If not, expand the capacity if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); //If the node in the bucket is not empty, the red black tree node is cyclically converted else if ((e = tab[index = (n - 1) & hash]) != null) { //hd chain header node, tl tail node TreeNode<K,V> hd = null, tl = null; do { //Convert a normal node to a tree node TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) != null); //After the linked list is processed into a treeNode, it is placed back into the bucket if ((tab[index] = hd) != null) //Balancing the linked list of red and black tree nodes is the final way to become a red and black tree hd.treeify(tab); } }

Convert node to tree node

TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) { return new TreeNode<>(p.hash, p.key, p.value, next); }

Create red black tree

final void treeify(Node<K,V>[] tab) { TreeNode<K,V> root = null; // Defines the root node of the tree for (TreeNode<K,V> x = this, next; x != null; x = next) { // Traverse the linked list. x points to the current node and next points to the next node next = (TreeNode<K,V>)x.next; // Next node x.left = x.right = null; // Set the left and right nodes of the current node to be empty if (root == null) { // If there is no root node x.parent = null; // The parent node of the current node is set to null x.red = false; // Set the red attribute of the current node to false (set the current node to black) root = x; // The root node points to the current node } else { // If the root node already exists K k = x.key; // Gets the key of the current linked list node int h = x.hash; // Get the hash value of the current linked list node Class<?> kc = null; // Define the Class to which the key belongs for (TreeNode<K,V> p = root;;) { // Start traversal from the root node. This traversal does not set a boundary and can only jump out from the inside // GOTO1 int dir, ph; // dir identifies the direction (left and right), ph identifies the hash value of the current tree node K pk = p.key; // key of current tree node if ((ph = p.hash) > h) // If the hash value of the current tree node is greater than the hash value of the current linked list node dir = -1; // The node identifying the current linked list will be placed on the left side of the current tree node else if (ph < h) dir = 1; // right /* * If the hash values of the key s of the two nodes are equal, they must be compared in other ways * If the key of the current linked list node implements the comparable interface, and the current tree node and the linked list node are instances of the same Class, then compare the two through the comparable method. * If it is still equal, compare it again through tiebreakhorder */ else if ((kc == null && (kc = comparableClassFor(k)) == null) || (dir = compareComparables(kc, k, pk)) == 0) dir = tieBreakOrder(k, pk); TreeNode<K,V> xp = p; // Save current tree node /* * If dir is less than or equal to 0: the current linked list node must be placed on the left side of the current tree node, but it may not be the left child of the tree node, or it may be the right child of the left child or a deeper node. * If dir is greater than 0: the current linked list node must be placed on the right side of the current tree node, but it may not be the right child of the tree node, or it may be the left child of the right child or a deeper node. * If the current tree node is not a leaf node, the left child or right child of the current tree node will eventually be used as the starting node, and then start from GOTO1 to find its own position (the current linked list node) * If the current tree node is a leaf node, you can mount the current linked list node to the left or right of the current tree node according to the value of dir. * After mounting, you need to rebalance the tree. After balancing, you can process the next linked list node. */ if ((p = (dir <= 0) ? p.left : p.right) == null) { x.parent = xp; // The current linked list node is the child node of the current tree node if (dir <= 0) xp.left = x; // As a left child else xp.right = x; // As a right child root = balanceInsertion(root, x); // Rebalance break; } } } } // After traversing all the linked list nodes, the finally constructed tree may undergo multiple balancing operations. It is uncertain which node of the linked list the root node is currently // Because we need to search based on the tree, we should make sure that the root of the object obtained by tab[N] is the node object. At present, it is only the first node object of the linked list, so we need to do corresponding processing. //Set the root node of the red black tree as the first element of its array slot //First of all, it is clear that TreeNode is not only a red black tree structure, but also a double linked list structure //What this method does is to ensure that the root node of the tree must also become the first node of the linked list moveRootToFront(tab, root); }

4. Expand and resize

The capacity expansion mechanism is described above and will not be repeated here\

Capacity doubled

final Node<K,V>[] resize() { //Get the element list before capacity expansion Node<K,V>[] oldTab = table; //Get previous list length int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { //If the original capacity is greater than the maximum capacity, set the threshold to the maximum value of int if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } //Otherwise, the new capacity will be expanded to the original cup of oldcap < < 1, and the left displacement is 1, which is equivalent to oldCap * 2 else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; //If you follow the else below, it means that it is the first time to extend the put element else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; //Capacity expansion = capacity expansion factor * capacity, default_ INITIAL_ Capability is 16 by default, default_ LOAD_ The default value of factor is 0.75 newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); }

Recalculate and convert the old to the new

@SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; //So far, if there is no element in the first expansion, the following code will not run //If there are original elements, put the original elements into the newly expanded array if (oldTab != null) { //Loop through the original list and place the old array in the new array for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { oldTab[j] = null; //If there is only one element in the list, directly recalculate the subscript between the hash of e's key and the new capacity if (e.next == null) newTab[e.hash & (newCap - 1)] = e; //If it is a red black tree, copy the red black tree else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { /* This one deals with the original HashMap conflict because the capacity is expanded The key conflicting with the hash calculated by & may change its position with the new capacity For example, the hash es of the two previously conflicting key s (k1,k2) are 110111101 respectively The default initialization capacity is 16 binary, which is 10000, and the binary of 16-1 is 1111. The operation result with key is 1101, but one with 11111 (36-1) is 1101 and the other is 11101 Therefore, it is necessary to recalculate the subscript. Through the & operation with oldCap, it is divided into low order and high order, because there are only two results of and oldCap & operation, one is 0 and the other is oldCap Set the result = 0 to the low order, and save the result equal to oldCap to the high order, The low hash value must be less than oldCap, so the subscript is still in the original position The high-order hash must be greater than oldCap, and there will be at least one more 1 on the leftmost binary (there may be other binary digits in front, but the operation results are the same). The calculation is exactly the original position plus the new capacity */ //Low bit chain header and tail Node<K,V> loHead = null, loTail = null; //Head and tail of high-level chain Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { //Gets the next element of the current node next = e.next; if ((e.hash & oldCap) == 0) { //If loTail==null, it means that it is the first element. Assign this element to loHead if (loTail == null) loHead = e; else //If it's not the first element, add it after it loTail.next = e; loTail = e; } //Process of high node else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); if (loTail != null) { loTail.next = null; //Point the original subscript to the low linked list newTab[j] = loHead; } if (hiTail != null) { hiTail.next = null; //The new subscript calculated by subscript j + oldCap is exactly the original location index plus the new capacity newTab[j + oldCap] = hiHead; } } } } } return newTab; }

Optimization of JDK8 to JDK7 in capacity expansion method

In jdk7, all elements are recalculated according to the index algorithm, which is inefficient

It can be seen that in 1.7, with the help of the transfer() method (removed in jdk1.8), the threshold and array length will be recalculated and rehash will be performed during capacity expansion, which is inefficient

//jdk1.7 void resize(int newCapacity) { Entry[] oldTable = table; int oldCapacity = oldTable.length; //Judge whether the maximum expansion value is exceeded. If the maximum value is reached, the expansion operation will not be carried out if (oldCapacity == MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return; } Entry[] newTable = new Entry[newCapacity]; // The transfer() method puts the values from the original array into the new array transfer(newTable, initHashSeedAsNeeded(newCapacity)); //Set hashmap as a new array reference after capacity expansion table = newTable; //Set a new threshold for hashmap expansion threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); } void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; for (Entry<K,V> e : table) { while(null != e) { Entry<K,V> next = e.next; if (rehash) { e.hash = null == e.key ? 0 : hash(e.key); } //Calculate the storage position in the current array through the hash value of the key value and the size of the new array int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; } } }

In jdk8, when expanding the HashMap, you do not need to recalculate the hash. You just need to see whether the new bit of the original hash value is 1 or 0

e. Hash & oldcap performs bit operation to judge the result

If it is 0, the index remains unchanged: original position newTab[j] = loHead;
If it is 1, the index becomes: original index + oldcap (original location + old capacity) newTab[j + oldCap] = hiHead;

It is precisely because of this ingenious rehash method that the time to recalculate the hash value is saved, and at the same time, it can be recognized whether the new 1bit is 0 or 1

In order to be random, in the process of resizing, it is ensured that the number of nodes on each bucket after rehash must be less than or equal to the number of nodes on the original bucket

There will be no more serious hash conflicts after rehash, and the previously conflicting nodes are evenly distributed into new buckets.

//jdk1.8 if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } //Position after capacity expansion = original position if (loTail != null) { loTail.next = null; newTab[j] = loHead; } //Location after capacity expansion = original location + old capacity if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; }

From the above source code, it can be analyzed that capacity expansion can be divided into three cases:

The first is the initialization phase. At this time, both newCap and newthreshold are set to 0. It can be seen from the previous source code that the default threshold is threshold = default during the first capacity expansion_ INITIAL_ CAPACITY * DEFAULT_LOAD_FACTOR = 12. DEFAULT_ INITIAL_ Capability is 16, default_ LOAD_ The factor is 0.75

The second is that if the oldCap is greater than 0, that is, the capacity and threshold of the table will be expanded twice each time

Third, if the initial capacity of threshold is specified, newCap will be equal to this threshold and the new threshold will be recalculated

5. Delete remove

Like put, a method is provided to the user, which actually calls removeNode

Pass the hash value of the key to the selected location. Pass the key to select a specific element for deletion

Judge whether the deleted node exists. If it exists, delete and return the value of the node. Otherwise, return null

public V remove(Object key) { Node<K,V> e; return (e = removeNode(hash(key), key, null, false, true)) == null ? null : e.value; }

6. Delete removeNode from the core

/** * The method is final and cannot be overridden. Subclasses can add their own processing logic by implementing the afterNodeRemoval method (described in the analysis) * * @param hash key The hash value of, which is obtained through hash(key) * @param key The key of the key value pair to delete * @param value The value of the key value pair to be deleted. Whether the value is used as a condition for deletion depends on whether matchValue is true * @param matchValue If it is true, it will be deleted only when the value equals(value) of the key value pair corresponding to the key is true; Otherwise, you don't care about the value of value * @param movable Whether to move the node after deletion. If false, the node will not be moved * @return Returns the deleted node object. If no node is deleted, null is returned */ final Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable) { Node<K,V>[] tab; Node<K,V> p; int n, index; // Declare node array, current node, array length, index value /* * If the node array tab is not empty, the array length n is greater than 0, and the node object p located according to the hash (the node is the root node of the tree or the first node of the linked list) is not empty * You need to traverse down from the node p to find the node object matching the key */ if ((tab = table) != null && (n = tab.length) > 0 && (p = tab[index = (n - 1) & hash]) != null) { Node<K,V> node = null, e; K k; V v; // Define the node object to be returned, and declare a temporary node variable, key variable and value variable // If the key and key of the current node are equal, the current node is the node to be deleted and assigned to node if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) node = p; /* * This step indicates that the first node does not match, so check whether there is a next node * If there is no next node, it means that there is no hash collision at the location of the node. There is only one node and it has not been matched, so there is no need to delete it, and finally null is returned * If there is a next node, it indicates that there is a hash collision at the position of the array. At this time, there may be a linked list or a red black tree */ else if ((e = p.next) != null) { // If the current node is of TreeNode type, it indicates that it is already a red black tree, then call the getTreeNode method to find the nodes that meet the conditions from the tree structure if (p instanceof TreeNode) node = ((TreeNode<K,V>)p).getTreeNode(hash, key); // If it is not a tree node, it is a linked list. You only need to compare nodes one by one from beginning to end else { do { // If the key of node e is equal to the key, node e is the node to be deleted. Assign it to the node variable and call out the loop if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) { node = e; break; } // Come here, it means that e doesn't match p = e; // Point the current node p to E. this step is to make p always store the parent node of E in the next cycle. If e matches next time, P is the parent node of node } while ((e = e.next) != null); // If e there is a next node, continue to match the next node. Until it matches a node, it jumps out or traverses all nodes in the linked list } } /* * If the node is not empty, the node to be deleted is matched according to the key * If you do not need to compare the value value, or you need to compare the value value, but the value value is also equal * Then you can delete the node */ if (node != null && (!matchValue || (v = node.value) == value || (value != null && value.equals(v)))) { if (node instanceof TreeNode) // If the node is a TreeNode object, it indicates that the node exists in the red black tree structure and calls the removeTreeNode method (which is resolved separately) to remove the node. ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable); else if (node == p) // If the node is not a TreeNode object, node == p means that the node node is the first node tab[index] = node.next; // Since the first node is deleted, you can directly point the corresponding position of the node array to the second node else // If the node node is not the first node, then p is the parent node of the node. To delete the node, you only need to point the next node of P to the next node of the node to delete the node from the linked list p.next = node.next; ++modCount; // The number of modifications of HashMap increases --size; // The number of elements in HashMap decreases afterNodeRemoval(node); // Call the afterNodeRemoval method, which has no implementation logic for HashMap. The purpose is to allow subclasses to override themselves as needed return node; } } return null; }

7. get

Like put and remove, it provides an external method, which is actually the called getNode method

If there is a value, null is returned if there is no value

The hash value and key of the key are passed to getNode

public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; }

8. Get getNode from the core

First, the subscript first = tab [(n - 1) & hash] is obtained by hash bit operation

Judge whether the first element of the bucket is what you want (k = first.key) == key or key= null && key.equals(k)

That is, compare whether the key s are consistent, and return if they are consistent

Otherwise, judge whether it is a red black tree first. If it is, use getTreeNode to obtain and return it

if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key);

If it is not a red black tree, the linked list will be cycled and the key will be compared and judged as before

do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null);

The source code is as follows

final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; }

Five ways to traverse hashmap

1. Loop through the entry in the entrySet of the map

The entrySet() method of map can return set < map. Entry < K, V > > entrySet();

Loop through set to get map. Entry < K, V >

Map. Entry < K, V > contains entry.getKey() and entry.getValue() to obtain key and value

Map<Integer, Integer> map = new HashMap<Integer, Integer>(); map.put(1, 10); map.put(2, 20); Set<Map.Entry<Integer, Integer>> entries = map.entrySet(); for (Map.Entry<Integer, Integer> entry : entries) { System.out.println("key ="+entry.getKey()+" value ="+entry.getValue()); }

2. Key value pair method of foreach iteration

map.keySet() and map.values() contain all the key s and value s, which can be traversed directly

Map<Integer, Integer> map = new HashMap<Integer, Integer>(); map.put(1, 10); map.put(2, 20); // Iterative key for (Integer key : map.keySet()) { System.out.println("Key = " + key); } // Iterative value for (Integer value : map.values()) { System.out.println("Value = " + value); }

3. Iterator of entrySet with generic map

The set returned by map.entrySet() can use the iterator() method to get the iterator

entries.next() gets entries. Just getKey and getValue are the same as the first method

Map<Integer, Integer> map = new HashMap<Integer, Integer>(); map.put(1, 10); map.put(2, 20); Iterator<Map.Entry<Integer, Integer>> entries = map.entrySet().iterator(); while (entries.hasNext()) { Map.Entry<Integer, Integer> entry = entries.next(); System.out.println("Key = " + entry.getKey() + ", Value = " + entry.getValue()); }

4. Iterator of entrySet without generic map

It is similar to the third method, except that there is no generics and the value needs to be strongly converted

Map map = new HashMap(); map.put(1, 10); map.put(2, 20); Iterator<Map.Entry> entries = map.entrySet().iterator(); while (entries.hasNext()) { Map.Entry entry = (Map.Entry) entries.next(); Integer key = (Integer) entry.getKey(); Integer value = (Integer) entry.getValue(); System.out.println("Key = " + key + ", Value = " + value); }

5. Traverse through Java8 Lambda expression

Map<Integer, Integer> map = new HashMap<Integer, Integer>(); map.put(1, 10); map.put(2, 20); map.forEach((k, v) -> System.out.println("key: " + k + " value:" + v));