HashMap for Java source code analysis
1, HashMap source code analysis
1. Data structure of HashMap
-
jdk7 before: array + linked list
-
After jdk8: array + linked list + red black tree
public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable { static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; ... ... } transient Node<K,V>[] table; }
The above code is part of the code in HashMap in jdk
A static internal class named Node is defined in the HashMap class, which implements the Map.Entry interface. After instantiation, this internal class is an Entry implementation class object, and each
The Entry object has the node < K, V > next attribute pointing to the next element, which is the implementation of the so-called linked list (through address guidance)
The HashMap class also defines an array of Node < K, V > [] tables to store Node objects one by one
From here, we can see that the Entry objects stored in the HashMap are started one by one
Little knowledge
Serialization: when a class implements Serializable, it is equivalent to telling the JVM that the class can be serialized, that is, it can write out objects through the stream
Transient keyword: in a serializable class, the member modified by transient keyword means that the member does not participate in serialization, that is, it cannot be written out and saved, which is equivalent to a temporary data
Deserialization: read the serialized object into the program memory
The following is the specific source code of the Node internal class
static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<?,?> e = (Map.Entry<?,?>)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; } }
2. Construction method of HashMap
public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable { static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; static final int MAXIMUM_CAPACITY = 1 << 30; static final float DEFAULT_LOAD_FACTOR = 0.75f; transient int modCount; //Record the number of times the HashMap structure has been modified transient int size; //The HashMap contains the number of key value pairs int threshold; //The condition value for the next expansion. When the array content reaches this value, the expansion will be carried out }
The HashMap class has these three constant attributes, which are related to the performance of the hash table. They are
static final int DEFAULT_ INITIAL_ Capability = 1 < < 4: defines the default size of the node < K, V > [] table array
static final int MAXIMUM_ Capability = 1 < < 30: defines the maximum length of the node < K, V > [] table array
static final float DEFAULT_LOAD_FACTOR = 0.75f: defines the load factor of the hash table
1 < < 4 < < is a shift operation, which is a binary based operation
The binary 0000 0001 of 1 is the decimal 1
1 < < 4 00010000 is decimal 16
Loading factor: the loading factor is to judge when the array is expanded. When the number of elements in the array meets the original array length * loading factor, the array is expanded
-
Empty construct public HashMap()
public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted }
When an empty constructor is called, the field properties of the generated object use the default values
The default length of the array is 16bit and the loading factor is 0.75
-
Constructor containing array length and load factor public HashMap(int initialCapacity, float loadFactor)
public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); }
In this construction method, the length of an array and the array of loading factors are passed as parameters. The function is to judge whether the two values meet the conditions. If they meet the conditions, assign a value to the loadFactor attribute of the object, and use this. Threshold = tablesize for (initialCapacity); Assign a value to threshold; Otherwise, an error is reported
tableSizeFor this function returns a minimum power of 2 greater than the specified value
Use this.threshold = tableSizeFor(initialCapacity) to assign a value of??? To threshold???
Shouldn't the value of threshold be this.threshold = tableSizeFor(initialCapacity)*loadFactor?
The constructor of HashMap does not initialize node < K, V > [] table, but puts the initialization process in the put method
-
Constructor containing array length public HashMap(int initialCapacity)
public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); }
This is to call the public HashMap(int initialCapacity, float loadFactor) construct and set the load factor as the default value
3. Stored value of HashMap (put method)
public V put(K key, V value) { return putVal(hash(key), key, value, false, true); }
When calling the put method, the put method calculates the hash value of key, and then calls the putVal method to return the value of Value.
-
1. The putval method will first judge whether the array needs to be expanded
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length;
If the array is empty or the length is zero, call the resize method to expand the array, and then initialize the array
-
2. Then calculate the index of the element and judge whether the element has been stored in this position on the array
if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null);
If the location is empty, it is stored directly in the location
(n-1) & hash is the operation of finding the index of the hash value in the array
N -- > array length
For example, the array length is 16 and the hash value is a random value
n 0001 0000 16
n-1 0000 1111 15
hash 1011 0011 random
&Results after 0000 0011 & 3
(n-1) & hash is equivalent to taking out the last few bits of a random value
-
3. If there are already elements in the current index of the array, the object stored in the current index of the array is called p
-
Judge whether the key value of this element is the same as that of p
else { Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p;
If it is the same, take out the p object; otherwise
-
Determine whether p is a tree node
else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
If p is a tree node, call the putTreeVal method
-
Traverse the nodes under this index of the array
-
If there is an element with the same key value as the element to be saved
If it exists, record this element on e for subsequent judgment
-
If it does not exist, the new element is inserted at the end of the linked list
else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } }
-
-
If the element extracted after executing the above code is not empty, it indicates that there are elements with the same key value as the object to be saved
if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; }
Replace the old value value with the new value value and return the old value
-
Finally, it indicates that the insertion is successful. Add one to the number of elements of the array to judge whether expansion is required
++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }
-
This is the whole process of the put method. In addition, an afterNodeAccess(e) method is provided at the end of the put method. This is an empty method without any implementation. It allows us to perform some operations after inserting elements
4. Value of HashMap (get method)
The get method returns a value. The source code is as follows
public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; }
The get method finds the corresponding key value pair object by calling the getNode method. The following is the source code of the getNode method
final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; }
5. HashMaps array expansion (resize method)
Because in the HashMap construction method, the array is not initialized, but the resize method is called in the put method to uniformly process the array
final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0;
First, some variables are defined in the resize method to represent some information of the array
-
If the array has been initialized, there are two cases
-
The length of the array is greater than the set maximum capacity
At this time, set the capacity expansion condition threshold to the maximum value of an integer, so that the judgment if (+ + size > threshold) after inserting an element in the put method is always false (because size is an int type and will not be greater than Integer.MAX_VALUE), so that the array will not be expanded again
-
The array length is between the default and maximum values
At this time, the array will be expanded normally, and the array length and expansion condition threshold will be doubled
if (oldCap > 0) { if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold }
-
-
If the array is uninitialized and the condition threshold is greater than zero
When will this happen?
When we call the non empty construction method of HashMap, the array is not initialized in the construction method, but the expansion threshold threshold value is calculated. In this way, the array length is zero and the expansion condition threshold exists
else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr;
In this case, the capacity expansion threshold is directly initialized as the length of the array
-
If the array is uninitialized and the condition threshold does not exist
Nonexistence means that no value has been assigned. In this case, the default value of threshold is 0
This is the most common case. Generally, the empty construct of HashMap is called
else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); }
In this case, both the array length and the condition threshold can be used as default values
-
if (newThr == 0)? Why this judgment?
Execute else if (oldthr > 0) newcap = oldthr; After this sentence, that is, when calling the parameterized construction of HashMap, there is no assignment for newThr, so newThr is the default value of 0
if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); }
This judgment is to assign a value to newThr
Then update the expanded information to the object
threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab;
-
After capacity expansion, transfer the original data to the new array
Don't explain this code first. I'm sorry
2, Reasons for unsafe HashMap threads
Before Java 7, HashMap had thread safety problems such as dead loop, data loss and data coverage; After Java 8, the thread safety problem of HashMap is data coverage
-
Multithreading problem 1 -- > data coverage
Premise: when multiple threads save put values, the index positions of multiple threads to be saved are the same, and the storage conditions are met (no key values are the same)
Thread 1 suspends after it judges that the storage conditions are met ------ > thread 2 starts and saves the value at the position where thread 1 wants to save -- > thread 1 continues to run and saves the value at the corresponding position
At this time, the value stored in thread 2 is overwritten
In addition, in the code of if (+ + size > threshold), + size is obviously a thread unsafe operation
-
Multithreading problem 2 -- > dead loop
The following is the source code of data migration after capacity expansion of HashMap array before jdk7
//The method of data migration is to add elements by header interpolation void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; //The code in the for loop traverses the linked list one by one, recalculates the index position, and copies the old array data to the new array (the array does not store the actual data, so it is only a copy reference) // Just) for (Entry<K,V> e : table) { while(null != e) { Entry<K,V> next = e.next; if (rehash) { e.hash = null == e.key ? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); //Point the next chain of the current entry to the new index location. The newTable[i] may be empty or it may also be an entry chain. If it is an entry chain, it is directly in the chain // The header is inserted. //The following three lines are the key to thread insecurity e.next = newTable[i]; newTable[i] = e; e = next; } } }
The code that will cause thread safety problems is mainly e.next = newTable[i]; newTable[i] = e; These two lines
.
-
Multithreading problem 2 -- > data loss
The above is only personal understanding. If there is anything wrong, you are welcome to point out