HashMap collection of common interview questions summary and source code analysis

HashMap collection of common interview questions summary and source code analysis

1. Introduction to HashMap collection

HashMap is implemented based on the Map interface of hash table. It exists in the form of key value storage, that is, it is mainly used to store key value pairs. The implementation of HashMap is not synchronous, which means that it is not thread safe. Its key and value can be null. In addition, the mappings in HashMap are not ordered.

Before JDK1.8, HashMap was composed of array + linked list. Array is the main body of HashMap, and linked list is mainly used to solve hash conflict * * (the hash code values calculated by the hashCode method called by two objects are the same, resulting in the same array index values) * * (zipper method to solve conflict). After JDK1.8, great changes have been made in solving hash conflict, When the length of the linked list is greater than the threshold (or the boundary value of the red black tree, which is 8 by default) and the length of the current array is greater than 64, all data at this index position will be stored in the red black tree instead.

Add: it will be judged before converting the linked list into a red black tree. Even if the threshold is greater than 8, but the array length is less than 64, the linked list will not become a red black tree at this time. Instead, you choose to expand the array.

The purpose of this is to avoid the red black tree structure as much as possible because the array is relatively small. In this case, changing to the red black tree structure will reduce the efficiency, because the red black tree needs left-hand rotation, right-hand rotation and color change to maintain balance. At the same time, when the array length is less than 64, the search time is relatively faster. To sum up, in order to improve performance and reduce search time, the linked list is converted to a red black tree when the threshold is greater than 8 and the array length is greater than 64. For details, please refer to treeifyBin method.

Of course, although the red black tree is added as the underlying data structure, the structure becomes complex, but when the threshold is greater than 8 and the array length is greater than 64, the efficiency becomes more efficient when the linked list is converted to red black tree.

Summary:

characteristic:

1. Disordered access

2. Both key and value positions can be null, but the key position can only be null

3. The key position is unique, and the underlying data structure controls the key position

4. The data structure before jdk1.8 is: linked list + array. After jdk1.8 is: linked list + array + red black tree

5. Only when the threshold (boundary value) > 8 and the array length is greater than 64 can the linked list be converted into a red black tree. The purpose of changing into a red black tree is to query efficiently.

2. Underlying data structure of HashMap set

2.1 data structure concept

Data structure is a way for computers to store and organize data. A data structure is a collection of that have one or more specific relationships with each other. In general, carefully selected data structures can bring higher operation or storage efficiency. Data structure is often related to efficient retrieval algorithm and index technology. 

Data structure: it is a way to store data.

Before JDK1.8, HashMap was composed of array + linked list data structure.

After JDK1.8, HashMap is composed of array + linked list + red black tree data structure.

2.2 the process of storing data in the underlying data structure of HashMap

The stored procedure is as follows:

Code used:

public class Demo01 {
    public static void main(String[] args) {
        HashMap<String, Integer> map = new HashMap<>();
        map.put("Lau Andy", 53);
        map.put("Liuyan", 35);
        map.put("Xue You Zhang", 55);
        map.put("Guo Fucheng", 52);
        map.put("dawn", 51);
        map.put("Lin Qingxia", 55);
        map.put("Lau Andy", 50);
    }
}

2.3 common HashMap interview questions

1. How is the hash function implemented in HashMap? What are the implementation methods of hash functions?

about key of hashCode do hash Operation, unsigned right shift 16 bits, and then XOR operation.
There are also square middle method, pseudo-random number method and remainder method. These three kinds of efficiency are relatively low. The operation efficiency of unsigned right shift 16 bit XOR is the highest. As for how the bottom layer is calculated, let's explain it to you when we look at the source code.

2. What happens when the hashcodes of two objects are equal?

Hash collision will occur if key Replace the old value if the content of the value is the same value.Otherwise, if it is connected to the back of the linked list and the length of the linked list exceeds the threshold 8, it will be converted to red black tree storage.

3. When and what is hash collision, and how to solve hash collision?

Just two elements key Hash collision occurs when the calculated hash code value is the same. jdk8 Use linked list to solve hash collision before. jdk8 Then use the linked list+The red black tree resolves hash collisions.

4. If the hashcode s of two keys are the same, how to store key value pairs?

hashcode Same, by equals Compare whether the contents are the same.
Same: New value Overwrite previous value
 Different: the new key value pair is added to the hash table

In the process of constantly adding data, the problem of capacity expansion will be involved. When the critical value is exceeded (and the location to be stored is not empty), the capacity will be expanded. Default capacity expansion method: expand the capacity to twice the original capacity and copy the original data.

According to the above description, when there are many elements in a linked list, that is, there are many elements with equal hash values but unequal contents, the efficiency of sequential search through key values is low. In JDK1.8, the hash table storage is realized by array + linked list + red black tree. When the length (threshold) of the linked list exceeds 8 and the length of the current array > 64, the linked list is converted into a red black tree, which greatly reduces the search time. jdk8 introduces the red black tree into the hash table only to improve the search efficiency.

In short, the hash table is implemented by array + linked list + red black tree (JDK1.8 adds the red black tree part). As shown in the figure below.

But in this case, the problem comes. The disadvantage of traditional hashMap is why red black tree is introduced in 1.8? If this structure is not more troublesome, why replace the red black tree with a threshold greater than 8?

The implementation of HashMap before JDK 1.8 is array + linked list. Even if the hash function is better, it is difficult to achieve 100% uniform distribution of elements. When a large number of elements in the HashMap are stored in the same bucket, there is a long linked list under the bucket. At this time, HashMap is equivalent to a single linked list. If the single linked list has n elements, the traversal time complexity is O(n), which completely loses its advantage. To solve this problem, JDK 1.8 introduces a red black tree (the search time complexity is O(logn)) to optimize this problem. When the length of the linked list is very small, the traversal speed is very fast, but when the length of the linked list continues to grow, it will certainly have a certain impact on the query performance, so it needs to be transformed into a tree.

As for why the threshold is 8, I think it should be the most reliable way to find the answer in the source code. Next, we will introduce it when analyzing the source code.

2.4 summary:

Above, we have roughly described the method of storing data at the bottom of HashMap. In order to facilitate your better understanding, let's further explain it with a storage flow chart: (jdk8 stored procedure)

explain:

1.size indicates the real-time quantity of K-V in HashMap. Note that this is not equal to the length of the array.

2. Threshold = capacity * LoadFactor. This value is the maximum currently occupied array length. If the size exceeds this critical value, resize again. The HashMap capacity after expansion is twice the previous capacity.

3.HashMap inheritance relationship

The inheritance relationship of HashMap is shown in the following figure:

explain:

  • Cloneable is an empty interface, indicating that it can be cloned. Create and return a copy of the HashMap object.
  • Serializable serialization interface. It is a marked interface. HashMap objects can be serialized and deserialized.
  • The AbstractMap parent class provides the Map implementation interface. To minimize the effort required to implement this interface.

Add: through the above inheritance relationship, we find a strange phenomenon that HashMap has inherited AbstractMap and AbstractMap class implements the Map interface. Why is HashMap implementing the Map interface? This structure is also found in ArrayList and LinkedList.

according to java Founder of the collection framework Josh Bloch Description, such writing is a mistake. stay java In the collection framework, there are many ways to write like this, which is written at the beginning java When he set the framework, he thought it might be valuable in some places until he realized that he was wrong. apparent, JDK Later, the maintainer did not think that this small mistake was worth revising, so it existed.

4. Members of HashMap collection class

4.1 member variables

1. Serial version number

private static final long serialVersionUID = 362498820763181265L;

2. Initialization capacity of the set (must be the n-th power of two)

//The default initial capacity is 16 -- 1 < < 4, which is equivalent to the fourth power of 1 * 2 -- 1 * 16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;   

Question: why must it be the n-th power of 2? What happens if the input value is not a power of 2, such as 10?

The HashMap construction method can also specify the initialization capacity of the collection:

HashMap(int initialCapacity) Construct a belt with a specified initial capacity and default load factor (0.75) Empty HashMap. 

According to the above explanation, when adding an element to the HashMap, we need to determine its specific position in the array according to the hash value of the key. In order to access efficiently, HashMap should minimize collisions, that is, it should distribute the data evenly as much as possible, and the length of each linked list is roughly the same. This implementation is based on the algorithm of which linked list to store the data.

This algorithm is actually modular, hash%length. The efficiency of direct redundancy in the computer is not as high as that of displacement operation (which has been explained above). Therefore, the source code is optimized to use hash & (length-1). In fact, the premise that hash%length is equal to hash & (length-1) is that length is the nth power of 2.

Why can this be evenly distributed to reduce collisions? The n-th power of 2 is actually n zeros after 1, and the n-th power-1 of 2 is actually n ones;

give an example:

Description: bitwise and operation: when the same binary digit is 1, the result is 1, otherwise it is zero.

For example, when the length is 8, 3&(8-1)=3  2&(8-1)=2 ,No collision at different positions;
For example, length length When it is 8, 8 is the third power of 2. Binary: 1000
length-1 Binary operation:
	1000
-	   1
---------------------
     111
 As follows:
hash&(length-1)
3   &(8    - 1)=3  
	00000011  3 hash
&   00000111  7 length-1
---------------------
	00000011----->3 Array subscript
	
hash&(length-1)
2 &  (8 -    1) = 2  
	00000010  2 hash
&   00000111  7 length-1
---------------------
	00000010----->2  Array subscript
 Note: the above calculation results are at different positions without collision;
For example, when the length is 9, 3&(9-1)=0  2&(9-1)=0 ,All on 0, collided;
For example, length length For 9, 9 is not 2 n Power. Binary: 00001001
length-1 Binary operation:
	1001
-	   1
---------------------
    1000
 As follows:
hash&(length-1)
3   &(9    - 1)=0  
	00000011  3 hash
&   00001000  8 length-1 
---------------------
	00000000----->0  Array subscript
	
hash&(length-1)
2 &  (9 -    1) = 2  
	00000010 2 hash
&   00001000 8 length-1 
---------------------
	00000000----->0  Array subscript
 Note: the above calculation results are above 0, and the collision occurs;

Note: of course, if the efficiency is not considered, the remainder can be obtained directly (there is no need to require that the length must be the nth power of 2)

Summary:

1. As can be seen from the above, when we determine the position of the key in the array according to the hash, if n is the power of 2, we can ensure the uniform insertion of data. If n is not the power of 2, we may never insert data in some positions of the array, wasting the space of the array and increasing the hash conflict.

2. On the other hand, we may want to determine the position by% remainder, which is OK, but the performance is not as good as & operation. And when n is the power of 2: hash & (length - 1) = = hash% length

3. Therefore, the reason why the hashmap capacity is power-2 is to evenly distribute the data and reduce hash conflicts. After all, the greater the hash conflict, the greater the length of a chain in the representative array, which will reduce the performance of hashmap

4. When creating a HashMap object, if the length of the input array is 10, not the power of 2, the HashMap must obtain the power of 2 through one-pass displacement operation and or operation, and it is the nearest number to that number.

The source code is as follows:

//Create the object of the HashMap collection, and specify that the array length is 10, not a power of 2
HashMap hashMap = new HashMap(10);
public HashMap(int initialCapacity) {//initialCapacity=10
   this(initialCapacity, DEFAULT_LOAD_FACTOR);
 }
public HashMap(int initialCapacity, float loadFactor) {//initialCapacity=10
     if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);//initialCapacity=10
}
  /**
   * Returns a power of two size for the given target capacity.
  */
    static final int tableSizeFor(int cap) {//int cap = 10
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

explain:

It can be seen that when instantiating a HashMap instance, if initialCapacity (assumed to be 10) is given, since the capacity of the HashMap must be a power of 2, this method is used to find the power of the smallest 2 greater than or equal to initialCapacity (assumed to be 10) (if initialCapacity is a power of 2, this number is returned).
This algorithm is analyzed below:
1) First, why do you want to subtract 1 from cap. int n = cap - 1;
This is to prevent that cap is already a power of 2. If the cap is already a power of 2 and the minus 1 operation is not performed, the returned capacity will be twice that of the cap after performing the following unsigned right shift operations. If you don't understand, read the following unsigned right and then come back.
Let's take a look at these unsigned right shift operations:
2) If n is 0 at this time (after cap-1), it is still 0 after several unsigned right shifts, and the last returned capacity is 1 (there is an n+1 operation at last).
Only the case where n is not equal to 0 is discussed here.

3) Note: | (bitwise OR operation): operation rule: when the same binary digit is 0, the result is 0, otherwise it is 1.

First shift right:

int n = cap - 1;//cap=10  n=9
n |= n >>> 1;
	00000000 00000000 00000000 00001001 //9
|	
	00000000 00000000 00000000 00000100 //9 changes to 4 after moving right
-------------------------------------------------
	00000000 00000000 00000000 00001101 //Bitwise XOR is followed by 13

Since n is not equal to 0, there will always be a bit of 1 in the binary representation of N. at this time, the highest bit 1 is considered. By moving 1 bit to the right without sign, the highest bit 1 is moved to the right by 1 bit, and then do or operation, so that the right bit immediately adjacent to the highest bit 1 in the binary representation of n is also 1, such as:

00000000 00000000 00000000 00001101

Second shift right:

 n |= n >>> 2;//N becomes: n=13 through the first right shift
	00000000 00000000 00000000 00001101  // 13
|
    00000000 00000000 00000000 00000011  //13 changes to 3 after moving right
-------------------------------------------------
	00000000 00000000 00000000 00001111 //Bitwise XOR is followed by 15

Note that this n has passed n | = n > > > 1; Operation. Assuming that n is 00000000 00000000 00000000 00001101 at this time, n moves two unsigned right bits, which will move the two consecutive ones in the highest bit to the right two bits, and then do or operate with the original n, so that there will be four consecutive ones in the high bit of the binary representation of n. For example:

00000000 00000000 00000000 00001111 //Bitwise XOR is followed by 15

Third shift right:

n |= n >>> 4;//N becomes: n=15 through the first and second right shifts
	00000000 00000000 00000000 00001111  // 15
|
    00000000 00000000 00000000 00000000  //15 changes to 0 after moving right
-------------------------------------------------
	00000000 00000000 00000000 00001111 //Bitwise XOR is followed by 15

This time, move the four consecutive ones in the existing high order to the right by 4 bits, and then do or operation. In this way, there will normally be 8 consecutive ones in the high order of the binary representation of n. Such as 00001111 1111xxxxxx.
and so on
Note that the maximum capacity is a positive number of 32bit, so the last n | = n > > > 16, At most, there are 32 1s (but this is already a negative number). Before executing tableSizeFor, judge the initialCapacity. If it is greater than maximum _capability (2 ^ 30), take maximum _capability. If it is equal to maximum _capability (2 ^ 30) , the shift operation will be performed. Therefore, after the shift operation, the maximum number of 30 1s will not be greater than or equal to the maximum_capability. 30 1s plus 1 will get 2 ^ 30).
Please see a complete example below:

Note that the obtained capacity is assigned to threshold.

this.threshold = tableSizeFor(initialCapacity);//initialCapacity=10

3. The default load factor is 0.75

static final float DEFAULT_LOAD_FACTOR = 0.75f;

4. Maximum aggregate capacity

//The upper limit of the maximum capacity of the set is: the 30th power of 2
static final int MAXIMUM_CAPACITY = 1 << 30;

5. When the value of the linked list exceeds 8, it will be transferred to the red black tree (added in 1.8)

 //When the number of nodes on the bucket is greater than this value, it will turn into a red black tree
 static final int TREEIFY_THRESHOLD = 8;

Question: why does the number of nodes in the Map bucket exceed 8 before turning into a red black tree?

8 this threshold is defined in HashMap. For this member variable, the comments on the source code only indicate that 8 is the threshold for bin (bin is a bucket) to be converted from a linked list to a tree, but do not explain why it is 8:

There is a comment in HashMap that says: let's continue to look at:

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins.  In usages with well-distributed user hashCodes, tree bins are rarely used.  Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution
(http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5)*pow(0.5, k)/factorial(k)).
The first values are:
Because the size of tree nodes is about twice that of ordinary nodes, we use tree nodes only when the box contains enough nodes(See TREEIFY_THRESHOLD). When they get too small(Due to deletion or resizing)It will be converted back to an ordinary bucket. Using well distributed users hashcode Tree boxes are rarely used. Ideally, in the case of random hash code, the frequency of nodes in the box follows Poisson distribution
(http://en.wikipedia.org/wiki/Poisson_distribution), the default adjustment threshold is 0.75, and the average parameter is about 0.5, although the adjustment granularity varies greatly. Ignoring the variance, the expected number of occurrences of list size k is (exp(-0.5)*pow(0.5, k)/factorial(k)).
The first value is:

0:    0.60653066
1:    0.30326533
2:    0.07581633
3:    0.01263606
4:    0.00157952
5:    0.00015795
6:    0.00001316
7:    0.00000094
8:    0.00000006
more: less than 1 in ten million

TreeNodes occupy twice as much space as ordinary Nodes, so it will be converted to TreeNodes only when the bin contains enough Nodes, and whether there are enough Nodes is determined by treeify_ Determined by the value of threshold. When the number of Nodes in the bin becomes less, it will be converted to an ordinary bin. And when we check the source code, we find that when the length of the linked list reaches 8, it will be turned into a red black tree, and when the length drops to 6, it will be turned into an ordinary bin.

This explains why it is not converted to TreeNodes at the beginning, but requires a certain number of nodes to be converted to TreeNodes. In short, it is a trade-off between space and time.

This paragraph also says: when the hashCode has good discreteness, the probability of using tree bin is very small, because the data is evenly distributed in each bin, and the length of the linked list in almost no bin will reach the threshold. However, under the random hashCode, the discreteness may become worse. However, JDK can not prevent users from implementing this bad hash algorithm, so it may lead to uneven data distribution. However, ideally, under the random hashCode algorithm, the distribution frequency of all nodes in the bin will follow the Poisson distribution. We can see that the probability that the length of the linked list in a bin reaches 8 elements is 0.00000006, which is almost an impossible event. Therefore, the reason for choosing 8 is not determined casually, but determined according to probability and statistics. It can be seen that every change and optimization of Java that has been developed for nearly 30 years is very rigorous and scientific.

That is to say, 8 conforms to the Poisson distribution. When it exceeds 8, the probability is very small, so we choose 8.

Supplement:

1).

 Poisson distribution(Poisson distribution ),It is a kind of discrete in statistics and probability[probability distribution]. 
The probability function of Poisson distribution is:

[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-huVHHkba-1632881825787)(img/image-20191115161055901.png)]

 Parameters of Poisson distributionλIs the unit time(Or unit area)The average number of random events in the. Poisson distribution is suitable to describe the number of random events in unit time.

2) The following is the explanation I read on some materials when studying this problem: for your reference:

The average lookup length of a red black tree is log(n),If the length is 8, the average lookup length is log(8)=3,The average lookup length of the linked list is n/2,When the length is 8, the average lookup length is 8/2=4,This is the need to convert into a tree; If the length of the linked list is less than or equal to 6, 6/2=3,and log(6)=2.6,Although the speed is also very fast, the time to transform into tree structure and spanning tree will not be too short.

6. When the value of the linked list is less than 6, it will be transferred back from the red black tree to the linked list

 //When the number of nodes on the bucket is less than this value, the tree is converted to the linked list
 static final int UNTREEIFY_THRESHOLD = 6;

7. When the number in the Map exceeds this value, the bucket in the table can be tree shaped. Otherwise, if there are too many elements in the bucket, it will be expanded instead of tree shaped. In order to avoid the conflict between expansion and tree shaped selection, this value cannot be less than 4 * tree_ THRESHOLD (8)

//The structure in the bucket is converted to the value with the smallest array length corresponding to the red black tree 
static final int MIN_TREEIFY_CAPACITY = 64;

8. table is used to initialize (must be the n-th power of two) (key)

//An array of storage elements 
transient Node<K,V>[] table;

table in JDK1.8, we know that HashMap is a structure composed of array, linked list and red black tree. table is the array in HashMap. Before jdk8, the array type was entry < K, V >. From JDK1.8, there are node < K, V > types. Just changed the name and implemented the same interface: map. Entry < K, V >. Responsible for storing the key value to the data.

9. Used to store cache

//A collection of concrete elements
transient Set<Map.Entry<K,V>> entrySet;

10. Number of elements stored in HashMap (key)

//The number of elements to store. Note that this is not equal to the length of the array.
 transient int size;

size is the real-time quantity of K-V in HashMap, not the length of array table.

11. Used to record the modification times of HashMap

// Counters for each expansion and change of map structure
 transient int modCount;  

12. The value used to resize the next capacity is calculated as (capacity * load factor)

// Critical value when the actual size (capacity * load factor) exceeds the critical value, capacity expansion will be carried out
int threshold;

13. Load factor of hash table (emphasis)

// Loading factor
final float loadFactor;

explain:

1. The LoadFactor is used to measure the full degree of the HashMap, which indicates the density of the HashMap and affects the probability of hash operation to the same array position. The method to calculate the real-time loading factor of the HashMap is size/capacity instead of the number of buckets occupied, and the capacity is removed. Capacity is the number of buckets, that is, the length of the table.

Too large loadFactor leads to low efficiency in finding elements, too small leads to low utilization of arrays, and the stored data will be very scattered. The default value of loadFactor is 0.75f, which is a good critical value officially given.

When the elements contained in the HashMap have reached 75% of the length of the HashMap array, it means that the HashMap is too crowded and needs to be expanded. The process of capacity expansion involves rehash, copying data and other operations, which is very performance consuming., Therefore, the number of capacity expansion can be minimized in development, which can be avoided by specifying the initial capacity when creating HashMap collection objects.

At the same time, loadFactor can be customized in the constructor of HashMap.

Construction method:
HashMap(int initialCapacity, float loadFactor) Construct an empty with a specified initial capacity and load factor HashMap. 

2. Why is the loading factor set to 0.75 and the initialization threshold is 12?

The closer the loadFactor is to 1, the more data (entries) stored in the array will be, and the more dense it will be, that is, the length of the linked list will increase. The smaller the loadFactor is, that is, it will approach 0, and the less data (entries) stored in the array will be, and the more sparse it will be.

If you want to have as few linked lists as possible. To expand the capacity in advance, some array spaces may not store data all the time. The loading factor should be as small as possible.

give an example:

For example, the load factor is 0.4.  So 16*0.4--->6 If the array is expanded when it is full of 6 spaces, the utilization of the array will be too low.
	 The load factor is 0.9.  So 16*0.9---->14 Then this will lead to a little more linked lists. This leads to low efficiency in finding elements.

Therefore, both the array utilization and the linked list should not be too much. After a lot of tests, 0.75 is the best scheme.

  • Threshold calculation formula: capacity (array length is 16 by default) * LoadFactor (load factor is 0.75 by default). This value is the maximum currently occupied array length. When size > = threshold, you should consider resizing the array, that is, this means a standard to measure whether the array needs to be expanded. The capacity of HashMap after expansion is twice that before

4.2 construction method

Important construction methods in HashMap are as follows:

1. Construct an empty HashMap with default initial capacity (16) and default load factor (0.75).

public HashMap() {
   this.loadFactor = DEFAULT_LOAD_FACTOR; // Assigning a default load factor of 0.75 to loadFactor does not create an array
}

2. Construct a HashMap with the specified initial capacity and default load factor (0.75).

 // Specifies the constructor for capacity size
  public HashMap(int initialCapacity) {
      this(initialCapacity, DEFAULT_LOAD_FACTOR);
  }

3. Construct a HashMap with a specified initial capacity and load factor. Let's analyze it.

/*
	 Specifies the constructor for capacity size and load factor
	 initialCapacity: Specified capacity
	 loadFactor:Specified load factor
*/
public HashMap(int initialCapacity, float loadFactor) {
    	//Judge whether the initialization capacity is less than 0
        if (initialCapacity < 0)
            //If it is less than 0, an illegal parameter exception, IllegalArgumentException, is thrown
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
    	//Judge whether the initialization capacity initialCapacity is greater than the maximum capacity of the set_ Capability - the 30th power of "2"
        if (initialCapacity > MAXIMUM_CAPACITY)
            //If maximum is exceeded_ Capability, the maximum_ Capability is assigned to initialCapacity
            initialCapacity = MAXIMUM_CAPACITY;
    	//Judge whether the load factor loadFactor is less than or equal to 0 or whether it is a non numeric value
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            //If one of the above conditions is met, an illegal parameter exception IllegalArgumentException is thrown
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
     	//Assign the specified load factor to the load factor loadFactor of the HashMap member variable
        this.loadFactor = loadFactor;
    	/*
    		tableSizeFor(initialCapacity) Judge whether the specified initialization capacity is the N-power of 2. If not, it will become the ratio index 			 Determine the n-th power of the smallest 2 with large initialization capacity. This has been explained above.
    		Note, however, that the calculated data is returned to the call in the tableSizeFor method body, and the value is directly assigned to the threshold edge 			 The boundary value. Some people will think that this is a bug. It should be written as follows:
    		this.threshold = tableSizeFor(initialCapacity) * this.loadFactor;
    		Only in this way can it meet the meaning of threshold (the capacity will be expanded when the size of HashMap reaches the threshold).
			However, please note that in the construction methods after jdk8, the member variable table is not initialized, and the initialization of table is pushed 			  In the put method, the threshold will be recalculated in the put method. The specific implementation of the put method will be explained below
    	*/
        this.threshold = tableSizeFor(initialCapacity);
    }
Finally called. tableSizeFor,Let's look at the method implementation:
     /**
     * Returns a power of two size for the given target capacity.
       Returns the n-th power of the smallest 2 larger than the specified initialization capacity
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

explain:

For this.threshold = tableSizeFor(initialCapacity); Answers to questions:

tableSizeFor(initialCapacity) Judge whether the specified initialization capacity is 2 n Power, if not, it becomes a ratio			Set the minimum 2 with large initialization capacity n Power. This has been explained above.
But note that in tableSizeFor The method body returns the calculated data to the call here, and assigns the value directly to threshold edge			The boundary value. Some people will think this is a bug,It should be written as follows:
this.threshold = tableSizeFor(initialCapacity) * this.loadFactor;
That's how it fits threshold When HashMap of size arrive threshold This threshold will be expanded).
However, please note that in jdk8 In future construction methods, there is no table This member variable is initialized, table The initialization of is pushed			 I am late put Method, in put In the method threshold Recalculate, put The specific implementation of the method will be explained below

4. Constructor containing another "Map"

//Construct a new HashMap with the same mapping relationship as the specified Map.
public HashMap(Map<? extends K, ? extends V> m) {
    	//The load factor loadFactor changes to the default load factor of 0.75
         this.loadFactor = DEFAULT_LOAD_FACTOR;
         putMapEntries(m, false);
 }

Finally, we call putMapEntries to see how the method is implemented:

final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    //Gets the length of the parameter set
    int s = m.size();
    if (s > 0)
    {
        //Judge whether the length of the parameter set is greater than 0, indicating that it is greater than 0
        if (table == null)  // Determine whether the table has been initialized
        { // pre-size
                // Uninitialized, s is the actual number of elements of m
                float ft = ((float)s / loadFactor) + 1.0F;
                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                        (int)ft : MAXIMUM_CAPACITY);
                // If the calculated t is greater than the threshold, the threshold is initialized
                if (t > threshold)
                    threshold = tableSizeFor(t);
        }
        // It has been initialized and the number of m elements is greater than the threshold value. Capacity expansion is required
        else if (s > threshold)
            resize();
        // Add all elements in m to HashMap
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}

be careful:

float ft = ((float)s / loadFactor) + 1.0F; Why add 1.0F to this line of code?

The result of s/loadFactor is decimal. Adding 1.0F is equivalent to (int)ft. therefore, the decimal is rounded up to ensure greater capacity as much as possible. Greater capacity can reduce the number of calls to resize. So + 1.0F is to get more capacity.

For example, if the number of elements in the original set is 6, then 6 / 0.75 is 8, which is the n-power of 2, then the size of the new array is 8. Then, the data of the original array will be stored in a new array with a length of 8. This will lead to insufficient capacity and continuous capacity expansion when storing elements, so the performance will be reduced. If + 1, the array length will directly change to 16, which can reduce the capacity expansion of the array.

4.3 membership method

4.3.1 adding method

The put method is complex, and the implementation steps are as follows:

1) First, calculate the bucket to which the key is mapped through the hash value;

2) If there is no collision on the barrel, insert it directly;

3) If a collision occurs, you need to deal with the conflict:

a: if the bucket uses the red black tree to handle conflicts, call the method of the red black tree to insert data;

b: otherwise, the traditional chain method is adopted for insertion. If the length of the chain reaches the critical value, the chain is transformed into a red black tree;

4) If there is a duplicate key in the bucket, replace the key with the new value value;

5) If the size is greater than the threshold, expand the capacity;

The specific methods are as follows:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

explain:

1) HashMap only provides put for adding elements, and putVal method is only a method called for put method, which is not provided to users. So we focus on the putVal method.

2)We can see that putVal()Method key Performed here hash()method,Take a look Hash How the method is implemented. 
 static final int hash(Object key) 
 {
        int h;
     	/*
     		1)If key equals null:
     			You can see that when the key is equal to null, there is also a hash value, and the returned value is 0
     		2)If key is not equal to null:
     			Firstly, the hashCode of the key is calculated and assigned to h, and then the final hash code is obtained by bitwise XOR with the binary after H is unsigned and shifted 16 bits to the right 					 Hash value
     	*/
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
 }

It can be seen from the above that HashMap supports empty key, while HashTable directly uses the key to obtain HashCode, so an exception will be thrown if the key is empty.

{in fact, the above has explained why the length of HashMap should be a power of 2. Because the method used by HashMap is very clever. It obtains the save bit of the object through hash & (table. Length - 1). As mentioned earlier, the length of the underlying array of HashMap is always the nth power of 2, which is the speed optimization of HashMap. When length is always the nth power of 2, hash & (length-1) The operation is equivalent to modulo length, i.e. hash%length, but & is more efficient than%. For ex amp le, n% 32 = n & (32 - 1).}

Interpret the above hash method:

Let's first study how the hash value of the key is calculated. The hash value of key is calculated by the above method.

This hash method first calculates the hashCode of the key and assigns it to h, and then performs bitwise XOR with the binary after H unsigned right shift of 16 bits to obtain the final hash value. The calculation process is as follows:

 static final int hash(Object key) 
 {
        int h;
     	/*
     		1)If key equals null:
     			You can see that when the key is equal to null, there is also a hash value, and the returned value is 0
     		2)If key is not equal to null:
     			Firstly, the hashCode of the key is calculated and assigned to h, and then the final hash code is obtained by bitwise XOR with the binary after H is unsigned and shifted 16 bits to the right 					 Hash value
     	*/
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
 }

The hash value calculated by the above hash function is used in the putVal function:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        . . . . . . . . . . . . . . 
        if ((p = tab[i = (n - 1) & hash]) == null)//Here n represents the array length 16
       . . . . . . . . . . . . . . 
  }

The calculation process is as follows:

Description:

​ 1)key.hashCode(); Returns the hash value, that is, hashcode. Suppose you randomly generate a value.

2) n indicates that the length of array initialization is 16

3) & (bitwise and operation): operation rule: when the same binary digit is 1, the result is 1, otherwise it is zero.

4) ^ (bitwise XOR operation): operation rules: on the same binary digits, the numbers are the same, the result is 0, and the difference is 1.

In short:

  • The high 16 bits remain unchanged, and the low 16 bits and high 16 bits make an XOR (the obtained hashcode is converted into 32-bit binary, and the low 16 bits and high 16 bits of the first 16 bits and the last 16 bits make an XOR)

    Question: why do you do this?

    If n, that is, the length of the array is very small, assuming 16, then n-1 is - "1111". Such a value and hashCode() directly perform bitwise and operations. In fact, only the last four bits of the hash value are used. If the high-order change of the hash value is large and the low-order change is small, it is easy to cause hash conflict. Therefore, the high-order and low-order are used here to solve this problem.

    For example:
    hashCode()Value:     1111 1111 1111 1111 1111 0000 1110 1010
    				&
    n-1 I.e. 16-1-->15:   . . . . . . . . . . . . . . . . . . . . . . 1111
    -------------------------------------------------------------------
    				  0000 0000 0000 0000 0000 0000 0000 1010 ---->10 As index
     In fact, it will hashCode Value as an array index, if the next high bit hashCode Inconsistent. If the lower order is consistent, the calculated index will still be 10,This creates a hash conflict. Reduce performance.
    
  • (n-1) & hash = - > the subscript (n-1) n indicates the array length of 16, and N-1 is 15

  • The essence of remainder is to divide constantly and subtract the remaining numbers. The operation efficiency is lower than that of bit operation.

Now look at the putVal() method and see what it does.

Main parameters:

  • Hash value of hash key
  • Key original key
  • Value the value to store
  • onlyIfAbsent if true means that the existing value will not be changed
  • If evict is false, it means that the table is in creation status

The source code of putVal() method is as follows:

public V put(K key, V value) 
{
        return putVal(hash(key), key, value, false, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    /*
    	1)transient Node<K,V>[] table; Represents an array that stores elements in a Map collection.
    	2)(tab = table) == null It means to assign an empty table to tab, and then judge whether tab is equal to null. It must be the first time 			 null
    	3)(n = tab.length) == 0 Indicates that the length of the array 0 is assigned to N, and then judge whether n is equal to 0 and N is equal to 0
    	Since the if judges to use double or, if only one is satisfied, execute the code n = (tab = resize()).length; Perform array initialization.
    	And assign the initialized array length to n
    	4)After executing n = (tab = resize()).length, each space of array tab is null
    */
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    /*
    	1)i = (n - 1) & hash Indicates that the index of the calculated array is assigned to i, that is, to determine which bucket the elements are stored in
    	2)p = tab[i = (n - 1) & hash]Indicates that the data of the calculated position is obtained and assigned to node p
    	3) (p = tab[i = (n - 1) & hash]) == null Judge whether the node position is equal to null. If it is null, execute generation 			 Code: tab[i] = newNode(hash, key, value, null); Create a new node according to the key value pair and put it into the bucket at that position
        Summary: if there is no hash collision conflict in the current bucket, the key value pair is directly inserted into the spatial location
    */ 
    if ((p = tab[i = (n - 1) & hash]) == null)
        //Create a new node and store it in the bucket
        tab[i] = newNode(hash, key, value, null);
    else {
         // Execute else to explain that tab[i] is not equal to null, indicating that this location already has a value.
        Node<K,V> e; K k;
        /*
        	Compare whether the hash value and key of the first element in the bucket (node in the array) are equal
        	1)p.hash == hash : p.hash Indicates the hash value of the existing data. Hash indicates the hash value of the added data. Compare the two 				  Are hash values equal
                 Note: p represents tab[i], that is, the Node object returned by the newNode(hash, key, value, null) method.
                    Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) 
                    {
                        return new Node<>(hash, key, value, next);
                    }
                    In the Node class, there is a member variable hash, which is used to record the hash value of the previous data
             2)(k = p.key) == key : p.key Get the key of the original data, assign it to K key, and then compare the key of the added data 					 Are the address values of two keys equal
             3)key != null && key.equals(k): If it can be executed here, it means that the address values of the two keys are not equal. Judge first and then 				 Whether the added key is equal to null. If not, call the equals method to determine whether the contents of the two keys are equal
        */
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
                /*
                	Note: the hash values of the two elements are equal, and the value of the key is also equal
                	Assign the old element whole object to e and record it with E
                */ 
                e = p;
        // hash values are not equal or key s are not equal; Judge whether p is a red black tree node
        else if (p instanceof TreeNode)
            // Put it in the tree
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // Description is a linked list node
        else {
            /*
            	1)If it is a linked list, you need to traverse to the last node and insert it
            	2)Loop traversal is used to judge whether there are duplicate key s in the linked list
            */
            for (int binCount = 0; ; ++binCount) {
                /*
                	1)e = p.next Get the next element of p and assign it to e
                	2)(e = p.next) == null Judge whether p.next is equal to null, which indicates that P has no next element 					 If you have reached the end of the linked list and have not found a duplicate key, it means that the HashMap does not contain the key
                	Insert the key value pair into the linked list
                */
                if ((e = p.next) == null) {
                    /*
                    	1)Create a new node and insert it into the tail
                    	 p.next = newNode(hash, key, value, null);
                    	 Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) 
                    	 {
                                return new Node<>(hash, key, value, next);
                         }
                         Note that the fourth parameter next is null, because the current element is inserted at the end of the linked list, then the next node must be null 								 null
                         2)This addition method also meets the characteristics of linked list data structure, adding new elements backward each time
                    */
                    p.next = newNode(hash, key, value, null);
                    /*
                    	1)After adding nodes, judge whether the number of nodes is greater than tree_ Threshold 8, if greater than
                    	Then the linked list is converted into a red black tree
                    	2)int binCount = 0 : Represents the initialization value of the for loop. Count from 0. Records the number of nodes traversed 						 Number. A value of 0 indicates the first node and 1 indicates the second node.... 7 represents the eighth node, plus one in the array 						 The number of elements is 9
                    	TREEIFY_THRESHOLD - 1 -->8 - 1 --->7
                    	If the value of binCount is 7 (plus an element in the array, the number of elements is 9)
                    	TREEIFY_THRESHOLD - 1 It is also 7. At this time, the red black tree is converted
                    */
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        //Convert to red black tree
                        treeifyBin(tab, hash);
                    // Jump out of loop
                    break;
                }
                 
                /*
                	Execution here shows that e = p.next is not null and is not the last element. Continue to judge the key value and interpolation of nodes in the linked list 					   Whether the key values of the input elements are equal
                */
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    // Equal, jump out of loop
                    /*
                		If the key of the element to be added is equal to that of the existing element in the linked list, the for loop will jump out. No more comparisons
                		Directly execute the following if statement to replace if (E! = null) 
                	*/
                    break;
                /*
                	Indicates that the newly added element is not equal to the current node. Continue to find the next node.
                	Used to traverse the linked list in the bucket. Combined with the previous e = p.next, you can traverse the linked list
                */
                p = e;
            }
        }
        /*
        	Indicates that a node whose key value and hash value are equal to the inserted element is found in the bucket
        	In other words, duplicate keys are found through the above operation, so here is to change the value of the key into a new value and return the old value
        	This completes the modification function of the put method
        */
        if (e != null) { 
            // Record the value of e
            V oldValue = e.value;
            // onlyIfAbsent is false or the old value is null
            if (!onlyIfAbsent || oldValue == null)
                //Replace old value with new value
                //e.value represents the old value and value represents the new value 
                e.value = value;
            // Post access callback
            afterNodeAccess(e);
            // Return old value
            return oldValue;
        }
    }
    //Number of records modified
    ++modCount;
    // Judge whether the actual size is greater than the threshold. If it exceeds the threshold, expand the capacity
    if (++size > threshold)
        resize();
    // Post insert callback
    afterNodeInsertion(evict);
    return null;
} 

4.3.2 treeifyBin method for converting linked list into red black tree

After adding nodes, judge whether the number of nodes is greater than tree_ The threshold value of threshold is 8. If it is greater than 8, the linked list will be converted into a red black tree. The method of converting the red black tree treeifyBin is as follows:

if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
   //Convert to red black tree tab represents array name hash represents hash value
   treeifyBin(tab, hash);

The treeifyBin method is as follows:

  /**
   * Replaces all linked nodes in bin at index for given hash unless
   * table is too small, in which case resizes instead.
     Replaces all linked nodes in the bucket at the index of the specified hash table. Unless the table is too small, the size will be modified.
     Node<K,V>[] tab = tab Array name
     int hash = hash Represents the hash value
  */
    final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        /*
        	If the current array is empty or the length of the array is less than the threshold for tree formation (min_tree_capability = 64),
        	Just expand the capacity. Instead of turning the node into a red black tree.
        	Objective: if the array is very small, it is less efficient to convert the red black tree and traverse it. In this case, the capacity is expanded, and the hash value is recalculated
        	,The length of the linked list may become shorter, and the data will be put into the array, which is relatively more efficient.
        */
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            //Capacity expansion method
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            /*
            	1)The execution here shows that the length of the array in the hash table is greater than the threshold 64, and the tree is started
            	2)e = tab[index = (n - 1) & hash]Indicates that the elements in the array are taken out and assigned to e, where e refers to the location in the hash table 					 Set the linked list node in the bucket, starting from the first
            */
            //hd: head node of red black tree tl: tail node of red black tree
            TreeNode<K,V> hd = null, tl = null;
            do {
                //Create a new tree node whose content is consistent with the current linked list node e
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    //Assign the p node of the newly created key to the head node of the red black tree
                    hd = p;
                else {
                    /*
                    	 p.prev = tl: Assign the previous node p to the previous node of the current P
                    	 tl.next = p;Take the current node p as the next node of the tail node of the tree
                    */
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
                /*
                	e = e.next Assign the next node of the current node to e, if the next node is not equal to null
                	Then go back to the above and continue to take out the nodes in the linked list and convert them into red black trees
                */
            } while ((e = e.next) != null);
            /*
            	Let the first element in the bucket, that is, the element in the array, point to the node of the new red black tree. Later, the element in the bucket is the red black tree
            	Instead of a linked list data structure
            */
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

Summary: the above operations have done the following:

1. Determine whether to expand or tree according to the number of elements in the hash table

2. If it is a tree, traverse the elements in the bucket, create the same number of tree nodes, copy the content and establish a connection

3. Then let the first element in the bucket point to the newly created tree root node, and replace the linked list content of the bucket with tree content

4.3.3 capacity expansion method_ resize

4.3.3.1 capacity expansion mechanism

To understand the capacity expansion mechanism of HashMap, you need to have these two questions

  • 1. When is capacity expansion required
  • 2. What is the capacity expansion of HashMap

1. When is capacity expansion required

When the number of elements in the HashMap exceeds the array size (array length) * loadFactor, the array will be expanded. The default value of loadFactor (DEFAULT_LOAD_FACTOR) is 0.75, which is a compromise value. That is, by default, the array size is 16, so when the number of elements in the HashMap exceeds 16 × When 0.75 = 12 (this value is the threshold or boundary value threshold value), the size of the array is expanded to 2 × 16 = 32, that is, double the size, and then recalculate the position of each element in the array, which is a very performance consuming operation. Therefore, if we have predicted the number of elements in the HashMap, predicting the number of elements can effectively improve the performance of the HashMap.

Supplement:

When the number of objects in one of the linked lists in the HashMap reaches 8, if the array length does not reach 64, the HashMap will be expanded first. If it reaches 64, the linked list will become a red black tree and the Node type will change from Node to TreeNode. Of course, if the number of nodes in the tree is lower than 6 when the resize method is executed next time after the mapping relationship is removed, the tree will also be converted into a linked list.

2. What is the capacity expansion of HashMap

Capacity expansion will be accompanied by a re hash allocation, and all elements in the hash table will be traversed, which is very time-consuming. When writing programs, try to avoid resize.

The rehash method used by HashMap during capacity expansion is very ingenious, because each capacity expansion is doubled. Compared with the original calculated (n-1) & hash result, there is only one bit more, so the node is either in the original position or assigned to the position of "original position + old capacity".

How to understand? For example, when we expand from 16 to 32, the specific changes are as follows:

Therefore, after the element recalculates the hash, because n becomes twice, the marking range of n-1 is 1 bit more in the high order (red), so the new index will change like this:

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-5RNFHHNS-1632881825791)(img/image-20191117110934974.png)]

Note: 5 is the original index calculated assuming. This verifies the above description: after capacity expansion, the node is either in the original location or assigned to the location of "original location + old capacity".

Therefore, when we expand the HashMap, we do not need to recalculate the hash. We just need to see whether the new bit of the original hash value is 1 or 0. If it is 0, the index remains unchanged. If it is 1, the index becomes "original index + oldcap (original location + old capacity)". The following figure shows the resizing diagram from 16 to 32:

It is precisely because of this ingenious rehash method that not only saves the time to recalculate the hash value, but also because the newly added 1 bit is 0 or 1, it can be considered as random. In the process of resizing, it is ensured that the number of nodes in each bucket after rehash must be less than or equal to the number of nodes in the original bucket, so as to ensure that there will be no more serious hash conflict after rehash, The previously conflicting nodes are evenly dispersed into new buckets.

4.3.3.2 interpretation of source code resize method

The following is the specific implementation of the code:

final Node<K,V>[] resize() {
    //Get the current array
    Node<K,V>[] oldTab = table;
    //Returns 0 if the current array is equal to null length, otherwise returns the length of the current array
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    //The default value of the current threshold point is 12 (16 * 0.75)
    int oldThr = threshold;
    int newCap, newThr = 0;
    //If the length of the old array is greater than 0
    //Start calculating the size after capacity expansion
    if (oldCap > 0) {
        // If you exceed the maximum value, you won't expand any more, so you have to collide with you
        if (oldCap >= MAXIMUM_CAPACITY) {
            //Modify the threshold to the maximum value of int
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        /*
        	If the maximum value is not exceeded, it will be expanded to twice the original value
        	1)(newCap = oldCap << 1) < MAXIMUM_CAPACITY After 2x expansion, the capacity should be less than the maximum capacity
        	2)oldCap >= DEFAULT_INITIAL_CAPACITY The original array length is greater than or equal to the array initialization length 16
        */
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            //Double the threshold
            newThr = oldThr << 1; // double threshold
    }
    //Direct assignment when the old threshold point is greater than 0
    else if (oldThr > 0) // The old threshold is assigned to the new array length
        newCap = oldThr;
    else {// Use defaults directly
        newCap = DEFAULT_INITIAL_CAPACITY;//16
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // Calculate the new resize maximum upper limit
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    //The default value of the new threshold is 12 times 2 and then becomes 24
    threshold = newThr;
    //Create a new hash table
    @SuppressWarnings({"rawtypes","unchecked"})
    //newCap is the new array length -- 32
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    //Determine whether the old array is equal to null
    if (oldTab != null) {
        // Move each bucket to a new bucket
        //Traverse each bucket of the old hash table and recalculate the new position of the elements in the bucket
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                //The original data is assigned null for GC recycling
                oldTab[j] = null;
                //Determines whether the array has a next reference
                if (e.next == null)
                    //There is no next reference, indicating that it is not a linked list. There is only one key value pair on the current bucket, which can be inserted directly
                    newTab[e.hash & (newCap - 1)] = e;
                //Judge whether it is a red black tree
                else if (e instanceof TreeNode)
                    //If the description is a red black tree to handle conflicts, call relevant methods to separate the trees
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // Using linked list to deal with conflicts
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    //Calculate the new position of the node through the principle explained above
                    do {
                        // Original index
                        next = e.next;
                     	//Here we judge that if it is equal to true e, the node does not need to move after resize
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // Original index + oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // Put the original index into the bucket
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // Put the original index + oldCap into the bucket
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

4.3.4 delete method (remove)

After understanding the put method, the remove method is no longer difficult, so the repeated content will not be introduced in detail.

To delete, first find the location of the element. If it is a linked list, traverse the linked list to find the element and then delete it. If it is a red black tree, traverse the tree, and then delete it after finding it. When the tree is less than 6, turn the linked list.

remove method:

//The specific implementation of the remove method is in the removeNode method, so let's focus on the removeNode method
public V remove(Object key) {
        Node<K,V> e;
        return (e = removeNode(hash(key), key, null, false, true)) == null ?
            null : e.value;
    }

removeNode method:

final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
        Node<K,V>[] tab; Node<K,V> p; int n, index;
    	//Find the location according to the hash 
    	//If the bucket mapped to the current key is not empty
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (p = tab[index = (n - 1) & hash]) != null) {
            Node<K,V> node = null, e; K k; V v;
            //If the node on the bucket is the key to be found, point the node to the node
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                node = p;
            else if ((e = p.next) != null) {
                //Indicates that the node has a next node
                if (p instanceof TreeNode)
                    //Note: if the conflict is handled by the red black tree, obtain the node to be deleted by the red black tree
                    node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
                else {
                    //Judge whether to handle hash conflicts in a linked list. If yes, traverse the linked list to find the node to be deleted
                    do {
                        if (e.hash == hash &&
                            ((k = e.key) == key ||
                             (key != null && key.equals(k)))) {
                            node = e;
                            break;
                        }
                        p = e;
                    } while ((e = e.next) != null);
                }
            }
            //Compare whether the value of the found key matches the key to be deleted
            if (node != null && (!matchValue || (v = node.value) == value ||
                                 (value != null && value.equals(v)))) {
                //Delete the node by calling the method of red black tree
                if (node instanceof TreeNode)
                    ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
                else if (node == p)
                    //Linked list deletion
                    tab[index] = node.next;
                else
                    p.next = node.next;
                //Record modification times
                ++modCount;
                //Number of changes
                --size;
                afterNodeRemoval(node);
                return node;
            }
        }
        return null;
    }

4.3.5 find element method (get)

Find the method to find the Value through the Key of the element.

The code is as follows:

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

The get method mainly calls the getNode method. The code is as follows:

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    //If the hash table is not empty and the bucket corresponding to the key is not empty
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        /* 
        	Determine whether the array elements are equal
        	Check the first element according to the position of the index
        	Note: always check the first element
        */
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // If it is not the first element, judge whether there are subsequent nodes
        if ((e = first.next) != null) {
            // Judge whether it is a red black tree. If yes, call the getTreeNode method in the red black tree to obtain the node
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                // If it is not a red black tree, it is a linked list structure. Judge whether the key exists in the linked list through the circular method
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

Summary:

1. Implementation steps of get method:

1) obtain the bucket to which the key is mapped through the hash value

2) if the key on the bucket is the key to be found, find it directly and return it

3) if the key on the bucket is not the key to be found, view the subsequent nodes:

A: if the subsequent node is a red black tree node, get the value according to the key by calling the red black tree method

b: if the subsequent node is a linked list node, the value is obtained by looping through the linked list according to the key

2. The above red black tree node calls the getTreeNode method to find through the find method of the tree node:

 final TreeNode<K,V> getTreeNode(int h, Object k) {
            return ((parent != null) ? root() : this).find(h, k, null);
 }
 final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
            TreeNode<K,V> p = this;
            do {
                int ph, dir; K pk;
                TreeNode<K,V> pl = p.left, pr = p.right, q;
                if ((ph = p.hash) > h)
                    p = pl;
                else if (ph < h)
                    p = pr;
                else if ((pk = p.key) == k || (k != null && k.equals(pk)))
                    return p;//Return directly after finding
                else if (pl == null)
                    p = pr;
                else if (pr == null)
                    p = pl;
                else if ((kc != null ||
                          (kc = comparableClassFor(k)) != null) &&
                         (dir = compareComparables(kc, k, pk)) != 0)
                    p = (dir < 0) ? pl : pr;
                //recursive lookup 
                else if ((q = pr.find(h, k, kc)) != null)
                    return q;
                else
                    p = pl;
            } while (p != null);
            return null;
        }

3. To find a red black tree, since the tree has been guaranteed to be orderly when added before, the search is basically a half search, which is more efficient.

4. Here, as during insertion, if the hash value of the comparison node is equal to the hash value to be found, it will judge whether the key s are equal, and if they are equal, they will be returned directly. If they are not equal, they are found recursively from the subtree.

5. If it is a tree, search through key.equals(k) in the tree, O(logn); If it is a linked list, search through key.equals(k) in the linked list, O(n).

4.3.6 several ways to traverse the HashMap set

1. Traverse Key and Values respectively

2. Iterate using Iterator

3. get (not recommended)

Note: according to the Alibaba development manual, this method is not recommended because it iterates twice. The keySet obtains the Iterator once, and iterates again through get. Reduce performance.

4.jdk8 later, use the default method in the Map interface:

default void forEach(BiConsumer<? super K,? super V> action) 
BiConsumer Methods in the interface:
	void accept(T t, U u) Perform this operation on the given parameter.  
		parameter 
            t - First input parameter 
            u - Second input parameter 

Traversal Code:

public class Demo02 {
    public static void main(String[] args) {
        HashMap<String,String> m1 = new HashMap();
        m1.put("001", "zhangsan");
        m1.put("002", "lisi");
        //Using Lambda expressions
        m1.forEach((key,value)->{
            System.out.println(key+"---"+value);
        });
    }
}

5. How to design multiple non duplicate key value pairs to store the initialization of HashMap?

5.1 HashMap initialization problem description

If we know exactly how many key value pairs we need to store, we should specify its capacity when initializing HashMap to prevent automatic capacity expansion of HashMap and affect the use efficiency.

By default, the capacity of HashMap is 16. However, if the user specifies a number as the capacity through the constructor, Hash will select a power greater than the first 2 of the number as the capacity. (3 - > 4, 7 - > 8, 9 - > 16). We have explained this above.

Alibaba Java development manual suggests that we set the initialization capacity of HashMap.

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-y1FBm3MH-1632881825796)(img/2.bmp)]

So why do you suggest that? Have you thought about it.

Of course, the above suggestions also have theoretical support. As described above, the capacity expansion mechanism of HashMap is to expand the capacity when the capacity expansion conditions are met. The capacity expansion condition of HashMap is that when the number (size) of elements in HashMap exceeds the threshold, it will be expanded automatically. In HashMap, threshold = loadFactor * capacity.

Therefore, if we do not set the initial capacity, HashMap may be expanded many times with the continuous increase of elements. The capacity expansion mechanism in HashMap determines that the hash table needs to be rebuilt for each capacity expansion, which greatly affects the performance.

However, setting the initialization capacity and different values will also affect the performance. When we know the number of KV to be stored in HashMap, how much should we set the capacity?

5.2 initialization of capacity in HashMap

When we use HashMap(int initialCapacity) to initialize the capacity, jdk will help us calculate a relatively reasonable value as the initial capacity by default. So, do we just need to pass the number of elements to be stored in the known HashMap directly to initialCapacity?

For the setting of this value, the following suggestions are made in Alibaba Java Development Manual:

In other words, if the default value we set is 7, it will be set to 8 after Jdk processing. However, this HashMap will be expanded when the number of elements reaches 8 * 0.75 = 6, which is obviously something we don't want to see. We should minimize the expansion. The reason has also been analyzed.

If we calculate through initialCapacity/ 0.75F + 1.0F, 7 / 0.75 + 1 = 10, 10 will be set to 16 after Jdk processing, which greatly reduces the probability of capacity expansion.

When the hash table maintained in HashMap reaches 75% (by default), rehash will be triggered, and the rehash process is time-consuming. Therefore, if the initialization capacity is set to initialCapacity/0.75 + 1, it can effectively reduce conflicts and errors.

Therefore, I can think that when we know the number of elements in the HashMap, setting the default capacity to initialCapacity/ 0.75F + 1.0F is a relatively good choice in performance, but it will also sacrifice some memory.

When we want to create a HashMap in the code, if we know the number of elements to be stored in the Map, setting the initial capacity of the HashMap can improve the efficiency to a certain extent.

However, the JDK does not directly take the number passed in by the user as the default capacity, but will perform some operation and finally get a power of 2. The reason has also been analyzed.

However, in order to avoid the performance consumption caused by capacity expansion to the greatest extent, we suggest that the default capacity can be set to initialCapacity/ 0.75F + 1.0F.

By default, the capacity of HashMap is 16. However, if the user specifies a number as the capacity through the constructor, Hash will select a power greater than the first 2 of the number as the capacity. (3 - > 4, 7 - > 8, 9 - > 16). We have explained this above.

Alibaba Java development manual suggests that we set the initialization capacity of HashMap.

[external chain picture transferring... (img-y1FBm3MH-1632881825796)]

So why do you suggest that? Have you thought about it.

Of course, the above suggestions also have theoretical support. As described above, the capacity expansion mechanism of HashMap is to expand the capacity when the capacity expansion conditions are met. The capacity expansion condition of HashMap is that when the number (size) of elements in HashMap exceeds the threshold, it will be expanded automatically. In HashMap, threshold = loadFactor * capacity.

Therefore, if we do not set the initial capacity, HashMap may be expanded many times with the continuous increase of elements. The capacity expansion mechanism in HashMap determines that the hash table needs to be rebuilt for each capacity expansion, which greatly affects the performance.

However, setting the initialization capacity and different values will also affect the performance. When we know the number of KV to be stored in HashMap, how much should we set the capacity?

5.2 initialization of capacity in HashMap

When we use HashMap(int initialCapacity) to initialize the capacity, jdk will help us calculate a relatively reasonable value as the initial capacity by default. So, do we just need to pass the number of elements to be stored in the known HashMap directly to initialCapacity?

For the setting of this value, the following suggestions are made in Alibaba Java Development Manual:

In other words, if the default value we set is 7, it will be set to 8 after Jdk processing. However, this HashMap will be expanded when the number of elements reaches 8 * 0.75 = 6, which is obviously something we don't want to see. We should minimize the expansion. The reason has also been analyzed.

If we calculate through initialCapacity/ 0.75F + 1.0F, 7 / 0.75 + 1 = 10, 10 will be set to 16 after Jdk processing, which greatly reduces the probability of capacity expansion.

When the hash table maintained in HashMap reaches 75% (by default), rehash will be triggered, and the rehash process is time-consuming. Therefore, if the initialization capacity is set to initialCapacity/0.75 + 1, it can effectively reduce conflicts and errors.

Therefore, I can think that when we know the number of elements in the HashMap, setting the default capacity to initialCapacity/ 0.75F + 1.0F is a relatively good choice in performance, but it will also sacrifice some memory.

When we want to create a HashMap in the code, if we know the number of elements to be stored in the Map, setting the initial capacity of the HashMap can improve the efficiency to a certain extent.

However, the JDK does not directly take the number passed in by the user as the default capacity, but will perform some operation and finally get a power of 2. The reason has also been analyzed.

However, in order to avoid the performance consumption caused by capacity expansion to the greatest extent, we suggest that the default capacity can be set to initialCapacity/ 0.75F + 1.0F.

Tags: Java data structure Interview HashMap

Posted on Tue, 28 Sep 2021 22:06:20 -0400 by bk6662