JAVA Collection Series: HashMap

Introduction to HashMap

  • HashMap is a hash table that stores key value mappings.
  • HashMap   The Map interface implementation based on hash table is one of the commonly used Java collections and is thread unsafe.
  • HashMap   Null keys and values can be stored, but there can only be one null as a key and multiple null as a value.
  • HashMap   It stores data according to the HashCode value of the key, has fast access speed, and does not support thread synchronization.
  • HashMap is unordered, that is, the insertion order is not recorded.
  • HashMap inherits from AbstractMap and implements Map, Cloneable and java.io.Serializable interfaces.
  • Before JDK1.8, HashMap consists of array + linked list. Array is the main body of HashMap, and linked list mainly exists to solve hash conflicts ("zipper method" to solve conflicts)
  • After JDK1.8   HashMap has made great changes in resolving hash conflicts. When the length of the linked list is greater than the threshold (the default is 8) (it will be judged before converting the linked list into a red black tree. If the length of the current array is less than 64, it will choose to expand the array first rather than convert it into a red black tree) to convert the linked list into a red black tree to reduce the search time.
  • The default initialization capacity of HashMap is 16. After each expansion, the capacity becomes twice the original. And,   HashMap always uses the power of 2 as the size of the hash table.  

Analysis of underlying data structure of HashMap

Before JDK1.8

Before JDK1.8, the bottom layer of HashMap was   Arrays and linked lists   Used together, that is   Linked list hash.

HashMap obtains the hash value through the hashCode of the key after being processed by the perturbation function, and then determines the storage position of the current element through (n-1) & hash (where n refers to the length of the array).

If there is an element in the current position, judge whether the hash value and key of the element are the same as those of the element to be stored;

If they are the same, they can be directly covered. If they are different, they can solve the conflict through the zipper method.

The so-called perturbation function refers to the hash method of HashMap.

The hash method, that is, the perturbation function, is used to prevent some poorly implemented hashCode() methods. In other words, the collision can be reduced after using the perturbation function.

Source code of hash method of JDK 1.8 HashMap:

The hash method of JDK 1.8 is simpler than that of JDK 1.7, but the principle remains the same.

static final int hash(Object key) {
    int h;
    // key.hashCode(): returns the hash value, that is, hashcode
    // ^: bitwise XOR
    // >>>: move unsigned right, ignore the sign bit, and fill up the empty bits with 0
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Source code of hash method of HashMap in JDK1.7:

static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).

    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

Compared with the hash method of JDK 1.8, the performance of the hash method of JDK 1.7 is slightly worse because it has been disturbed for 4 times.

so-called   "Zipper method"   That is: combine linked list and array. In other words, create a linked list array, and each cell in the array is a linked list. If hash conflicts are encountered, the conflicting values can be added to the linked list.

After JDK1.8

Great changes have been made in resolving hash conflicts since JDK1.8.

When the length of the linked list is greater than the threshold (8 by default), the treeifBin() method will be called first. This method will determine whether to convert to red black tree according to HashMap array. Only when the array length is greater than or equal to 64, the red black tree conversion operation will be performed to reduce the search time. Otherwise, you just execute the resize() method to expand the array. The relevant source code will not be posted here. Just focus on the treeifBin() method!

  Properties of HashMap:

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    // serial number
    private static final long serialVersionUID = 362498820763181265L;
    // The default initial capacity is 16, (1 < < 4, i.e. the 4th power of 2)
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
    // The 30th power of maximum capacity 2
    static final int MAXIMUM_CAPACITY = 1 << 30;
    // Default fill factor
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    // When the number of nodes on the bucket is greater than this value, it will turn into a red black tree
    static final int TREEIFY_THRESHOLD = 8;
    // When the number of nodes on the bucket is less than this value, the tree is converted to the linked list
    static final int UNTREEIFY_THRESHOLD = 6;
    // The structure in the bucket is transformed into the minimum size of the table corresponding to the red black tree
    static final int MIN_TREEIFY_CAPACITY = 64;
    // An array of storage elements is always a power of 2
    transient Node<k,v>[] table;
    // A set that holds concrete elements
    transient Set<map.entry<k,v>> entrySet;
    // The number of elements to store. Note that this is not equal to the length of the array.
    transient int size;
    // Counters for each expansion and change of map structure
    transient int modCount;
    // Critical value when the actual size (capacity * filling factor) exceeds the critical value, capacity expansion will be carried out
    int threshold;
    // Loading factor
    final float loadFactor;
}
  • loadFactor load factor

The loadFactor controls the density of the data stored in the array. The closer the loadFactor is to 1, the more data (entries) stored in the array will be and the more dense it will be, that is, the length of the linked list will increase; The smaller the loadFactor, that is, it approaches 0, the less and sparse the data (entries) stored in the array.

Too large loadFactor leads to low efficiency in finding elements, too small leads to low utilization of arrays, and the stored data will be very scattered. The default value of loadFactor is 0.75f, which is a good critical value officially given.

The default capacity of a given hashMap is 16 and the load factor is 0.75.

During the use of Map, data is constantly stored in it. When the number reaches 16 * 0.75 = 12, the capacity of the current 16 needs to be expanded. This process involves rehash, copying data and other operations, so it consumes a lot of performance.

HashMap source code analysis

Construction method

Four construction methods of HashMap:

  • Default construction method

Construct an empty set. The initial default capacity of the set is 16 and the default loading factor is 0.75f

    /**
     * Default constructor  
     *
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
  • The specified Map is used as the construction method of the input parameter

Construct a new HashMap using the same mapping as the specified Map.

HashMap is created using the default load factor (0.75), and the initial capacity is sufficient to save the Map in the specified Map

    /**
     * Constructor containing additional Map collections 
     *
     */
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }

  putMapEntries method:

final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    if (s > 0) {
        // Determine whether the table has been initialized
        if (table == null) { // pre-size
            // Uninitialized, s is the actual number of elements of m
            float ft = ((float)s / loadFactor) + 1.0F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                    (int)ft : MAXIMUM_CAPACITY);
            // If the calculated t is greater than the threshold, the threshold is initialized
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        // It has been initialized and the number of m elements is greater than the threshold value. Capacity expansion is required
        else if (s > threshold)
            resize();
        // Add all elements in m to HashMap
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}
  • Specifies the construction method of the collection capacity

Internally call the constructor that specifies the collection capacity and the default load factor

   /**
     * Constructor that specifies the capacity size 
     *
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
  •   Specifies the construction method of the collection capacity and load factor
    /**
     * Constructor that specifies the capacity size and load factor 
     *
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

put method

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

put called the putVal method again

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

get method

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // Array elements are equal
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // More than one node in bucket
        if ((e = first.next) != null) {
            // get in tree
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // get in linked list
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

resize method

Capacity expansion will be accompanied by a re hash allocation, and all elements in the hash table will be traversed, which is very time-consuming. When writing programs, try to avoid resize.

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // If you exceed the maximum value, you won't expand any more, so you have to collide with you
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // If the maximum value is not exceeded, it will be expanded to twice the original value
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {
        // signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // Calculate the new resize upper limit
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        // Move each bucket to a new bucket
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else {
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        // Original index
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // Original index + oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // Put the original index into the bucket
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // Put the original index + oldCap into the bucket
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

Tags: Java set HashMap map

Posted on Thu, 30 Sep 2021 18:16:20 -0400 by YuriM