Deep analysis of Map source code

preface

map set is the set of key value

The difference between HashMap and HashTable

HashMap is thread unsafe. HashTable is read and write with synchronized lock. It is thread safe and inefficient. Hashtab cannot store null key and value. HashMap can store null key and value in the first position of array

The role of HashCode

hashCode is the int type value calculated by jdk according to the address, string or number of the object. It is mainly used in: HashMap can quickly find
Difference between hashcode and equals: the hashcode of the two objects is the same, but the objects are not necessarily the same. If the equals are the same, the hashcode must be the same.

Handwritten HashMap

Custom Map interface
package com.mayikt.ext;

/**
 * @Description: Custom Map interface
 * @Author: ChenYi
 * @Date: 2020/06/20 11:19
 **/
public interface MayiktMap<K, V> {
    /**
     * The size of the collection
     *
     * @return
     */
    int size();

    /**
     * add to
     *
     * @param key
     * @param value
     * @return
     */
    V put(K key, V value);

    /**
     * Get element
     *
     * @param key
     * @return
     */
    V get(K key);

    /**
     * Objects of existing key and value
     *
     * @param <K>
     * @param <V>
     */
    interface Entry<K, V> {

        K getKey();

        V getValue();

        V setValue(V value);
    }
}

Implementation of code array of HashMap based on ArrayList
package com.mayikt.ext.impl;

import com.mayikt.ext.MayiktMap;

import java.util.ArrayList;
import java.util.List;

/**
 * @Description: Using ArrayList to implement custom HashMap
 * @Author: ChenYi
 * @Date: 2020/06/20 11:40
 **/

public class MayiktArrayListHashMap<K, V> implements MayiktMap<K, V> {
    List<MayiktEntry<K, V>> mayiktEntryList = new ArrayList<>();

    @Override
    public int size() {
        return mayiktEntryList.size();
    }

    @Override
    public V put(K key, V value) {
        MayiktEntry<K, V> mayiktEntry = new MayiktEntry<>(key, value);
        mayiktEntryList.add(mayiktEntry);
        return value;
    }

    @Override
    public V get(K key) {
        for (MayiktEntry<K, V> mayiktEntry : mayiktEntryList) {
            if (mayiktEntry.getKey().equals(key)) {
                return mayiktEntry.getValue();
            }
        }
        return null;
    }


    class MayiktEntry<K, V> implements MayiktMap.Entry<K, V> {
        private K k;
        private V v;

        public MayiktEntry(K k, V v) {
            this.k = k;
            this.v = v;
        }

        @Override
        public K getKey() {
            return k;
        }

        @Override
        public V getValue() {
            return v;
        }

        @Override
        public V setValue(V value) {
            this.v = value;
            return v;
        }
    }
}

There is the same object with the same key and the same hash value, which does not solve the problem of hash conflict.

Implementation of HashMap linked list based on LinkList
package com.mayikt.ext.impl;

import com.mayikt.ext.MayiktMap;

import java.util.LinkedList;
import java.util.Objects;

/**
 * @Description:Custom implementation of HashMap based on LinkList
 * @Author: ChenYi
 * @Date: 2020/06/20 12:41
 **/

public class MayiktLinkListHashMap<K, V> implements MayiktMap<K, V> {
    LinkedList<MayiktLinkListHashMap.MayiktEntry>[] data = new LinkedList[100];

    @Override
    public int size() {
        return data.length;
    }

    @Override
    public V put(K key, V value) {
        int index = hash(key);
        LinkedList<MayiktLinkListHashMap.MayiktEntry> linkedList = data[index];
        MayiktEntry<K, V> mayiktEntry = new MayiktEntry<>(key, value);
        //Description the linkList does not exist
        if (Objects.isNull(linkedList)) {
            linkedList = new LinkedList<>();
            linkedList.add(mayiktEntry);
            data[index] = linkedList;
            return value;
        }
        //Before it exists, you need to traverse the LinkList set to see if the key is the same. If so, you can modify the value
        for (MayiktEntry entry : linkedList) {
            if (entry.getKey().equals(key)) {
                entry.setValue(value);
                return value;
            }
        }
        //There is a hashCode conflict and the key is different. You need to add a new one
        linkedList.add(mayiktEntry);
        return value;
    }

    private int hash(K key) {
        int hashCode = key.hashCode();
        return hashCode % data.length;
    }

    @Override
    public V get(K key) {
        if (Objects.isNull(key)) {
            return null;
        }
        int index = hash(key);
        LinkedList<MayiktEntry> linkedList = (LinkedList<MayiktEntry>) data[index];
        if (Objects.isNull(linkedList)) {
            return null;
        }
        for (MayiktEntry mayiktEntry : linkedList) {
            if (mayiktEntry.getKey().equals(key)) {
                return (V) mayiktEntry.getValue();
            }
        }
        return null;
    }

    class MayiktEntry<K, V> implements MayiktMap.Entry<K, V> {
        private K k;
        private V v;

        public MayiktEntry(K k, V v) {
            this.k = k;
            this.v = v;
        }

        @Override
        public K getKey() {
            return k;
        }

        @Override
        public V getValue() {
            return v;
        }

        @Override
        public V setValue(V value) {
            this.v = value;
            return v;
        }
    }
}

HashSet source code analysis

Underlying code
  public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

The bottom layer of the HashSet is implemented based on the HashMap. The added value is the key of the HashMap, and an object is created as the value to play the role of a placeholder. Because the HashMap does not allow key repetition, the value of the HashSet is also unique.

1.7 principle of HashMap

Here are some inline snippets.

HashMap code of jdk1.7
package com.mayikt.ext.impl;

import com.mayikt.ext.MayiktMap;

import java.util.HashMap;

/**
 * @Description: 1.7HashMap class
 * @Author: ChenYi
 * @Date: 2020/06/21 20:57
 **/

public class MayiktHashMap<K, V> implements MayiktMap<K, V> {
    /**
     * Default initialization capacity
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
    /**
     * Default load factor
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    /**
     * Maximum capacity
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;
    /**
     * Actual load factor
     */
    final float loadFactor;
    /**
     * threshold
     */
    int threshold;
    /**
     * Empty array
     */
    final MayiktHashMap.Entry<?, ?>[] EMPTY_TABLE = {};
    /**
     * array
     */
    transient MayiktHashMap.Entry<K, V>[] table = (Entry<K, V>[]) EMPTY_TABLE;
    transient int hashSeed = 0;

    transient int size;

    transient int modCount;

    public MayiktHashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }

    public MayiktHashMap(int initialCapacity, float loadFactor) {
        //Set initial capacity
        if (initialCapacity < 0) {
            throw new IllegalArgumentException("Illegal initial capacity: " +
                    initialCapacity);
        }
        if (initialCapacity > MAXIMUM_CAPACITY) {
            initialCapacity = MAXIMUM_CAPACITY;
        }
        if (loadFactor <= 0 || Float.isNaN(loadFactor)) {
            throw new IllegalArgumentException("Illegal load factor: " +
                    loadFactor);
        }
        //Set the actual load factor
        this.loadFactor = loadFactor;
        //Actual capacity
        threshold = initialCapacity;
        init();
    }

    protected void init() {
    }

    @Override
    public int size() {
        return 0;
    }

    @Override
    public V put(K key, V value) {
        //Add element for the first time
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        //If key is null
        if (key == null) {
            //Add key is null
            return putForNullKey(value);
        }
        //hash value
        int hash = hash(key);
        //Evaluate index position in array
        int index = indexFor(hash, table.length);
        //Traverse the Entry to determine whether it is the same key. If it is the same key, modify the value
        for (MayiktHashMap.Entry<K, V> e = table[index]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                return oldValue;
            }
        }
        //Add element
        addEntry(hash, key, value, index);
        size++;
        return null;
    }

    private V putForNullKey(V value) {
        //If null exists before, you need to modify the value
        for (MayiktHashMap.Entry<K, V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
//                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);
        return null;
    }

    void addEntry(int hash, K key, V value, int bucketIndex) {
        //See if expansion is needed
        if ((size >= threshold) && (null != table[bucketIndex])) {
            //Double the capacity
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }
        createEntry(hash, key, value, bucketIndex);
    }

    /**
     * Expansion
     *
     * @param newCapacity
     */
    void resize(int newCapacity) {
        MayiktHashMap.Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        MayiktHashMap.Entry[] newTable = new MayiktHashMap.Entry[newCapacity];
        transfer(newTable, false);
        table = newTable;
        threshold = (int) Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }

    /**
     * Reassign after expansion
     *
     * @param newTable
     * @param rehash
     */
    void transfer(MayiktHashMap.Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (MayiktHashMap.Entry<K, V> e : table) {
            while (null != e) {
                MayiktHashMap.Entry<K, V> next = e.next;
                //Recalculate index
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

    private void createEntry(int hash, K key, V value, int bucketIndex) {
        //If there is no hash conflict, take it out as null. If there is a hash conflict, use the header insertion method, and the newly added one should be put in the front
        Entry<K, V> next = table[bucketIndex];
        //Set the entry object of the index in the array
        table[bucketIndex] = new MayiktHashMap.Entry<>(hash, key, value, next);
    }

    /**
     * Calculate the corresponding array index according to the hash value and the array length
     *
     * @param hash
     * @param length
     * @return
     */
    static int indexFor(int hash, int length) {
        //length is the power even number of 2. In order to reduce the index conflict, you need to reduce 1 to become the radix. This is the bit operation of the computer
        return hash & (length - 1);
    }

    /**
     * Calculate hash
     *
     * @param k
     * @return
     */
    final int hash(Object k) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }
        h ^= k.hashCode();

        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

    private void inflateTable(int toSize) {
        //Set the power of the initial capacity to 2, which is even
        int capacity = roundUpToPowerOf2(toSize);
        threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        table = new MayiktHashMap.Entry[capacity];
//        initHashSeedAsNeeded(capacity);
    }

    private static int roundUpToPowerOf2(int number) {
        // assert number >= 0 : "number must be non-negative";
        int rounded = number >= MAXIMUM_CAPACITY
                ? MAXIMUM_CAPACITY
                : (rounded = Integer.highestOneBit(number)) != 0
                ? (Integer.bitCount(number) > 1) ? rounded << 1 : rounded
                : 1;
        return rounded;
    }

    @Override
    public V get(K key) {
        if (key == null) {
            return getForNullKey();
        }
        MayiktHashMap.Entry<K, V> entry = getEntry(key);
        return null == entry ? null : entry.getValue();
    }

    /**
     * Get the value with null key
     *
     * @return
     */
    private V getForNullKey() {
        if (size == 0) {
            return null;
        }
        for (MayiktHashMap.Entry<K, V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                return e.value;
            }
        }
        return null;
    }

    private Entry<K, V> getEntry(K key) {
        if (size == 0) {
            return null;
        }
        int hash = (key == null) ? 0 : hash(key);
        for (MayiktHashMap.Entry<K, V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) {
                return e;
            }
        }
        return null;
    }

    static class Entry<K, V> implements MayiktMap.Entry<K, V> {

        final K key;
        V value;
        MayiktHashMap.Entry<K, V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, MayiktHashMap.Entry<K, V> next) {
            value = v;
            this.next = next;
            key = k;
            hash = h;
        }

        @Override
        public final K getKey() {
            return key;
        }

        @Override
        public final V getValue() {
            return value;
        }

        @Override
        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }
    }
}

Summary:

  • The bottom layer is implemented based on array + linked list. The key and value values are stored by using the Entry object. The Entry is a one-way linked list, which only records the pointer of the next node, but not the previous node
  • The default initial capacity is 16 and the load factor is 0.75
  • Support the storage of key null, which is stored in the position with index 0 in the array, the first position
  • The length of an array is the power of 2, even if it comes from an even number. Even if the initial capacity of a HashMap specified by new is an odd number, the power of 2, the smallest capacity currently specified, will be taken as the capacity of the array through calculation when the element is added for the first time. For example, if the current specified array is 3, then 4, if 9, then 16
  • In order to reduce the index conflict of the array, the bit operation of hash & (length - 1) is used. Because the length is even, in order to reduce the index conflict, the subtraction of 1 becomes odd, which can reduce the index conflict during the operation
  • In order to make full use of resources, when reaching the threshold of capacity expansion and preparing to resize, it is also necessary to determine whether the corresponding array index conflicts. If not, it will not expand first, (size > = threshold) & & (null! = table [bucketindex]) and the capacity expansion is doubled
  • After the length of the array is doubled, you need to call the transfer function to calculate the new index value and put it into the expanded array
  • load_ The reason for factor loading factor is 0.75 is that if the loading factor is large, the threshold is large, and the array is about to be full, then resize will take place. At this time, there are many index conflicts. If the loading factor is too small, the threshold is relatively small, and the capacity will be expanded soon, and the space and memory of the array cannot be fully utilized. In order to ensure the chance of conflict and space utilization rate There is a balance between, so using 0.75 is the best
  • The implementation of put method first calculates the hash value of the key, and then calculates the index position in the array through indexFor method
  • The difference between index conflict and hash conflict is that the bottom layer uses binary operation to generate the same index. The object is different, but the binary generates the same index. The hash conflict object is different, but the hashCode is the same
  • Time complexity of key query
    1. If the key does not have a hash conflict, get it directly from the array according to the index, and the time complexity is O(1)
    2. If the key has a hash conflict, you need to query from the linked list. The query efficiency is relatively slow

Problems in jdk7

  • Thread is not safe. If the linked list is too long, it will lead to low query efficiency and time complexity O(n)
  • There will be a problem of dead cycle during capacity expansion
    1. In the case of multithreading, the capacity of HashMap is expanded at the same time, because each time the array is expanded, the length of the new array changes, and the index value needs to be recalculated, and the data in the original table needs to be moved to the new table, e.next=new Table[i], The shared variable is used for operation, because it is the same linked list before. When the index value is recalculated, it will still be in the same linked list (the hash value is unchanged). For example, it is B-A before, and then two threads operate at the same time. Because the head insertion method is used, when thread 1 recalculates the index and has changed the pointer to A-B, But thread 2 starts to calculate at this time. At this time, B-A-B will cause a circular reference and a dead cycle.
  • hashMap in jdk7 computes hash very evenly, reduces hash conflict and query efficiency
  • The hashMap in jdk8 is very simple to calculate the hash and has a high probability of hash conflict. However, the red black tree is used in jdk8 to solve the problem of slow query efficiency

jdk8HashMap

Summary:

  • The bottom layer is realized by array + linked list + red black tree (a balanced binary tree). The time complexity is O(logn)
  • The hash function is relatively simple, because when there is a conflict, there will be a red black tree to store, which will not cause the chain table to be too long and the query efficiency to be reduced
  • When the length of the linked list exceeds 8 and the capacity of the array exceeds 64, the linked list will be converted to a red black tree. If the length of the linked list exceeds 8 but the length of the array does not exceed 64, it will only be expanded, and the capacity will be doubled. If the length exceeds 64, the entire one-way linked list will be converted to a two-way linked list, and then the entire two-way linked list will be converted to a red black tree
  • During capacity expansion, the index value will be recalculated, but note whether the previous one is from the same linked list or in the same linked list, because the hash value of the same linked list is the same after index = (n - 1) & hash
    The index of the calculation process is the same, so if it is the same linked list before capacity expansion, it is the same linked list after capacity expansion
    If the length of the list is less than 6, the red black tree will become the list

ConcurrentHashMap source code

1.7

  • By default, it is divided into 16 segments with segmented locks. Each segment has an independent table. Segment inherits ReentrantLock's re-entry lock and HashEntry to store data
  • Calculate the index according to the key and store it in the Segment location

1.8

  • Adopt cas NO lock mechanism
  • The Node node is used to store data. The new Node uses cas optimistic locking mechanism to ensure thread safety
  • If there is a conflict in the calculation of index, use synchronized lock
  • Lock granularity is more fine than jdk1.7

Reference: from ant class

Tags: Java JDK less

Posted on Wed, 24 Jun 2020 22:22:38 -0400 by thomasadam83