[JDK source code] 20000 word concurrent HashMap

Opening question

(1) Is the data structure of ConcurrentHashMap the same as that of HashMap?

(2) When does HashMap have concurrency security problems in a multithreaded environment?

(3) How does concurrent HashMap solve the problem of concurrency security?

(4) What locks does ConcurrentHashMap use?

(5) How does the capacity expansion of ConcurrentHashMap proceed?

(6) Is ConcurrentHashMap strongly consistent?

brief introduction

ConcurrentHashMap is a thread safe version of HashMap. Internally, it also uses the structure of (array + linked list + red black tree) to store elements.

Compared with HashTable with the same thread safety, the efficiency and other aspects are greatly improved.

reference resources
[JDK source code] HashMap
[JDK source code] HashTable

Introduction to various locks

After understanding AQS, let's take a simple look at various locks

(1)synchronized

The keyword in java is internally implemented as a monitor lock, which is mainly indicated by the field in the object header of the object monitor.

synchronized has been optimized a lot since the old version. There are three ways to exist at runtime: biased lock, lightweight lock and heavyweight lock.

Biased locking means that a piece of synchronization code is accessed by a thread all the time, and the thread will automatically acquire the lock to reduce the cost of acquiring the lock.

Lightweight lock means that when a lock is a biased lock and is accessed by another thread, the biased lock will be upgraded to a lightweight lock. This thread will try to obtain the lock by spinning without blocking and improving performance.

Heavyweight lock means that when the lock is a lightweight lock, when the spinning thread spins a certain number of times and does not obtain the lock, it will enter the blocking state. The lock will be upgraded to a heavyweight lock, which will block other threads and reduce performance.

(2)CAS*

CAS, Compare And Swap, is an optimistic lock. It believes that concurrent operations on the same data will not necessarily be modified. When updating data, try to update the data. If it fails, keep trying.

(3) volatile (non lock)

Keyword in java. When multiple threads access the same variable, one thread modifies the value of the variable, and other threads can immediately see the modified value. (this involves the knowledge of java memory model, which can be used for reference JMM and Volatile)

Volatile only guarantees visibility, not atomicity. For example, the variable i modified by volatile does not guarantee the correct result every time for i + + operation, because i + + operation is a two-step operation, which is equivalent to i = i +1. Read first and then add 1. Volatile cannot be guaranteed.

(4) Spin lock

Spin lock means that the thread trying to obtain the lock will not block, but will continue to try in a circular way. This has the advantage of reducing the unlocking caused by thread context switching and improving performance. The disadvantage is that the cycle will consume CPU.

(5) Sectional lock

Segmented lock is a design idea of lock. It refines the granularity of lock and is mainly used in ConcurrentHashMap to realize efficient concurrent operation. When the operation does not need to update the whole array, only one item in the array can be locked.

(6)ReentrantLock

Reentrant lock means that a thread will automatically acquire a lock when it attempts to acquire a lock after acquiring the lock. The advantage of reentrant lock is to avoid deadlock.

In fact, synchronized is also a reentrant lock.

Source code analysis

Member properties

// Maximum capacity value of hash table array
private static final int MAXIMUM_CAPACITY = 1 << 30;

// Hash table default capacity value 16
private static final int DEFAULT_CAPACITY = 16;

// The maximum array size (not a power of 2) is required for toArray and related methods (not core attributes)
static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

// Attributes left over from jdk1.7 to represent concurrency level
// jdk1.8 is only used during initialization and does not represent the concurrency level. After 1.8, the concurrency level is determined by the length of the hash table
private static final int DEFAULT_CONCURRENCY_LEVEL = 16;

// Load factor: indicates the filling degree of hash table ~ in ConcurrentHashMap, this attribute has a fixed value of 0.75 and cannot be modified~
private static final float LOAD_FACTOR = 0.75f;

// Treeing threshold: when the length of the linked list in a bucket of the hash table reaches 8, treeing of the linked list may occur
static final int TREEIFY_THRESHOLD = 8;

// De treeing threshold: when the number of red and black tree elements in a bucket of the hash table is less than 6, the red and black tree is converted back to the linked list structure
static final int UNTREEIFY_THRESHOLD = 6;

// Only when the length of hash table reaches 64 and the length of linked list in a bucket reaches 8 can tree occur
static final int MIN_TREEIFY_CAPACITY = 64;

// Controls the minimum step size of data migrated by the thread (bucket span ~)
private static final int MIN_TRANSFER_STRIDE = 16;

// The fixed value 16 is related to capacity expansion. When calculating capacity expansion, a capacity expansion identification stamp will be generated according to the attribute value
private static int RESIZE_STAMP_BITS = 16;

// (1 << (32 - RESIZE_STAMP_BITS)) - 1 = 65535: 1 << 16 -1
// Indicates the maximum number of threads that can be accommodated by concurrent expansion
private static final int MAX_RESIZERS = (1 << (32 - RESIZE_STAMP_BITS)) - 1;

// It is also a capacity expansion related attribute, which will be used in capacity expansion analysis~
private static final int RESIZE_STAMP_SHIFT = 32 - RESIZE_STAMP_BITS;

// When the hash value of the node node is - 1, it indicates that the current node is a FWD(forwarding) node (a node that has been migrated)
static final int MOVED     = -1;
// When the hash value of the node node is - 2, it means that the current node has been trealized and the current node is a TreeBin object ~, and the TreeBin object agent operates the red black tree
static final int TREEBIN   = -2; 
// When the hash value of node node is - 3:
static final int RESERVED  = -3;
// 0x7fffffff hex to binary value: 111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
// The function is that when a binary negative number and 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 are bitwise and (&), a positive number will be obtained, but not the absolute value
static final int HASH_BITS = 0x7fffffff; 

// Number of CPU s in the current system
static final int NCPU = Runtime.getRuntime().availableProcessors();

// JDK1.8 serializes the attributes (non core attributes) used in concurrent HashMap compatible with JDK1.7
private static final ObjectStreamField[] serialPersistentFields = {
    new ObjectStreamField("segments", Segment[].class),
    new ObjectStreamField("segmentMask", Integer.TYPE),
    new ObjectStreamField("segmentShift", Integer.TYPE)
};

// Hash table
transient volatile Node<K,V>[] table;

// New table reference: during capacity expansion, the new table in capacity expansion will be assigned to nextTable (keep reference). After capacity expansion, it will be set to NULL
private transient volatile Node<K,V>[] nextTable;

// The same function as baseCount in LongAdder: when there is no thread contention or the current LongAdder is locked, the increment will be accumulated to baseCount
private transient volatile long baseCount;

// Indicates the status of hash table: 
// When sizecl < 0:
// Case 1: sizeCtl=-1: indicates that the current table is initializing (that is, a thread is creating a table array), and the current thread needs to wait
// Case 2: indicates that the current table hash table is being expanded. The high 16 bits represent the identification stamp of the expansion, and the low 16 bits represent the number of expansion threads: (1 + nThread) that is, the number of threads participating in concurrent expansion.
// Sizecl = 0: indicates that the default initial capacity is used when creating a table hash table. DEFAULT_CAPACITY=16
// When sizecl > 0:
// Case 1: if the table is not initialized, it indicates the initialization size
// Case 2: if the table has been initialized, it means that the condition (threshold) will be triggered during the next capacity expansion
private transient volatile int sizeCtl;

// During capacity expansion, record the current progress. All threads need to allocate interval tasks from transferIndex and execute their own tasks
private transient volatile int transferIndex;

// In LongAdder, cellsBusy indicates the locking status of the object:
// 0: indicates that the current LongAdder object is in an unlocked state
// 1: Indicates that the current LongAdder object is locked
private transient volatile int cellsBusy;

// The cells array in LongAdder will be created after thread contention in baseCount,
// The thread will get its own cell by calculating the hash value and accumulate the increment to the specified cell
// Total = sum(cells) + baseCount
private transient volatile CounterCell[] counterCells;

Construction method

   /**
     * Creates a new, empty map with the default initial table size (16). The default initialization is 16
     */
    public ConcurrentHashMap() {
    }

    /**
     * Pass in initialCapacity. However, the real capacity is the minimum power of initialCapacity. For example, if 9 is passed in, the initial capacity = 16
     */
    public ConcurrentHashMap(int initialCapacity) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException();
        int cap = ((initialCapacity >= (MAXIMUM_CAPACITY >>> 1)) ?
                    // Maximum capacity_ CAPACITY = 1 << 30
                   MAXIMUM_CAPACITY :
                   // The tableSizeFor method changes 9 to 16
                   tableSizeFor(initialCapacity + (initialCapacity >>> 1) + 1));
        this.sizeCtl = cap;
    }

    /**
     * Creates a new map with the same mappings as the given map.
     *
     * @param m the map
     */
    public ConcurrentHashMap(Map<? extends K, ? extends V> m) {
        // Default 16, DEFAULT_CAPACITY=16
        this.sizeCtl = DEFAULT_CAPACITY;
        putAll(m);
    }

    public ConcurrentHashMap(int initialCapacity, float loadFactor) {
        this(initialCapacity, loadFactor, 1);
    }

    /**
     * @param loadFactor the load factor (table density) for
     * establishing the initial table size Determine initial table size
     * @param concurrencyLevel the estimated number of concurrently Estimate concurrent
     * updating threads. The implementation may use this value as
     * a sizing hint. Update thread. Implementations can use this value as a hint for resizing.
     */
    public ConcurrentHashMap(int initialCapacity,
                             float loadFactor, int concurrencyLevel) {
        if (!(loadFactor > 0.0f) || initialCapacity < 0 || concurrencyLevel <= 0)
            throw new IllegalArgumentException();
        if (initialCapacity < concurrencyLevel)   // Use at least as many bins
            initialCapacity = concurrencyLevel;   // as estimated threads
        long size = (long)(1.0 + (long)initialCapacity / loadFactor);
        int cap = (size >= (long)MAXIMUM_CAPACITY) ?
            MAXIMUM_CAPACITY : tableSizeFor((int)size);
        this.sizeCtl = cap;
    }

Comparing the construction method with HashMap, it can be found that instead of the threshold and LoadFactor in HashMap, sizeCtl is used to control, and only the capacity is stored in it. So how is it used? The official interpretation of sizeCtl is as follows:

(1) - 1 indicates that a thread is initializing

(2) - (1 + nThreads) indicates that n threads are expanding capacity together (in fact, sizeCtl=-(1 + nThreads) is not accurate here, because the upper 16 bits of sizeCtl also store the expansion postmark, followed by)

(3) 0, the default value. The default capacity will be used in subsequent real initialization, that is, 16 given in initTable

(4) > 0 indicates the next expansion threshold after initialization or expansion

  • tableSizeFor
private static final int tableSizeFor(int c) {
    int n = c - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    // The principle is to change decimal c into binary, and then all binary numbers of c are 1, such as 1010 = > 1111; Finally, at + 1, it becomes the minimum second power of c
    // For example, the original 9 (1001) becomes 15 (1111) and then + 1 (10000)
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

Add element

  • Methods used: spread, initTable, tabAt, helpTransfer, putTreeVal, treeifyBin and addCount
public V put(K key, V value) {
    return putVal(key, value, false);
}

// onlyIfAbsent: replace data:
// If it is false, when put ting data, it will replace the data with the same K and V in the Map
// If true, when data is put, if there are data with the same K and V in the Map, no replacement and no insertion will be made
final V putVal(K key, V value, boolean onlyIfAbsent) {
    // Neither key nor value can be null
    if (key == null || value == null) throw new NullPointerException();
    // Calculate the hash value. Through the spread method, the high bit can also participate in the addressing operation, making the final hash value more dispersed
    int hash = spread(key.hashCode());
    // The number of elements in the bucket where the element to be inserted is located= 2 indicates that the current bucket level may have been transformed into a red black tree
    int binCount = 0;
    // Dead cycle, combined with CAS (if CAS fails, the whole bucket will be taken again for the following process)
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh;
        if (tab == null || (n = tab.length) == 0)
            // If the bucket is not initialized or the number of buckets is 0, the bucket is initialized
            tab = initTable();
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
            // If there is no element in the bucket where the element to be inserted is located, insert the element into the bucket
            if (casTabAt(tab, i, null,
                    new Node<K,V>(hash, key, value, null)))
                // If an element is found when CAS is used to insert an element, it will enter the next cycle and operate again
                // If CAS is used to insert elements successfully, break jumps out of the loop and the process ends
                break;                   // no lock when adding to empty bin
        }
        else if ((fh = f.hash) == MOVED)
            // If the hash of the first element in the bucket where the element to be inserted is MOVED (the node where other threads are expanding and the bucket has been migrated), the current thread helps migrate the elements together
            tab = helpTransfer(tab, f);
        else {
            // If the bucket is not empty and elements are not being migrated, lock the bucket (segment lock)
            // And find out whether the element to be inserted is in this bucket
            // If it exists, replace the value (because onlyIfAbsent=false)
            // If it does not exist, it will be inserted at the end of the linked list or into the tree
            V oldVal = null;// The original value will be used later
            // The lock is a bucket f
            synchronized (f) {
                // Check whether the first element has changed again. If so, enter the next cycle and start over
                if (tabAt(tab, i) == f) {
                    // If the hash value of the first element is greater than or equal to 0 (indicating that it is not migrating or a tree)
                    // That is, the elements in the bucket are stored in a linked list
                    if (fh >= 0) {
                        // The number of elements in the bucket is assigned as 1
                        binCount = 1;
                        // Traverse the whole bucket, and add 1 to binCount at the end of each time
                        for (Node<K,V> e = f;; ++binCount) {
                            K ek;
                            if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                            (ek != null && key.equals(ek)))) {
                                // If this element is found, a new value is assigned (onlyIfAbsent=false)
                                // And exit the loop
                                oldVal = e.val;
                                if (!onlyIfAbsent)
                                    e.val = value;
                                break;
                            }
                            Node<K,V> pred = e;
                            if ((e = e.next) == null) {
                                // If no element is found at the end of the linked list
                                // Just insert it at the end of the linked list and exit the loop
                                pred.next = new Node<K,V>(hash, key,
                                        value, null);
                                break;
                            }
                        }
                    }
                    else if (f instanceof TreeBin) {
                        // If the first element is a tree node
                        Node<K,V> p;
                        // The number of elements in the bucket is assigned as 2
                        binCount = 2;
                        // Call the insertion method of the red black tree to insert elements
                        // null if successfully inserted
                        // Otherwise, return the found node
                        if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                value)) != null) {
                            // If this element is found, a new value is assigned (onlyIfAbsent=false)
                            // And exit the loop
                            oldVal = p.val;
                            if (!onlyIfAbsent)
                                p.val = value;
                        }
                    }
                }
            }
            // If binCount is not 0, the element is successfully inserted or found
            if (binCount != 0) {
                // If the number of linked list elements reaches 8 treeify_ Treeing is attempted if threshold = 8
                // Because when inserting elements into the tree, binCount is only assigned 2, and the number of elements in the whole tree is not calculated
                // Therefore, the tree will not be repeated
                if (binCount >= TREEIFY_THRESHOLD)
                    treeifyBin(tab, i);
                // If the element to be inserted already exists, the old value is returned
                if (oldVal != null)
                    return oldVal;
                // Exit the outer loop and the process ends
                break;
            }
        }
        }
        // Insert elements successfully. Add 1 to the number of elements (do you want to expand in this one)
        addCount(1L, binCount);
        // The successful insertion of the element returns null, which indicates that the element did not exist before
        return null;
    }

The overall process is similar to that of HashMap, which consists of the following steps:

(1) If the bucket array is not initialized, it will be initialized;

(2) If the bucket where the element to be inserted is empty, try to insert this element directly into the first position of the bucket;

(3) If capacity expansion is in progress, the current thread will be added to the process of capacity expansion;

(4) If the bucket of the element to be inserted is not empty and the element is not being migrated, lock the bucket (segment lock);

(5) If the element in the current bucket is stored in a linked list, look for the element or insert the element in the linked list;

(6) If the element in the current bucket is stored as a red black tree, look for the element or insert the element in the red black tree;

(7) If the element exists, the old value is returned;

(8) If the element does not exist, add 1 to the number of elements in the whole Map, check whether capacity expansion is required, and return null;

The locks used in the operation of adding elements mainly include (spin lock + CAS + synchronized + segmented lock).

Why use synchronized instead of ReentrantLock?

Because synchronized has been greatly optimized, it is no worse than ReentrantLock in specific cases.

  • spread
static final int spread(int h) {
    // XOR the low 16 bits of h with the high 16 bits of h, and then in and hash_ Bits (0x7fffff i.e. the 31st power of 2) bitwise AND
    // The purpose is to reduce hash conflicts
    return (h ^ (h >>> 16)) & HASH_BITS;
}
  • initTable: initializes the bucket array when placing elements for the first time.

  • tabAt: get the Node node (header Node) of the specified subscript i in the tab(Node []) array

  • helpTransfer: when the thread adds an element, it finds that it is expanding and the bucket element of the current element has been migrated, it helps to migrate the elements of other buckets.

  • putTreeVal: adds data to a tree node

  • treeifyBin: tree

  • addCount: determines whether capacity expansion is required

Initialize bucket array

initTable initializes the bucket array when placing elements for the first time.

private final Node<K,V>[] initTable() {
    Node<K,V>[] tab; int sc;
    while ((tab = table) == null || tab.length == 0) {
        if ((sc = sizeCtl) < 0)
            // If sizectl < 0, it indicates that initialization or capacity expansion is in progress. Give up the CPU
            Thread.yield(); // lost initialization race; just spin
        else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
            // If the sizeCtl atom is successfully updated to - 1, the current thread enters initialization
            // If the atomic update fails, it indicates that other threads have entered the initialization step first, and then enter the next cycle
            // If the initialization is not completed in the next cycle, sizectl < 0 enters the logic of if above to give up the CPU
            // If the next cycle update is completed, but enter table.length= 0, also exit the loop
            try {
                // Check whether the table is empty again to prevent ABA problems, that is, to prevent other threads from continuing initialization after initialization
                if ((tab = table) == null || tab.length == 0) {
                    // If sc is 0, the default value of 16 is used. Here is the capacity of the initialized array. sc is sizeCtl
                    int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
                    // New array
                    @SuppressWarnings("unchecked")
                    Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                    // Assign to table bucket array
                    table = tab = nt;
                    // Set sc to 0.75 times the length of the array
                    // n - (n >>> 2) = n - n/4 = 0.75n
                    // It can be seen that the loading factor and expansion threshold are dead
                    // This is why there are no threshold and loadFactor attributes
                    sc = n - (n >>> 2);
                }
            } finally {
                // Assign sc to sizeCtl, and the expansion threshold is stored
                sizeCtl = sc;
            }
            break;
        }
    }
    return tab;
}

(1) Using CAS lock control, only one thread initializes the bucket array, and when we do not pass parameters in the construction method, the default value here is 16, int n = (SC > 0)? sc : DEFAULT_ Capability SC = sizectl;

(2) sizeCtl stores the expansion threshold after initialization;

(3) The expansion threshold is 0.75 times the bucket array size. The bucket array size is the capacity of the map, that is, the maximum number of elements stored.

Determine whether capacity expansion is required

After adding elements each time, add 1 to the number of elements, and judge whether the expansion threshold is reached. If it is reached, expand or assist in capacity expansion.

private final void addCount(long x, int check) {
    CounterCell[] as; long b, s;
    // The ideas used here as like LongAdder are as like as two peas.
    // Store the size of the array on different segments according to different threads (also the idea of segment lock)
    // And there is a baseCount. The baseCount is updated first. If it fails, the segments corresponding to different threads are updated
    // This ensures that conflicts are minimized

    // Try to add the quantity to baseCount first. If it fails, add it to the segmented counter cell
    if ((as = counterCells) != null ||
            !U.compareAndSwapLong(this, BASECOUNT, b = baseCount, s = b + x)) {
        CounterCell a; long v; int m;
        boolean uncontended = true;
        // If as is empty
        // Or the length is 0
        // Or the segment of the current thread is null
        // Or adding the number to the segment of the current thread fails
        if (as == null || (m = as.length - 1) < 0 ||
                (a = as[ThreadLocalRandom.getProbe() & m]) == null ||
                !(uncontended =
                        U.compareAndSwapLong(a, CELLVALUE, v = a.value, v + x))) {
            // Force to increase the quantity (in any case, the quantity must be added, not simply spin)
            // Different threads corresponding to different segments failed to update
            // This indicates that a conflict has occurred, so expand the capacity of counterCells
            // To reduce the probability that multiple threads hash to the same segment
            fullAddCount(x, uncontended);
            return;
        }
        if (check <= 1)
            return;
        // Calculate the number of elements
        s = sumCount();
    }
    if (check >= 0) {
        Node<K,V>[] tab, nt; int n, sc;
        // If the number of elements reaches the capacity expansion threshold, capacity expansion is performed
        // Note that under normal circumstances, sizeCtl stores the expansion threshold, that is, 0.75 times the capacity
        while (s >= (long)(sc = sizeCtl) && (tab = table) != null &&
                (n = tab.length) < MAXIMUM_CAPACITY) {
            // rs is a postmark mark during capacity expansion
            int rs = resizeStamp(n);
            if (sc < 0) {
                // SC < 0 indicates capacity expansion
                if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
                        sc == rs + MAX_RESIZERS || (nt = nextTable) == null ||
                        transferIndex <= 0)
                    // Capacity expansion has been completed. Exit the cycle
                    // Normally, it should only trigger the condition nextTable==null. Other conditions don't know when to trigger
                    break;

                // If the expansion is not completed, the current thread is added to the migration element
                // And increase the number of expansion threads by 1
                if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1))
                    transfer(tab, nt);
            }
            else if (U.compareAndSwapInt(this, SIZECTL, sc,
                    (rs << RESIZE_STAMP_SHIFT) + 2))// RESIZE_STAMP_SHIFT=16
                // This is where the thread that triggered the expansion enters
                // The upper 16 bits of sizeCtl store rs the expanded postmark
                // The lower 16 bits of sizeCtl store the number of expanded threads plus 1, i.e. (1+nThreads)
                // Therefore, the official saying that the value of sizeCtl during capacity expansion is - (1+nThreads) is wrong

                // Enter migration element
                transfer(tab, null);
            // Recalculate the number of elements
            s = sumCount();
        }
    }
}

(1) The storage method of the number of elements is similar to the LongAdder class, which is stored in different segments to reduce the conflict when different threads update the size at the same time; (refer to [JDK source code] concurrent atomic class LongAdder)

(2) When calculating the number of elements, add the values of these segments and baseCount to calculate the total number of elements;

(3) Under normal conditions, sizeCtl stores the expansion threshold, which is 0.75 times the capacity;

(4) During capacity expansion, sizeCtl high-order storage capacity expansion postmark (resizeStamp), low-order storage capacity expansion threads plus 1 (1+nThreads);

(5) After adding elements to other threads, if it is found that there is capacity expansion, it will also be added to the capacity expansion column of the;

Assist in capacity expansion (migration of elements)

When the thread adds an element and finds that it is expanding and the bucket element where the current element is located has been migrated, it assists in migrating the elements of other buckets.

final Node<K,V>[] helpTransfer(Node<K,V>[] tab, Node<K,V> f) {
    Node<K,V>[] nextTab; int sc;
    // If the bucket array is not empty, and the first element of the current bucket is of type ForwardingNode, and nextTab is not empty
    // It indicates that the current bucket has been migrated before helping to migrate the elements of other buckets
    // During capacity expansion, the first element of the old bucket will be set as ForwardingNode, and its nextTab will point to the new bucket array
    if (tab != null && (f instanceof ForwardingNode) &&
            (nextTab = ((ForwardingNode<K,V>)f).nextTable) != null) {
        int rs = resizeStamp(tab.length);
        // Sizecl < 0, indicating capacity expansion
        while (nextTab == nextTable && table == tab &&
                (sc = sizeCtl) < 0) {
            if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
                    sc == rs + MAX_RESIZERS || transferIndex <= 0)
                break;
            // Number of expansion threads plus 1
            if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1)) {
                // The current thread helps migrate elements
                transfer(tab, nextTab);
                break;
            }
        }
        return nextTab;
    }
    return table;
}

When the current bucket element migration is completed, you can help migrate other bucket elements;

Migration element

When the capacity is expanded, the capacity is doubled and some elements are migrated to other buckets.

private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) {
    int n = tab.length, stride;
    if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE)
        stride = MIN_TRANSFER_STRIDE; // subdivide range
    if (nextTab == null) {            // initiating
        // If nextTab is empty, the migration has not started,
        // Just create a new bucket array
        try {
            // The new bucket array is twice as large as the original bucket
            @SuppressWarnings("unchecked")
            Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1];
            nextTab = nt;
        } catch (Throwable ex) {      // try to cope with OOME
            sizeCtl = Integer.MAX_VALUE;
            return;
        }
        nextTable = nextTab;// nextTable represents a reference to the new array
        transferIndex = n;
    }
    // New bucket array size
    int nextn = nextTab.length;
    // Create a node of ForwardingNode type and store the new bucket array in it
    ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);
    boolean advance = true;
    boolean finishing = false; // to ensure sweep before committing nextTab
    for (int i = 0, bound = 0;;) {
        Node<K,V> f; int fh;
        // The whole while loop is just calculating the value of i. the process is too complex to care about
        // The value of i decreases from n-1
        // Where n is the size of the old bucket array, that is, i is reduced from 15 to 1 to migrate elements
        while (advance) {
            int nextIndex, nextBound;
            if (--i >= bound || finishing)
                advance = false;
            else if ((nextIndex = transferIndex) <= 0) {
                i = -1;
                advance = false;
            }
            else if (U.compareAndSwapInt
                    (this, TRANSFERINDEX, nextIndex,
                            nextBound = (nextIndex > stride ?
                                    nextIndex - stride : 0))) {
                bound = nextBound;
                i = nextIndex - 1;
                advance = false;
            }
        }
        if (i < 0 || i >= n || i + n >= nextn) {
            // If a traversal is completed
            // That is, the elements in all buckets of the entire map have been migrated
            int sc;
            if (finishing) {
                // If all migration is completed, replace the old bucket array and set the global variable nextTable = null;
                // And set the next capacity expansion threshold to 0.75 times the capacity of the new bucket array
                nextTable = null;
                table = nextTab;
                sizeCtl = (n << 1) - (n >>> 1);
                return;
            }
            if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
                // After the current thread is expanded, the number of expanded threads is - 1
                if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
                    // After expansion, both sides must be equal
                    return;
                // Set finishing to true
                // Only when finishing is true can the above if condition be reached
                finishing = advance = true;
                // i is reassigned to n
                // In this way, the bucket array will be traversed again to see if the migration is complete
                // That is, the second traversal will go to the following condition (fh = f.hash) == MOVED
                i = n; // recheck before commit
            }
        }
        else if ((f = tabAt(tab, i)) == null)
            // If there is no data in the bucket, directly put it into ForwardingNode to mark that the bucket has been migrated
            advance = casTabAt(tab, i, null, fwd);
        else if ((fh = f.hash) == MOVED)
            // If the hash value of the first element in the bucket is MOVED
            // Indicates that it is a ForwardingNode node
            // That is, the bucket has been migrated
            advance = true; // already processed
        else {
            // Lock the bucket and migrate the elements
            synchronized (f) {
                // Judge again whether the first element of the current bucket has been modified
                // That is, other threads may migrate elements first
                if (tabAt(tab, i) == f) {
                    // Divide a linked list into two linked lists
                    // The rule is to and operate the hash of each element in the bucket and the bucket size n
                    // Those equal to 0 are placed in the low linked list, and those not equal to 0 are placed in the high linked list
                    // The position of the low-level linked list transferred to the new bucket remains the same as that of the old bucket
                    // The high-level linked list is moved to the new bucket, and the position is exactly its position in the old bucket plus n
                    // This is why the capacity is doubling during capacity expansion
                    Node<K,V> ln, hn;
                    if (fh >= 0) {
                        // The hash value of the first element is greater than or equal to 0
                        // It indicates that the elements in the bucket are stored in the form of linked list
                        // This is basically similar to the HashMap migration algorithm
                        // The only difference is that it takes one more step to find lastRun
                        // lastRun here is a sub linked list that needs no special processing after extracting the linked list
                        // For example, the hash value and bucket size n of all elements and the value after operation are 0 0 4 0 0 0 respectively
                        // Then the elements corresponding to the last three zeros must still be in the same bucket
                        // At this time, lastRun corresponds to the penultimate node
                        int runBit = fh & n;
                        Node<K,V> lastRun = f;
                        for (Node<K,V> p = f.next; p != null; p = p.next) {
                            int b = p.hash & n;
                            if (b != runBit) {
                                runBit = b;
                                lastRun = p;
                            }
                        }
                        // See if the last few elements belong to the low-level linked list or the high-level linked list
                        if (runBit == 0) {
                            ln = lastRun;
                            hn = null;
                        }
                        else {
                            hn = lastRun;
                            ln = null;
                        }
                        // Traverse the linked list and put the hash-n with 0 in the low-order linked list
                        // Those that are not 0 are placed in the high-order linked list
                        for (Node<K,V> p = f; p != lastRun; p = p.next) {
                            int ph = p.hash; K pk = p.key; V pv = p.val;
                            if ((ph & n) == 0)
                                ln = new Node<K,V>(ph, pk, pv, ln);
                            else
                                hn = new Node<K,V>(ph, pk, pv, hn);
                        }
                        // The position of the low linked list remains unchanged
                        setTabAt(nextTab, i, ln);
                        // The position of the high-order linked list is the original position plus n
                        setTabAt(nextTab, i + n, hn);
                        // Mark current bucket migrated
                        setTabAt(tab, i, fwd);
                        // If advance is true, return to the above for --i operation
                        advance = true;
                    }
                    else if (f instanceof TreeBin) {
                        // If the first element is a tree node
                        // The same is true. It divides into two trees
                        // It is also placed in the low tree according to the hash & n of 0
                        // Not 0 placed in high-level tree
                        TreeBin<K,V> t = (TreeBin<K,V>)f;
                        TreeNode<K,V> lo = null, loTail = null;
                        TreeNode<K,V> hi = null, hiTail = null;
                        int lc = 0, hc = 0;
                        // Traverse the whole tree and divide it into two trees according to whether hash & n is 0
                        for (Node<K,V> e = t.first; e != null; e = e.next) {
                            int h = e.hash;
                            TreeNode<K,V> p = new TreeNode<K,V>
                                    (h, e.key, e.val, null, null);
                            if ((h & n) == 0) {
                                if ((p.prev = loTail) == null)
                                    lo = p;
                                else
                                    loTail.next = p;
                                loTail = p;
                                ++lc;
                            }
                            else {
                                if ((p.prev = hiTail) == null)
                                    hi = p;
                                else
                                    hiTail.next = p;
                                hiTail = p;
                                ++hc;
                            }
                        }
                        // If the number of elements in the differentiated tree is less than or equal to 6, it will degenerate into a linked list
                        ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) :
                                (hc != 0) ? new TreeBin<K,V>(lo) : t;
                        hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) :
                                (lc != 0) ? new TreeBin<K,V>(hi) : t;
                        // The position of the low tree remains unchanged
                        setTabAt(nextTab, i, ln);
                        // The position of the high tree is the original position plus n
                        setTabAt(nextTab, i + n, hn);
                        // Mark that the bucket has been migrated
                        setTabAt(tab, i, fwd);
                        // If advance is true, return to the above for --i operation
                        advance = true;
                    }
                }
            }
        }
    }
}

(1) The size of the new bucket array is twice that of the old bucket array;

(2) The migration of elements starts with the next bucket;

(3) A ForwardingNode type element is placed in the bucket after migration to mark the completion of the bucket migration;

(4) During migration, divide the elements in the bucket into two linked lists or trees according to whether hash & n is equal to 0;

(5) The low linked list (tree) is stored in the original position;

(6) The high-order linked list (tree) is stored in the original position plus n;

(7) When migrating elements, the current bucket will be locked, which is also the idea of segmented locking;

Delete element

Deleting an element is the same as adding an element. First find the bucket where the element is located, and then lock the whole bucket with the idea of segmented lock.

public V remove(Object key) {
    // Call the replace node method
    return replaceNode(key, null, null);
}

final V replaceNode(Object key, V value, Object cv) {
    // Compute hash
    int hash = spread(key.hashCode());
    // spin
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh;
        if (tab == null || (n = tab.length) == 0 ||
                (f = tabAt(tab, i = (n - 1) & hash)) == null)
            // If the bucket where the target key is located does not exist, jump out of the loop and return null
            break;
        else if ((fh = f.hash) == MOVED)
            // If capacity expansion is in progress, assist in capacity expansion
            tab = helpTransfer(tab, f);
        else {
            V oldVal = null;
            // Has the tag been processed
            boolean validated = false;
            synchronized (f) {
                // Verify again whether the first element of the current bucket has been modified
                if (tabAt(tab, i) == f) {
                    if (fh >= 0) {
                        // FH > = 0 indicates that it is a linked list node
                        validated = true;
                        // Traverse the linked list to find the target node
                        for (Node<K,V> e = f, pred = null;;) {
                            K ek;
                            if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                            (ek != null && key.equals(ek)))) {
                                // Target node found
                                V ev = e.val;
                                // Check whether the old value of the target node is equal to cv
                                if (cv == null || cv == ev ||
                                        (ev != null && cv.equals(ev))) {
                                    oldVal = ev;
                                    if (value != null)
                                        // If value is not empty, replace the old value
                                        e.val = value;
                                    else if (pred != null)
                                        // If the front node is not empty
                                        // Delete current node
                                        pred.next = e.next;
                                    else
                                        // If the front node is empty
                                        // Description is the first element in the bucket. Delete it
                                        setTabAt(tab, i, e.next);
                                }
                                break;
                            }
                            pred = e;
                            // Traverse to the end of the linked list, and no element is found. Jump out of the loop
                            if ((e = e.next) == null)
                                break;
                        }
                    }
                    else if (f instanceof TreeBin) {
                        // If it is a tree node
                        validated = true;
                        TreeBin<K,V> t = (TreeBin<K,V>)f;
                        TreeNode<K,V> r, p;
                        // Traversing the tree found the target node
                        if ((r = t.root) != null &&
                                (p = r.findTreeNode(hash, key, null)) != null) {
                            V pv = p.val;
                            // Check whether the old value of the target node is equal to cv
                            if (cv == null || cv == pv ||
                                    (pv != null && cv.equals(pv))) {
                                oldVal = pv;
                                if (value != null)
                                    // If value is not empty, replace the old value
                                    p.val = value;
                                else if (t.removeTreeNode(p))
                                    // If value is empty, the element is deleted
                                    // If the number of elements in the deleted tree is small, it will degenerate into a linked list
                                    // t.removeTreeNode(p) this method returns true, indicating that the number of elements in the tree is small after the node is deleted
                                    setTabAt(tab, i, untreeify(t.first));
                            }
                        }
                    }
                }
            }
            // If it is processed, it returns whether the element is found or not
            if (validated) {
                // If an element is found, its old value is returned
                if (oldVal != null) {
                    // If the value to be replaced is empty, the number of elements is reduced by 1
                    if (value == null)
                        addCount(-1L, -1);
                    return oldVal;
                }
                break;
            }
        }
    }
    // Element not found returned null
    return null;
}

(1) Calculate hash;

(2) If the bucket does not exist, it indicates that the target element is not found and returns;

(3) If capacity expansion is in progress, assist in deleting after capacity expansion is completed;

(4) If it is stored in the form of a linked list, traverse the entire linked list to find elements, and then delete them after finding them;

(5) If it is stored in the form of a tree, traverse the tree to find elements, and delete them after finding them;

(6) If it is stored in the form of a tree, after deleting elements, the tree is smaller and degenerates into a linked list;

(7) If the element is indeed deleted, the number of the entire map element is reduced by 1 and the old value is returned;

(8) If the element is not deleted, null is returned;

Get element

Get elements. Get elements in different ways according to the first element of the bucket where the target key is located. The key point is to rewrite the find() method.

public V get(Object key) {
    Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
    // Compute hash
    int h = spread(key.hashCode());
    // If the bucket in which the element is located exists and there are elements in it
    if ((tab = table) != null && (n = tab.length) > 0 &&
            (e = tabAt(tab, (n - 1) & h)) != null) {
        // If the first element is the element to be found, return it directly
        if ((eh = e.hash) == h) {
            if ((ek = e.key) == key || (ek != null && key.equals(ek)))
                return e.val;
        }
        else if (eh < 0)
            // If the hash is less than 0, it indicates that it is a tree or expanding capacity
            // Find is used to find elements. The search method of find has different implementation methods according to different subclasses of Node
            return (p = e.find(h, key)) != null ? p.val : null;

        // Traverse the entire linked list to find elements
        while ((e = e.next) != null) {
            if (e.hash == h &&
                    ((ek = e.key) == key || (ek != null && key.equals(ek))))
                return e.val;
        }
    }
    return null;
}

(1) hash to the bucket where the element is located;

(2) If the first element in the bucket is the element to be found, it is returned directly;

(3) If it is a tree or an element is being migrated, call the find() method of each Node subclass to find the element;

(4) If it is a linked list, traverse the whole linked list to find elements;

(5) The acquired element is not locked;

Get the number of elements

The storage of the number of elements also adopts the idea of segmentation. When obtaining the number of elements, all segments need to be added up.

public int size() {
    // Call sumCount() to calculate the number of elements
    long n = sumCount();
    return ((n < 0L) ? 0 :
            (n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE :
                    (int)n);
}

final long sumCount() {
    // Calculate the sum of all counter cell segments and baseCount
    CounterCell[] as = counterCells; CounterCell a;
    long sum = baseCount;
    if (as != null) {
        for (int i = 0; i < as.length; ++i) {
            if ((a = as[i]) != null)
                sum += a.value;
        }
    }
    return sum;
}

(1) The number of elements exists in different segments according to different threads;

(2) Calculate the sum of all counter cell segments and baseCount;

(3) The number of elements obtained is not locked;

summary

(1) ConcurrentHashMap is a thread safe version of HashMap;

(2) ConcurrentHashMap uses * * (array + linked list + red black tree) * * structure to store elements;

(3) Concurrent HashMap is much more efficient than HashTable with the same thread safety;

(4) The locks used by concurrent HashMap include synchronized, CAS, spin lock, segmented lock, volatile, etc;

(5) There are no threshold and loadFactor fields in the ConcurrentHashMap, but sizeCtl is used to control them;

(6) sizeCtl = -1, indicating that initialization is in progress;

(7) sizeCtl = 0, the default value, which means that the default capacity is used in subsequent real initialization;

(8) Sizectl > 0, the incoming capacity is stored before initialization, and the next capacity expansion threshold is stored after initialization or capacity expansion;

(9) Sizectl = (resizestamp < < 16) + (1 + nthreads), indicating that capacity expansion is in progress. The high-order storage capacity expansion postmark, and the number of low-order storage capacity expansion threads plus 1;

(10) If capacity expansion is in progress during the update operation, the current thread assists in capacity expansion;

(11) The update operation will use synchronized to lock the first element of the current bucket, which is the idea of segmented lock;

(12) The entire capacity expansion process is carried out through the CAS control sizeCtl field, which is very critical;

(13) A ForwardingNode node will be placed in the bucket of migrated elements to identify that the bucket has been migrated;

(14) The storage of the number of elements also adopts the segmentation idea, which is similar to the implementation of LongAdder;

(15) Updating the number of elements will hash different threads to different segments to reduce resource contention;

(16) When updating the number of elements, if multiple threads update a segment at the same time, the counter cell will be expanded;

(17) The number of elements is obtained by adding all segments (including baseCount and CounterCell);

(18) The query operation will not be locked, so the ConcurrentHashMap is not strongly consistent;

(19) Elements with null key or value cannot be stored in ConcurrentHashMap;

Article reference
Source code analysis of Luffy ConcurrentHashMap_ 01 member attribute, internal class and construction method analysis
Source code analysis of ConcurrentHashMap (jdk1.8)
Source code analysis of ConcurrentHashMap (JDK8 version)
[14] Concurrent HashMap source code analysis of Java collection (1.8)

Tags: Java Back-end source code

Posted on Mon, 22 Nov 2021 00:15:35 -0500 by argh2xxx