Exploration of ConcurrentHashMap Principle for Java Concurrent Programming Notes

In a multithreaded environment, data loss occurs when HashMap is used for put operations. To avoid the potential for this bug, it is strongly recommended that ConcurrentHashMap be used instead of HashMap.

HashTable is a thread-safe class that uses synchronized to lock the entire Hash table for thread security, meaning that each time the entire table is locked, the thread is exclusive, which is equivalent to competing for a lock when all threads read and write, resulting in very inefficient results.ConcurrentHashMap can read data without locking, and its internal structure allows it to keep the granularity of locks as small as possible during write operations, allowing multiple modifications to occur concurrently. The key is to use lock separation technology.It uses multiple locks to control modifications to different parts of the hash table.ConcurrentHashMap internally uses segments to represent these different parts, each of which is actually a small Hashtable with its own lock.As long as multiple modifications occur on different segments, they can occur concurrently.

 

Implementing CountcurrentHashMap in JDK1.7

ConcurrentHashMap uses a structure called Segment internally to improve its concurrency ability. A Segment is actually a Hash Table-like structure. Segment maintains an array of chained lists internally. Let's take a look at the internal structure of ConcurrentHashMap using the following figure.From the structure below, we can see that the process of locating an element by ConcurrentHashMap requires two Hash operations, the first Hash to Segment and the second Hash to the head of the chain where the element is located. Therefore, the side effect of this structure is that Hash is a more Hash process than normal HashMap.Long, but the benefit is that when writing, you can only operate on the SEGMENT in which the element is located, without affecting other SEGMENTS. Ideally, ConcurrentHashMap can support writing operations up to and including the number of SEGMENTS (which happens to be distributed very evenly across allSegments), so ConcurrentHashMap's concurrency can be greatly improved with this architecture.Let's take a look at the internal structure details of ConcurrentHashMap using the following image:

It is not difficult to see that ConcurrentHashMap uses a second hash, where the first hash maps the key to the corresponding segment and the second hash maps to the buckets of the segment.

The main reason for using a secondary hash is to construct a detachable lock so that modifications to the map do not lock the entire container and improve concurrency.Of course, nothing is absolutely perfect. The problem with second hash is that the entire hash process is longer than hashmap's single hash, so don't use concurrent Hashmap if it's not a concurrent situation.

ConcurrentHashMap prior to JAVA7 mainly used a locking mechanism. When operating on a Segment, the Segment was locked and non-query operations were not allowed. CAS unlock algorithm was used after JAVA8. This optimistic operation judges before completion, performs only when the expected results are met, and provides a good optimization for concurrent operations.

Let's start with an analysis of the principles of the ConcurrentHashMap for JDK1.7

1.JDK1.7 ConcurrentHashMap

As shown above, it is composed of an array of Segment s, HashEntry, and, like HashMap, is still an array plus a list.

Let's take a look at the member variables in Segment from the following source code:


static final class Segment<K,V> extends ReentrantLock implements Serializable {
    transient volatile int count;    //Number of elements in a Segment
    transient int modCount;          //Number of operations (such as put or remove) that affect the size of the table
    transient int threshold;        //Threshold value, where the number of elements in a Segment exceeds this value, the Segment will be expanded
    final float loadFactor;         //Load factor, used to determine threshold
    transient volatile HashEntry<K,V>[] table;    //An array of linked lists in which each element represents the head of a linked list
}

Next, take a look at the composition in HashEntry, with the following source code:


/**
     * ConcurrentHashMap List Entry.Note that this will not be exported as a user-visible Map.Entry.
     */
    static final class HashEntry<K,V> {
        final int hash;
        final K key;
        volatile V value;
        volatile HashEntry<K,V> next;

        HashEntry(int hash, K key, V value, HashEntry<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        /**
         * Settings with volatile Writing Semantic next Field.
        final void setNext(HashEntry<K,V> n) {
            UNSAFE.putOrderedObject(this, nextOffset, n);
        }

        // Unsafe mechanics
        static final sun.misc.Unsafe UNSAFE;
     //Next HashEntry Offset
        static final long nextOffset;
        static {
            try {
                UNSAFE = sun.misc.Unsafe.getUnsafe();
                Class k = HashEntry.class;
          //Gets the offset of HashEntry next in memory
                nextOffset = UNSAFE.objectFieldOffset
                    (k.getDeclaredField("next"));
            } catch (Exception e) {
                throw new Error(e);
            }
        }
    }

Much like HashMap, the only difference is that the core data, such as value, and the list of chains are volatile-modified to ensure visibility when captured.

In principle, ConcurrentHashMap uses segmented locking, where Segments inherit from ReentrantLock.Unlike HashTable s, which require synchronization for either put or get operations, ConcurrentHashMap theoretically supports concurrent threads at CurrencyLevel (the number of Segmented arrays).Whenever a thread accesses a Segment using a lock, no other Segments are affected.

Next, let's move on to the member variables and constructors of ConcurrentHashMap in JDK1.7, with the following source code:


// Default initial capacity
static final int DEFAULT_INITIAL_CAPACITY = 16;
// Default Load Factor
static final float DEFAULT_LOAD_FACTOR = 0.75f;
// Default segment level
static final int DEFAULT_CONCURRENCY_LEVEL = 16;
// Maximum capacity
static final int MAXIMUM_CAPACITY = 1 << 30;
// segment minimum capacity
static final int MIN_SEGMENT_TABLE_CAPACITY = 2;
// Maximum capacity of a segment
static final int MAX_SEGMENTS = 1 << 16;
// Number of retries before lock
static final int RETRIES_BEFORE_LOCK = 2;

public ConcurrentHashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR, DEFAULT_CONCURRENCY_LEVEL);
}

public ConcurrentHashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR, DEFAULT_CONCURRENCY_LEVEL);
}

public ConcurrentHashMap(int initialCapacity, float loadFactor) {
        this(initialCapacity, loadFactor, DEFAULT_CONCURRENCY_LEVEL);
}

public ConcurrentHashMap(int initialCapacity,
                             float loadFactor, int concurrencyLevel) {
        if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)
            throw new IllegalArgumentException();
        if (concurrencyLevel > MAX_SEGMENTS)
            concurrencyLevel = MAX_SEGMENTS;
        // Find the best matching parameter of two sizes
        int sshift = 0;
        // The length of the segmented array is calculated by concurrentLevel, and the length of the segmented array is the nth power of 2.
        // The default concurrencyLevel = 16, so ssize is also 16 by default, when sshift = 4
        // sshift is equivalent to the number of times ssize moves from 1 to left
        int ssize = 1;
        while (ssize < concurrencyLevel) {
            ++sshift; 
            ssize <<= 1;
        }
        // Segment offset, where segmentShift = 28 by default
        this.segmentShift = 32 - sshift;
        // Mask for hash algorithm, segmentMask = 15 by default
        this.segmentMask = ssize - 1;

        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;

        int c = initialCapacity / ssize;
        if (c * ssize < initialCapacity)
            ++c;
        int cap = MIN_SEGMENT_TABLE_CAPACITY;
        while (cap < c)
            cap <<= 1;
        // create segments and segments[0]
        Segment<K,V> s0 =
            new Segment<K,V>(loadFactor, (int)(cap * loadFactor),
                             (HashEntry<K,V>[])new HashEntry[cap]);
        // Create an egment array of ssize length
        Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
        UNSAFE.putOrderedObject(ss, SBASE, s0); // ordered write of segments[0]
        this.segments = ss;
 }

Where concurrencyLevel Once specified, it cannot be changed. Subsequently, ConcurrentHashMap will not increase the number of Segments if the number of elements in ConcurrentHashMap increases causing ConrruentHashMap to need to be expanded. ConcurrentHashMap will only increase the size of the chain table array in the Segment. This benefit is that the expansion process does not require rehash for the entire ConcurrentHashMap, but only rehash the elements in the Segment once.That's it.

The whole ConcurrentHashMap initialization method is still very simple, first it is based on concurrencyLevel to new out Segments, where the number of Segments is not greater than the maximum 2 exponents of concurrencyLevel, that is, the number of Segments is always 2 exponents, which makes it easy to use shift operation to hash and speed up the hash process.The next step is to determine the size of the Segments based on intialCapacity, which is also an exponent of 2 for each Segment, in order to speed up the hash process.

Notice the two variables segmentShift and segmentMask, which will play a big role later. Assuming the constructor determines that the number of Segment s is the nth power of 2, segmentShift is 32 minus n, and segmentMask is the nth power of 2 minus one.

Next, let's look at the core methods of ConcurrentHashMap, put and get, in JDK1.7.


public V put(K key, V value) {
        Segment<K,V> s;
        if (value == null)
            throw new NullPointerException();
     //(1)
        int hash = hash(key);
     //(2)
        int j = (hash >>> segmentShift) & segmentMask;
        if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
             (segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
            s = ensureSegment(j);
     //(3)
        return s.put(key, hash, value, false);
}

Code (1) calculates the hash value of the key

Code (2) which Segment to locate according to hash value, segmentShift, segmentMask.

Code (3) saves key-value pairs to the corresponding segment s.

You can see that the Segment is first located by key, then put in the corresponding Segment.The source code for putting in Segments is as follows:


 final V put(K key, int hash, V value, boolean onlyIfAbsent) {
       //(1)
            HashEntry<K,V> node = tryLock() ? null :
                scanAndLockForPut(key, hash, value);
            V oldValue;
            try {
          //(2)
                HashEntry<K,V>[] tab = table;
          //(3)
                int index = (tab.length - 1) & hash;
          //(4)
                HashEntry<K,V> first = entryAt(tab, index);
          //(5)
                for (HashEntry<K,V> e = first;;) {
                    if (e != null) {
                        K k;
                        if ((k = e.key) == key ||
                            (e.hash == hash && key.equals(k))) {
                            oldValue = e.value;
                            if (!onlyIfAbsent) {
                                e.value = value;
                                ++modCount;
                            }
                            break;
                        }
                        e = e.next;
                    }
            //(6)
                    else {
              
                        if (node != null)
                 //(7)
                            node.setNext(first);
                        else  //(8)
                            node = new HashEntry<K,V>(hash, key, value, first);
                        int c = count + 1;
              //(9)
                        if (c > threshold && tab.length < MAXIMUM_CAPACITY)
                            rehash(node);
                        else   //(10)
                            setEntryAt(tab, index, node);
                        ++modCount;
                        count = c;
                        oldValue = null;
                        break;
                    }
                }
            } finally {
         //(11)
                unlock();
            }
            return oldValue;
   }

Although the value in HashEntry is modified with the volatile keyword, concurrent atomicity is not guaranteed, so the put operation still needs to be locked.

Code (1) The first step is to try to acquire the lock, and if the acquisition fails there must be competition from other threads, use the scanAndLockForPut() spin to acquire the lock.

Code (2) Each Segment corresponds to an array of HashEntry[].

Code (3) calculates the subscript of the corresponding HashEntry array. The length of the array in each segment is the N-th power of 2, so after operation, the lower number of bits of hash data is taken.

Code (4) locates a HashEntry node (the header node of the corresponding chain table).

Code (5) traverses the list of chains.

Code (6) If the list of chains is empty (that is, the header is empty)

Code (7) inserts a new node into the list as the chain header.,

Code (8) creates nodes based on key and value and inserts them into the chain table.

Code (9) determines if the number of elements exceeds the threshold or if the length of the array in the segment exceeds MAXIMUM_CAPACITY, rehash expands if the condition is met!

Code (10) Place node in the corresponding position in the array (HashEntry[]) when no expansion is required

Code (11) Finally release the lock.

Overall, the put process is as follows:

  1. Locate the table in the current Segment to HashEntry through the hashcode of the key.
  2. Traverse through the HashEntry to determine if the incoming key is equal to the key currently traversed if it is not empty, and override the old value if it is equal.
  3. Not empty requires a new HashEntry to be added to the Segment, and will first determine if an extension is required.
  4. The lock on the current Segment acquired in code (1) is finally unlocked.

Next let's look at its expansion, rehash source code is as follows:


/**
         * Twice as much capacity as before
         */
        @SuppressWarnings("unchecked")
        private void rehash(HashEntry<K,V> node) {

            HashEntry<K,V>[] oldTable = table;
            int oldCapacity = oldTable.length;
            // Double (move one bit to the left)
            int newCapacity = oldCapacity << 1;
            // Calculate a new threshold
            threshold = (int)(newCapacity * loadFactor);
            // Create a new array
            HashEntry<K,V>[] newTable =
                (HashEntry<K,V>[]) new HashEntry[newCapacity];
            // mask
            int sizeMask = newCapacity - 1;
            // Traversing through old array data
            for (int i = 0; i < oldCapacity ; i++) {
                HashEntry<K,V> e = oldTable[i]; // Header node corresponding to a chain table
                if (e != null) {
                    HashEntry<K,V> next = e.next;
                    // Calculate the subscript of the chain table corresponding to e in the new array
                    int idx = e.hash & sizeMask; 
                    if (next == null)   //  Place directly in (new) array when there is only one node
                        newTable[idx] = e;
                    else { // When the list has more than one node:
                        HashEntry<K,V> lastRun = e; // Use the header node of the list as the end of the new list
                        int lastIdx = idx;
                        for (HashEntry<K,V> last = next;
                             last != null;
                             last = last.next) {
                            // The data in one chain table in the old array does not necessarily belong to the same chain table in the new array, so it needs to be recalculated each time
                            int k = last.hash & sizeMask;
                            if (k != lastIdx) {
                                lastIdx = k;
                                lastRun = last;
                            }
                        }
                        // lastRun (and subsequent elements) is inserted into the array.
                        newTable[lastIdx] = lastRun;
                        // Traverse backwards from the (old list) header node to the last set of headers different from the previous hash value.
                        for (HashEntry<K,V> p = e; p != lastRun; p = p.next) {
                            V v = p.value;
                            int h = p.hash;
                            int k = h & sizeMask;
                            HashEntry<K,V> n = newTable[k];
                            newTable[k] = new HashEntry<K,V>(h, p.key, v, n); // Split Chain List
                        }
                    }
                }
            }
            // New nodes will not be inserted until the old data is added to the new structure (still inserting the header)
            int nodeIndex = node.hash & sizeMask; // add the new node
            node.setNext(newTable[nodeIndex]);
            newTable[nodeIndex] = node;
            table = newTable;
      }

Next, take a look at scanAndLockForPut() spinning to acquire the lock, the source code is as follows:


private HashEntry<K,V> scanAndLockForPut(K key, int hash, V value) {
            HashEntry<K,V> first = entryForHash(this, hash);
            HashEntry<K,V> e = first;
            HashEntry<K,V> node = null;
            int retries = -1; // Negative number when locating nodes
       //(1)
            while (!tryLock()) {
                HashEntry<K,V> f; // First recheck below
                if (retries < 0) {
                    if (e == null) {
                        if (node == null) // Create nodes speculatively
                            node = new HashEntry<K,V>(hash, key, value, null);
                        retries = 0;
                    }
                    else if (key.equals(e.key))
                        retries = 0;
                    else
                        e = e.next;
                }
          //(2)
                else if (++retries > MAX_SCAN_RETRIES) {
                    lock();
                    break;
                }
                else if ((retries & 1) == 0 &&
                         (f = entryForHash(this, hash)) != first) {
                    e = first = f; // Reiterate if Entry changes
                    retries = -1;
                }
            }
            return node;
        }

Scan a node that contains a given key and try to acquire a lock. If it is not found, create and return one.

On return, make sure the lock is held.

Unlike most methods, calls to method equals are filtered: since traversal speed does not matter, we can also help preheat related code and access.

Code (1) attempts to spin to acquire the lock.

Code (2) If the number of retries reaches MAX_SCAN_RETRIES, block lock acquisition instead to ensure success.

 

Next, let's look at the get method in JDK1.7 from the following source code:


  public V get(Object key) {
        Segment<K,V> s; // manually integrate access methods to reduce overhead
        HashEntry<K,V>[] tab;
        int h = hash(key);
        // First calculate the subscript of the segments array (h >>> segmentShift) & segmentMask)
        long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
        if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null &&
            (tab = s.table) != null) { // Find segment by subscript
            // Then (tab.length - 1) & h) gets the subscript for the corresponding HashEntry array
            // Traversing a list of chains
            for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
                     (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE);
                 e != null; e = e.next) {
                K k;

                if ((k = e.key) == key || (e.hash == h && key.equals(k)))
                    return e.value;
            }
        }
        return null;
    }

You can see that get logic is not as complex as the previous methods:

You just need to locate Key to a specific Segment after Hash, and then to a specific element after Hash.

Since the value attribute in HashEntry is modified with the volatile keyword to ensure memory visibility, it is the latest value each time it is fetched.

ConcurrentHashMap's get method is very efficient because there is no need to lock the entire process.

 

Next, look at the remove method, which comes from the following source code:


public V remove(Object key) {
        // Calculate hash value
        int hash = hash(key);
        // Find corresponding segment based on hash value
        Segment<K,V> s = segmentForHash(hash);
        // Call Segment.remove function
        return s == null ? null : s.remove(key, hash, null);
}
public boolean remove(Object key, Object value) {
        int hash = hash(key);
        Segment<K,V> s;
        return value != null && (s = segmentForHash(hash)) != null &&
            s.remove(key, hash, value) != null;
}

The source code for the Segment.remove function is as follows:


 /**
         * Remove; match on key only if value null, else match both.
         */
        final V remove(Object key, int hash, Object value) {
            if (!tryLock())
                scanAndLock(key, hash);
            V oldValue = null;
            try {
                HashEntry<K,V>[] tab = table;
                // Calculate HashEntry array subscripts
                int index = (tab.length - 1) & hash;
                // Find Head Node
                HashEntry<K,V> e = entryAt(tab, index);
                HashEntry<K,V> pred = null;
                while (e != null) {
                    K k;
                    HashEntry<K,V> next = e.next;
                    if ((k = e.key) == key ||
                        (e.hash == hash && key.equals(k))) { // Find corresponding node
                        V v = e.value;
                        if (value == null || value == v || value.equals(v)) {
                            if (pred == null)
                                // When pred is empty, it means that the header node of the list is to be removed and the list is reset
                                setEntryAt(tab, index, next);
                            else
                                pred.setNext(next);
                            ++modCount;
                            --count;
                            // Record old value
                            oldValue = v;
                        }
                        break;
                    }
                    pred = e;
                    e = next;
                }
            } finally {
                unlock();
            }
            return oldValue;
        }

2. Principle analysis of ConcurrentHashMap in JDK1.8

1.7 has solved the concurrency problem and can support so many concurrencies of N Segment s, but HashMap still has problems in version 1.7.So what's the problem?

The obvious reason is that queries are too inefficient to traverse a list of chains.

Therefore, 1.8 made some data structure adjustments.In JAVA8, it abandoned the concept of Segment and enabled a new way to implement it using the CAS algorithm.The bottom level is still thought of by the way of "Array"+Chain List+Red-Black Tree, but in order to achieve concurrency, many auxiliary classes have been added, such as TreeBin, Traverser and other object internal classes.

How to make the state of objects "visible" to threads consistent across multiple threads: ConcurrentHashMap is implemented using the happens-before rule.Happens-beforerule (extracted from JAVA concurrent programming):

  • Procedural Ordering Rule: Each action A in a thread happens-before every action B in that thread, where all action B can appear after A in the program.
  • Monitor Lock Rule: Unlock a monitor lock happens-before Every subsequent lock on the same monitor.
  • Volatile variable rule: Write operations happens-before on a volatile field for each subsequent read and write operation on the same field.
  • Thread Start Rule: In a thread, the call to Thread.start happens-before the action of each startup thread.
  • Thread Termination Rule: Any action in a thread happens-before detects that the thread has terminated, or has successfully returned from a Thread.join call, or that Thread.isAlive has returned false.
  • Interrupt rule: One thread calls the interrupt happens-before of another thread to find an interrupt on the interrupted thread.
  • Termination rule: The end of a constructor for an object happens-before begins at the end of the finalizer for that object.
  • Transitivity: If A happens-before is in B and B happens-before is in C, A happens-before is in C:

Assuming there are two statements in the code, the order of which is statement 1 precedes statement 2; then, as long as there is no dependency between the statements, disrupting their order will have no effect on the final result, when the CPU is truly left to execute, their order of execution may be statement 2 followed by statement 1.

First, look at the composition of the underlying structure (the following image is from Baidu, lazy to draw):

You can see that JDK1.8ConcurrentHashMap is very similar to JDK1.8 HashMap.It discards the original Segment Segmented Lock and uses CAS + synchronized to ensure concurrency security.


//Key value input.This class will never be exported as a user-variable Map.Entry (that is, a setValue-enabled one; see MapEntry below).
//However, it can be used for read-only traversal used in bulk tasks.The subclasses of nodes with negative hash fields are special and contain null keys and val ues (but will never be exported).Otherwise, keys and Vals will never be empty.
 static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        volatile V val;
        volatile Node<K,V> next;

        Node(int hash, K key, V val, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.val = val;
            this.next = next;
        }

        public final K getKey()       { return key; }
        public final V getValue()     { return val; }
        public final int hashCode()   { return key.hashCode() ^ val.hashCode(); }
        public final String toString(){ return key + "=" + val; }
        public final V setValue(V value) {
            throw new UnsupportedOperationException();
        }

        public final boolean equals(Object o) {
            Object k, v, u; Map.Entry<?,?> e;
            return ((o instanceof Map.Entry) &&
                    (k = (e = (Map.Entry<?,?>)o).getKey()) != null &&
                    (v = e.getValue()) != null &&
                    (k == key || k.equals(key)) &&
                    (v == (u = val) || v.equals(u)));
        }

        /**
         * Virtualization support for map.get(); overridden in subclasses.
         */
        Node<K,V> find(int h, Object k) {
            Node<K,V> e = this;
            if (k != null) {
                do {
                    K ek;
                    if (e.hash == h &&
                        ((ek = e.key) == k || (ek != null && k.equals(ek))))
                        return e;
                } while ((e = e.next) != null);
            }
            return null;
        }
    }

Change HashEntry, which stores data in 1.7, to Node, but for the same purpose.

Valnext is decorated with volatile to ensure visibility.

Next, look at the source code for the put method, which is as follows:


public V put(K key, V value) {
        return putVal(key, value, false);
   }

    /** Implementation for put and putIfAbsent */
    final V putVal(K key, V value, boolean onlyIfAbsent) {
     //(1)
        if (key == null || value == null) throw new NullPointerException();
     //(2)
        int hash = spread(key.hashCode());
        int binCount = 0;
     //(3)
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
       //(4)
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
       //(5)
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
       //(6)
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
          //(7)
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
              //(8)
                        if (fh >= 0) {
                            binCount = 1;
                 //(9)
                            for (Node<K,V> e = f;; ++binCount) {
                                K ek;
                   //(10)
                                if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                     (ek != null && key.equals(ek)))) {
                                    oldVal = e.val;
                                    if (!onlyIfAbsent)
                                        e.val = value;
                                    break;
                                }
                                Node<K,V> pred = e;
                   //(11) If the last node is traversed, it proves that the new node needs to be inserted, then it is inserted at the end of the list.
                                if ((e = e.next) == null) {
                                    pred.next = new Node<K,V>(hash, key,
                                                              value, null);
                                    break;
                                }
                            }
                        }
              //(12)
                        else if (f instanceof TreeBin) {
                            Node<K,V> p;
                            binCount = 2;
                            if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                           value)) != null) {
                                oldVal = p.val;
                                if (!onlyIfAbsent)
                                    p.val = value;
                            }
                        }
                    }
                }
                if (binCount != 0) {
            //(13)
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
            }
        }
     //Code (14)
        addCount(1L, binCount);
        return null;
    }

Code (1) throw an exception if it is empty

Code (2) Calculate hash value

Code (3)

Code (4) determines whether initialization is required.

Code (5) f is the Node located for the current key, and if NULL indicates that the current location can write data, CAS attempts to write, failures guarantee spin success.

Code (6) Expansion is required if hashcode == MOVED == -1 at the current location.

Code (7) If none of them is satisfied, write data using a synchronized lock.Node Lock The node here can be interpreted as the head node of a chain table with the same hash value

Code (8) fh 0 indicates that this node is a chain list node and not a tree node.

Code (9) Traverses through all the nodes in the list here

Code (10) Modify the value of the corresponding node if the hash and key values are the same

Code (11) If you traverse to the last node, it proves that the new node needs to be inserted and then inserts it at the end of the list

Code (12) If this node is a tree node, insert the value as a tree

Code (13) If the chain length has reached a critical value of 8, the chain table needs to be converted to a tree structure.If the number is greater than TREEIFY_THRESHOLD, it will be converted to a red-black tree.

Code (14) Number of elements + 1 for the current ConcurrentHashMap

 

Next, let's look at the get method source for ConcurrentHashMap in JDK1.8, which is as follows:


// GET method (JAVA8)
public V get(Object key) {  
    Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;  
    //Calculate hash value  
    int h = spread(key.hashCode());  
    //Determine Node Location Based on hash Value  
    if ((tab = table) != null && (n = tab.length) > 0 &&  
        (e = tabAt(tab, (n - 1) & h)) != null) {  
        //If the node key searched is the same as the incoming key and is not null, return to the node directly    
        if ((eh = e.hash) == h) {  
            if ((ek = e.key) == key || (ek != null && key.equals(ek)))  
                return e.val;  
        }  
        //If eh <0, this node will look directly in the tree  
        else if (eh < 0)  
            return (p = e.find(h, key)) != null ? p.val : null;  
         //Otherwise, traverse the list to find the corresponding value and return  
        while ((e = e.next) != null) {  
            if (e.hash == h &&  
                ((ek = e.key) == key || (ek != null && key.equals(ek))))  
                return e.val;  
        }  
    }  
    return null;  
}  

Next, look at the remove method source for ConcurrentHashMap in JDK1.8, which is as follows:


// REMOVE OR REPLACE method (JAVA8)
 final V replaceNode(Object key, V value, Object cv) {
    int hash = spread(key.hashCode());
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh;
        // Array is not empty, length is not zero, specified hash code value is 0
        if (tab == null || (n = tab.length) == 0 ||
            (f = tabAt(tab, i = (n - 1) & hash)) == null)
            break;
        // Is a forwardNode
        else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);
        else {
            V oldVal = null;
            boolean validated = false;
            synchronized (f) {
                if (tabAt(tab, i) == f) {
                    if (fh >= 0) {
                        validated = true;
                        // Circular Search
                        for (Node<K,V> e = f, pred = null;;) {
                            K ek;
                            // equal same take out
                            if (e.hash == hash &&
                                ((ek = e.key) == key ||
                                 (ek != null && key.equals(ek)))) {
                                V ev = e.val;
                                 // Value is null or equal to the value found  
                                if (cv == null || cv == ev ||
                                    (ev != null && cv.equals(ev))) {
                                    oldVal = ev;
                                    if (value != null)
                                        e.val = value;
                                    else if (pred != null)
                                        pred.next = e.next;
                                    else
                                        setTabAt(tab, i, e.next);
                                }
                                break;
                            }
                            pred = e;
                            if ((e = e.next) == null)
                                break;
                        }
                    }
                    // Efficient Find/Delete for Trees, Red and Black Trees
                    else if (f instanceof TreeBin) {
                        validated = true;
                        TreeBin<K,V> t = (TreeBin<K,V>)f;
                        TreeNode<K,V> r, p;
                        if ((r = t.root) != null &&
                            (p = r.findTreeNode(hash, key, null)) != null) {
                            V pv = p.val;
                            if (cv == null || cv == pv ||
                                (pv != null && cv.equals(pv))) {
                                oldVal = pv;
                                if (value != null)
                                    p.val = value;
                                else if (t.removeTreeNode(p))
                                    setTabAt(tab, i, untreeify(t.first));
                            }
                        }
                    }
                }
            }
            if (validated) {
                if (oldVal != null) {
                    if (value == null)
                        addCount(-1L, -1);
                    return oldVal;
                }
                break;
            }
        }
    }
    return null;
}

We can see that JDK1.8 and JDK1.7 have changed the implementation of ConcurrentHashMap. I prefer the CAS unlock mechanism. If simply looking at the code comments above is obviously not enough to understand the implementation of JAVA 8's ConcurrentHashMap, I will only provide ideas for source reading, cas, volatile, final Wait for the note to be explained, so if you are really interested in writing programs, interruptions, step by step to see the implementation of this code.

1.8 has made major changes to the data structure of 1.7, ensuring query efficiency (O(logn) with red and black trees, or even removing ReentrantLock to synchronized, so that you can see that synchronized optimization is in place in the new version of JDK.

 

Believe that up to this point, understand the above, face interviews, and solve the problems. Below are the online interview questions, as follows:

(1) Do you know how HashMap works?Do you know how the get() method of HashMap works?

HashMap is based on the principle of hashing. We use put(key, value) to store objects in HashMap and get(key) to get objects from HashMap.When we pass keys and values to the put() method, we first call the hashCode() method on the key, and the hashCode returned is used to find the bucket location to store the Entry object.

(2) Do you know how ConcurrentHashMap works?Do you know how ConcurrentHashMap differs between JAVA8 and JAVA7?

ConcurrentHashMap uses a structure called Segment internally to improve its concurrency ability. A Segment is actually a Hash Table-like structure. Segment maintains an array of chained lists internally. Let's take a look at the internal structure of ConcurrentHashMap using the following figure.From the structure below, we can see that the process of locating an element by ConcurrentHashMap requires two Hash operations, the first Hash to Segment and the second Hash to the head of the chain where the element is located. Therefore, the side effect of this structure is that Hash is a more Hash process than normal HashMap.Long, but the benefit is that when writing, you can only operate on the SEGMENT in which the element is located, without affecting other SEGMENTS. Ideally, ConcurrentHashMap can support writing operations up to and including the number of SEGMENTS (which happens to be distributed very evenly across allSegments), so ConcurrentHashMap's concurrency can be greatly improved with this architecture.

ConcurrentHashMap prior to JAVA7 mainly used a locking mechanism. When operating on a Segment, the Segment was locked and non-query operations were not allowed. CAS unlock algorithm was used after JAVA8. This optimistic operation judges before completion, performs only when the expected results are met, and provides a good optimization for concurrent operations.

(3) What happens when two objects have the same hashcode?

Because hashcodes are the same, their bucket s are in the same position and a collision occurs.Because Map uses LinkedList to store objects, this Entry (Map.Entry object with key-value pairs) is stored in the LinkedList.(When a key-value pair is added to a Map, the location where the key-value pair (that is, the Entry object) is stored is determined by the hashCode() return value of its key.When the hashCode() returns the same value for the keys of two Entry objects, it is up to the keys to eqauls() to compare values to determine whether they will override (return true) or produce an Entry chain (return false). If you can explain the introduction of JDK1.8 red and black trees, the interviewer may be impressed.

(4) If the hashcode s of the two keys are the same, how do you get the value object?

When we call the get() method, HashMap uses the hashcode of the key object to find the bucket location and get the value object.If two value objects are stored in the same bucket, the LinkedList is traversed until the value object is found.Once the bucket location is found, the keys.equals() method is called to find the correct node in the LinkedList and the value object to be found.(When a program takes out a corresponding value through a key, the system simply calculates the hashCode() return value of the key, finds the index of the key in the table array based on the hashCode return value, then takes out the Entry at the index, and finally returns the corresponding value of the key).

(5) What if the HashMap size exceeds the capacity defined by the load factor?

When a map is filled with 75% of the buckets, like other collection classes such as ArrayList, an array of buckets twice the size of the original HashMap is created to resize the map and place the original object in a new bucket array.This process is called rehashing because it calls the hash method to find the new bucket location.

(6) Do you know what's wrong with resizing HashMap?

When you resize a HashMap, there is competition because if both threads find that a HashMap needs to be resized, they try to resize it at the same time.As you resize, the order of the elements stored in the LinkedList is reversed, because when you move to a new bucket location, HashMap does not place the element at the end of the LinkedList, but at the head, to avoid tail traversing.If conditional competition occurs, then there is a dead cycle.At this point, you can ask the interviewer why it is so strange to use HashMap in a multithreaded environment?

(7) What is the use of final and volatile modifications for variables in ConcurrentHashMap?Where the list of chains is the next attribute of final, then deletion of an element occurs, how to achieve it?

Using final to implement immutable is the easiest way to ensure multithreaded security.Because you can't take him, there's no opportunity to change it.Invariant patterns are primarily qualified by the final keyword.The final keyword also has special semantics in JMM.The Final domain makes it possible to ensure initialization safety, which allows immutable objects to be freely accessed and shared without synchronization.

Use volatile to ensure that changes in memory for one variable are immediately visible to other threads, and support for concurrent operations without locking can be achieved with CAS

The remove execution starts by assigning the table to a local variable tab, copies the tab in turn, and finally points the pointer to the next variable until the deletion position.

(8) Describe the remove operation in ConcurrentHashMap, what should be noted?

The following points should be noted.First, when the node to be deleted exists, the last step to delete is to subtract the count value by one.This must be the last step or the read operation may not see the previous structural changes to the segment.Second, remove assigns a table to a local variable tab at the beginning of execution, because tables are volatile variables, which are expensive to read and write.The compiler cannot optimize the read and write of volatile variables either. It does not make much difference to access non-volatile instance variables directly multiple times, and the compiler optimizes accordingly.

(9) What is the difference between HashTable and Concurrent HashMap, describing lock segmentation technology.

HashTable containers are inefficient in a highly competitive concurrent environment because all threads accessing HashTable must compete for the same lock. If there are multiple locks in the container and each lock is used to lock a portion of the container's data, then when multithreads access data from different data segments in the container, there will be no lock competition between threads, so that there can beEfficient concurrent access, which is the lock fragmentation technique used by ConcurrentHashMap, first divides data into segments of storage, then assigns a lock to each segment of data. When a thread occupies a lock to access one segment of data, the other segments of data can also be accessed by other threads.Some methods require cross-sections, such as size() and containsValue(), which may need to lock the entire table rather than just one segment, locking all segments sequentially, and releasing locks on all segments sequentially after the operation is complete.Sequential is important here, or deadlocks are most likely to occur. Inside ConcurrentHashMap, segment arrays are final, and their member variables are actually final. However, declaring an array final alone does not guarantee that its members are final, which requires implementation assurance.This ensures that no deadlocks occur because the order in which locks are acquired is fixed.


Tags: Java Attribute Programming JDK

Posted on Tue, 05 May 2020 23:06:45 -0400 by smeagal