Java Concurrent learning: source code analysis of non blocking concurrent queue based on CAS

Today, let's talk about concurrent linkedqueue

The ConcurrentLinkedQueue we are going to learn today does not implement the BlockingQueue interface. It is a thread safe and unbounded non blocking queue that completely uses CAS operations.

Structural composition

public class ConcurrentLinkedQueue<E> extends AbstractQueue<E>
        implements Queue<E>, {
    private static final long serialVersionUID = 196745693267521676L;
     * The fundamental invariants are:
     * - There is exactly one (last) Node with a null next reference,
     *   which is CASed when enqueueing.  This last Node can be
     *   reached in O(1) time from tail, but tail is merely an
     *   optimization - it can always be reached in O(N) time from
     *   head as well.
     * - The elements contained in the queue are the non-null items in
     *   Nodes that are reachable from head.  CASing the item
     *   reference of a Node to null atomically removes it from the
     *   queue.  Reachability of all elements from head must remain
     *   true even in the case of concurrent modifications that cause
     *   head to advance.  A dequeued Node may remain in use
     *   indefinitely due to creation of an Iterator or simply a
     *   poll() that has lost its time slice.
    private static class Node<E> {
        volatile E item; // value
        volatile Node<E> next; // next domain
        Node(E item) {
            // Construct nodes to ensure thread safety
            UNSAFE.putObject(this, itemOffset, item);
        /* ----- CAS algorithm provided by UNSafe tool class is used internally----- */
        // If the item is cmp, change it to val
        boolean casItem(E cmp, E val) {
            return UNSAFE.compareAndSwapObject(this, itemOffset, cmp, val);
		// Set next to val
        void lazySetNext(Node<E> val) {
            UNSAFE.putOrderedObject(this, nextOffset, val);
		// If next is cmp, change next to val
        boolean casNext(Node<E> cmp, Node<E> val) {
            return UNSAFE.compareAndSwapObject(this, nextOffset, cmp, val);

        // Unsafe mechanics

        private static final sun.misc.Unsafe UNSAFE;
        private static final long itemOffset;
        private static final long nextOffset;

        static {
            try {
                UNSAFE = sun.misc.Unsafe.getUnsafe();
                Class<?> k = Node.class;
                itemOffset = UNSAFE.objectFieldOffset
                nextOffset = UNSAFE.objectFieldOffset
            } catch (Exception e) {
                throw new Error(e);
     * A node from which the first live (non-deleted) node (if any)
     * can be reached in O(1) time.
     * Invariants:
     * - all live nodes are reachable from head via succ()
     * - head != null
     * - (tmp = head).next != tmp || tmp != head
     * Non-invariants:
     * - head.item may or may not be null.
     * - it is permitted for tail to lag behind head, that is, for tail
     *   to not be reachable from head!
    private transient volatile Node<E> head;

     * A node from which the last node on list (that is, the unique
     * node with == null) can be reached in O(1) time.
     * Invariants:
     * - the last node is always reachable from tail via succ()
     * - tail != null
     * Non-invariants:
     * - tail.item may or may not be null.
     * - it is permitted for tail to lag behind head, that is, for tail
     *   to not be reachable from head!
     * - may or may not be self-pointing to tail.
    private transient volatile Node<E> tail;
    // No parameter structure, initialize to point the head and tail to the sentinel node with null item
    public ConcurrentLinkedQueue() {
        head = tail = new Node<E>(null);

	// Specify initial capacity
    public ConcurrentLinkedQueue(Collection<? extends E> c) {
        Node<E> h = null, t = null;
        for (E e : c) {
            Node<E> newNode = new Node<E>(e);
            if (h == null)
                h = t = newNode;
            else {
                t = newNode;
        if (h == null)
            h = t = new Node<E>(null);
        head = h;
        tail = t;

In the implementation of the ConcurrentLinkedQueue non blocking algorithm, the head/tail does not always refer to the head/tail node, that is, the queue is allowed to be in an inconsistent state. The advantage is that the two steps that need to be atomized together when entering / leaving the queue are separated, so as to narrow the range of atomized update values to unique variables when entering / leaving the queue, This is the key to the implementation of non blocking algorithm.

Because queues are sometimes in inconsistent states, ConcurrentLinkedQueue provides three invariants to maintain the correctness of non blocking algorithms: basic invariants, head invariants and tail invariants.

Invariants refer to the "contract" that must be observed between various methods of concurrent objects. Each method must maintain invariants before and after calling. Using invariants, each method can be analyzed in isolation without considering all possible interactions between them.

Basic invariant

  1. When a new node is inserted into the queue, there is a (last) node with null next field in the queue.
  2. Start from the head to traverse the queue, and you can access all nodes whose item field is not null.

Invariants and mutables of head


  1. All surviving nodes can be traversed from the head by calling the succ() method.
  2. head cannot be null.
  3. The next field of the head node cannot refer to itself.


  1. The item value of the head node may or may not be null.
  2. Allow tail to be followed by head, that is, traversing the queue from head may not reach tail.

Invariant and variable of tail


  1. By calling the succ() method through tail, the last node is always reachable.
  2. tail cannot be null.


  1. The item field of the tail node may or may not be null.
  2. Allow the tail to lag behind the head, that is, traversing the queue from the head may not reach the tail.
  3. The next field of the tail node can refer to itself.

offer operation

Source code analysis

The offer operation will add the element e [non null] to the end of the queue. Due to the characteristics of unbounded queue, this operation will never return false.

public boolean offer(E e) {
        // Check whether the element is null, and empty the pointer if it is null
        // Construct a new node
        final Node<E> newNode = new Node<E>(e);

        // [1] The for loop iterates from tail
        for (Node<E> t = tail, p = t;;) {
            Node<E> q =;
            // [2] q == null indicates that p is the tail node
            if (q == null) {
                // [3]
                // cas sets the next of p to newNode and returns true
                // If the setting fails, it indicates that another thread has modified
                // Then enter the cycle again
                if (p.casNext(null, newNode)) {
                    // [4]
                    // Here, the tail pointer does not change every time a node is inserted. The odd number of nodes starting from head will be tail
                    if (p != t) // hop two nodes at a time
                        casTail(t, newNode);  // Failure is OK.
                    return true;
                // Lost CAS race to another thread; re-read next
            else if (p == q)
                // In the case of concurrency, when you remove the head [such as poll], = head
                // That is, the branching condition of p == q is satisfied, and a new head needs to be found again
                p = (t != (t = tail)) ? t : head;
                // Indicates that the tail is not pointing to the last node, and updates the location of p
                // This is actually to find the location of the last node
                p = (p != t && t != (t = tail)) ? t : q;

Graphic offer operation

The above is the operation of an offer element in the simulated single thread case. You can see:

  1. Both the initialization head and tail point to the sentinel node whose item is null, and their next point to null.
  2. In the case of single thread, we temporarily think that CAS operations are successful. At this time, q is null, the first branch [2] will be taken, and p's next will point to newNode. At this time, p==t, so [4] casTail operation will not be executed, and true will be returned directly.

In the case of multithreading, things are not so simple:

  1. Thread A wants to insert data A at the end of the queue, and thread B wants to insert data B at the end of the queue. At the same time, they have reached [3] p.casNext(null, newNode). Since casNext is atomic, suppose A is set successfully at this time, and p == t, as shown in Figure 1.
  2. If A succeeds and the cas of thread B fails to set next, the for loop will be performed again. At this time, q= null && p != q. Go to [6] and move p to the position of Q, that is, the position of A, as shown in Figure 2.
  3. Cycle again. At this time, q==null. Set next in [3] again. At this time, it is assumed that B succeeds, as shown in Figure 3.
  4. At this point, you will find that the tail needs to be reset because P= If the T condition satisfies [4], casTail(t, newNode) will be executed, and the tail pointer will point to the inserted B.

I believe that after a graphic + source code analysis, you will gradually become familiar with the whole process and summarize it a little:

The offer operation is actually controlled by the atomic CAS operation. At a certain time, only one thread can successfully append elements at the end of the queue. The CAS failed thread will try the CAS operation again through the loop until it succeeds. The non blocking algorithm is like this. It uses CPU resources to replace the resource consumption of blocking threads by circulating CAS. Moreover, the tail pointer does not always point to the last node. Due to its own mechanism, the last node is either the location pointed to by the tail or its next. Therefore, when positioning, the p pointer is used to locate the position of the last node.

By the way, you will find that the [5] operation has not been involved in the whole process. In fact, the situation of [5] may occur during the poll operation. Here is an example:

Figure 1 shows a possible situation caused by the poll operation. Take it as an example: at this time, the tail node points to the discarded node and offer s an element to the queue.

  1. At this time, execute to [2], and the direction of each pointer is shown in Figure 1.
  2. Then, since q is not null and p == q, enter [5] smoothly. At this time, p is assigned as head, as shown in Figure 2.
  3. Loop again, q points to, which is null, as shown in Figure 3.
  4. If q is null, enter [2], as before, [3] set next, and then [4] P= t. Set the new node to the new tail, as shown in Figure 4.

JDK1.6 hops design intent

When looking at the source code comments, I found that hop is annotated in many places. It does exist in the original JDK1.6 source code: talk about the implementation principle analysis of concurrency (VI) concurrent linkedqueue, and the design concept is the same. Use hops to control the update frequency of tail node and improve the efficiency of joining the team.

Quoting the art of Java Concurrent Programming, Fang Tengfei:   Reducing the number of times CAS updates the tail node can improve the efficiency of joining the queue. Therefore, doug lea uses the hops variable to control and reduce the update frequency of the tail node. Instead of updating the tail node into the tail node every time the node joins the queue, it updates the tail node when the distance between the tail node and the tail node is greater than or equal to the value of the constant hops (equal to 1 by default), The longer the distance between the tail node and the tail node, the fewer times to update the tail node using CAS. However, the longer the distance, the longer the time to locate the tail node each time you join the queue, because the loop body needs to cycle once more to locate the tail node, but this can still improve the efficiency of joining the queue, In essence, it reduces the write operation on volatile variables by increasing the read operation on volatile variables, and the write operation overhead on volatile variables is much greater than the read operation, so the queue entry efficiency will be improved.

private static final int HOPS = 1;

    public boolean offer(E e) {
        if (e == null) throw new NullPointerException();
        Node<E> n = new Node<E>(e);
        for (;;) {
            Node<E> t = tail;
            Node<E> p = t;
            for (int hops = 0; ; hops++) {
                Node<E> next = succ(p); // 1. Obtain the successor node of p. (if p's next points to itself, return the head node)
                if (next != null) { // 2. If next is not null
                    if (hops > HOPS && t != tail) 
                        continue retry; // 3. If the number of spins is greater than HOPS and t is not the tail node, jump out of layer 2 cycle and try again.
                    p = next; // 4. If the number of spin words is less than HOPS or t is the tail node, point p to next.
                } else if (p.casNext(null, n)) { // 5. If next is null, try to set the next node of p to n, and then spin.
                    if (hops >= HOPS)
                        casTail(t, n); // 6. If the setting is successful and the spin times are greater than HOPS, try to set n as the tail node, and it doesn't matter if it fails. 
                    return true; // 7. Add successfully.
                } else {
                    p = succ(p); // 8.  If step 5 fails to set the next node of p to n, point p to the successor node of p and spin.

     final Node<E> succ(Node<E> p) {
         Node<E> next = p.getNext();
         //If the next node of the p node points to itself, the head node is returned; Otherwise, the next node of p is returned.
         return (p == next) ? head : next;

poll operation

The poll operation will dequeue an element at the queue head and return. If the queue is empty, it will return null.

Source code analysis

public E poll() {
        // [1]continue xxx; Will come back here
        // [2] Dead cycle
        for (;;) {
            for (Node<E> h = head, p = h, q;;) {
                E item = p.item;
				// [3] If there is a current value, the cas operation is set to null
                if (item != null && p.casItem(item, null)) {
                    // Successful CAS is the linearization point
                    // for item to be removed from this queue.
                    // [4]
                    if (p != h) // hop two nodes at a time
                        updateHead(h, ((q = != null) ? q : p);
                    return item;
                // [item == null] or [item! = null but cas failed]
                // [5] The queue is empty and null is returned
                else if ((q = == null) {
                    updateHead(h, p);
                    return null;
                // [6]
                else if (p == q)
                    continue restartFromHead;
                // [7]
                    p = q;

    final void updateHead(Node<E> h, Node<E> p) {
        // h == p actually does not need to be updated. Otherwise, update the head to p and the update is successful. Point to h itself
        if (h != p && casHead(h, p))

Graphical poll operation

Let's take a look at the simplest case:

Initially, the head and tail point to the sentinel node where the item is null. At this time, suppose a thread performs the poll operation and starts the iteration from the head: at this time, p.item = = null & & = = null, go to the branch [5] and update the head. At this time, P= h. That is, it returns null directly.

If another thread just adds an element to the queue when walking to the [5] branch, the situation is as follows:

  1. The pointer Q will point to the position of the newly inserted element, and [5] position q= Null, then go [6] and find P= q. [6] I can't go in.
  2. Finally, go to [7] and point p to the q node.
  3. Enter the cycle again and go to branch [3]. At this time, the item is not null. Try cas to set the item to null. If the setting is successful, the condition [4] is established, P= h. Set p to head, make H point to itself, and finally return the value of P.

You will find that the final result is a situation that occurs when we analyze the offer operation before, that is, when we offer, we find = tail.

Next, we can see that there are similar judgments in poll, that is, the code of [6], and the judgment p == q is also similar. Purple indicates thread A and blue indicates thread B.

  1. Suppose that when thread A performs the poll operation, the current queue state is shown in Figure 1.
  2. As shown in Figure 2, p sets A to null through cas operation.
  3. At this point P= h. The updateHead operation will be executed. Before that, if thread B just starts the poll, as shown in Figure 3.
  4. Thread B will go to [6], jump to restartFromHead and find the head of the current queue, as shown in Figure 4.

When poll ing an element, the CAS operation will be used to set the item value of the current node to null, and the CAS will be set to head to point the removed node to itself, so that it will be garbage collected. During the whole cycle, the concurrency is constantly detected. If the head node is found to be modified, it will jump out of the cycle and get a new head again.


ConcurrentLinkedQueue is a thread safe and unbounded non blocking queue using CAS operation, which is based on linked list.

The head and tail nodes of the linked list are decorated with volatile to ensure the safety of outgoing and incoming operations in a multi-threaded environment. Volatile itself ensures visibility, and atomicity is guaranteed by CAS operations.

In terms of design, the non blocking algorithm allows the queue to be in an inconsistent state. For example, the tail pointer does not point to the last node every time. The last node may be tail or This feature separates the two steps that need to be executed together in the queue in / out operation, Thus, the unique variable of atomization range when entering / leaving the team is effectively reduced. For inconsistency, three invariants are used to maintain the correctness of the non blocking algorithm.

The overhead of writing to volatile variables is much greater than that of reading. Therefore, it increases the overhead of traversing the queue and finding the head/tail nodes [increasing the overhead of volatile reading]. However, it is not necessary to update the CAS head/tail [reducing the overhead of volatile writing] every time to improve the queue entry efficiency.

Original link:

If you think this article is helpful to you, just like it, pay attention and support it!

Tags: Java Back-end Programmer architecture

Posted on Mon, 06 Dec 2021 16:48:18 -0500 by Stelios