Synchronization tool - exchange

This blog series is a record summary of learning concurrent programming. Due to the large number of articles and the scattered time of writing, I arranged a directory post (transmission gate) for easy reference.

Concurrent programming series blog portal

1, About exchange

Exchange -- exchange, a synchronizer introduced in JDK 1.5, can be seen literally that the main function of this class is to exchange data.

Exchange is a bit similar to CyclicBarrier. We know that CyclicBarrier is a fence. The thread arriving at the fence needs to wait for a certain number of other threads to arrive before passing through the fence.

Exchange can be seen as a two-way fence, as shown in the following figure:

After the thread 1 reaches the fence, it will first observe whether other threads have reached the fence. If not, it will wait. If other threads (thread 2) have arrived, they will exchange their information in pairs. Therefore, the exchange is very suitable for data exchange between two threads.

2, Exchange sample

Let's take a look at an example to understand the functions of exchange:

Example: suppose there is now a producer and a consumer. If we want to implement the producer consumer model, the general idea is to use the queue as a message queue. The producer keeps producing messages and then enters the queue; the consumer keeps taking messages from the message queue for consumption. If the queue is full, the producer waits, if the queue is empty, the consumer waits.

Let's take a look at how to use exchange to implement the producer messager pattern:
producer:

public class Producer implements Runnable {
    private final Exchanger<Message> exchanger;

    public Producer(Exchanger<Message> exchanger) {
        this.exchanger = exchanger;
    }

    @Override
    public void run() {
        Message message = new Message(null);
        for (int i = 0; i < 3; i++) {
            try {
                Thread.sleep(1000);

                message.setV(String.valueOf(i));
                System.out.println(Thread.currentThread().getName() + ": Data produced[" + i + "]");

                message = exchanger.exchange(message);

                System.out.println(Thread.currentThread().getName() + ": Exchange data[" + String.valueOf(message.getV()) + "]");

            } catch (InterruptedException e) {
                e.printStackTrace();
            }

        }
    }
}

consumer:

public class Consumer implements Runnable {
    private final Exchanger<Message> exchanger;

    public Consumer(Exchanger<Message> exchanger) {
        this.exchanger = exchanger;
    }

    @Override
    public void run() {
        Message msg = new Message(null);
        while (true) {
            try {
                Thread.sleep(1000);
                msg = exchanger.exchange(msg);
                System.out.println(Thread.currentThread().getName() + ": Data consumed[" + msg.getV() + "]");
                msg.setV(null);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

}

Main:

public class Main {
    public static void main(String[] args) {
        Exchanger<Message> exchanger = new Exchanger<>();
        Thread t1 = new Thread(new Consumer(exchanger), "consumer-t1");
        Thread t2 = new Thread(new Producer(exchanger), "producer-t2");

        t1.start();
        t2.start();
    }
}

Output:

Producer-t2: produced data [0]
Producer-t2: exchange to get data [null]
Consumer - t1: Data consumed [0]
Producer-t2: produced data [1]
Consumer - t1: Data consumed [1]
Producer-t2: exchange to get data [null]
Producer t2: produced data [2]
Consumer - t1: Data consumed [2]
Producer-t2: exchange to get data [null]

In the above example, the producer produces three data: 0, 1, 2. Exchange with consumers through exchange. As you can see, consumers will exchange empty messages to producers after consumption.

3, Principle of exchange

Construction of exchange

Let's take a look at the structure of the exchange. The exchange has only one empty constructor:

public Exchanger() {
    participant = new Participant();
}

During construction, a participant object is created internally. Participant is an internal class of exchange, which is essentially a ThreadLocal To save the thread local variable Node:

static final class Participant extends ThreadLocal<Node> {
    public Node initialValue() { return new Node(); }
}

We can understand the Node object as the exchange data carried by each thread itself:

@sun.misc.Contended static final class Node {
    int index;              // Arena index
    int bound;              // Last recorded value of Exchanger.bound
    int collides;           // Number of CAS failures at current bound
    int hash;               // Pseudo-random for spins
    Object item;            // This thread's current item
    volatile Object match;  // Item provided by releasing thread
    volatile Thread parked; // Set to this thread when parked, else null
}

Single slot exchange of exchange

There are two ways to exchange data. When the concurrency is low, "single slot exchange" is used internally; when the concurrency is high, "multi slot exchange" is used.

Let's look at the exchange method first:

public V exchange(V x) throws InterruptedException {
        Object v;
        Object item = (x == null) ? NULL_ITEM : x; // translate null args
        if ((arena != null ||
             (v = slotExchange(item, false, 0L)) == null) &&
            ((Thread.interrupted() || // disambiguates null return
              (v = arenaExchange(item, false, 0L)) == null)))
            throw new InterruptedException();
        return (v == NULL_ITEM) ? null : (V)v;
    }

It can be seen that exchange is actually a method used to determine the data exchange mode. According to the status of some fields in the exchange, its internal part will judge whether to use single slot exchange or arena exchange at present. The flow chart of the whole judgment is as follows:

The arena field of the exchange is an array of Node type, representing a slot array, which is only used in multi slot exchange. In addition, the Exchanger also has a slot field, which represents a single slot exchange Node and is only used in single slot exchange.

The slot field will eventually point to the Node node of the first arriving thread, indicating that the thread occupies the slot.

    //Multichannel switched array
    private volatile Node[] arena;
    //Single slot switching node
    private volatile Node slot;

Single slot exchange diagram:

Let's take a look at how the exchange implements the single slot exchange. The single slot exchange method is not complex. The entry item of the slotExchange represents the data carried by the current thread. The return value is normally the data carried by the matching thread

/**
 * Single slot switching
 *
 * @param item Data to be exchanged
 * @return Data of other matching threads; if multi slot switching is activated or interrupted, null is returned; if timeout, timed'out (an object) is returned
 */
private final Object slotExchange(Object item, boolean timed, long ns) {
    Node p = participant.get();         // Exchange node carried by current thread
    Thread t = Thread.currentThread();
    if (t.isInterrupted())              // Thread interrupt status check
        return null;

    for (Node q; ; ) {
        if ((q = slot) != null) {       // Slot! = null, indicating that a thread has arrived first and occupied the slot
            if (U.compareAndSwapObject(this, SLOT, q, null)) {
                Object v = q.item;      // Get exchange value
                q.match = item;         // Set exchange value
                Thread w = q.parked;
                if (w != null)          // Wake up the thread waiting in this slot
                    U.unpark(w);
                return v;               // Exchange succeeded, return result
            }
            // When the number of CPU cores is more than 1 and the bound is 0, create arena array and set the bound to SEQ size
            if (NCPU > 1 && bound == 0 && U.compareAndSwapInt(this, BOUND, 0, SEQ))
                arena = new Node[(FULL + 2) << ASHIFT];
        } else if (arena != null)       // slot == null && arena != null
            // Initialization of arena occurs in the middle of a single slot exchange, which needs to be rerouted directly to the multi slot exchange
            return null;
        else {                          // If the current thread arrives first, the slot will be occupied
            p.item = item;
            if (U.compareAndSwapObject(this, SLOT, null, p))    // Occupy slot slot
                break;
            p.item = null;              // CAS operation failed, continue next spin
        }
    }

    // This indicates that the current thread arrives first and has occupied the slot slot. You need to wait for the paired thread to arrive
    int h = p.hash;
    long end = timed ? System.nanoTime() + ns : 0L;
    int spins = (NCPU > 1) ? SPINS : 1;             // Spin times, related to CPU cores
    Object v;
    while ((v = p.match) == null) {                 // p.match == null indicates that the matching thread has not arrived
        if (spins > 0) {                            // Optimized operation: random CPU release during spin
            h ^= h << 1;
            h ^= h >>> 3;
            h ^= h << 10;
            if (h == 0)
                h = SPINS | (int) t.getId();
            else if (h < 0 && (--spins & ((SPINS >>> 1) - 1)) == 0)
                Thread.yield();
        } else if (slot != p)                       // Optimization operation: the pairing thread has arrived, but it is not fully prepared, so it needs to spin for a while
            spins = SPINS;
        else if (!t.isInterrupted() && arena == null &&
                (!timed || (ns = end - System.nanoTime()) > 0L)) {  //It's been spinning for a long time, but I can't wait for pairing, and then I block the current thread
            U.putObject(t, BLOCKER, this);
            p.parked = t;
            if (slot == p)
                U.park(false, ns);               // Block current thread
            p.parked = null;
            U.putObject(t, BLOCKER, null);
        } else if (U.compareAndSwapObject(this, SLOT, p, null)) {   // Timeout or other (cancel) to make slot s for other threads
            v = timed && ns <= 0L && !t.isInterrupted() ? TIMED_OUT : null;
            break;
        }
    }
    U.putOrderedObject(p, MATCH, null);
    p.item = null;
    p.hash = h;
    return v;
}

The whole process of the above code is roughly as follows:

First arrived thread:

  1. If the current thread is the first thread to arrive, the slot field will point to its Node node, indicating that the slot is occupied;
  2. Then, the thread will spin for a period of time. If after a period of time, the thread cannot wait for the paired thread to arrive, it will enter the block. (the reason why we don't block directly here, but spin, is because of the cost of thread context switching, which belongs to an optimization method.)

Pairing threads that arrive later:
If the current thread (paired thread) is not the first arriving thread, the slot at the time of arrival has been occupied. At this time, the slot points to the Node node of the first arriving thread itself. The pairing thread will empty the slot and return the item in the Node as the data obtained by the exchange. In addition, the pairing thread will store the data it carries in the match field of the Node and wake up the thread pointed to by Node.parked (that is, the first arriving thread).

The first arriving thread is woken up:
After the thread is woken up, because the match is not empty (the data carried by the matching thread is stored), it will exit the spin, and then return the value corresponding to the match.

In this way, thread A and thread B realize data exchange, and the whole process does not use synchronous operation.

Multi slot exchange of exchange

The most complex part of the exchange is its arena exchange. Let's see when it will trigger the multi slot exchange?
As we said before, when there is a large amount of concurrency, multi slot switching will be triggered, which is not accurate.

There is such a code in slotExchange:

In other words, if multiple matching threads compete to modify slot slots in a single slot exchange, and a thread CAS fails to modify slot, arena multi slot array will be initialized, and arena exchange will be used for all subsequent exchanges:

/**
 * Multichannel switching
 *
 * @param item Data to be exchanged
 * @return Data of other matching threads; if it is interrupted, it returns null; if it is timeout, it returns timed "out (an object)
 */
private final Object arenaExchange(Object item, boolean timed, long ns) {
    Node[] a = arena;
    Node p = participant.get();                     // Exchange node carried by current thread
    for (int i = p.index; ; ) {                     // arena index of the current thread
        int b, m, c;
        long j;

        // Select the element whose offset address is (I < ashift) + abase from arena array, i.e. the really available Node
        Node q = (Node) U.getObjectVolatile(a, j = (i << ASHIFT) + ABASE);

        if (q != null && U.compareAndSwapObject(a, j, q, null)) {   // CASE1: slot is not empty, indicating that a thread has arrived and is waiting
            Object v = q.item;                     // Get the value of the arrived thread
            q.match = item;                        // Exchange the value carried by the current thread to the arrived thread
            Thread w = q.parked;                   // q. Parker points to a thread that has arrived
            if (w != null)
                U.unpark(w);                       // Wake up a thread that has arrived
            return v;
        } else if (i <= (m = (b = bound) & MMASK) && q == null) {       // CASE2: valid slot position and slot is empty
            p.item = item;
            if (U.compareAndSwapObject(a, j, null, p)) {            // Occupied the slot, successful
                long end = (timed && m == 0) ? System.nanoTime() + ns : 0L;
                Thread t = Thread.currentThread();
                for (int h = p.hash, spins = SPINS; ; ) {               // Spin and wait for a while to see if any other paired threads have reached the slot
                    Object v = p.match;
                    if (v != null) {                                    // A matching thread has reached the slot
                        U.putOrderedObject(p, MATCH, null);
                        p.item = null;
                        p.hash = h;
                        return v;   // Returns the value exchanged by the paired thread
                    } else if (spins > 0) {
                        h ^= h << 1;
                        h ^= h >>> 3;
                        h ^= h << 10;
                        if (h == 0)                // initialize hash
                            h = SPINS | (int) t.getId();
                        else if (h < 0 &&          // approx 50% true
                                (--spins & ((SPINS >>> 1) - 1)) == 0)
                            Thread.yield();        // Every time I wait, I have to give up the CPU twice
                    } else if (U.getObjectVolatile(a, j) != p)       // Optimization operation: the pairing thread has arrived, but it is not fully prepared, so it needs to spin for a while
                        spins = SPINS;
                    else if (!t.isInterrupted() && m == 0 &&
                            (!timed || (ns = end - System.nanoTime()) > 0L)) {      // Can't wait for the matching thread, blocking the current thread
                        U.putObject(t, BLOCKER, this);
                        p.parked = t;                           // The node references the current thread so that I can wake up when the paired thread arrives
                        if (U.getObjectVolatile(a, j) == p)
                            U.park(false, ns);
                        p.parked = null;
                        U.putObject(t, BLOCKER, null);
                    } else if (U.getObjectVolatile(a, j) == p &&
                            U.compareAndSwapObject(a, j, p, null)) {    // Try to reduce the size of arena slot array
                        if (m != 0)                // try to shrink
                            U.compareAndSwapInt(this, BOUND, b, b + SEQ - 1);
                        p.item = null;
                        p.hash = h;
                        i = p.index >>>= 1;        // descend
                        if (Thread.interrupted())
                            return null;
                        if (timed && m == 0 && ns <= 0L)
                            return TIMED_OUT;
                        break;                     // expired; restart
                    }
                }
            } else                                 // Occupied slot failed
                p.item = null;
        } else {                                   // CASE3: invalid slot position, need to expand
            if (p.bound != b) {
                p.bound = b;
                p.collides = 0;
                i = (i != m || m == 0) ? m : m - 1;
            } else if ((c = p.collides) < m || m == FULL ||
                    !U.compareAndSwapInt(this, BOUND, b, b + SEQ + 1)) {
                p.collides = c + 1;
                i = (i == 0) ? m : i - 1;          // cyclically traverse
            } else
                i = m + 1;                         // grow
            p.index = i;
        }
    }
}

/**
 * Single slot switching
 *
 * @param item Data to be exchanged
 * @return Data of other matching threads; if multi slot switching is activated or interrupted, null is returned; if timeout, timed'out (an object) is returned
 */
private final Object slotExchange(Object item, boolean timed, long ns) {
    Node p = participant.get();         // Exchange node carried by current thread
    Thread t = Thread.currentThread();
    if (t.isInterrupted())              // Thread interrupt status check
        return null;

    for (Node q; ; ) {
        if ((q = slot) != null) {       // Slot! = null, indicating that a thread has arrived first and occupied the slot
            if (U.compareAndSwapObject(this, SLOT, q, null)) {
                Object v = q.item;      // Get exchange value
                q.match = item;         // Set exchange value
                Thread w = q.parked;
                if (w != null)          // Wake up the thread waiting in this slot
                    U.unpark(w);
                return v;               // Exchange succeeded, return result
            }
            // When the number of CPU cores is more than 1 and the bound is 0, create arena array and set the bound to SEQ size
            if (NCPU > 1 && bound == 0 && U.compareAndSwapInt(this, BOUND, 0, SEQ))
                arena = new Node[(FULL + 2) << ASHIFT];
        } else if (arena != null)       // slot == null && arena != null
            // Initialization of arena occurs in the middle of a single slot exchange, which needs to be rerouted directly to the multi slot exchange
            return null;
        else {                          // If the current thread arrives first, the slot will be occupied
            p.item = item;
            if (U.compareAndSwapObject(this, SLOT, null, p))    // Occupy slot slot
                break;
            p.item = null;              // CAS operation failed, continue next spin
        }
    }

    // This indicates that the current thread arrives first and has occupied the slot slot. You need to wait for the paired thread to arrive
    int h = p.hash;
    long end = timed ? System.nanoTime() + ns : 0L;
    int spins = (NCPU > 1) ? SPINS : 1;             // Spin times, related to CPU cores
    Object v;
    while ((v = p.match) == null) {                 // p.match == null indicates that the matching thread has not arrived
        if (spins > 0) {                            // Optimized operation: random CPU release during spin
            h ^= h << 1;
            h ^= h >>> 3;
            h ^= h << 10;
            if (h == 0)
                h = SPINS | (int) t.getId();
            else if (h < 0 && (--spins & ((SPINS >>> 1) - 1)) == 0)
                Thread.yield();
        } else if (slot != p)                       // Optimization operation: the pairing thread has arrived, but it is not fully prepared, so it needs to spin for a while
            spins = SPINS;
        else if (!t.isInterrupted() && arena == null &&
                (!timed || (ns = end - System.nanoTime()) > 0L)) {  //It's been spinning for a long time, but I can't wait for pairing, and then I block the current thread
            U.putObject(t, BLOCKER, this);
            p.parked = t;
            if (slot == p)
                U.park(false, ns);               // Block current thread
            p.parked = null;
            U.putObject(t, BLOCKER, null);
        } else if (U.compareAndSwapObject(this, SLOT, p, null)) {   // Timeout or other (cancel) to make slot s for other threads
            v = timed && ns <= 0L && !t.isInterrupted() ? TIMED_OUT : null;
            break;
        }
    }
    U.putOrderedObject(p, MATCH, null);
    p.item = null;
    p.hash = h;
    return v;
}

The overall process of arena exchange is similar to slotExchange, the main difference is that it will calculate the hit slot according to the index field in Node carried by the current thread data.

If the slot is occupied, a thread has arrived first, and the processing is the same as that of slotExchange;

If the slot is valid and null, it means that the current thread arrives first and occupies the slot, and then waits for optimization in the order of lock upgrade: spin - > yield - > block. If the matching thread cannot wait, it will block.

In addition, arena exchange uses slot array, so it involves the expansion and reduction of slot array. Readers can study the source code by themselves.

Secondly, when locating the effective slots of arena array, we need to consider the impact of cache rows. Because the data between cache and memory is exchanged in the unit of cache behavior, according to the principle of locality, the data of adjacent address space will be loaded on the same data block (cache row) of cache, while the array is a continuous (logic, involving virtual memory) internal address space, therefore, multiple slots will be loaded on the same cache row when a slot is changed When the time is changed, all data (including other slots) on the cache line where the slot is located will be invalid, which needs to be reloaded from memory, affecting performance.

It should be noted that due to the different versions of JDK, the implementation details within the synchronization tool class vary greatly, so the most important thing is to understand its design idea. The design idea of exchange is similar to that of LongAdder, which improves the performance through the way of unlocked + decentralized hotspot. However, I feel that the implementation of exchange in JDK 1.8 is more complex, especially the multi slot exchange, which also involves the cache row related things.

Tags: Java JDK Programming

Posted on Tue, 12 May 2020 07:53:10 -0400 by cullouch