|Write in front
This article will not go directly to the topic of why LongAdder performs better than AtomicLong, but first introduce volatile. First, I can sort out what I have learned recently. Second, I think AtomicLong is to solve the scenarios where volatile is not applicable as a foreshadowing. Then, I introduce AtomicLong, and finally introduce LongAdder and the performance comparison between LongAdder and AtomicLong, If you want to see the reason directly, jump to the end of the text: the reason for the performance difference.
| volatile
Volatile keyword can be understood as lightweight synchronized. Its use will not cause thread context switching and scheduling, and the use cost is lower than synchronized. However, volatile only ensures visibility, which means that when a thread modifies a variable modified by volatile, the new value is always immediately known to other threads. Volatile is not suitable for computing scenarios such as i + +, that is, the operation result depends on the current value of the variable. Take an example: VolatileTest.java.
public class VolatileTest { private static final int THREAD_COUNT = 20; private static volatile int race = 0; public static void increase() { race++; } public static void main(String[] args) { Thread[] threads = new Thread[THREAD_COUNT]; for (int i = 0; i < THREAD_COUNT; i++) { threads[i] = new Thread(new Runnable() { @Override public void run() { for (int i = 0; i < 1000; i++) { increase(); } } }); threads[i].start(); } //Wait until all accumulation threads end while (Thread.activeCount() > 1) { Thread.yield(); } System.out.println("race: " + race); } }
The function of this method is very simple, that is, each thread performs 1000 self increment operations on race, 20 threads perform self increment on race, and 20 * 1000 = 20000. However, no matter how many times the program is run, the result is less than 20000.
The reason lies in the increase method. Although the increase method has only one line, it will be found that the increase method with only one line of code is composed of four lines of bytecode instructions after decompilation.
| AtomicLong
Although locking the increase method can ensure the correctness of the results, synchronized and reentrock are mutually exclusive locks. Only one thread is allowed to execute at the same time, and the other threads can only wait. The execution efficiency will be very poor. Fortunately, jdk provides atomic classes for this operation scenario, and modifies the race variable of int type modified by volatile to AtomicLong type. The code is as follows: AtomicLongTest.java.
public class AtomicLongTest { private static final int THREAD_COUNT = 20; private static volatile AtomicLong race = new AtomicLong(0); public static void increase() { race.getAndIncrement(); } public static void main(String[] args) { Thread[] threads = new Thread[THREAD_COUNT]; for (int i = 0; i < THREAD_COUNT; i++) { threads[i] = new Thread(new Runnable() { @Override public void run() { for (int i = 0; i < 1000; i++) { increase(); } } }); threads[i].start(); } //Wait until all accumulation threads end while (Thread.activeCount() > 1) { Thread.yield(); } System.out.println("race: " + race); } } The expected result is 20000.
Although AtomicLong can guarantee the correctness of the results, the performance of using AtomicLong in high concurrency scenarios is not good. In order to solve the problem of performance, long adder is introduced in jdk1.8.
| LongAdder
The usage posture of LongAdder is similar to AtomicLong. Modify AtomicLong in the above code to LongAdder. The test code is as follows:
public class LongAdderTest { private static final int THREAD_COUNT = 20; //The default initialization value is 0 private static volatile LongAdder race = new LongAdder(); public static void increase() { race.increment(); } public static void main(String[] args) { Thread[] threads = new Thread[THREAD_COUNT]; for (int i = 0; i < THREAD_COUNT; i++) { threads[i] = new Thread(new Runnable() { @Override public void run() { for (int i = 0; i < 1000; i++) { increase(); } } }); threads[i].start(); } while (Thread.activeCount() > 1) { Thread.yield(); } System.out.println("race: " + race); } } The result is also expected.
|Performance comparison between AtomicLong and LongAdder
After knowing the volatile keyword, AtomicLong and LongAdder, let's test the performance of AtomicLong and LongAdder. The functions of both are similar. How to choose to speak with data JMH is used for Benchmark test, and the test code is as follows:
@BenchmarkMode(Mode.Throughput) @OutputTimeUnit(TimeUnit.MILLISECONDS) public class PerformaceTest { private static AtomicLong atomicLong = new AtomicLong(); private static LongAdder longAdder = new LongAdder(); @Benchmark @Threads(10) public void atomicLongAdd() { atomicLong.getAndIncrement(); } @Benchmark @Threads(10) public void longAdderAdd() { longAdder.increment(); } public static void main(String[] args) throws RunnerException { Options options = new OptionsBuilder().include(PerformaceTest.class.getSimpleName()).build(); new Runner(options).run(); } } explain:
- @Benchmark mode (mode. Throughput) = > test throughput
- @Outputtimeunit (timeunit. Milliseconds) = > output time unit
- @Threads (10) = > number of test threads in each process
Test results: Number of threads is 1:
Benchmark Mode Cnt Score Error Units PerformaceTest.atomicLongAdd thrpt 200 153824.699 ± 137.947 ops/ms PerformaceTest.longAdderAdd thrpt 200 124087.220 ± 81.015 ops/ms The number of threads is 5:
PerformaceTest.atomicLongAdd thrpt 200 56392.136 ± 1165.361 ops/ms PerformaceTest.longAdderAdd thrpt 200 605501.870 ± 4140.190 ops/ms Number of threads is 10:
Benchmark Mode Cnt Score Error Units PerformaceTest.atomicLongAdd thrpt 200 53286.334 ± 957.765 ops/ms PerformaceTest.longAdderAdd thrpt 200 713884.602 ± 3950.884 ops/ms From the test results, when the number of threads is 5, LongAdder The performance has been better than AtomicLong.
|Reasons for performance differences
To analyze the performance difference, you must go deep into the source code and analyze the source code. First, take a look at AtomicLong's getAndIncrement method.
AtomicLong#getAndIncrement method analysis
//AtomicLong#getAndIncrement public final long getAndIncrement() { return unsafe.getAndAddLong(this, valueOffset, 1L); } //Unsafe#getAndAddLong public final long getAndAddLong(Object var1, long var2, long var4) { long var6; do { var6 = this.getLongVolatile(var1, var2); } while(!this.compareAndSwapLong(var1, var2, var6, var6 + var4)); return var6; }
The CAS algorithm is used at the bottom, and the CAS operation in the JVM is realized by using the CMPXCHG instruction provided by the processor. The basic idea of spin CAS implementation is to cycle CAS operation until it is successful, which also brings performance problems under high concurrency. The cycle time is long and the overhead is large. If the spin CAS is not successful for a long time, it will bring very large execution overhead to the processor. In the high concurrency environment, when N threads spin at the same time, there will be a large number of failures and continuous spin. Therefore, in the above test, when the number of test threads is very large, the performance of using LongAdder is better than that of using AtomicLong.
Analysis of LongAdder#increment method
public void increment() { add(1L); } public void add(long x) { Cell[] as; long b, v; int m; Cell a; if ((as = cells) != null || !casBase(b = base, b + x)) { boolean uncontended = true; if (as == null || (m = as.length - 1) < 0 || (a = as[getProbe() & m]) == null || !(uncontended = a.cas(v = a.value, v + x))) longAccumulate(x, null, uncontended); } } final void longAccumulate(long x, LongBinaryOperator fn, boolean wasUncontended) { int h; if ((h = getProbe()) == 0) { ThreadLocalRandom.current(); // force initialization h = getProbe(); wasUncontended = true; } boolean collide = false; // True if last slot nonempty for (;;) { Cell[] as; Cell a; int n; long v; if ((as = cells) != null && (n = as.length) > 0) { if ((a = as[(n - 1) & h]) == null) { if (cellsBusy == 0) { // Try to attach new Cell Cell r = new Cell(x); // Optimistically create if (cellsBusy == 0 && casCellsBusy()) { boolean created = false; try { // Recheck under lock Cell[] rs; int m, j; if ((rs = cells) != null && (m = rs.length) > 0 && rs[j = (m - 1) & h] == null) { rs[j] = r; created = true; } } finally { cellsBusy = 0; } if (created) break; continue; // Slot is now non-empty } } collide = false; } else if (!wasUncontended) // CAS already known to fail wasUncontended = true; // Continue after rehash else if (a.cas(v = a.value, ((fn == null) ? v + x : fn.applyAsLong(v, x)))) break; else if (n >= NCPU || cells != as) collide = false; // At max size or stale else if (!collide) collide = true; else if (cellsBusy == 0 && casCellsBusy()) { try { if (cells == as) { // Expand table unless stale Cell[] rs = new Cell[n << 1]; for (int i = 0; i < n; ++i) rs[i] = as[i]; cells = rs; } } finally { cellsBusy = 0; } collide = false; continue; // Retry with expanded table } h = advanceProbe(h); } else if (cellsBusy == 0 && cells == as && casCellsBusy()) { boolean init = false; try { // Initialize table if (cells == as) { Cell[] rs = new Cell[2]; rs[h & 1] = new Cell(x); cells = rs; init = true; } } finally { cellsBusy = 0; } if (init) break; } else if (casBase(v = base, ((fn == null) ? v + x : fn.applyAsLong(v, x)))) break; // Fall back on using base } } The code is very long and can be understood in combination with pictures:

The reason for the high performance of LongAdder is that it uses the Cell array to avoid the competition of shared variables with space for efficiency. In LongAdder, the base variable is used internally to save the Long value. When there is no thread conflict, CAS is used to update the base value. When there is thread conflict, the thread that does not execute CAS successfully operates the Cell array and sets the elements in the array to 1, that is, cell[i]=1, When the count is finally obtained, the sum of cell[i] will be calculated. Adding base will be the final count result. The sum code is as follows:
public long sum() { Cell[] as = cells; Cell a; long sum = base; if (as != null) { for (int i = 0; i < as.length; ++i) { if ((a = as[i]) != null) sum += a.value; } } return sum; }

|AtomicLong and LongAdder selection
Long adder is selected for high parallel delivery, and AtomicLong is selected for non high parallel delivery.