Java-CopyOnWrite on Write

introduce

Copy-on-write (COW) is an optimization strategy in the field of computer program design.

Core Ideas

Multiple callers read the same resource that the pointer points to. Only when the caller writes, copy a copy of the resource, and replace the old resource with the copy.

application

  1. Linux uses COW technology to reduce Fork overhead;
  2. The file system guarantees data integrity to some extent through COW technology.
  3. The database provides users with a snapshot using the write-time replication strategy.
  4. COW technology is also utilized by the CopyOnWriteArrayList and CopyOnWriteArraySet of JDK.

Vector and Collections.SynchronizedXxx

ArrayList threads are not secure, Vector and Collections.SynchronizedXxx threads are secure.

Vector ensures thread security by embellishing each method declaration with the synchronized keyword:

Collections.SynchronizedXxx ensures thread security by encapsulating specific operations within each method with the synchronized keyword:

However, container threads are safe, which does not mean that concurrent use can be safely and boldly performed, but also that care should be taken about how to use them, such as:

public class CopyOnWriteTest {

    public static void main(String[] args) {
        Vector<Integer> vector = new Vector<>();
        vector.add(1);
        vector.add(2);
        vector.add(3);
        vector.add(4);
        vector.add(5);
        for (Integer item : vector) {
            new Thread(vector::clear).start();
            System.out.println(item);
        }
    }
}

Execution results:

1
Exception in thread "main" java.util.ConcurrentModificationException
    at java.util.Vector$Itr.checkForComodification(Vector.java:1210)
    at java.util.Vector$Itr.next(Vector.java:1163)
    at com.wkw.study.copyonwrite.CopyOnWriteTest.main(CopyOnWriteTest.java:20)

The root cause is that Vector inherits AbstractList, which maintains the container modification number modCount, which adds 1 for each Vector modification, but Vector's iterator:

/**
 * Returns an iterator over the elements in this list in proper sequence.
 *
 * <p>The returned iterator is <a href=" "><i>fail-fast</i></a >.
 *
 * @return an iterator over the elements in this list in proper sequence
 */
public synchronized Iterator<E> iterator() {
    return new Itr();
}

/**
 * An optimized version of AbstractList.Itr
 */
private class Itr implements Iterator<E> {
    int cursor;       // index of next element to return
    int lastRet = -1; // index of last element returned; -1 if no such
    int expectedModCount = modCount;
    
    public E next() {
        synchronized (Vector.this) {
            checkForComodification();
            int i = cursor;
            if (i >= elementCount)
                throw new NoSuchElementException();
            cursor = i + 1;
            return elementData(lastRet = i);
        }
    }
    
    final void checkForComodification() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
    }
    
    ......
}

You can see that Vector's iterator gets the number of modifications to Vector at iterator initialization, the iterator checks the number of modifications each time it gets the next element, and throws an exception whenever it finds that the number of modifications is inconsistent with Vector (that is, Vector is modified), in order to fail quickly.

Collections.SynchronizedXxx also has problems traversing using iterators.

The protocol that a collection's remove/add/clear method cannot be called in a foreach loop applies not only to non-thread-safe ArrayList/LinkedList s, but also to thread-safe Vector s and Collections.SynchronizedXxx.

To solve these problems, you need to lock the entire Vector before iteration. However, distributing the container CopyOnWriteArrayList can avoid these problems.

New Generation Concurrent Container under JUC VS java.util Old Generation Concurrent Container

CopyOnWriteArrayList is a substitute for synchronous List and CopyOnWriteArraySet is a substitute for synchronous Set.
Hashtable -> ConcurrentHashMap, Vector -> CopyOnWriteArrayList; Concurrency-enabled containers under JUC are, in summary, lock granularity issues compared to older generation thread security classes:

  1. Locks such as Hashtable, Vector, Collections.SynchronizedXxx are granular and use the synchronized keyword directly at the method declaration.

  2. ConcurrentHashXxx, CopyOnWriteArrayXxx, and so on have small lock granularity.

    Thread security is implemented in a variety of ways, such as ConcurrentHashMap using CAS + volatile

  3. Thread-safe containers under JUC do not throw ConcurrentModificationException exceptions when traversing

So in general, it is recommended that you use the thread-safe containers provided under the JUC package (ConcurrentHashMap, ConcurrentHashSet, CopyOnWriteArrayList, CopyOnWriteArraySet, and so on) instead of older generation thread-safe containers (Hashtable, Vector, Collections.SynchronizedXxx, and so on).

Principle of CopyOnWriteArrayList

COW is a way of solving concurrency. The basic principle is read-write separation:

When writing, copy a new collection, add or delete elements within the new collection; After all modifications have been made, the reference to the original set is pointed to the new set.

The advantage is that COW s can be read and traversed in high concurrency without locking, since the current collection does not add any elements.

Basic Definitions

public class CopyOnWriteArrayList<E>
    implements List<E>, RandomAccess, Cloneable, java.io.Serializable {
    private static final long serialVersionUID = 8673264195747942595L;

    /** The lock protecting all mutators */
    //Write Lock
    final transient ReentrantLock lock = new ReentrantLock();

    /** The array, accessed only via getArray/setArray. */
    private transient volatile Object[] array;
    
    ......
}

Write operation

/**
 * Appends the specified element to the end of this list.
 *
 * @param e element to be appended to this list
 * @return {@code true} (as specified by {@link Collection#add})
 */
public boolean add(E e) {
    final ReentrantLock lock = this.lock;
    lock.lock();
    try {
        Object[] elements = getArray();
        int len = elements.length;
        Object[] newElements = Arrays.copyOf(elements, len + 1);
        newElements[len] = e;
        setArray(newElements);
        return true;
    } finally {
        lock.unlock();
    }
}

/**
 * Gets the array.  Non-private so as to also be accessible
 * from CopyOnWriteArraySet class.
 */
final Object[] getArray() {
    return array;
}

/**
 * Sets the array.
 */
final void setArray(Object[] a) {
    array = a;
}    

As you can see, the principle is simple:

  1. Write operations are locked to prevent data loss during concurrent writes.
  2. Copy a new array and add operations on the new array.
  3. Point array to a new array
  4. Final unlock

Read operation

/**
 * {@inheritDoc}
 *
 * @throws IndexOutOfBoundsException {@inheritDoc}
 */
public E get(int index) {
    return get(getArray(), index);
}

private E get(Object[] a, int index) {
    return (E) a[index];
}

Read operations are direct reads of the original array;

iterator

/**
 * Returns an iterator over the elements in this list in proper sequence.
 *
 * <p>The returned iterator provides a snapshot of the state of the list
 * when the iterator was constructed. No synchronization is needed while
 * traversing the iterator. The iterator does <em>NOT</em> support the
 * {@code remove} method.
 *
 * @return an iterator over the elements in this list in proper sequence
 */
public Iterator<E> iterator() {
    return new COWIterator<E>(getArray(), 0);
}

static final class COWIterator<E> implements ListIterator<E> {
    /** Snapshot of the array */
    private final Object[] snapshot;
    /** Index of element to be returned by subsequent call to next.  */
    private int cursor;

    private COWIterator(Object[] elements, int initialCursor) {
        cursor = initialCursor;
        snapshot = elements;
    }

    public boolean hasNext() {
        return cursor < snapshot.length;
    }

    public boolean hasPrevious() {
        return cursor > 0;
    }

    @SuppressWarnings("unchecked")
    public E next() {
        if (! hasNext())
            throw new NoSuchElementException();
        return (E) snapshot[cursor++];
    }
    
    ......
}

You can see that iteration is also an iteration of the original array. If the set is modified, the array inside the set points to a new array object, while the snapshot inside the COWIterator points to the old array passed in at initialization, so no exception is thrown, because the old array never changes and the old array reading operation is always reliable and safe.

Because the read-write separation does not affect the read, the CopyOnWriteArrayList does not maintain the modCount number of modifications.

Performance: CopyOnWriteArrayList VS Collections.synchronizedList

Analysis

  1. In terms of space utilization, CopyOnWriteArrayList needs to copy an array when it is written, so Collections.synchronizedList must have a higher space utilization rate;
  2. For read operations, Collections.SynchronizedList needs to be locked, CopyOnWriteArrayList reads the original array directly, so CopyOnWriteArrayList reads more efficiently;
  3. On the write side, CopyOnWriteArrayList needs to copy arrays when writing, ReentrantLock is locked, Collections.SynchronizedList uses the synchronized keyword, synchronized monitor lock takes more time in high concurrency cases; However, Collections.synchronizedList does not need to copy arrays; Taken together, Collections.SynchronizedList writing may perform better;

Verification

public static void main(String[] args) {
    List<Integer> copyOnWriteArrayList = new CopyOnWriteArrayList<>();
    List<Integer> synchronizedList = Collections.synchronizedList(new ArrayList<>());
    StopWatch stopWatch = new StopWatch();
    int loopCount = 1000;
    stopWatch.start("CopyOnWriteList write");
    /**
     * ThreadLocalRandom:It is JDK 7 that provides concurrent random numbers to resolve competing contention among multiple threads.
     * ThreadLocalRandom Instead of instantiating directly with new, its static method, current(), is used for the first time.
     * Changing from Math.random() to ThreadLocalRandom has the following benefits: We no longer have competition to access the same random number generator instance from multiple threads.
     */
    IntStream.rangeClosed(1, loopCount).parallel().forEach(
            item -> copyOnWriteArrayList.add(ThreadLocalRandom.current().nextInt(loopCount)));
    stopWatch.stop();

    stopWatch.start("Collections.synchronizedList write");
    /**
     * parallelStream Features: Based on server kernel limitations, if you are an octet
     * Only eight threads per thread, no custom thread pool
     */
    IntStream.rangeClosed(1, loopCount).parallel().forEach(
            item -> synchronizedList.add(ThreadLocalRandom.current().nextInt(loopCount)));
    stopWatch.stop();

    System.out.println(stopWatch.prettyPrint());
}

Result:

StopWatch '': running time (millis) = 55
-----------------------------------------
ms     %     Task name
-----------------------------------------
00054  098%  CopyOnWriteList write
00001  002%  Collections.synchronizedList write

CopyOnWriteList takes more time to write than Collections.synchronizedList;

public static void main(String[] args) {
    List<Integer> copyOnWriteArrayList = new CopyOnWriteArrayList<>();
    List<Integer> synchronizedList = Collections.synchronizedList(new ArrayList<>());
    copyOnWriteArrayList.addAll(IntStream.rangeClosed(1, 1000000).boxed().collect(Collectors.toList()));
    synchronizedList.addAll(IntStream.rangeClosed(1, 1000000).boxed().collect(Collectors.toList()));

    int copyOnWriteArrayListSize = copyOnWriteArrayList.size();
    StopWatch stopWatch = new StopWatch();
    int loopCount = 1000000;
    stopWatch.start("CopyOnWriteArrayList read");
    /**
     * ThreadLocalRandom:It is JDK 7 that provides concurrent random numbers to resolve competing contention among multiple threads.
     * ThreadLocalRandom Instead of instantiating directly with new, its static method, current(), is used for the first time.
     * Changing from Math.random() to ThreadLocalRandom has the following benefits: We no longer have competition to access the same random number generator instance from multiple threads.
     */
    IntStream.rangeClosed(1, loopCount).parallel().forEach(
            item -> copyOnWriteArrayList.get(ThreadLocalRandom.current().nextInt(copyOnWriteArrayListSize)));
    stopWatch.stop();

    stopWatch.start("Collections.synchronizedList read");
    int synchronizedListSize = synchronizedList.size();
    /**
     * parallelStream Features: Based on server kernel limitations, if you are an octet
     * Only eight threads per thread, no custom thread pool
     */
    IntStream.rangeClosed(1, loopCount).parallel().forEach(
            item -> synchronizedList.get(ThreadLocalRandom.current().nextInt(synchronizedListSize)));
    stopWatch.stop();

    System.out.println(stopWatch.prettyPrint());
}

Result:

StopWatch '': running time (millis) = 158
-----------------------------------------
ms     %     Task name
-----------------------------------------
00030  019%  CopyOnWriteArrayList read
00128  081%  Collections.synchronizedList read

Collections.synchronizedList takes longer to write than CopyOnWriteList;

Advantages and disadvantages of CopyOnWriteArrayList

Advantage

  1. For some scenarios with more reading and less writing, COW is more appropriate.

    For example, configuration, blacklist, logistics address and so on change very little data, which is a lock-free implementation, which can achieve higher concurrency of programs.

  2. CopyOnWriteArrayList is concurrently secure and performs better than Vector.

    Vectors are add synchronized methods to ensure synchronization, but when each method is executed to obtain a lock, performance will be greatly reduced. CopyOnWriteArrayList only locks additions and deletions, but read without locks, performance is better than Vector.

shortcoming

  1. Data consistency issues: The CopyOnWrite container can only guarantee the final consistency of the data, not the real-time consistency of the data.

    Thread A, for example, iterates over the data in the CopyOnWriteArrayList container. Thread B modified the data in the CopyOnWriteArrayList section between thread A iterations, but thread A iterated over the old data.

  2. Memory usage problem. If CopyOnWriteArrayList frequently adds or deletes data inside it, and objects are large and frequent writes consume memory, causing Java GC problems, consider other containers such as ConcurrentHashMap.

Reference resources:

Interviewer: Do you know how Copy-On-Write is used in Java?

Tags: Java source code

Posted on Wed, 24 Nov 2021 12:53:46 -0500 by lalomarquez