Implementation principle of ArrayList (JDK 1.8)

Implementation principle of ArrayList (JDK 1.8)

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable

ArrayList inherits from AbstractList and implements the List interface. In fact, AbstractList has implemented the List interface. Repeated implementation here makes the interface function clearer, as is the case for many classes in JDK.

Among them, clonable interface is clone tag interface, Serializable serializable tag interface, which requires clone and Serializable function. RandomAccess is just a flag interface, which means that this class supports fast random access, and the for loop is better than iterator in loop traversal.

1. Member variable

   // Default initial capacity
   private static final int DEFAULT_CAPACITY = 10;

   // Empty array instance, used when the initial capacity is 0 or the incoming collection is empty (not null)
   private static final Object[] EMPTY_ELEMENTDATA = {};
   
   // Empty array example, used when no parameter construction
   private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
   
   // ArrayList internal data container
   transient Object[] elementData; // non-private to simplify nested class access
   
   // Actual number of elements
   private int size;

In ArrayList, there are mainly five member variables. DEFAULT_CAPACITY indicates the initial capacity size, that is, when we initialize ArrayList, we do not specify the capacity size, the default capacity will be 10, and Object[] elementData is easy for the actual storage of objects inside ArrayList, that is to say, ArrayList is implemented by array.

In 1.8, the empty array is divided into two kinds of situations, namely, empty element data and defaultaccess element data. Different situations are distinguished when the empty array is marked.

2. Construction method

ArrayList has three construction methods: ArrayList(int initialCapacity) with specified capacity, ArrayList() without parameters and ArrayList (collection <? Extends E > C) with incoming collection.

    public ArrayList() {
        this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
    }

The simplest is the no parameter construction, which is directly assigned to the empty array defaultaccess? Empty? Elementdata. In fact, the default capacity of 10 is handled when the add() method is called for the first time, not in the construction method.

    public ArrayList(int initialCapacity) {
        if (initialCapacity > 0) {
            this.elementData = new Object[initialCapacity];
        } else if (initialCapacity == 0) {
            this.elementData = EMPTY_ELEMENTDATA;
        } else {
            throw new IllegalArgumentException("Illegal Capacity: "+
                    initialCapacity);
        }
    }

For the construction method of the incoming capacity, when the incoming parameter > 0, directly initialize the array of the corresponding capacity. The parameter type is int, that is, the maximum initial capacity of the ArrayList cannot exceed integer.max'value. In fact, the maximum capacity of the ArrayList can only be integer.max'value. When the initial capacity is passed in 0, it will be assigned to the empty array "element data". If < 0, this is obviously not allowed. Direct IllegalArgumentException

    public ArrayList(Collection<? extends E> c) {
        elementData = c.toArray();
        if ((size = elementData.length) != 0) {
            // c.toArray might (incorrectly) not return Object[] (see 6260652)
            if (elementData.getClass() != Object[].class)
                elementData = Arrays.copyOf(elementData, size, Object[].class);
        } else {
            // replace with empty array.
            this.elementData = EMPTY_ELEMENTDATA;
        }
    }

When the collection is constructed, there is no null check, that is to say, if NULL is passed in, NPE exception will be directly generated. The logic of the collection construction is also very simple. When the incoming collection is not empty, call Arrays.copyOf to copy, and the capacity size is the incoming size, while the incoming collection is empty, then the value is empty array, and the value is empty.

3. Add elements

When ArrayList adds elements, it will confirm the capacity, which may involve capacity expansion and array replication, so the efficiency is relatively low. At the same time, when adding elements, ArrayList does not verify the elements themselves, so it allows null in the collection.

3.1. Add elements at the end
    public boolean add(E e) {
        // Fixed capacity
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        // Set value
        elementData[size++] = e;
        return true;
    }

In the add() method, the main thing is to determine the capacity of the ensurcapacityinternal (int mincapacity) method.

    private void ensureCapacityInternal(int minCapacity) {
        ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
    }

First call calculateCapacity(Object[] elementData, int minCapacity) to calculate capacity, and then ensure explicitcapacity (int mincapacity)

    private static int calculateCapacity(Object[] elementData, int minCapacity) {
        if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
            return Math.max(DEFAULT_CAPACITY, minCapacity);
        }
        return minCapacity;
    }

Here, it is only determined whether the array is empty. If there is any impression before, it will only be initialized to default access [empty] element data when there is no parameter construction. At this time, the larger value of default access (10) and incoming minCapacity will be taken. The default capacity size 10 is also born here.

In other cases, it's direct but minCapacity, that is, size + 1. If it's added for the first time, it's 1.

    private void ensureExplicitCapacity(int minCapacity) {
        modCount++;
        // overflow-conscious code
        if (minCapacity - elementData.length > 0)
            grow(minCapacity);
    }

modCount is an operation counter, both add and remove will be + 1. When we need to delete the ArrayList element in the loop, we need to use the remove() method of Iterator. At this time, if we directly use the delete of List, there is a verification against modCount, and a ConcurrentModificationException exception will be thrown.

If minCapacity is larger than the array capacity, call grow(int minCapacity) to expand the capacity.

    private void grow(int minCapacity) {
        // overflow-conscious code
        int oldCapacity = elementData.length;
        // 0.5x increase in new capacity
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        if (newCapacity - MAX_ARRAY_SIZE > 0) // MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }

When expanding, the new capacity is half of the original capacity + the original capacity, which is 0.5 times the growth. If the new capacity after growth is smaller than the calculated capacity minCapacity, it is assigned as minCapacity. If it is larger than max array size (integer.max value - 8), it enters the hugeCapacity(int minCapacity) method.

    private static int hugeCapacity(int minCapacity) {
        if (minCapacity < 0) // overflow
            throw new OutOfMemoryError();
        return (minCapacity > MAX_ARRAY_SIZE) ?
            Integer.MAX_VALUE :
            MAX_ARRAY_SIZE;
    }

As you can see here, when minCapacity < 0, OutOfMemoryError will be generated. This is an Error subclass, which needs to be avoided. When will minCapacity be less than 0? When the ArrayList size is integer.max? Value and the capacity needs to be expanded, an Error will occur.

In this method, we can see that when the required capacity of the ArrayList is greater than the max array size for the first time, it will be set to the max array size, and then it will become integer.max value when the capacity is expanded again. If it is not enough, an error will occur.

The last step of the expansion is to call Arrays.copyOf to copy the elements, which is also called System.arraycopy to operate. At the same time, size + +, the number of actual elements is increased by 1.

3.2. Add elements in the middle
    public void add(int index, E element) {
        rangeCheckForAdd(index);
        // Confirm capacity
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        System.arraycopy(elementData, index, elementData, index + 1, size - index);
        elementData[index] = element;
        size++;
    }

The logic of adding elements in the middle is basically the same as adding elements at the end.

    private void rangeCheckForAdd(int index) {
        if (index > size || index < 0)
            throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
    }

Before adding an element, you need to check the range first. The added range can only be between [0, size]. When index == size, it is actually the tail insertion. Then confirm the new capacity of the capacity. It has been mentioned when adding the tail of this method. Then copy the array. This step will skip the processing of the index position. Finally, assign a value to the index position, that is, add the index position.

You can see that the last call size++, add(int index, E element) method always adds elements, even if the index location has data, it will only squeeze the original index position data back to one place, and will not cover.

3.3. Batch adding

In addition to add() and add(int index, E element), ArrayList also has two methods for batch adding.

    public boolean addAll(Collection<? extends E> c) {
        Object[] a = c.toArray();
        int numNew = a.length;
        // Capacity confirmation
        ensureCapacityInternal(size + numNew);  // Increments modCount
        System.arraycopy(a, 0, elementData, size, numNew);
        size += numNew;
        return numNew != 0;
    }

    public boolean addAll(int index, Collection<? extends E> c) {
        // Scope check
        rangeCheckForAdd(index);
        Object[] a = c.toArray();
        int numNew = a.length;
        // Capacity confirmation
        ensureCapacityInternal(size + numNew);  // Increments modCount
        int numMoved = size - index;
        if (numMoved > 0)
            System.arraycopy(elementData, index, elementData, index + numNew, numMoved);
        System.arraycopy(a, 0, elementData, index, numNew);
        size += numNew;
        return numNew != 0;
    }

With the addition basis of the previous single element, batch addition is easy to understand. The only difference is that when the array is copied, the whole set to be added is copied. For batch adding of index location, if you insert it in the middle (nummoved > 0), the first copy will free up the location where you want to add the set length in the middle, and the second copy the added set to the index location.

4. Modify element

For the modification of elements in ArrayList, if it is the modification of object attributes, you can modify the reference objects directly. But for the basic type wrapper class or String, there is no way to modify them by reference, or we need to replace the object references. In this case, you need to call set(int index, E element).

    public E set(int index, E element) {
        // Scope check
        rangeCheck(index);
        E oldValue = elementData(index);
        elementData[index] = element;
        return oldValue;
    }

This method is easy to implement. The essence of ArrayList modification is to change the array value. First, check the range to prevent the array from crossing the boundary. This is a good understanding. ArrayList is an array, and then replace the value of index position.

    private void rangeCheck(int index) {
        if (index >= size)
            throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
    }

elementData(int index) gets the original value, which is used to set the return value. elementData implementation is simpler, that is, array value.

5. Remove elements

There are three ways to remove elements in ArrayList: remove by index (int index), remove by element (object o), and bulk remove all (collection <? > C).

5.1. Index deletion
    public E remove(int index) {
        // Scope check
        rangeCheck(index);
        modCount++;
        E oldValue = elementData(index);
        int numMoved = size - index - 1;
        // Delete the end of
        if (numMoved > 0)
            System.arraycopy(elementData, index+1, elementData, index, numMoved);
        elementData[--size] = null; // clear to let GC do its work
        return oldValue;
    }

Because the removal of elements does not involve the internal array size change, the implementation is relatively simple. It's no stranger to the necessary range check, and then judge whether it's tail deletion. If it's not tail deletion, copy System.arraycopy. The purpose of copying is to move the element after index forward by one bit to cover the index position to be deleted, and then reduce the size by one.

In the removal method, you can see that modCount is increased. At the same time, the element at the end of the removal is assigned null for GC to take effect.

5.2. Delete by element
    public boolean remove(Object o) {
        if (o == null) {
            for (int index = 0; index < size; index++)
                if (elementData[index] == null) {
                    fastRemove(index);
                    return true;
                }
        } else {
            for (int index = 0; index < size; index++)
                if (o.equals(elementData[index])) {
                    fastRemove(index);
                    return true;
                }
        }
        return false;
    }

When deleting by element, we first determine whether the element is null, because null can be added in ArrayList. The logic of different branches is the same here, which is whether the traversal collection comparison is the same as the incoming element, but the comparison is = = null and equals. If they are the same, delete them and return them, so the remove(Object o) method will only delete the first element of the collection that is the same as the passed in object.

The point is this fastRemove.

    private void fastRemove(int index) {
        modCount++;
        int numMoved = size - index - 1;
        if (numMoved > 0)
            System.arraycopy(elementData, index+1, elementData, index, numMoved);
        elementData[--size] = null; // clear to let GC do its work
    }

What's the first feeling of seeing this method? Is it familiar? Yes, fastRemove is basically the same as pointer based deletion, except for the two steps of range verification and getting the elements before deletion.

5.3. Batch deletion
    public boolean removeAll(Collection<?> c) {
        Objects.requireNonNull(c);
        return batchRemove(c, false);
    }

For removeAll (Collection<? > C), batchRemove (Collection<? > C, Boolean complement) is called after verification is non empty.

    private boolean batchRemove(Collection<?> c, boolean complement) {
        final Object[] elementData = this.elementData;
        int r = 0, w = 0;
        boolean modified = false;
        try {
            for (; r < size; r++)
                // Find the elements that do not need to be removed and put them in front of the array
                if (c.contains(elementData[r]) == complement)
                    elementData[w++] = elementData[r];
        } finally {
            // Preserve behavioral compatibility with AbstractCollection,
            // even if c.contains() throws.
            if (r != size) {
                System.arraycopy(elementData, r, elementData, w, size - r);
                w += size - r;
            }
            if (w != size) {
                // clear to let GC do its work
                for (int i = w; i < size; i++)
                    elementData[i] = null;
                modCount += size - w;
                size = w;
                modified = true;
            }
        }
        return modified;
    }

This method may seem a little bit convoluted, but it will be very clear after understanding its principle. First, traverse the array, find out the elements not included in the array to be removed, and start from the head of the original array. This number has W, that is, the first w elements of the array are included in the collection c, and the remaining elements do not care. Finally, the element assignment from w to size is null For GC to work.

6. Circular deletion

As mentioned earlier, when the ArrayList is deleted in a circular way, it will report an error. What's the matter?

If we want to delete all one element in a collection, such as the a element in the following set ss.

        List<String> ss = new ArrayList<>();
        ss.add("a");
        ss.add("b");
        ss.add("a");
        ss.add("b");
        ss.add("c");

When we need to delete one, we can call the remove method to delete, according to the index or according to the elements, but when there are multiple, we don't know the index of each element, and according to the value, we don't know how many a exist, so we need to traverse the collection.

Then there may be a problem.

        for (String s : ss) {
            if("a".equals(s)){
                ss.remove(s);
            }
        }

java.util.ConcurrentModificationException will be thrown no matter for I or foreach deletion, because the internal class Itr.next() method will be called for each value of Arraylist loop.

        public E next() {
            // Verify modCount
            checkForComodification();
            int i = cursor;
            if (i >= size)
                throw new NoSuchElementException();
            Object[] elementData = ArrayList.this.elementData;
            if (i >= elementData.length)
                throw new ConcurrentModificationException();
            cursor = i + 1;
            return (E) elementData[lastRet = i];
        }

At the beginning of this method, there is a checkForComodification() method to verify modCount. In this method, modCount and expectedModCount are compared. If they are not equal, a ConcurrentModificationException exception will be thrown.

        final void checkForComodification() {
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
        }

What is expectedModCount? Why is it not equal to modCount.

    private class Itr implements Iterator<E> {
        int cursor;       // index of next element to return
        int lastRet = -1; // index of last element returned; -1 if no such
        int expectedModCount = modCount;

expectedModCount is the member variable of Itr, which will be initialized and assigned as modCount during the cycle. At the beginning, they are equal. After previous research, we have known that modCount will increase automatically during the remove call, so checkForComodification will throw an exception.

We often use the method of removing with Itr.

        Iterator<String> it = ss.iterator();
        while (it.hasNext()){
            if("a".equals(it.next())){
                it.remove();
            }
        }

In this way, there is no problem when deleting. This is because in Itr's remove, expectedModCount is reassigned so that the value is equal after each call.

        public void remove() {
            if (lastRet < 0)
                throw new IllegalStateException();
            checkForComodification();
            try {
                // Call the deletion of ArrayList
                ArrayList.this.remove(lastRet);
                cursor = lastRet;
                lastRet = -1;
                // expectedModCount reassignment
                expectedModCount = modCount;
            } catch (IndexOutOfBoundsException ex) {
                throw new ConcurrentModificationException();
            }
        }

7. Other methods

The main methods in ArrayList are construction method, add and remove. After these methods are understood, the implementation of other methods will be clear.

For example, the get method actually gets the elements of the array according to the index.

    public E get(int index) {
        // Scope check
        rangeCheck(index);
        // Get the value from the array, that is, elementData[index]
        return elementData(index);
    }

For example, the size method returns the value of the size attribute.

    public int size() {
        return size;
    }

The isEmpty method is to judge whether the size is 0

    public boolean isEmpty() {
        return size == 0;
    }

In ArrayList, there is a sublist method to get subsets. This method returns an internal class sublist, which does not recreate a new array and still holds the reference of elements of ArrayList array. Therefore, when modifying ArrayList elements, the elements of sublist will also be modified. This must be noted in actual development.

    public List<E> subList(int fromIndex, int toIndex) {
        subListRangeCheck(fromIndex, toIndex, size);
        return new SubList(this, 0, fromIndex, toIndex);
    }

Tags: Java JDK less Attribute

Posted on Wed, 04 Dec 2019 17:22:53 -0500 by Copyright