Collection collection (super detailed)

Collection

1, Collection overview

1. What is a collection?

An array is a collection. A collection is a container used to hold data of other data types. The collection will put in the new objects and take them out one by one when in use. It is a carrier.

2. What does the collection store?

Collections cannot directly store basic data types. In addition, collections cannot directly store java objects. The addresses (or references) of java objects are stored in collections.

The collection is used to store other objects. The collection itself is an object, as shown in the following figure:

The reason why the basic data type can be directly added to the following code is that when the int data type is added to the collection, it will be automatically boxed and converted from int to integer object data type

public static void main(String[] args) {
    Collection collection= new ArrayList();
    collection.add(1);
    collection.add(2);
    System.out.println(collection);
}

3. Data structure of collection

In java, the bottom layer of each different set is a different data structure. Adding data to different sets is equivalent to putting the data into different data structures. For example, putting data into set c1 may be placed in the array, and putting data into set c2 may be placed in the binary tree

Using different sets is equivalent to using different data structures.

Arrays, binary trees, linked lists, hash tables... Are common data structures.

4. Which package is assembled in JDK?

The collection is under java.util. * in the JDK, and all collection classes and collection interfaces are under the java.util package.

5.Collection collection

5.1 collective storage mode

Collections are divided into two categories: one is to store objects in a single way, and the other is to store objects in key value pairs.

Single mode: store objects in a single mode directly into the collection. The super parent interface in this kind of collection: java.util.Collection

Key value pair: when storing objects as key value pairs, you need to pass in the object and its name. The super parent interface in this kind of collection: java.util.Map

5.2 set inheritance diagram

All sets inherit the iteratable interface, indicating that all sets are iterative and traversable.

5.3 collection characteristics

List set storage features: orderly and repeatable, and the stored elements have subscripts. Order refers to the order in which they are saved and the order in which they are taken out. It does not mean that they are sorted according to size.

Orderliness can be reflected by the following codes and results:

public static void main(String[] args) {
    List list=new ArrayList();
    list.add(1);
    list.add(2);
    list.add(3);
    list.add(4);
    Iterator it=list.iterator();
    while (it.hasNext()){
        System.out.println(it.next());
    }
}

Set set storage point: it is out of order and cannot be repeated. The order in which the set is stored and the order in which it is taken out are not necessarily the same. The stored elements do not have subscripts, and the elements stored in the set cannot be the same.

Note: the Set set cannot be repeated. This does not mean that adding a duplicate element will report an error, but that the second duplicate element will not be saved. The following codes and results can be reflected:

public static void main(String[] args) {
    Set set=new HashSet();
    set.add(1);
    set.add(1);
    set.add(2);
    set.add(3);
    set.add(3);
    Iterator it=set.iterator();
    while (it.hasNext()){
        System.out.println(it.next());
    }
}

5.4List implementation class

The List implementation classes we want to learn are mainly ArrayList, LinkedList and Vector

ArrayList: the bottom layer adopts the data structure of array. ArrayList is non thread safe.

LinkedList: the bottom layer adopts the data structure of two-way linked list.

Vector: the bottom layer adopts the data structure of array. Vector is thread safe, but it is less used because of its low efficiency and other schemes to ensure thread safety.

5.5Set implementation class

The Set implementation classes we need to learn mainly include HashSet and TreeSet

HashSet: when the HashSet is new, the bottom layer actually creates a HashMap. The data stored in the HashSet is actually stored in the HashMap. The bottom layer of the HashMap set adopts the data structure of hash table.

The bottom layer of HashSet is a HashMap. The following is the source code:

SortedSet: SortedSet is also an inherited Set, so it is out of order and non repeatable. However, the elements put into SortedSet can be sorted automatically. The following code and results:

public static void main(String[] args) {
    Set set=new TreeSet();
    set.add(44);
    set.add(3);
    set.add(65);
    set.add(2);
    Iterator it=set.iterator();
    while (it.hasNext()){
        System.out.println(it.next());
    }
}

TreeSet: TreeSet inherits from SortedSet. When TreeSet is new, the bottom layer actually creates a new TreeMap. Putting data into TreeSet actually stores data in TreeMap. The bottom layer of TreeMap adopts the data structure of binary tree.

The bottom layer of TreeSet is a TreeMap. The following is the source code:

6.Map set

There is no relationship between the Map Collection and the Collection collection. The above Collection sets store elements in a single way, while the Map sets store elements in Key value pairs. The keys of all Map sets do not need to be non repeatable.

The implementation classes of Map collection are mainly HashMap and HashTable. The sub interface has a SortedMap, and SortedMap has a TreeMap implementation class.

6.1HashMap

HashMap bottom layer adopts the data structure of hash table, which is non thread safe.

6.2HashTable

The HashTable bottom layer adopts the data structure of hash table, which is thread safe, inefficient and less used. Now there are other ways to control thread safety.

6.2.1 Properties

Properties is an implementation class under HashTable. Because it inherits HashTable, properties are thread safe. The key and value of properties only support String data types.

6.3SortedMap

SortedMap inherits from Map, so SortedMap also has the characteristics of disorder and non repetition. However, the key elements in the SortedMap set can be automatically arranged according to size, which is called an arrayable set.

6.3.1 TreeMap

TreeMap is the implementation class of SortedMap. The bottom layer adopts the data structure of binary tree, which is out of order and cannot be repeated, but the elements stored in the key will be arranged according to the size.

6.4Map set inheritance diagram

2, Connection common methods

1.add

There is an add() method in connection. If no generic type is added, various types of data can be added to the collection (basic type data is automatically boxed), as follows:

public static void main(String[] args) {
    Collection collection=new ArrayList();
    collection.add(1);
    collection.add(new LinkedList<>());
    collection.add(3.14);
    collection.add("Hello!");
}
2.size

There is a size() method in connection to get the number of elements in the collection.

3.contains

To judge whether a collection contains an element, use the following method:

public static void main(String[] args) {
    Collection collection=new ArrayList();
    collection.add(1);
    collection.add(new LinkedList<>());
    collection.add(3.14);
    collection.add("Hello!");
    boolean contains = collection.contains(3.14);
    System.out.println(contains);
}
3.1 source code analysis

The contains method is used to determine whether there is an element in the collection. This element can be a common type such as int, float, double, or a reference object type such as String. The following is the source code of ArrayList:

public boolean contains(Object o) {
        return indexOf(o) >= 0;
    }

public int indexOf(Object o) {
    return indexOfRange(o, 0, size);
}

int indexOfRange(Object o, int start, int end) {
    Object[] es = elementData;
    if (o == null) {
        for (int i = start; i < end; i++) {
            if (es[i] == null) {
                return i;
            }
        }
    } else {
        for (int i = start; i < end; i++) {
            if (o.equals(es[i])) {
                return i;
            }
        }
    }
    return -1;
}

In the source code of contains, the equals method is invoked, while the equals method compares the content.

The following is the test of String reference object type. The equals method is overridden in the String class, so the result is true:

public static void main(String[] args) {
    Collection collection=new ArrayList();
    String s1=new String("jack");
    String s2=new String("jack");
    collection.add(s1);
    System.out.println(collection.contains(s2));
}

The equals method in Object compares the address of the Object, not the content. The source code is as follows:

The following is a test of user-defined reference Object type. The result is false because the equals method is not overridden in this class, but the euqals method in Object is called, as follows:

public class Test01 {

public static void main(String[] args) {
    Collection collection=new ArrayList();
    User u1=new User("jack");
    User u2=new User("jack");
    collection.add(u1);
    System.out.println(collection.contains(u2));
	}
}

class User{
private  String name;
public User(){};
public User(String name){
    this.name=name;
	}
}

If the equals method in the User class is overridden, the result is true.

3.2containsJVM diagram

The following figure is the JVM memory diagram when the program is running. You can see that putting String objects a and b into the collection actually puts the memory address of the object into the collection, and the memory address of x object is not in the collection. The contains method in the collection uses the equals method instead of the simple "= =", otherwise it will only compare the object address, so all java objects put into the collection should override the equals method.

3.3 conclusion

From the analysis in 3.1, the java objects stored in the collection should override the equals method.

4.remove

Delete an element in the collection as follows:

public static void main(String[] args) {
    Collection collection=new ArrayList();
    collection.add(1);
    collection.add(new LinkedList<>());
    collection.add(3.14);
    collection.add("Hello!");
    collection.remove(3.14);
}
4.1 source code analysis

The remove source code, like the contains source code, calls the equals method. The following is the source code of the remove method in the ArryList collection. Because the contains has written a relevant introduction, this section only introduces the remove source code:

5.isEmpty

Judge whether there are elements in the collection. If it is empty, return true. The method is as follows:

public static void main(String[] args) {
    Collection collection=new ArrayList();
    collection.add(1);
    collection.add(new LinkedList<>());
    collection.add(3.14);
    collection.add("Hello!");
    boolean flag = collection.isEmpty();
    System.out.println(flag);
}
6.toArray

Convert a set into an array using the following method:

public static void main(String[] args) {
    Collection collection=new ArrayList();
    collection.add(1);
    collection.add(new LinkedList<>());
    collection.add(3.14);
    collection.add("Hello!");
    Object[] objects = collection.toArray();
    for (Object object : objects) {
        System.out.println(object);
    }
}
7. Iterator (key)

A Collection traversal / iteration method, which can only be used in Collection, not in Map.

Iterator is a method of Collection inheriting iteratable. This method returns iterator type, which is called iterator.

7.1 iterator working diagram

7.2 iterator method

There are two methods commonly used in iterators: hasNext(), next()

hasNext: judge whether there is an element at the next position pointed by the iterator. If there is an element, return true.

next: advances the iterator one bit and takes out the element pointed to.

7.3 application method
public static void main(String[] args) {
    Collection collection=new ArrayList();
    collection.add(1);
    collection.add(new Object());
    collection.add(3.14);
    collection.add("Hello!");
    //Get iterator
    Iterator it=collection.iterator();
    //Traversal iterator
    while (it.hasNext()){
        System.out.println(it.next());
    }
}
7.4 key points

When the structure of the elements in the collection changes, the iterator needs to retrieve them.

The following code is obtained by the iterator before adding elements, and the elements added later cannot be obtained by the iterator. The code and results are as follows:

public static void main(String[] args) {
    Collection c=new ArrayList();
    Iterator it=c.iterator();
    c.add(1);
    c.add(2);
    c.add(3);
    while (it.hasNext()){
        System.out.println(it.next());
    }
}

As long as the collection is added or deleted after obtaining the iterator, the above exception will be reported.

Why do we need to retrieve iterators after the collection structure changes?

Obtaining the collection iterator object is equivalent to obtaining a snapshot of the collection state at this time. When the iterator iterates, it will refer to the snapshot iteration. After the collection structure is changed, the previous snapshot is different from the changed collection structure.

The bottom layer will have an inspection mechanism to check whether the structural state of the snapshot and the current collection is the same at any time. If it is different, an exception will be reported.

3, List specific common methods

1.add

The full name of the method is: add(inde index,Object element). There is also an add() method in the Collection set, but a method that can add elements according to subscripts is added in the List set. The usage method is as follows:

public static void main(String[] args) {
    List list=new ArrayList();
    list.add(1);
    list.add(2);
    list.add(3);
    list.add(2,"nihao");
    Iterator it = list.iterator();
    while (it.hasNext()){
        System.out.println(it.next());
    }
}

Attention

1. Since the bottom layer of the ArrayList set is an array, the subscript starts from 0.1.2.3... So the element is inserted according to this

2. The bottom layer of ArrayList set is array. The efficiency of inserting elements is too low, and the efficiency of querying elements is high, so this method is not used very much.

2.get

The full name of the method is: get(int index), which can obtain elements according to subscripts. The bottom layer of ArrayList is array, which has high query efficiency. The use method is as follows:

public static void main(String[] args) {
    List list=new ArrayList();
    list.add(1);
    list.add(2);
    list.add(3);
    list.add(2,"nihao");
    Object o = list.get(3);
    System.out.println(o);
}
3.indexOf

The full name of the method: indexOf(Object o). You can obtain the subscript of the element for the first time according to the element. The usage method is as follows:

public static void main(String[] args) {
    List list=new ArrayList();
    list.add(1);
    list.add(2);
    list.add(3);
    list.add(2,"nihao");
    int i = list.indexOf("nihao");
    System.out.println(i);
}
4.remove

The full name of the method: remove(int index). You can delete the elements in the collection according to the subscript. The usage method is as follows:

public static void main(String[] args) {
    List list=new ArrayList();
    list.add(1);
    list.add(2);
    list.add(3);
    list.add(2,"nihao");
    list.remove(1);
}
5.set

Full name of the method: set(int index,Object element). You can change the elements in the set according to the subscript. The usage method is as follows:

public static void main(String[] args) {
    List list=new ArrayList();
    list.add(1);
    list.add(2);
    list.add(3);
    list.add(2,"nihao");
    list.set(2,"Hello");
}

4, ArrayList source code analysis

The ArrayList collection is non thread safe.

At the bottom of the ArrayList collection is an array of Object type. The source code is as follows:

// non-private to simplify nested class access
transient Object[] elementData; 

There are two ways to create an ArrayList set: one is no parameter, and the other is parameter. You can pass in int type data and create an array with an initial capacity of this data size. The source code is as follows:

//Nonparametric structure
public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}
//Parametric structure
public ArrayList(int initialCapacity) {
    if (initialCapacity > 0) {
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: "+initialCapacity);
    }
}

Create an ArrayList set test with and without parameters. The code is as follows:

List list1=new ArrayList();
List list2=new ArrayList(20);
1. Initial capacity

In earlier versions of JDK, an array with an initial capacity of 10 will be created directly.

After JDK8, an array with a length of 0 will be created first. When the first element is added, the length of the array is 10. The source code is as follows:

/**
 * Constructs an empty list with an initial capacity of ten.
 */
public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

/**
 * Shared empty array instance used for default sized empty instances. We
 * distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when first element is added.
 */
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
2. Automatic capacity expansion

Because the bottom layer of ArrayList is array, when the array capacity is insufficient, it needs to be expanded automatically.

In the source code, the capacity expansion growth is that the old capacity is shifted to the right by one bit (bit operation), which is equivalent to that the capacity expansion growth is 1 / 2 of the old capacity. For example, if the old capacity is 4 and the capacity expansion growth is 4 / 2 = 2, then the growth multiple is 6 / 4 = 1.5 times, so the old capacity is 1.5 times the new capacity. The source code is as follows:

[little knowledge points] bit operation: 10 > > 1

Calculate the value of moving 10 to the right by one bit. The binary of 10 is 1010, and the first one on the left is 0. Move the whole to the right, and the result is 0101 and 5.

3. Turn to thread safe

ArrayList collection is non thread safe. How to convert ArrayList collection into thread safe?

Using the collection tool class: java.util.Collections

The following is the code to convert the ArrayList collection to thread safe:

public static void main(String[] args) {

    //The ArrayList collection is non thread safe
    List list=new ArrayList();
    //Become thread safe
    Collections.synchronizedList(list);
    list.add(1);
    list.add(2);
    list.add(3);
}

5, LinkedList source code analysis

The bottom layer of the LinkedList set is a two-way linked list. You can see in the construction method of adding nodes when adding elements:

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev;
    }
}
1. Initialization

When constructing a new LinkedList set without parameters, an empty set will be constructed first. The following is the source code for constructing a LinkedList set:

/**
 * Pointer to first node.
 */
//Always refer to the first node
transient Node<E> first;

/**
 * Pointer to last node.
 */
//A reference that always points to the tail node
transient Node<E> last;
/**
 * Constructs an empty list.
 */
//Nonparametric structure
public LinkedList() {
}
2. Add elements

When adding elements to the collection, a new node will be created. The first reference always points to the header node and the last reference always points to the tail node.

The following is the source code for adding elements. Later, we will explain how to add elements according to the source code.

/**
 * Pointer to first node.
 */
transient Node<E> first;//Reference to the first node

/**
 * Pointer to last node.
 */
transient Node<E> last;//Reference to tail node
//Method of adding elements
public boolean add(E e) {
    linkLast(e);
    return true;
}

void linkLast(E e) {
    final Node<E> l = last;
    final Node<E> newNode = new Node<>(l, e, null);
    last = newNode;
    if (l == null)
        first = newNode;
    else
        l.next = newNode;
    size++;
    modCount++;
}

When the collection is just created empty, last and first are null, and the first element is added, the code analysis diagram is as follows:

When adding non first elements to the collection, the code analysis diagram is as follows:

The following is the combination diagram of the above two, as follows:

6, Vector source code analysis

1. Underlying data structure

Like the underlying data structure of the ArryList set, the Vector set is an array. The following is the source code

@SuppressWarnings("serial") // Conditionally serializable
protected Object[] elementData;
2. Initial capacity

After the Vector collection constructs a new object without parameters, it will generate an array with a capacity (initialCapacity) of 10 and a capacity increment of 0. At this time, the number of elements (elementCount) is 0. When the add method is used, the number of elements will increase automatically. You can read it according to the comments of the source code. The following is the source code:

/**
 * Constructs an empty vector so that its internal data array
 * has size {@code 10} and its standard capacity increment is
 * zero.
 */
public Vector() {
    this(10);
}

/**
* Constructs an empty vector with the specified initial capacity and
* with its capacity increment equal to zero.
*
* @param   initialCapacity   the initial capacity of the vector
* @throws IllegalArgumentException if the specified initial capacity is negative
*/
public Vector(int initialCapacity) {
	this(initialCapacity, 0);
}

/**
 * Constructs an empty vector with the specified initial capacity and
 * capacity increment.
 *
 * @param   initialCapacity     the initial capacity of the vector
 * @param   capacityIncrement   the amount by which the capacity is
 *                              increased when the vector overflows
 * @throws IllegalArgumentException if the specified initial capacity is negative
 */
public Vector(int initialCapacity, int capacityIncrement) {
    super();
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal Capacity: "+
                                           initialCapacity);
    this.elementData = new Object[initialCapacity];
    this.capacityIncrement = capacityIncrement;
}
3. Automatic capacity expansion

Test the automatic capacity expansion, add 11 elements, add breakpoints in the 11th element, and add breakpoints in the growth () method. It can be concluded that * * new capacity is twice the old capacity** The following is a screenshot of debug:

The following is the source code for adding elements to automatic capacity expansion:

//Call the add method
public synchronized boolean add(E e) {
    modCount++;
    add(e, elementData, elementCount);
    return true;
}
//The add private method called above
private void add(E e, Object[] elementData, int s) {
    if (s == elementData.length)
        elementData = grow();//Expansion method
    elementData[s] = e;
    elementCount = s + 1;
}
//Capacity expansion method
private Object[] grow() {
    return grow(elementCount + 1);//The parameter is the current total number of elements + 1
}
//The parameter is the current total number of elements + 1
private Object[] grow(int minCapacity) {
    int oldCapacity = elementData.length;
    int newCapacity = ArraysSupport.newLength(oldCapacity,
            minCapacity - oldCapacity, /* minimum growth */
            capacityIncrement > 0 ? capacityIncrement : oldCapacity
                                       /* preferred growth */);
    return elementData = Arrays.copyOf(elementData, newCapacity);
}
4. Thread synchronization

All methods in Vector are synchronized. They are thread synchronized, thread safe, inefficient and less used.

7, HashSet source code analysis

1.HashSet features

The HashSet set is out of order and cannot be repeated. Out of order means that the order of saving may be different from that of taking out. Non repetition means that the same element can only be saved once, but repeated saving will not report an error.

2. Underlying data structure

When creating a HashSet, a HashMap is actually new at the bottom. The source code is as follows:

/**
 * Constructs a new, empty set; the backing {@code HashMap} instance has
 * default initial capacity (16) and load factor (0.75).
 */
public HashSet() {
    map = new HashMap<>();
}

Adding elements to the HashSet set is actually adding elements to the key of the HashMap. The source code is as follows:

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

8, TreeSet source code analysis

1.TreeSet features

TreeSet collections are also unordered and non repeatable, but the stored elements can be sorted by size.

2. Underlying data structure

When creating a TreeSet, a TreeMap is created at the bottom. The source code is as follows:

public TreeSet() {
    this(new TreeMap<>());
}

Adding elements to TreeSet actually adds elements to the key part of TreeMap, and the key part of TreeMap can be sorted according to size. The source code is as follows:

public boolean add(E e) {
    return m.put(e, PRESENT)==null;
}

Tags: Java Interview set

Posted on Sun, 19 Sep 2021 02:01:05 -0400 by ajcalvert