My jdk source code: String is a special and powerful class!

1, Overview

String class is a special existence in java. Although it is not a basic type, it is frequently used. This is mainly because the value of the string object is a constant. We can see why it is a constant in the source code below. Because of this feature, it is thread safe. Next, let's analyze the source code in depth and uncover the mystery of string class!

2, Source code

(1) the source code of the class definition is as follows

//String is of final type, which cannot be overwritten
public final class String
        implements java.io.Serializable, Comparable<String>, CharSequence {

We can see that the String class is modified by the keyword final. As mentioned in the previous article, the class modified by final is the final class, which cannot be inherited. It also implements serializable interface, comparable interface and charSequence interface. So what's the use of implementing these three interfaces?

* implement the serializable interface: the serializable interface is the same as the clonable interface. It is the mark interface of the JVM. If you want to serialize the objects of some classes, they must implement the serializable interface. Without implements Serializable, you cannot provide remote calls through RMI (including ejb). Avi's RMI(remote method invocation).RMI allows objects on a remote machine to be manipulated as if they were on the local machine. When sending a message to a remote object, you need to use the serializaiton mechanism to send parameters and receive a direct return. The state information of Java's JavaBeans. Bean s is usually configured at design time. The state information of a Bean must be stored so that it can be recovered when the program is running. This also requires the serializaiton mechanism. serialization allows you to convert objects that implement the serializable interface into byte sequences, which can be fully stored for later regenerating the original objects. serialization can be done not only on the local machine, but also on the network (RMI of cat novel). This benefit is great.

* implement the comparable interface: this interface forces the overall sorting of the objects of each class that implements it. This sorting is called the natural sorting of classes. String implements this interface to rewrite the compareTo method. The source code is parsed below.

* implement the charSequence interface: implement the length(), charAt(), chars() methods provided by charSequence to obtain IntStream stream, etc.

(2) the member variables are as follows:

    //All the strings received by the constructor are stored in the array of char type (byte array is used after 1.9 to save space under different byte encoding). Because value is of final type, it is immutable
    private final char value[];

    // It is used to store the hash value of String class object through hashcode()
    private int hash; // Default to 0

    // Identity to implement serialization
    private static final long serialVersionUID = -6849794470754667710L;

(3) constructor, mainly look at the following five constructor source codes:

    public String() {
        this.value = "".value;
    }

    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

    public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

    public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBounds(bytes, offset, length);
        this.value =  StringCoding.decode(charset, bytes, offset, length);
    }

    public String(byte bytes[], String charsetName)
            throws UnsupportedEncodingException {
        this(bytes, 0, bytes.length, charsetName);
    }

Note:

* when we use String str=new String(), we generate an empty string of '' instead of null.          

* as jdk1.8 and before, char array was used to store in String class, we can see that when calling constructor to pass char array, Array.copyof method was used to directly copy content in constructor.

* we also use the new String("string". getBytes("iso-8859-1"), "utf-8") method to decode strings.

(4) for important method analysis, the first is our equals(Object anObject) method. The source code is as follows:

    //To judge whether the content is equal, first judge whether the object is equal, because the string object cannot be modified, and compare the character arrays 
    public boolean equals(Object anObject) {
        //Determine whether it is the same reference object
        if (this == anObject) {
            return true;
        }
        //Judge whether it is of String type
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                //Determine whether two characters are the same length
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                //Unequal returns true directly
                return true;
            }
        }
        return false;
    }

We find that the String class overrides the equals method, so the hashcode method must be overridden below. Hashcode is used for fast access of hash data. For example, when using the HashSet/HashMap/HashTable class to store data, whether it is the same is determined according to the hashcode value of the storage object. If we rewrite the equals method for an object, that is, as long as the value of the member variable of the object is equal, then equals returns true, but does not rewrite the hashcode method, then when we create a new object, when the original object. Equals (new object) is equal to true, the hashcode values of the two are not equal. Thus, there is a discrepancy in understanding. For example, when storing hash sets (such as Set classes), two identical objects will be stored, leading to confusion. Therefore, the hashcode method must be rewritten.

Generally speaking, it is for hashmap or hashset to avoid logical conflicts. For example, you create a Map and then new a student object, assign unique attributes, such as name, height and weight, and then give the student number Map.put (stu, As like as two peas, 23, when you get(stu) can get the student number, then you new a person exactly the same is student, if not rewrite hashcode, then default will use hash address algorithm, get a different hashcode value, this time you get student's student number can not get, because hashcode is different.

(5) the source code of hashCode() method is as follows:

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;
            //The calculation process of hash value is different from that of Object class, which is directly generated by using local methods according to the storage address
            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

In this paper, 31 is used as the calculation constant because 31 is a singular prime number, so 31 * I = 32 * I-I = (I < < 5) - I. The calculation of this kind of displacement combined with subtraction is much faster than the general calculation.  

(6) the source code of charAt(int index) method is as follows:

    //According to the index value, get the value, that is, get the character array
    public char charAt(int index) {
        if ((index < 0) || (index >= value.length)) {
            throw new StringIndexOutOfBoundsException(index);
        }
        return value[index];
    }

(7) CompareTo (string another string) and compareToIgnoreCase(String str) methods

    public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;
            }
            k++;
        }
        return len1 - len2;
    }

We can clearly see the judgment logic. Next to comparing the value of char characters, different values will return the difference of char characters written in the previous string in the same position as the target string in the specific coordinate position, instead of comparing the following char characters. If the comparison reaches the minimum length of two strings and no result is given, the difference between the length of the previous string and the length of the subsequent payment string is directly returned.

    public int compareToIgnoreCase(String str) {
        return CASE_INSENSITIVE_ORDER.compare(this, str);
    }

This method is consistent with the compareTo() method and the compareTo() method except that case is ignored. Case? Innovative? Order is a comparator defined in the source code. The source code is as follows:

public static final Comparator<String> CASE_INSENSITIVE_ORDER = new CaseInsensitiveComparator();

(8) concat(String str) method

    public String concat(String str) {
        //Length of new string
        int otherLen = str.length();
        //If the new string length is 0, the original string will be returned directly
        if (otherLen == 0) {
            return this;
        }
        //Length of original string
        int len = value.length;
        //The character array (len+otherlen) storing the final string. Copy the source array through the copyOf method of Arrays class
        char buf[] = Arrays.copyOf(value, len + otherLen);
        //The splicing string is spliced into the source string through the getChars method, and then the new string is returned.
        str.getChars(buf, len);
        return new String(buf, true);
    }

API also explains this method:

* this String object is returned if the length of the parameter String is 0.

* otherwise, create a new String object to represent the character sequence formed by connecting the character sequence represented by this String object and the character sequence represented by the parameter String.

(9) indexOf(String str) and indexOf(String str, int fromIndex) methods

    public int indexOf(String str) {
        return indexOf(str, 0);
    }
 
    public int indexOf(String str, int fromIndex) {
        return indexOf(value, 0, value.length,
                str.value, 0, str.value.length, fromIndex);
    }
    //The core implementation code is decorated with static because the indexOf(String str, int fromIndex) of AbstractStringBuilder class calls this method, but it is the method of other classes that can only be called in a static way, which is convenient for other classes to call
    static int indexOf(char[] source, int sourceOffset, int sourceCount,
            char[] target, int targetOffset, int targetCount,
            int fromIndex) {
        if (fromIndex >= sourceCount) {
            return (targetCount == 0 ? sourceCount : -1);
        }
        if (fromIndex < 0) {
            fromIndex = 0;
        }
        if (targetCount == 0) {
            return fromIndex;
        }
 
        char first = target[targetOffset];
        //Find the maximum position to traverse, because we may not need to traverse all the way to the end
        int max = sourceOffset + (sourceCount - targetCount);
 
        for (int i = sourceOffset + fromIndex; i <= max; i++) {
            /* Look for first character. */
            //I think it's better to write here. Find the position of the first equal character. If it's not equal, add it all the time. Pay attention to the boundary
            if (source[i] != first) {
                while (++i <= max && source[i] != first);
            }
 
            /* Found first character, now look at the rest of v2 */
            //Judge the lower boundary again. If it is larger than the boundary, you can directly return to - 1
            if (i <= max) {
                int j = i + 1;
                int end = j + targetCount - 1;
                //This loop finds exactly the same length as the target string
                for (int k = targetOffset + 1; j < end && source[j]
                        == target[k]; j++, k++);
               //If the exact length of the target string is equal to the length of the target string, it is considered to be found
                if (j == end) {
                    /* Found whole string. */
                    return i - sourceOffset;
                }
            }
        }
        return -1;
    }

         This method will first find the first occurrence position of the header character of the substring in this string, and then start at the next position of this position, and then compare the characters of the substring with the characters in this string in turn. If all the characters are equal, the position of the header character in this string will be returned. If there are any unequal characters, continue to search in the remaining string Find, continue the above process until a substring is found or - 1 is returned.

(10) split(String regex) and split(String regex, int limit) methods, regular matching

    public String[] split(String regex) {
        return split(regex, 0);
    }
    public String[] split(String regex, int limit) {
        char ch = 0;
        //1. If the regex has only one digit and is not a special character listed;
        //2. If the regex has two digits, the first digit is an escape character and the second digit is not a number or a letter, "|" means or, that is, as long as ch is less than 0 or greater than 9, less than a or greater than Z, less than a or greater than Z, any one is established
        //3. The third is related to encoding, that is, characters not between utf-16
        if (((regex.value.length == 1 &&
             ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
             (regex.length() == 2 &&
              regex.charAt(0) == '\\' &&
              (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
              ((ch-'a')|('z'-ch)) < 0 &&
              ((ch-'A')|('Z'-ch)) < 0)) &&
            (ch < Character.MIN_HIGH_SURROGATE ||
             ch > Character.MAX_LOW_SURROGATE))
        {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            //1.limit is to input a value. If the value is > 0, the cutting will be performed n-1 times, that is, the number of times of execution will not exceed the number of times of input. The array length will not be greater than the number of times of cutting. The input limit is the number 1, and the cutting will be performed 1-1 times, that is, 0 times, so the array length after cutting is still 1, that is, the original string
            //2. If the limit value entered is a non positive number, the cutting is performed to an infinite number of times, and the array length can also be any number
            //3. If the input limit value is equal to 0, cutting will be performed infinite times and all empty strings at the end of the array will be removed
            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {    // last one
                    //assert (list.size() == limit - 1);
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
            // If no match was found, return this
            if (off == 0)
                return new String[]{this};

            // Add the last legacy parameter
            if (!limited || list.size() < limit)
                list.add(substring(off, value.length));

            // Return the result. Continue to judge whether the limit value is 0 or not
            int resultSize = list.size();
            if (limit == 0) {
                while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                    resultSize--;
                }
            }
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        //The split() method calls Pattern for split processing in non special cases. This class is used to realize regular matching
        return Pattern.compile(regex).split(this, limit);
    }

Pattern class is understood as pattern class. To create a matching pattern, the construction method is private and cannot be created directly. However, a regular expression can be created through the simple factory method of Pattern.complie(String regex).

(11) replace(char oldChar, char newChar) method

    public String replace(char oldChar, char newChar) {
        //Determine whether the replacement character and the replaced character are the same
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            //Convert source string to character array
            char[] val = value; /* avoid getfield opcode */

            while (++i < len) {
                //Determine the location of the first replaced string
                if (val[i] == oldChar) {
                    break;
                }
            }
            //The position of the replaced character from the occurrence is not greater than the length of the source string
            if (i < len) {
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    //Store the source string from the character before the position of the replaced character in the string array
                    buf[j] = val[j];
                }
                while (i < len) {
                    char c = val[i];
                    //Start comparison; if the same string is replaced, if not, follow the original string
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                ////Recreate String using String's constructor
                return new String(buf, true);
            }
        }
        return this;
    }

(12) substring(int beginIndex) and substring(int beginIndex, int endIndex) methods

    public String substring(int beginIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        int subLen = value.length - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
    }
    public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        //endIndex cannot be greater than the string length of the array;
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        //Endindex > = beginindex & & endindex < = str.length() otherwise, the corner sign is out of bounds exception: StringIndexOutOfBoundsException
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        //When returning a string, include the element at the beginIndex position, but not the element at the endIndex position
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

We can see that the following method contains the judgment conditions of the above method. And if it changes, a new String object will be returned at the end. When returning a String, include the element at the beginIndex position, but not the element at the endIndex position

(13) intern() method

    public native String intern();

The idea of this local method is to determine whether the constant exists in the constant pool. If it exists, judge whether the existing content is a reference or a constant. If it is a reference, return the reference address to the heap space object. If it is a constant, return the constant pool constant directly. If it does not exist, copy the current object reference to the constant pool, and return the current reference to the object.

3, Summary

Please look forward to my jdk source code (3): AbstractStringBuilder

Tags: Java less encoding jvm

Posted on Wed, 13 May 2020 15:20:55 -0400 by arion279