Understanding String Constant Pool from the Byte Code Perspective

Serve directly in jdk1.8

Previously, I didn't understand string constant pools very well. I read some articles and sorted out some experiments by myself. The article mainly explained the questions of "How many objects will a piece of code create?" and "Does two string variables point to the same address?".

First String is an immutable type, just like a constant pool with wrapper types like Integer [-128,127], String has its own constant pool, except Integer created it when it created the instance (implemented below), and String was added to it during creation.

    private static class IntegerCache {
        static final int low = -128;
        static final int high;
        static final Integer cache[];

        static {
            // high value may be configured by property
            int h = 127;
            String integerCacheHighPropValue =
                sun.misc.VM.getSavedProperty("java.lang.Integer.IntegerCache.high");
            if (integerCacheHighPropValue != null) {
                try {
                    int i = parseInt(integerCacheHighPropValue);
                    i = Math.max(i, 127);
                    // Maximum array size is Integer.MAX_VALUE
                    h = Math.min(i, Integer.MAX_VALUE - (-low) -1);
                } catch( NumberFormatException nfe) {
                    // If the property cannot be parsed into an int, ignore it.
                }
            }
            high = h;

            cache = new Integer[(high - low) + 1];
            int j = low;
            for(int k = 0; k < cache.length; k++)
                cache[k] = new Integer(j++);

            // range [-128, 127] must be interned (JLS7 5.1.7)
            assert IntegerCache.high >= 127;
        }

        private IntegerCache() {}
    }

There are two ways to create strings:

		String s1 = "abc"; // 1
		
        String s2 = new String("abc");// 2

This is compiled as follows:

 0 ldc #2 <abc>
 2 astore_1
 
 3 new #3 <java/lang/String>
 6 dup
 7 ldc #2 <abc>
 9 invokespecial #4 <java/lang/String.<init>>
12 astore_2

Here ldc simply means taking data from the pool of runtime constants, which you can see This link.

You can see that the first sentence is "abc" which points directly to the constant pool. The second sentence first creates a String object, then removes "abc" from the constant pool, and then executes the init construction method. You can see that both "abc" are the same string constant, and the reference relationship is shown below.

You can see the differences in how you write them after compilation in the following code

        String s3 = new String(s1 + "def"); // 3
        
        String s4 = new String("abc" + "def"); // 4

Compiled as follows:

 13 new #3 <java/lang/String>
 16 dup
 17 new #6 <java/lang/StringBuilder>
 20 dup
 21 invokespecial #7 <java/lang/StringBuilder.<init>>
 24 aload_1  //This happens above, meaning s1 is taken out
 25 invokevirtual #8 <java/lang/StringBuilder.append>
 28 ldc #4 <def>
 30 invokevirtual #8 <java/lang/StringBuilder.append>
 33 invokevirtual #9 <java/lang/StringBuilder.toString>
 36 invokespecial #5 <java/lang/String.<init>>
 39 astore_3
 
 40 new #3 <java/lang/String>
 43 dup
 44 ldc #10 <abcdef>
 46 invokespecial #5 <java/lang/String.<init>>
 49 astore 4

This is because Java is a stack-based instruction set architecture and another is a register-based instruction set architecture.

As you can see above, the compiled differences between the third and fourth sentences are quite large, mainly because the compiler optimizes when it compiles to byte code, such as the new String("abc" + "def") in the fourth sentence is optimized to new String ("abc def"), so "abc def" is taken directly from the constant pool in the byte code.This string constant. So there is no difference between the fourth and second sentences in that you create a string constant in the constant pool and point the String object at it.

The third sentence explains the question, "How many objects will a piece of code create?"So, in the third sentence of code, there are three objects created. The number of objects created is related to the specific situation in the constant pool. Assuming that "def" already exists in the constant pool, there will be no more creation.

Next comes another, more complex, analysis:

		String s5 = new String(new String("aa")  + new String("bb"));// 5

Compiled as follows:

 51 new #3 <java/lang/String>
 54 dup
 55 new #6 <java/lang/StringBuilder>
 58 dup
 59 invokespecial #7 <java/lang/StringBuilder.<init>>
 62 new #3 <java/lang/String>
 65 dup
 66 ldc #11 <aa>
 68 invokespecial #5 <java/lang/String.<init>>
 71 invokevirtual #8 <java/lang/StringBuilder.append>
 74 new #3 <java/lang/String>
 77 dup
 78 ldc #12 <bb>
 80 invokespecial #5 <java/lang/String.<init>>
 83 invokevirtual #8 <java/lang/StringBuilder.append>
 86 invokevirtual #9 <java/lang/StringBuilder.toString>
 89 invokespecial #5 <java/lang/String.<init>>
 92 astore 5

Here, the outermost String object is created first, followed by a StringBuilder object, then the middle two String objects and the corresponding "aa" "bb", which is six objects. Of course, this is the most common case. At least, "aa" "bb" already exists, which is four objects.

The main solution below is "Do two string variables point to the same address?"

The main question is that after calling the intern method, there are two situations for the intern method. First, there are so many String s6 = s4.intern() in the constant pool;Point to the constant in the constant pool, as shown in the first figure below; second, if the constant pool does not exist, a corresponding constant is generated in the constant pool, but this constant is a pointer to the object calling the intern function, as shown in the second figure below.

There are two main situations:

  1. First case
        //String s1 = "abc";
        //String s3 = new String(s1 + "def");
        String s4 = new String("abc" + "def");
        String s6 = s4.intern();
        String s5 = "abcdef";
        System.out.println(s4 == s5 ); // false


This is obviously different.

  1. Second case
        String s1 = "abc";
        String s3 = new String(s1 + "def");
        //String s4 = new String("abc" + "def");
        s3.intern();
        String s5 = "abcdef";
        System.out.println(s3 == s5 ); // true

There is no corresponding "abcdef" created in the constant pool when s3 is created here, so the intern al function is called only by creating a reference to the address in the heap in the constant pool. Then s5 is created, when this variable points to the address in the heap, so s3 == s5 returns true later.

Tags: Java Interview string

Posted on Tue, 21 Sep 2021 14:43:38 -0400 by SilveR316