JVM_7_ Run time optimization

Just in time compilation

Layered compilation

(TieredCompilation)

Let's start with an example

public class JIT1 { 
  public static void main(String[] args) { 
    for (int i = 0; i < 200; i++) { 
      long start = System.nanoTime(); 
      for (int j = 0; j < 1000; j++) { 
        new Object(); 
      }
      long end = System.nanoTime(); 
      System.out.printf("%d\t%d\n",i,(end - start)); 
    } 
  } 
}

Operation results (partial screenshot)

The speed gradually increased

Faster

Why?

The JVM divides the execution status into five levels:

  • 0 layer, interpretation execution (Interpreter)

  • Layer 1, compiled and executed with C1 real-time compiler (without profiling)

  • Layer 2, compiled and executed with C1 real-time compiler (with basic profiling)

  • Layer 3, compiled and executed using C1 real-time compiler (with full profiling)

  • Layer 4, compiled and executed using C2 real-time compiler

Profiling refers to the collection of data on the execution status of some programs during operation, such as [number of method calls], [number of edge loops], etc

The difference between just in time compiler (JIT) and interpreter

  • The interpreter interprets the bytecode as a machine code. Even if the same bytecode is encountered next time, it will still perform repeated interpretation
  • JIT is to compile some bytecodes into machine codes and store them in the Code Cache. When encountering the same code next time, it will be executed directly without further compilation
  • The interpreter interprets bytecode as machine code common to all platforms
  • JIT will generate platform specific machine code according to platform type

For most of the infrequent code, we don't need to spend time compiling it into machine code, but run it in the way of interpretation and execution; On the other hand, for only a small part of the hot code, we can compile it into machine code to achieve the ideal running speed. In terms of execution efficiency, simply compare interpreter < C1 < C2. The overall goal is to find the hotspot code (the origin of the hotspot name) and optimize it

An optimization method just now is called escape analysis to find out whether the new object escapes. You can use - XX:-DoEscapeAnalysis to turn off escape analysis and run the example observation just now

Execution flow of the above example

  • At first, there was no optimization, and the time was long
  • Then, because some codes are executed repeatedly, the C1 immediate compiler compiles them directly into machine code. The next time they are executed directly, there is no need to compile, so the speed is faster
  • After escape analysis, because the new Object is always in the for loop and has no reference, the C2 real-time compiler directly optimizes it and directly changes the bytecode without new. The resulting time is directly reduced to about 800ms

reference material: https://docs.oracle.com/en/java/javase/12/vm/java-hotspot-virtual-machine-performance-enhancements.html#GUID-D2E3DC58-D18B-4A6C-8167-4A1DFB4888E4

Method Inlining

(Inlining)

private static int square(final int i) { 
  return i * i; 
}
System.out.println(square(9));

If square is found to be a hot method and its length is not too long, it will be inlined. The so-called inlining is to copy and paste the code in the method to the caller's location:

System.out.println(9 * 9);

constant folding can also be optimized

System.out.println(81);

experiment

public class JIT2 { 
  // -20: + unlockdiagnosticvmoptions - XX: + printinlining print inlining information 
  // -20: Compilecommand = dontinline, * jit2.square prohibits the inlining of a method 
  // -20: + printcompilation print compilation information 
  public static void main(String[] args) { 
    int x = 0; 
    for (int i = 0; i < 500; i++) { 
      long start = System.nanoTime(); 
      for (int j = 0; j < 1000; j++) { 
        x = square(9); 
      }
      long end = System.nanoTime(); 
      System.out.printf("%d\t%d\t%d\n",i,x,(end - start)); 
    } 
  }
  
  private static int square(final int i) { 
    return i * i; 
  } 
}

result

initial

Just in time compiler optimization

inline optimization

Field optimization

For JMH benchmark test, please refer to: http://openjdk.java.net/projects/code-tools/jmh/

Create maven project and add dependencies as follows

<dependency> 
  <groupId>org.openjdk.jmh</groupId> 
  <artifactId>jmh-core</artifactId> 
  <version>${jmh.version}</version> 
</dependency> 
<dependency> 
  <groupId>org.openjdk.jmh</groupId> 
  <artifactId>jmh-generator-annprocess</artifactId> 
  <version>${jmh.version}</version> 
  <scope>provided</scope> 
</dependency>

Write benchmark Code:

package test; import org.openjdk.jmh.annotations.*; 

import org.openjdk.jmh.runner.Runner; 
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options; 
import org.openjdk.jmh.runner.options.OptionsBuilder; 
import java.util.Random; 
import java.util.concurrent.ThreadLocalRandom; 

@Warmup(iterations = 2, time = 1) 
@Measurement(iterations = 5, time = 1) 
@State(Scope.Benchmark) 
public class Benchmark1 { 
  int[] elements = randomInts(1_000); 
  
  private static int[] randomInts(int size) { 
    Random random = ThreadLocalRandom.current(); 
    int[] values = new int[size]; 
    for (int i = 0; i < size; i++) { 
      values[i] = random.nextInt(); 
    }
    return values; 
  }
  
  @Benchmark 
  public void test1() { 
    for (int i = 0; i < elements.length; i++) { 
      doSum(elements[i]); 
    } 
  }
  
  @Benchmark 
  public void test2() { 
    int[] local = this.elements; 
    for (int i = 0; i < local.length; i++) {
      doSum(local[i]); 
    } 
  }
  
  @Benchmark 
  public void test3() { 
    for (int element : elements) { 
      doSum(element); 
    } 
  }
  
  static int sum = 0; 
  
  @CompilerControl(CompilerControl.Mode.INLINE) 
  static void doSum(int x) { 
    sum += x; 
  }
  
  public static void main(String[] args) throws RunnerException { 
    Options opt = new OptionsBuilder() 
      .include(Benchmark1.class.getSimpleName()) 
      .forks(1) 
      .build(); 
    new Runner(opt).run();
  } 
}

First, enable the method inline of doSum. The test results are as follows (throughput per second, the higher the score, the better):

Benchmark Mode Samples Score Score error Units 
t.Benchmark1.test1 thrpt 5 2420286.539 390747.467 ops/s 
t.Benchmark1.test2 thrpt 5 2544313.594 91304.136 ops/s 
t.Benchmark1.test3 thrpt 5 2469176.697 450570.647 ops/s

Next, disable the doSum method inline

@CompilerControl(CompilerControl.Mode.DONT_INLINE) 
static void doSum(int x) { 
  sum += x; 
}

The test results are as follows:

Benchmark Mode Samples Score Score error Units 
t.Benchmark1.test1 thrpt 5 296141.478 63649.220 ops/s 
t.Benchmark1.test2 thrpt 5 371262.351 83890.984 ops/s 
t.Benchmark1.test3 thrpt 5 368960.847 60163.391 ops/s

analysis:

In the example just now, whether the doSum method is inline will affect the optimization of reading the elements member variable:

If the doSum method is inlined, the test1 method will be optimized as follows (pseudo code):

@Benchmark public void test1() { 
  // elements.length will be cached for the first time - > int [] local 
  for (int i = 0; i < elements.length; i++) { 
    // Subsequent 999 times of length < - Local 
    sum += elements[i]; 
    // Take the element of subscript i for 1000 times < - Local 
  } 
}

1999 Field read operations can be saved

However, if the doSum method is not inlined, the above optimization will not be performed

Exercise: add the volatile modifier to elements while inline, and observe the test results

Reflection optimization

package cn.itcast.jvm.t3.reflect; 
import java.io.IOException; 
import java.lang.reflect.InvocationTargetException; 
import java.lang.reflect.Method; 

public class Reflect1 { 
  public static void foo() { 
    System.out.println("foo..."); 
  }
  
  public static void main(String[] args) throws Exception { 
    Method foo = Reflect1.class.getMethod("foo"); 
    for (int i = 0; i <= 16; i++) { 
      System.out.printf("%d\t", i); 
      foo.invoke(null); 
    }
    System.in.read(); 
  } 
}

foo.invoke the first 0 ~ 15 calls use the NativeMethodAccessorImpl implementation of MethodAccessor

package sun.reflect; 
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method; 
import sun.reflect.misc.ReflectUtil; 

class NativeMethodAccessorImpl extends MethodAccessorImpl { 
  private final Method method; 
  private DelegatingMethodAccessorImpl parent; 
  private int numInvocations; 
  
  NativeMethodAccessorImpl(Method method) { 
    this.method = method; 
  }
  
  public Object invoke(Object target, Object[] args) throws IllegalArgumentException, InvocationTargetException { 
    // inflationThreshold inflation threshold, default 15 
    if (++this.numInvocations > ReflectionFactory.inflationThreshold() && !ReflectUtil.isVMAnonymousClass(this.method.getDeclaringClass())) { 
      // The new implementation dynamically generated by ASM is implemented locally, which is about 20 times faster than the local implementation 
      MethodAccessorImpl generatedMethodAccessor = (MethodAccessorImpl) (new MethodAccessorGenerator()) 
        .generateMethod( 
        this.method.getDeclaringClass(),
        this.method.getName(),
        this.method.getParameterTypes(), 
        this.method.getReturnType(), 
        this.method.getExceptionTypes(), 
        this.method.getModifiers() ); 
      this.parent.setDelegate(generatedMethodAccessor); 
    }
    
    // Call local implementation 
    return invoke0(this.method, target, args); 
  }
  
  void setParent(DelegatingMethodAccessorImpl parent) { 
    this.parent = parent; 
  }
  
  private static native Object invoke0(Method method, Object target, Object[] args); 
  
}

When it is called for the 16th time (counting from 0), the original implementation will be replaced by the class generated at runtime. You can get the class name sun.reflex.generatedmethodaccessor1 through debug

You can use Alibaba's arthas tool:

java -jar arthas-boot.jar 
  [INFO] arthas-boot version: 3.1.1 
  [INFO] Found existing java process, please choose one and hit RETURN. * [1]: 13065 cn.itcast.jvm.t3.reflect.Reflect1

Select 1 and enter to analyze the process

Then enter [jad + class name] to decompile

Decompile output

be careful

You can see from the source code of ReflectionFactory

  • Sun.reflflect.noinflation can be used to disable inflation (generate GeneratedMethodAccessor1 directly, but the first generation is time-consuming. It is not cost-effective if you only call reflection once)
  • Sun.reflect.inflationthreshold you can modify the inflation threshold

Tags: Java jvm

Posted on Wed, 29 Sep 2021 16:41:01 -0400 by tr0gd0rr