Analysis and solution of 9 common CMS GC problems in Java

1. Write in front

|This article mainly summarizes some usage scenarios of "CMS + ParNew" combination in Hotspot VM. Focus on analyzing the root causes and summarizing the troubleshooting methods through part of the source code. The troubleshooting process will be omitted. In addition, there are many professional terms in this paper, and there is a certain reading threshold. If it is not introduced clearly, please refer to the relevant materials by yourself.

|The total number of words is about 20000 (excluding code fragments). The overall reading time is about 30min. The article is long. You can choose the scene you are interested in for research.

1.1 INTRODUCTION

Since Sun released the Java language, it began to use GC technology for automatic memory management, avoiding the hanging pointer problem caused by manual management, and greatly improving the development efficiency. Since then, GC technology has become famous. GC has a very long history. John McCarthy, known as the "father of Lisp" and "father of artificial intelligence" in 1960, published the GC algorithm in his paper. In the past 60 years, the development of GC technology has also made rapid progress, but no matter how cutting-edge the collector is, it is also based on the combination or application of three basic algorithms, In other words, the fundamental problem to be solved by GC has not changed for so many years. The author believes that GC technology will not be outdated in the near future. Compared with the new technology changing with each passing day, GC, a classical technology, is more worthy of our study.

At present, Java GC materials on the Internet either mainly explain the theory or analyze the GC problems in a single scenario, and there are few materials summarizing the whole system. As a lesson for the past and a teacher for the future, several engineers of meituan collected analysis articles on various internal GC problems, and made some summaries in combination with their personal understanding, hoping to play the role of "throwing bricks and attracting jade". If there are mistakes in the article, please don't hesitate to correct them.

Can you systematically master GC problem handling ability? Some influencing factors are mutually causal. How to analyze the problem? For example, a service RT suddenly rises. There are four symptoms: increased GC time, increased thread blocks, increased slow queries, and high CPU load. Which is the incentive? How to judge whether there is a problem with GC? What are the common problems with CMS? How to judge what the root cause is? How to solve or avoid these problems? After reading this article, I believe you will have a systematic understanding of the problem handling of CMS GC and be able to solve these problems more easily. Let's start now!

  Scan VX for Java data, front-end, test, python and so on

1.2 overview

To systematically master GC problem handling, the author gives a learning path here. The framework of the whole article is also carried out according to this structure, which is mainly divided into four steps.

  • Establish knowledge system:   From the memory structure of JVM to garbage collection algorithms and collectors, learn the basic knowledge of GC and master some common GC problem analysis tools.

  • Determine evaluation indicators:   Understand the evaluation methods of basic GC, find out how to set independent system indicators, and the means to judge whether there are problems in GC in business scenarios.

  • Scenario tuning practices:   Analyze and solve nine common GC problem scenarios in CMS by using the mastered knowledge and systematic evaluation indicators.

  • Summarize the optimization experience:   Summarize the whole process and put forward some suggestions, and improve the summarized experience into the knowledge system.

2. GC basis

Before we start, let's briefly introduce some common concepts such as JVM memory partition, collection algorithm and collector. Students with good foundation can skip this part directly.

2.1 basic concepts

  • GC:   GC itself has three kinds of semantics. The following needs to bring in different semantics according to specific scenarios:

    • Garbage Collection: Garbage Collection technology, noun.

    • Garbage Collector: Garbage Collector, noun.

    • Garbage Collecting: garbage collection action, verb.

  • Mutator:   The role of garbage production, that is, our application, garbage generator, allocate s and free through Allocator.

  • TLAB:   Thread Local Allocation Buffer is short for CAS based Mutator Threads, which can preferentially allocate objects to a piece of memory in Eden. Because the memory area is exclusive to Java threads, there is no lock competition, so the allocation speed is faster. Each TLAB is exclusive to one thread.

  • Card Table:   Card table is mainly used to mark the status of card pages. Each card table item corresponds to a card page. When an object reference in a card page has a write operation, the write barrier will change the card table state of the marked object to dirty. The essence of the card table is to solve the problem of cross generation reference. For specific solutions, please refer to the problem on StackOverflow   how-actually-card-table-and-writer-barrier-works , or read the source code in cardTableRS.app.

2.2 JVM memory partition

It can be seen from the official website of JCP (Java Community Process) that the latest java version has reached Java 16, the future Java 17 and the current Java 11 and Java 8 are LTS versions, and the JVM specification is changing with the iteration. Since this paper mainly discusses CMS, the memory structure of Java 8 is still put here.

GC mainly works in Heap area and MetaSpace area (the blue part in the above figure). In Direct Memory, if DirectByteBuffer is used, GC passes when the allocated memory is insufficient   Cleaner#clean   Indirect management.

Any automatic memory management system will face the following steps: allocate space for new objects, and then collect garbage object space. Let's introduce these basics.

2.3 allocation object

In Java, Unsafe is mainly used for object address operation, and the allocate and free methods of C are called. There are two allocation methods:

  • free list:   The free address is recorded through additional storage, which changes random IO into sequential IO, but brings additional space consumption.

  • bump pointer:   When a pointer is used as the dividing point to allocate memory, it is only necessary to move the pointer to the idle end by a distance equal to the size of the object. The allocation efficiency is high, but the use scenario is limited.

2.4 collection object

2.4.1 garbage identification

  • Reference Counting:   The reference of each object is counted. Whenever it is referenced in a place, the counter is + 1. If the reference fails, - 1. The reference count is placed in the object header. Objects greater than 0 are considered as living objects. Although the problem of circular reference can be solved by the Recycler algorithm, in the multithreaded environment, the reference count change also needs expensive synchronization operation, and the performance is low. This algorithm will be used in early programming languages.

  • Accessibility analysis, also known as Tracing GC:   Starting from GC Root, the objects that can be searched are reachable objects. At this time, it is not enough to judge whether the objects live / die. It needs to be marked for many times to be more accurate. Objects outside the whole connected graph can be recycled as garbage. At present, the mainstream virtual machines in Java adopt this algorithm.

Note: the citation counting method can deal with the problem of circular citation. Don't say that again in the next interview ~~

2.4.2 collection algorithm

Since the emergence of automatic memory management, there have been some collection algorithms. Different collectors are also combined in different scenarios.

  • Mark Sweep:   The recycling process is mainly divided into two stages. The first stage is the Tracing stage, that is, traversing the object graph from GC Root and marking each object encountered. The second stage is the Sweep stage, that is, the collector checks each object in the heap and recycles all unmarked objects. The whole process will not move objects. In different implementations of the whole algorithm, Tricolour Abstraction, BitMap and other technologies will be used to improve the efficiency of the algorithm. It is more efficient when there are many living objects.

  • Mark compact:   The main purpose of this algorithm is to solve the fragmentation problem in non mobile recyclers. It is also divided into two stages. The first stage is similar to mark sweep, and the second stage will sort the living objects according to the comparison order. It mainly implements two finger recycling algorithm, sliding recycling (Lisp2) algorithm and threaded comparison algorithm.

  • Copying:   The space is divided into two From and To half areas with the same size. Only one of them will be used at the same time. During each recycling, the living objects in one half area will be transferred To the other half area by copying. There are recursive (proposed by Robert R. Fenichel and Jerome C. Yochelson) and iterative (proposed by Cheney), as well as approximate first search algorithms that solve the problems of recursive stack and cache line of the first two. The replication algorithm can allocate memory quickly by colliding pointers, but it also has the disadvantage of low space utilization. In addition, the replication cost is high when the living objects are large.

Some comparisons between the three algorithms in terms of moving objects, space and time. Assuming that the number of living objects is * L * and the heap space is * H *, then:

Looking at the time-consuming of mark, sweep, compaction and copying together, there is a general relationship as follows:

Although both compaction and copying involve moving objects, depending on the specific algorithm, compaction may first calculate the target address of the object, then correct the pointer, and finally move the object. Copying can do these things as one, so it can be faster. In addition, we should also pay attention to the overhead caused by GC. We should not only look at the time-consuming of the Collector, but also look at the Allocator. If you can ensure that there are no memory fragments, you can use the pointer bumping method for allocation. You only need to move a pointer to complete the allocation, which is very fast. If there are memory fragments, they have to be managed in a freelist way, and the allocation speed is usually slower.

  Scan VX for Java data, front-end, test, python and so on

2.5 collector

At present, there are mainly two categories in Hotspot VM: generational collection and partitioned collection. See the figure below for details, but it will gradually develop to partitioned collection in the future. Within meituan, some businesses try to use ZGC (interested students can learn this article)   Exploration and practice of a new generation of garbage collector ZGC ), the rest basically stay on CMS and G1. In addition, after JDK11, a collector Epsilon (a no OP garbage collector) that does not perform any garbage collection action is provided for performance analysis. The other is Azul's Zing JVM, whose C4 (concurrent continuously comparing collector) collector also has a certain influence in the industry.

Note: it is worth mentioning that RednaxelaFX, a preacher of domestic GC technology, also worked in Azul in the early years. Some materials in this paper also refer to some of his articles.

2.5.1 generation collector

  • ParNew:   A multithreaded collector, using replication algorithm, mainly works in the Young area, which can be accessed through  - XX:ParallelGCThreads   Parameter to control the number of threads collected. The whole process is STW and is often used in combination with CMS.

  • CMS:   In order to obtain the shortest recovery pause time, the "mark clear" algorithm is adopted for garbage collection in four steps, in which the initial mark and re mark will be STW, most of which are applied to the server side of Internet websites or B/S system. JDK9 is marked and discarded, and JDK14 is deleted. See the details   JEP 363.

2.5.2 partition collector

  • G1:   The utility model relates to a server-side garbage collector, which is applied in a multiprocessor and high-capacity memory environment to achieve high throughput and meet the requirements of garbage collection pause time as much as possible.

  • ZGC:   A low latency garbage collector launched in JDK 11 is suitable for memory management and recycling of large memory and low latency services. In SPECjbb 2015 benchmark test, the maximum pause time is only 1.68 ms under 128G, which is much better than G1 and CMS.

  • Shenandoah:   It is developed by a team of Red Hat. Similar to G1, the garbage collector is designed based on Region, but does not require member set or Card Table to record cross Region references. The pause time has nothing to do with the size of the heap. The pause time is close to ZGC. The following figure shows the benchmark with collectors such as CMS and G1.

2.5.3 common collectors

At present, CMS and G1 collectors are most used. Both have the concept of generation. The main memory structures are as follows:

2.5.4 other collectors

Only common collectors are listed above. In addition, there are many other collectors, such as Metronome, Stopless, Staccato, Chicken, Clover and other real-time collectors, Sapphire, Compressor, Pauseless and other concurrent replication / collation collectors, and dorigez Leroy container and other tag collation collectors. Due to space reasons, they are not introduced here.

2.6 common tools

If you want to do a good job, you must first sharpen your tools. Here are some tools commonly used by the author. You can choose the specific situation freely. The problems in this paper are located and analyzed by using these tools.

2.6.1 command line terminal

  • Standard terminal classes: jps, jinfo, jstat, jstack, jmap
  • Function integration classes: jcmd, vjtools, arthas, greys

2.6.2 visual interface

  • Simple: JConsole, JVisualvm, HA, GCHisto, GCViewer
  • Advanced: MAT, jpprofiler

arthas is recommended for the command line, jpprofiler is recommended for the visual interface, and there are also some online platforms   gceasy,heaphero,fastthread  , Scalpel (a self-developed JVM problem diagnosis tool, which is not open source for the time being) within meituan is also easy to use.

3. GC problem judgment

Before troubleshooting and optimizing GC problems, we need to clarify whether the problem is directly caused by GC or GC exceptions caused by application code.

3.1 is there any problem with GC?

3.1.1 setting evaluation criteria

Two core indicators for judging GC:

  • Latency:   It can also be understood as the maximum pause time, that is, the longest time of an STW in the garbage collection process. The shorter, the better. To a certain extent, the increase of frequency is acceptable. It is the main development direction of GC technology.

  • Throughput:   In the life cycle of the application system, because the GC thread will occupy the CPU clock cycle currently available to the Mutator, the throughput is the percentage of the time effectively spent by the Mutator in the total running time of the system. For example, if the system runs for 100 min and GC takes 1 min, the system throughput is 99%. The collector with priority to swallowing can accept a long pause.

At present, the systems of major Internet companies basically pursue low latency to avoid the loss of user experience caused by too long GC pause. The measurement indicators need to be combined with the SLA of application services, mainly judged from the following two points:

In short, the time of one pause shall not exceed TP9999 of application service, and the throughput of GC shall not be less than 99.99%. For example, assuming that the TP9999 of A service A is 80 ms and the average GC pause is 30 ms, the maximum pause time of the service should not exceed 80 ms and the GC frequency should be controlled more than 5 min. If not, it needs to be tuned or parallel redundancy through more resources. (you can stop first and look at the gc.mean minute level indicator on the monitoring platform. If it exceeds 6 ms, the GC throughput of A single machine will not reach 4 9s.)

Note: in addition to these two indicators, there are Footprint (resource size measurement), response speed and other indicators. Real time systems such as the Internet pursue low latency, while many embedded systems pursue Footprint.

3.1.2 understand GC Cause

After getting the GC log, we can simply analyze the GC situation. Through some tools, we can intuitively see the distribution of causes, as shown in the figure below, which is the chart drawn by gceasy:

As shown in the figure above, we can clearly know what causes GC and the time spent each time. However, to analyze GC problems, we must first understand GC Cause, that is, under what conditions the JVM chooses to perform GC operations, The specific classification of causes can be seen in the Hotspot source code: src/share/vm/gc/shared/gcCause.hpp and src/share/vm/gc/shared/gcCause.cpp.

const char* GCCause::to_string(GCCause::Cause cause) {
  switch (cause) {
    case _java_lang_system_gc:
      return "System.gc()";

    case _full_gc_alot:
      return "FullGCAlot";

    case _scavenge_alot:
      return "ScavengeAlot";

    case _allocation_profiler:
      return "Allocation Profiler";

    case _jvmti_force_gc:
      return "JvmtiEnv ForceGarbageCollection";

    case _gc_locker:
      return "GCLocker Initiated GC";

    case _heap_inspection:
      return "Heap Inspection Initiated GC";

    case _heap_dump:
      return "Heap Dump Initiated GC";

    case _wb_young_gc:
      return "WhiteBox Initiated Young GC";

    case _wb_conc_mark:
      return "WhiteBox Initiated Concurrent Mark";

    case _wb_full_gc:
      return "WhiteBox Initiated Full GC";

    case _no_gc:
      return "No GC";

    case _allocation_failure:
      return "Allocation Failure";

    case _tenured_generation_full:
      return "Tenured Generation Full";

    case _metadata_GC_threshold:
      return "Metadata GC Threshold";

    case _metadata_GC_clear_soft_refs:
      return "Metadata GC Clear Soft References";

    case _cms_generation_full:
      return "CMS Generation Full";

    case _cms_initial_mark:
      return "CMS Initial Mark";

    case _cms_final_remark:
      return "CMS Final Remark";

    case _cms_concurrent_mark:
      return "CMS Concurrent Mark";

    case _old_generation_expanded_on_last_scavenge:
      return "Old Generation Expanded On Last Scavenge";

    case _old_generation_too_full_to_scavenge:
      return "Old Generation Too Full To Scavenge";

    case _adaptive_size_policy:
      return "Ergonomics";

    case _g1_inc_collection_pause:
      return "G1 Evacuation Pause";

    case _g1_humongous_allocation:
      return "G1 Humongous Allocation";

    case _dcmd_gc_run:
      return "Diagnostic Command";

    case _last_gc_cause:
      return "ILLEGAL VALUE - last gc cause - ILLEGAL VALUE";

    default:
      return "unknown GCCause";
  }
  ShouldNotReachHere();
}

Several key GC causes to focus on:

  • System.gc():   Manually trigger GC operation.

  • CMS:   Some actions of CMS GC during execution, focusing on the two STW stages of CMS Initial Mark and CMS Final Remark.

  • Promotion Failure:   The Old area does not have enough space for objects promoted in the Young area (even if the total available memory is large enough).

  • Concurrent Mode Failure:   During the operation of CMS GC, the space reserved in the Old area is not enough to allocate to new objects. At this time, the collector will degrade and seriously affect the GC performance. The following case is such a scenario.

  • GCLocker Initiated GC:   If GC is just needed when the thread is executing in the JNI critical area, GC Locker will prevent the occurrence of GC and prevent other threads from entering the JNI critical area until the last thread exits the critical area.

When to use these causes to trigger recycling, you can take a look at the CMS code, which is not discussed here. The details are in / src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp.

bool CMSCollector::shouldConcurrentCollect() {
  LogTarget(Trace, gc) log;

  if (_full_gc_requested) {
    log.print("CMSCollector: collect because of explicit  gc request (or GCLocker)");
    return true;
  }

  FreelistLocker x(this);
  // ------------------------------------------------------------------
  // Print out lots of information which affects the initiation of
  // a collection.
  if (log.is_enabled() && stats().valid()) {
    log.print("CMSCollector shouldConcurrentCollect: ");

    LogStream out(log);
    stats().print_on(&out);

    log.print("time_until_cms_gen_full %3.7f", stats().time_until_cms_gen_full());
    log.print("free=" SIZE_FORMAT, _cmsGen->free());
    log.print("contiguous_available=" SIZE_FORMAT, _cmsGen->contiguous_available());
    log.print("promotion_rate=%g", stats().promotion_rate());
    log.print("cms_allocation_rate=%g", stats().cms_allocation_rate());
    log.print("occupancy=%3.7f", _cmsGen->occupancy());
    log.print("initiatingOccupancy=%3.7f", _cmsGen->initiating_occupancy());
    log.print("cms_time_since_begin=%3.7f", stats().cms_time_since_begin());
    log.print("cms_time_since_end=%3.7f", stats().cms_time_since_end());
    log.print("metadata initialized %d", MetaspaceGC::should_concurrent_collect());
  }
  // ------------------------------------------------------------------

  // If the estimated time to complete a cms collection (cms_duration())
  // is less than the estimated time remaining until the cms generation
  // is full, start a collection.
  if (!UseCMSInitiatingOccupancyOnly) {
    if (stats().valid()) {
      if (stats().time_until_cms_start() == 0.0) {
        return true;
      }
    } else {
   
      if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
        log.print(" CMSCollector: collect for bootstrapping statistics: occupancy = %f, boot occupancy = %f",
                  _cmsGen->occupancy(), _bootstrap_occupancy);
        return true;
      }
    }
  }
  if (_cmsGen->should_concurrent_collect()) {
    log.print("CMS old gen initiated");
    return true;
  }

  CMSHeap* heap = CMSHeap::heap();
  if (heap->incremental_collection_will_fail(true /* consult_young */)) {
    log.print("CMSCollector: collect because incremental collection will fail ");
    return true;
  }

  if (MetaspaceGC::should_concurrent_collect()) {
    log.print("CMSCollector: collect for metadata allocation ");
    return true;
  }

  // CMSTriggerInterval starts a CMS cycle if enough time has passed.
  if (CMSTriggerInterval >= 0) {
    if (CMSTriggerInterval == 0) {
      // Trigger always
      return true;
    }

    // Check the CMS time since begin (we do not check the stats validity
    // as we want to be able to trigger the first CMS cycle as well)
    if (stats().cms_time_since_begin() >= (CMSTriggerInterval / ((double) MILLIUNITS))) {
      if (stats().valid()) {
        log.print("CMSCollector: collect because of trigger interval (time since last begin %3.7f secs)",
                  stats().cms_time_since_begin());
      } else {
        log.print("CMSCollector: collect because of trigger interval (first collection)");
      }
      return true;
    }
  }

  return false;
}

  Scan VX for Java data, front-end, test, python and so on

3.2 judge whether the problem is caused by GC?

Whether it is the result (phenomenon) or the cause, how to judge whether it is the fault caused by GC or the GC problem caused by the system itself in the process of dealing with a GC problem. Here we continue to take a Case mentioned at the beginning of this article: "how to judge which is the root cause in the four representations of increasing GC time consumption, increasing thread blocks, increasing slow queries and high CPU load?", the author has roughly sorted out four judgment methods for reference according to his own experience:

  • Timing analysis:   The probability that the first event is the root cause is greater. Analyze the abnormal time point of each index through monitoring means and restore the event timeline. If it is observed that the CPU load is high first (there should be enough time Gap), the whole problem impact chain may be: high CPU load - > increased slow query - > increased GC time consumption - > increased thread Block - > increased RT.

  • Probability analysis:   Statistical probability is used to infer based on the experience of historical problems. It is analyzed by type from near to far. For example, there are many problems of slow query in the past, then the whole problem impact chain may be: increase of slow query - > increase of GC time consumption - > high CPU load - > increase of thread Block - > increase of RT.

  • Experimental analysis:   Simulate the problem site by means of fault drill, trigger some conditions (one or more) and observe whether problems will occur. If only thread blocks are triggered, the whole problem impact chain may be: more thread blocks - > high CPU load - > more slow queries - > increased GC time consumption - > increased RT.

  • Counter evidence analysis:   Conduct a counter evidence analysis on one of the representations, that is, judge whether the occurrence of the representation is related to the results. For example, from the perspective of the whole cluster, we observed that some nodes' slow check and CPU are normal, but there are also problems, so the whole problem impact chain may be: increased GC time - > increased thread blocks - > increased RT.

The subsequent analysis methods are completely different for different root causes. If the CPU load is high, you may need to look at the hot spots with the flame diagram. If the slow queries increase, you may need to look at the DB. If it is caused by thread blocks, you may need to look at the lock competition. Finally, if all appearances prove that there is no problem, there may be a problem with the GC. You can continue to analyze the GC problem.

3.3 introduction to problem classification

3.3.1 Mutator type

The types of mutators are mainly divided into two types according to the proportion diagram of object Survival Time. A similar statement is also mentioned in the weak generation hypothesis, as shown in the figure below. "Survival Time" represents object Survival Time and "Rate" represents object allocation proportion:

  • IO interactive:   At present, most services on the Internet belong to this type, such as distributed RPC, MQ, HTTP gateway services, etc., which do not require much memory. Most objects will die within the time of TP9999. The larger the Young area, the better.

  • MEM computational:   It is mainly distributed data computing, Hadoop, distributed storage, HBase, Cassandra, self built distributed cache, etc. it has high memory requirements and long object survival time. The larger the Old area, the better.

Of course, in addition to the two, there are scenes in between. This article mainly discusses the first case. The object Survival Time distribution map has very important guiding significance for us to set GC parameters. As shown in the figure below, we can simply calculate the generation boundary.

3.3.2 GC problem classification

The author selects nine different types of GC questions, covering most scenarios. If there are better scenarios, please give them in the comment area.

  • Unexpected GC:   Unexpected GC doesn't actually need to happen. We can avoid it by some means.

    • Space Shock:   For the problem of space shock, see "scenario 1: space shock caused by dynamic capacity expansion".
    • Explicit GC:   The problem of executing GC is displayed. See "scenario 2: removal and retention of explicit GC".
  • Partial GC:   The GC of partial collection operation only recycles some generations / partitions.

    • Young GC:   The collection action of young area in generational collection can also be called Minor GC.

      • ParNew:   Young GC is frequent. See "scenario 4: early promotion".
    • Old GC:   The old area collection action in generational collection can also be called Major GC, and some can also be called Full GC. In fact, this name is nonstandard. It is Full GC when foreround GC occurs in CMS. The CMSScavengeBeforeRemark parameter only triggers Young GC before Remark.

      • CMS:   Old GC is frequent. See "scenario 5: CMS Old GC is frequent".
      • CMS:   The Old GC is infrequent but takes a long time. See "scenario 6: a single CMS Old GC takes a long time".
  • Full GC:   The GC collected in full will take a long time to recycle the whole heap. Once it occurs, it will have a great impact. It can also be called Major GC. See "scenario 7: memory fragmentation & collector degradation".

  • MetaSpace:   For problems caused by meta space recycling, see "Scene 3: Metaspace area OOM".

  • Direct Memory:   Direct memory (also known as off heap memory) recycling causes problems. See "scenario 8: off heap memory OOM".

  • JNI:   For problems caused by local Native methods, see "scenario 9: GC problems caused by JNI".

3.3.3 troubleshooting difficulty

The difficulty of solving a problem is inversely proportional to its common degree. Most of us can find similar problems through various search engines and try to solve them by the same means. When a problem can't find similar problems on various websites, there may be two situations. One is not a problem, and the other is to encounter a deep hidden problem. If you encounter this problem, you may have to go deep into the source code level for debugging. In the following GC problem scenarios, the troubleshooting difficulty increases from top to bottom.

4. Common scenario analysis and solution

4.1 scenario 1: space shock caused by dynamic capacity expansion

4.1.1 phenomena

When the service is just started, there are many GC times and the maximum space is left, but GC still occurs. In this case, we can observe the space change of the heap by observing the GC log or monitoring tools. The GC Cause is generally Allocation Failure, and it is observed in the GC log that after a GC, the size of each space in the heap will be adjusted, as shown in the following figure:

4.1.2 causes

In the parameters of the JVM  - Xms   and  - Xmx   The settings are inconsistent. Only the initialization will be performed during initialization  - Xms   The size of the space to store information, and apply to the operating system whenever the space is insufficient. In this case, GC must be performed. Specifically through   ConcurrentMarkSweepGeneration::compute_new_size()   Method to calculate the new space size:

void ConcurrentMarkSweepGeneration::compute_new_size() {
  assert_locked_or_safepoint(Heap_lock);

  // If incremental collection failed, we just want to expand
  // to the limit.
  if (incremental_collection_failed()) {
    clear_incremental_collection_failed();
    grow_to_reserved();
    return;
  }

  // The heap has been compacted but not reset yet.
  // Any metric such as free() or used() will be incorrect.

  CardGeneration::compute_new_size();

  // Reset again after a possible resizing
  if (did_compact()) {
    cmsSpace()->reset_after_compaction();
  }
}

In addition, if there is a lot of space left, the volume reduction operation will also be carried out through the JVM  - XX:MinHeapFreeRatio   and  - XX:MaxHeapFreeRatio   To control the ratio of expansion and shrinkage. Adjusting these two values can also control the timing of scaling. For example, expansion is used   GenCollectedHeap::expand_heap_and_allocate()   The code is as follows:

HeapWord* GenCollectedHeap::expand_heap_and_allocate(size_t size, bool   is_tlab) {
  HeapWord* result = NULL;
  if (_old_gen->should_allocate(size, is_tlab)) {
    result = _old_gen->expand_and_allocate(size, is_tlab);
  }
  if (result == NULL) {
    if (_young_gen->should_allocate(size, is_tlab)) {
      result = _young_gen->expand_and_allocate(size, is_tlab);
    }
  }
  assert(result == NULL || is_in_reserved(result), "result not in heap");
  return result;
}

  Scan VX for Java data, front-end, test, python and so on

For the understanding of the whole scaling model, you can see this figure. When the size of the committed space exceeds the size of the low water level / high water level, the capacity will be adjusted accordingly:

4.1.3 strategy

Positioning: observe whether the committed proportion of the Old/MetaSpace area at the CMS GC trigger time point is a fixed value, or observe the total memory utilization as mentioned above.

Solution: try to set the space size configuration parameters in pairs to be fixed, such as  - Xms   and  - Xmx,-XX:MaxNewSize   and  - XX:NewSize,-XX:MetaSpaceSize   and  - XX:MaxMetaSpaceSize   Wait.

4.1.4 summary

Generally speaking, we need to ensure that the heap of the Java virtual machine is stable  - Xms   and  - Xmx   Set a value (that is, the initial value is consistent with the maximum value) to obtain a stable heap. Similarly, there are similar problems in the MetaSpace area. However, without pursuing pause time, concussion space is also advantageous. It can be dynamically scaled to save space, such as Java applications as rich clients.

Although this problem is primary, the probability of occurrence is really not small, especially when some specifications are not perfect.

4.2 scenario 2: explicit GC removal and retention

4.2.1 phenomena

In addition to the CMS GC triggered by capacity expansion and shrinkage, there are several trigger conditions, such as the Old area reaching the recycling threshold, insufficient MetaSpace space, promotion failure in Young area, large object guarantee failure, etc. if these conditions do not occur, but the GC is triggered? In this case, the System.gc method may be called manually in the code. At this time, you can find the GC Cause in the GC log and confirm it. So is there a problem with this GC? Look at some information on the Internet. Some people say it can be added  - XX:+DisableExplicitGC   Parameter to avoid this GC. Some people say that this parameter cannot be added. Adding this parameter will affect the recovery of Native Memory. Let's start with the conclusion. The author suggests that System.gc be retained here. Why should it be retained? Let's analyze it together.

4.2.2 causes

Find the source code of System.gc in Hotspot, and you can find the increase  - XX:+DisableExplicitGC   Parameter, the method becomes an empty method, which will be called if it is not added   Universe::heap()::collect   Method, continue to follow up to this method and find that System.gc will trigger a Full GC of STW to collect the whole heap.

JVM_ENTRY_NO_ENV(void, JVM_GC(void))
  JVMWrapper("JVM_GC");
  if (!DisableExplicitGC) {
    Universe::heap()->collect(GCCause::_java_lang_system_gc);
  }
JVM_END
void GenCollectedHeap::collect(GCCause::Cause cause) {
  if (cause == GCCause::_wb_young_gc) {
    // Young collection for the WhiteBox API.
    collect(cause, YoungGen);
  } else {
#ifdef ASSERT
  if (cause == GCCause::_scavenge_alot) {
    // Young collection only.
    collect(cause, YoungGen);
  } else {
    // Stop-the-world full collection.
    collect(cause, OldGen);
  }
#else
    // Stop-the-world full collection.
    collect(cause, OldGen);
#endif
  }
}

Keep System.gc

A point of knowledge is added here. CMS GC is divided into two modes: Background and Foreground. The former is the concurrent collection in our general understanding, which does not affect the normal operation of business threads, but the Foreground Collector is very different. It will perform a compressed GC. This compressed GC uses the same Lisp2 algorithm as the Serial Old GC. It uses mark compact as the Full GC, commonly known as MSC (Mark sweep compact). It collects the Young area, Old area and MetaSpace of the Java heap. From the algorithm chapter above, we know that the cost of compact is huge, so using foregroup collector will bring a very long STW. If System.gc is called frequently in an application, it is very dangerous.

Remove System.gc

If disabled, there will be another memory leak problem. At this time, we need to talk about DirectByteBuffer. It has the characteristics of zero copy. It is used by various NIO frameworks such as Netty and will use off heap memory. The heap memory is managed by the JVM itself, and the out of heap memory must be released manually. DirectByteBuffer does not have a Finalizer, and its Native Memory is cleaned through   sun.misc.Cleaner   Automatic completion is a cleaning tool based on phantom reference, which is lighter than ordinary Finalizer.

During the process of allocating space for DirectByteBuffer, System.gc will be explicitly called. You want to force useless DirectByteBuffer objects to release their associated Native Memory through Full GC. The code implementation is as follows:

// These methods should be called whenever direct memory is allocated or
// freed.  They allow the user to control the amount of direct memory
// which a process may access.  All sizes are specified in bytes.
static void reserveMemory(long size) {

    synchronized (Bits.class) {
        if (!memoryLimitSet && VM.isBooted()) {
            maxMemory = VM.maxDirectMemory();
            memoryLimitSet = true;
        }
        if (size <= maxMemory - reservedMemory) {
            reservedMemory += size;
            return;
        }
    }

    System.gc();
    try {
        Thread.sleep(100);
    } catch (InterruptedException x) {
        // Restore interrupt status
        Thread.currentThread().interrupt();
    }
    synchronized (Bits.class) {
        if (reservedMemory + size > maxMemory)
            throw new OutOfMemoryError("Direct buffer memory");
        reservedMemory += size;
    }

}

HotSpot VM will only do Reference Processing for objects in Old GC, while it will only do Reference Processing for objects in Young GC. The DirectByteBuffer object in Young will be processed during the Young GC, that is, the CMS GC will perform Reference Processing on the Old, which can trigger the Cleaner to clean up the dead DirectByteBuffer object. However, if you have not done GC or only done Young GC for a long time, the Cleaner will not be triggered in the Old, so the Native Memory associated with the DirectByteBuffer that has died but has been promoted to the Old may not be released in time. These implementation features make it possible to rely on System.gc to trigger GC to ensure that the cleaning of DirectByteMemory can be completed in time. If open  - 20: + disableexplicitgc, the cleaning may not be completed in time, so the OOM of Direct Memory occurs.

4.2.3 strategy

It can be seen from the above analysis that there will be certain risk points whether it is retained or removed. However, NIO will be widely used in RPC communication on the Internet, so the author suggests to retain it here. In addition, the JVM provides  - XX:+ExplicitGCInvokesConcurrent   and  - XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses   Parameter to change the trigger type of System.gc from foregroup to Background, and Background will also do Reference Processing. In this way, STW overhead can be greatly reduced and NIO Direct Memory OOM will not occur.

4.2.4 summary

Not only CMS, but also in G1 or ZGC   ExplicitGCInvokesConcurrent   Patterns will be collected by high-performance concurrent collection. However, it is recommended to make constraints on code specification and standardize the use of System.gc.

P.S. HotSpot has a special treatment for System.gc, which is mainly reflected in whether System.gc will trigger the update of GC statistics / threshold data like ordinary GC. Many GC algorithms in HotSpot have adaptive functions, and the parameters used in the next GC will be determined according to the previously collected efficiency, but System.gc does not update these statistics by default, Avoid the interference of the user's forced GC on these adaptive functions (refer to the - XX: + useadaptive sizepolicywithsystemgc parameter, which is false by default).

4.3 scenario 3: OOM in MetaSpace area

4.3.1 phenomena

After the JVM is started or at a certain point in time, the used size of MetaSpace continues to grow. At the same time, each GC cannot be released, and increasing the MetaSpace space cannot be completely solved.

4.3.2 causes

Before discussing why there is OOM, let's take a look at what data is stored in this area. Before Java 7, the String constant pool was placed in the Perm area, and all strings that are interned will be stored here. Because String.intern is not controlled, so  - XX:MaxPermSize   The value of is not easy to set and often appears   java.lang.OutOfMemoryError: PermGen space   Exception, so after Java 7, Literal such as constant pool, Class Static and Symbols Reference are moved to Heap. After Java 8, permgen was also removed and MetaSpace was replaced.

At the bottom layer, the JVM applies for memory mapping to the operating system through the mmap interface. Each time it applies for 2MB of space, this is a virtual memory mapping. It does not really consume 2MB of main memory, but only when it is used later. The requested memory is put into a linked list, VirtualSpaceList, as one of the nodes.

In the upper layer, MetaSpace is mainly composed of Klass Metaspace and NoKlass Metaspace.

  • Klass MetaSpace:   It is used to store Klass, which is the runtime data structure of the Class file in the JVM. By default, this part is placed in the Compressed Class Pointer Space, which is a continuous memory area, followed by Heap. Compressed Class Pointer Space is not required if it is set  - 20: - usecompressedclasspointers, or  - Xmx   If it is set to be greater than 32 G, there will be no memory. In this case, Klass will be stored in NoKlass Metaspace.
  • NoKlass Metaspace:   It is specially used to store other contents related to Klass, such as Method, ConstantPool, etc., which can be composed of multiple discontinuous memory blocks. Although it is called NoKlass Metaspace, it can also store Klass content. The corresponding scene has been mentioned above.

Specific definitions can be found in the source code shared/vm/memory/metaspace.hpp:

class Metaspace : public AllStatic {

  friend class MetaspaceShared;

 public:
  enum MetadataType {
    ClassType,
    NonClassType,
    MetadataTypeCount
  };
  enum MetaspaceType {
    ZeroMetaspaceType = 0,
    StandardMetaspaceType = ZeroMetaspaceType,
    BootMetaspaceType = StandardMetaspaceType + 1,
    AnonymousMetaspaceType = BootMetaspaceType + 1,
    ReflectionMetaspaceType = AnonymousMetaspaceType + 1,
    MetaspaceTypeCount
  };

 private:

  // Align up the word size to the allocation word size
  static size_t align_word_size_up(size_t);

  // Aligned size of the metaspace.
  static size_t _compressed_class_space_size;

  static size_t compressed_class_space_size() {
    return _compressed_class_space_size;
  }

  static void set_compressed_class_space_size(size_t size) {
    _compressed_class_space_size = size;
  }

  static size_t _first_chunk_word_size;
  static size_t _first_class_chunk_word_size;

  static size_t _commit_alignment;
  static size_t _reserve_alignment;
  DEBUG_ONLY(static bool   _frozen;)

  // Virtual Space lists for both classes and other metadata
  static metaspace::VirtualSpaceList* _space_list;
  static metaspace::VirtualSpaceList* _class_space_list;

  static metaspace::ChunkManager* _chunk_manager_metadata;
  static metaspace::ChunkManager* _chunk_manager_class;

  static const MetaspaceTracer* _tracer;
}

Why can't MetaSpace objects be released? Let's look at the following two points:

  • Metaspace memory management:   The life cycle of a class and its metadata is the same as its corresponding class loader. As long as the class loader of a class is alive, the class metadata in the Metaspace is alive and cannot be recycled. Each loader has a separate storage space, which is managed through ClassLoaderMetaspace. The pointers of SpaceManager * are isolated from each other.

  • MetaSpace elastic expansion:   Since MetaSpace space and Heap are not together, this space can be set without setting or separately. Generally, to avoid MetaSpace exhausting VM memory, a maxmetaspace size will be set. During operation, if the actual size is less than this value, the JVM will pass  - XX:MinMetaspaceFreeRatio   and  - XX:MaxMetaspaceFreeRatio   Two parameters dynamically control the size of the entire MetaSpace. You can see the specific use   MetaSpaceGC::compute_new_size()   Method (code below), which will be called when several collectors such as CMSCollector and G1CollectorHeap execute GC. This will be based on   used_after_gc,MinMetaspaceFreeRatio   and   MaxMetaspaceFreeRatio   These three values calculate a new one  _ capacity_until_GC   Value (watermark). Then according to the actual  _ capacity_until_GC   Value use   MetaspaceGC::inc_capacity_until_GC()   and   MetaspaceGC::dec_capacity_until_GC()   expand or shrink. This process can also be understood by referring to the scaling model in scenario 1.

void MetaspaceGC::compute_new_size() {
  assert(_shrink_factor <= 100, "invalid shrink factor");
  uint current_shrink_factor = _shrink_factor;
  _shrink_factor = 0;
  const size_t used_after_gc = MetaspaceUtils::committed_bytes();
  const size_t capacity_until_GC = MetaspaceGC::capacity_until_GC();

  const double minimum_free_percentage = MinMetaspaceFreeRatio / 100.0;
  const double maximum_used_percentage = 1.0 - minimum_free_percentage;

  const double min_tmp = used_after_gc / maximum_used_percentage;
  size_t minimum_desired_capacity =
    (size_t)MIN2(min_tmp, double(max_uintx));
  // Don't shrink less than the initial generation size
  minimum_desired_capacity = MAX2(minimum_desired_capacity,
                                  MetaspaceSize);

  log_trace(gc, metaspace)("MetaspaceGC::compute_new_size: ");
  log_trace(gc, metaspace)("    minimum_free_percentage: %6.2f  maximum_used_percentage: %6.2f",
                           minimum_free_percentage, maximum_used_percentage);
  log_trace(gc, metaspace)("     used_after_gc       : %6.1fKB", used_after_gc / (double) K);


  size_t shrink_bytes = 0;
  if (capacity_until_GC < minimum_desired_capacity) {
    // If we have less capacity below the metaspace HWM, then
    // increment the HWM.
    size_t expand_bytes = minimum_desired_capacity - capacity_until_GC;
    expand_bytes = align_up(expand_bytes, Metaspace::commit_alignment());
    // Don't expand unless it's significant
    if (expand_bytes >= MinMetaspaceExpansion) {
      size_t new_capacity_until_GC = 0;
      bool succeeded = MetaspaceGC::inc_capacity_until_GC(expand_bytes, &new_capacity_until_GC);
      assert(succeeded, "Should always succesfully increment HWM when at safepoint");

      Metaspace::tracer()->report_gc_threshold(capacity_until_GC,
                                               new_capacity_until_GC,
                                               MetaspaceGCThresholdUpdater::ComputeNewSize);
      log_trace(gc, metaspace)("    expanding:  minimum_desired_capacity: %6.1fKB  expand_bytes: %6.1fKB  MinMetaspaceExpansion: %6.1fKB  new metaspace HWM:  %6.1fKB",
                               minimum_desired_capacity / (double) K,
                               expand_bytes / (double) K,
                               MinMetaspaceExpansion / (double) K,
                               new_capacity_until_GC / (double) K);
    }
    return;
  }

  // No expansion, now see if we want to shrink
  // We would never want to shrink more than this
  assert(capacity_until_GC >= minimum_desired_capacity,
         SIZE_FORMAT " >= " SIZE_FORMAT,
         capacity_until_GC, minimum_desired_capacity);
  size_t max_shrink_bytes = capacity_until_GC - minimum_desired_capacity;

  // Should shrinking be considered?
  if (MaxMetaspaceFreeRatio < 100) {
    const double maximum_free_percentage = MaxMetaspaceFreeRatio / 100.0;
    const double minimum_used_percentage = 1.0 - maximum_free_percentage;
    const double max_tmp = used_after_gc / minimum_used_percentage;
    size_t maximum_desired_capacity = (size_t)MIN2(max_tmp, double(max_uintx));
    maximum_desired_capacity = MAX2(maximum_desired_capacity,
                                    MetaspaceSize);
    log_trace(gc, metaspace)("    maximum_free_percentage: %6.2f  minimum_used_percentage: %6.2f",
                             maximum_free_percentage, minimum_used_percentage);
    log_trace(gc, metaspace)("    minimum_desired_capacity: %6.1fKB  maximum_desired_capacity: %6.1fKB",
                             minimum_desired_capacity / (double) K, maximum_desired_capacity / (double) K);

    assert(minimum_desired_capacity <= maximum_desired_capacity,
           "sanity check");

    if (capacity_until_GC > maximum_desired_capacity) {
      // Capacity too large, compute shrinking size
      shrink_bytes = capacity_until_GC - maximum_desired_capacity;
      shrink_bytes = shrink_bytes / 100 * current_shrink_factor;

      shrink_bytes = align_down(shrink_bytes, Metaspace::commit_alignment());

      assert(shrink_bytes <= max_shrink_bytes,
             "invalid shrink size " SIZE_FORMAT " not <= " SIZE_FORMAT,
             shrink_bytes, max_shrink_bytes);
      if (current_shrink_factor == 0) {
        _shrink_factor = 10;
      } else {
        _shrink_factor = MIN2(current_shrink_factor * 4, (uint) 100);
      }
      log_trace(gc, metaspace)("    shrinking:  initThreshold: %.1fK  maximum_desired_capacity: %.1fK",
                               MetaspaceSize / (double) K, maximum_desired_capacity / (double) K);
      log_trace(gc, metaspace)("    shrink_bytes: %.1fK  current_shrink_factor: %d  new shrink factor: %d  MinMetaspaceExpansion: %.1fK",
                               shrink_bytes / (double) K, current_shrink_factor, _shrink_factor, MinMetaspaceExpansion / (double) K);
    }
  }

  // Don't shrink unless it's significant
  if (shrink_bytes >= MinMetaspaceExpansion &&
      ((capacity_until_GC - shrink_bytes) >= MetaspaceSize)) {
    size_t new_capacity_until_GC = MetaspaceGC::dec_capacity_until_GC(shrink_bytes);
    Metaspace::tracer()->report_gc_threshold(capacity_until_GC,
                                             new_capacity_until_GC,
                                             MetaspaceGCThresholdUpdater::ComputeNewSize);
  }
}

As can be seen from scenario 1, in order to avoid additional GC consumption caused by elastic scaling, we will  - XX:MetaSpaceSize   and  - XX:MaxMetaSpaceSize   The two values are set to be fixed, but this will also lead to the inability to expand when there is not enough space, and then frequently trigger GC and finally OOM. Therefore, the key reason is that ClassLoader keeps loading new classes in memory. Generally, this problem occurs in dynamic Class loading and so on.

4.3.3 strategy

After you know about the reason, how to locate and solve it is very simple. You can dump the snapshot and observe the Histogram (Histogram) of Classes through jpprofiler or MAT, or locate it directly through the command. Jcmd hits the Histogram diagram several times to see which package has more Classes. However, sometimes it is necessary to combine InstBytes, KlassBytes, Bytecodes, MethodAll and other indicators. The following figure shows an Orika problem found by the author using jcmd.

jcmd <PID> GC.class_stats|awk '{print$13}'|sed  's/\(.*\)\.\(.*\)/\1/g'|sort |uniq -c|sort -nrk1

If you cannot position from the overall perspective, you can add  - XX:+TraceClassLoading   and  - XX:+TraceClassUnLoading   Parameter to observe detailed class loading and unloading information.

  Scan VX for Java data, front-end, test, python and so on

4.3.4 summary

The understanding of the principle is complicated, but it is easy to locate and solve the problem. The common problems include Orika's classMap, JSON's ASMSerializer, Groovy dynamic loading class, etc., which basically focus on the technical points of reflection, Javasisit bytecode enhancement, CGLIB dynamic proxy, OSGi self-defined class loader, etc. In addition, the utilization rate of MetaSpace area shall be monitored in time. If there are fluctuations in indicators, problems shall be found and solved in advance.

4.4 scenario 4: early promotion*

4.4.1 phenomena

This scenario mainly occurs on generational collectors, which is called "prediction promotion" in professional terms. 90% of the objects live and die day and night. They will only be promoted to the Old area after several GC baptisms in the Young area. The GC Age of the objects will increase by 1 every time they experience GC, and the largest pass  - XX:MaxTenuringThreshold   To control.

Premature promotion generally does not directly affect GC. It is always accompanied by floating garbage, large object guarantee failure and other problems, but these problems do not occur immediately. We can observe the following phenomena to judge whether premature promotion has occurred.

The allocation rate is close to the promotion rate, and the promotion age of the object is small.

"Desired survivor size 107347968 bytes" appears in the GC log,   new threshold 1(max 6) "and other information indicate that GC will be placed in the Old area after experiencing a GC at this time.

Full GC is frequent, and the change proportion of Old area is very large after one GC.

For example, the recovery threshold triggered by the Old area is 80%, which decreases to 10% after a GC, which shows that 70% of the objects in the Old area actually have a short survival time. As shown in the figure below, the size of the Old area is recovered from 2.1G to 300M after each GC, that is, 1.8G of garbage is recovered, only   300M active object. The whole Heap is currently 4G, and active objects account for less than one tenth.

Hazards of early promotion:

  • Young GC is frequent and the total throughput decreases.
  • Full GC is frequent, and there may be a large pause.

4.4.2 causes

The main reasons are as follows:

  • Young/Eden area is too small:   The direct consequence of being too small is that Eden's filling time becomes shorter, and the objects that should have been recycled participate in the GC and are promoted. Young GC adopts the replication algorithm. From the basic article, we know that the time spent by copying is much longer than that of mark, that is, the time spent by young GC is essentially the time spent by copying (in other words, when CMS scans Card Table or G1 scans member set, there is a problem), The objects that are not recycled in time increase the cost of recycling, so the time of young GC increases, and the space cannot be released quickly, and the number of young GC also increases.

  • Allocation rate too high:   You can observe the allocation rate of Mutator before and after the problem. If there is obvious fluctuation, you can try to observe the network card traffic, slow query log of storage middleware and other information to see whether a large amount of data is loaded into memory.

At the same time, the failure to GC drop objects also brings another problem, which leads to dynamic age calculation: the JVM passes  - XX:MaxTenuringThreshold   Parameter to control the promotion age. After each GC, the age will be increased by one. When the maximum age is reached, you can enter the Old area. The maximum value is 15 (because 4 bits are used in the JVM to represent the age of the object). Set a fixed maxtenuringthreshold value as the promotion condition:

  • If MaxTenuringThreshold is set too large, the objects that should have been promoted will stay in the Survivor area until the Survivor area overflows. Once the overflow occurs, all objects in Eden + Survivor will no longer be promoted to the Old area according to their age, so the object aging mechanism will fail.

  • If MaxTenuringThreshold is set too small, it will be promoted too early, that is, objects cannot be fully recycled in the Young area, and a large number of short-term objects will be promoted to the Old area. The space in the Old area will grow rapidly, causing frequent Major GC. Generational recycling will lose its significance and seriously affect GC performance.

The performance of the same application is different at different times. The execution of special tasks or the change of traffic components will lead to the fluctuation of the object's life cycle distribution. The fixed threshold setting will cause the above problems because it cannot dynamically adapt to the change. Therefore, Hotspot will use dynamic calculation to adjust the promotion threshold.

For specific dynamic calculation, please refer to the Hotspot source code, which is specified in / src/hotspot/share/gc/shared/ageTable.cpp   compute_tenuring_threshold   Method:

uint ageTable::compute_tenuring_threshold(size_t survivor_capacity) {
  //TargetSurvivorRatio defaults to 50, which means that after recycling, you want the occupation rate of survivor area to reach this ratio
  size_t desired_survivor_size = (size_t)((((double) survivor_capacity)*TargetSurvivorRatio)/100);
  size_t total = 0;
  uint age = 1;
  assert(sizes[0] == 0, "no objects with age zero should be recorded");
  while (age < table_size) {//table_size=16
    total += sizes[age];
    //If the size of all objects at this age is added and the consumption is > the expected size, age is set as the new promotion threshold
    if (total > desired_survivor_size) break;
    age++;
  }

  uint result = age < MaxTenuringThreshold ? age : MaxTenuringThreshold;
  if (PrintTenuringDistribution || UsePerfData) {

    //Print the expected survivor size, the newly calculated threshold, and the set maximum threshold
    if (PrintTenuringDistribution) {
      gclog_or_tty->cr();
      gclog_or_tty->print_cr("Desired survivor size " SIZE_FORMAT " bytes, new threshold %u (max %u)",
        desired_survivor_size*oopSize, result, (int) MaxTenuringThreshold);
    }

    total = 0;
    age = 1;
    while (age < table_size) {
      total += sizes[age];
      if (sizes[age] > 0) {
        if (PrintTenuringDistribution) {
          gclog_or_tty->print_cr("- age %3u: " SIZE_FORMAT_W(10) " bytes, " SIZE_FORMAT_W(10) " total",
                                        age,    sizes[age]*oopSize,          total*oopSize);
        }
      }
      if (UsePerfData) {
        _perf_sizes[age]->set_value(sizes[age]*oopSize);
      }
      age++;
    }
    if (UsePerfData) {
      SharedHeap* sh = SharedHeap::heap();
      CollectorPolicy* policy = sh->collector_policy();
      GCPolicyCounters* gc_counters = policy->counters();
      gc_counters->tenuring_threshold()->set_value(result);
      gc_counters->desired_survivor_size()->set_value(
        desired_survivor_size*oopSize);
    }
  }

  return result;
}

It can be seen that when Hotspot traverses all objects, it starts to accumulate from the space occupied by all objects with age 0. If the space of all objects with age equal to n is added, it can be judged by using the condition value of Survivor area (TargetSurvivorRatio / 100, TargetSurvivorRatio default value is 50). If it is greater than this value, the cycle will be ended, Compare n with MaxTenuringThreshold. If n is small, the threshold is n. if n is large, you can only set the maximum threshold to MaxTenuringThreshold. After the dynamic age is triggered, more objects enter the Old area, resulting in a waste of resources.

4.4.3 strategy

After knowing the cause of the problem, we have the direction to solve it. If so   The Young/Eden area is too small. We can appropriately increase the Young area while the total Heap memory remains unchanged. How to increase it? In general, the size of Old should be about 2 ~ 3 times that of active objects. Considering the floating garbage problem, it is better to be about 3 times, and the rest can be distributed to Young area.

Taking a typical early promotion optimization by the author, the original configuration was Young 1.2G + Old 2.8G. By observing the CMS GC, it was found that the surviving object was about 300~400M, so the Old 1.5G was adjusted, and the remaining 2.5G was distributed to Young area. By adjusting only one Young area size parameter (- Xmn), the whole JVM reduces the number of Young GC from 26 to 11 times a minute, and the single time does not increase. The total GC time is reduced from 1100ms to 500ms, and the number of CMS GC is reduced from about 40 minutes to 7 hours and 30 minutes.

If the allocation rate is too high:

  • Occasional large: find the problem code through the memory analysis tool and make some optimization from the business logic.

  • Always large: the current Collector does not meet the expectations of the Mutator. In this case, either expand the VM of the Mutator, or adjust the GC Collector type or increase the space.

4.4.4 summary

The problem of early promotion is generally not particularly obvious, but over time, a wave of collector degradation may break out, so we should avoid it in advance. We can see whether there are these phenomena in our system. If it matches, we can try to optimize it. The ROI of one line of code optimization is still very high.

In the process of observing the proportion change before and after the Old area, it is found that the proportion that can be recovered is very small. For example, it is only recovered from 80% to 60%, indicating that most of our objects are alive, and the space in the Old area can be appropriately increased.

4.4.5 meal addition

As for how to select the specific NewRatio value when adjusting the ratio of Young to Old, here we abstract the problem into a reservoir model, find the following key measurement indicators, and you can calculate them according to your own scenario.

  • The value r of NewRatio is related to va, vp, vyc, voc, rs   There is a certain functional correlation (RS) in the equivalence   The smaller R is, the larger R is, and the smaller R is vp   The smaller,..., I tried to use NN to assist in modeling, but I haven't completely worked out the specific formula yet. Students with ideas can give your answer in the comment area).

  • Total pause time T is Young GC total time Tyc   And Old GC total time Toc   Sum of, where Tyc   And vyc   And vp   Related, Toc   Related to voc.

  • After ignoring the GC time, the time interval between two young GCS should be greater than TP9999. In this way, the object can be recycled in Eden area as much as possible, which can reduce many pauses.

4.5 scenario 5: CMS Old GC frequent*

4.5.1 phenomena

CMS GC is frequently performed in the Old area, but each time is not too long, and the overall maximum STW is also within the acceptable range. However, too frequent GC leads to a large decline in throughput.

4.5.2 causes

This is a common situation. After the Young GC is completed, a Background thread responsible for processing CMS GC, concurrentMarkSweepThread, will continuously poll and use   shouldConcurrentCollect()   Methods a test was conducted to determine whether the recovery conditions were met. If conditions are met, use   collect_in_background()   Start the Background mode GC once. The judgment of polling is to use   sleepBeforeNextCycle()   Method with an interval of  - XX:CMSWaitDuration   The default value is 2s.

The specific code is: src/hotspot/share/gc/cms/concurrentMarkSweepThread.cpp.

void ConcurrentMarkSweepThread::run_service() {
  assert(this == cmst(), "just checking");

  if (BindCMSThreadToCPU && !os::bind_to_processor(CPUForCMSThread)) {
    log_warning(gc)("Couldn't bind CMS thread to processor " UINTX_FORMAT, CPUForCMSThread);
  }

  while (!should_terminate()) {
    sleepBeforeNextCycle();
    if (should_terminate()) break;
    GCIdMark gc_id_mark;
    GCCause::Cause cause = _collector->_full_gc_requested ?
      _collector->_full_gc_cause : GCCause::_cms_concurrent_mark;
    _collector->collect_in_background(cause);
  }
  verify_ok_to_terminate();
}
void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
  while (!should_terminate()) {
    if(CMSWaitDuration >= 0) {
      // Wait until the next synchronous GC, a concurrent full gc
      // request or a timeout, whichever is earlier.
      wait_on_cms_lock_for_scavenge(CMSWaitDuration);
    } else {
      // Wait until any cms_lock event or check interval not to call shouldConcurrentCollect permanently
      wait_on_cms_lock(CMSCheckInterval);
    }
    // Check if we should start a CMS collection cycle
    if (_collector->shouldConcurrentCollect()) {
      return;
    }
    // .. collection criterion not yet met, let's go back
    // and wait some more
  }
}

The code to determine whether to recycle is: / src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp.

bool CMSCollector::shouldConcurrentCollect() {
  LogTarget(Trace, gc) log;

  if (_full_gc_requested) {
    log.print("CMSCollector: collect because of explicit  gc request (or GCLocker)");
    return true;
  }

  FreelistLocker x(this);
  // ------------------------------------------------------------------
  // Print out lots of information which affects the initiation of
  // a collection.
  if (log.is_enabled() && stats().valid()) {
    log.print("CMSCollector shouldConcurrentCollect: ");

    LogStream out(log);
    stats().print_on(&out);

    log.print("time_until_cms_gen_full %3.7f", stats().time_until_cms_gen_full());
    log.print("free=" SIZE_FORMAT, _cmsGen->free());
    log.print("contiguous_available=" SIZE_FORMAT, _cmsGen->contiguous_available());
    log.print("promotion_rate=%g", stats().promotion_rate());
    log.print("cms_allocation_rate=%g", stats().cms_allocation_rate());
    log.print("occupancy=%3.7f", _cmsGen->occupancy());
    log.print("initiatingOccupancy=%3.7f", _cmsGen->initiating_occupancy());
    log.print("cms_time_since_begin=%3.7f", stats().cms_time_since_begin());
    log.print("cms_time_since_end=%3.7f", stats().cms_time_since_end());
    log.print("metadata initialized %d", MetaspaceGC::should_concurrent_collect());
  }
  // ------------------------------------------------------------------
  if (!UseCMSInitiatingOccupancyOnly) {
    if (stats().valid()) {
      if (stats().time_until_cms_start() == 0.0) {
        return true;
      }
    } else {
  
      if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
        log.print(" CMSCollector: collect for bootstrapping statistics: occupancy = %f, boot occupancy = %f",
                  _cmsGen->occupancy(), _bootstrap_occupancy);
        return true;
      }
    }
  }

  if (_cmsGen->should_concurrent_collect()) {
    log.print("CMS old gen initiated");
    return true;
  }

  // We start a collection if we believe an incremental collection may fail;
  // this is not likely to be productive in practice because it's probably too
  // late anyway.
  CMSHeap* heap = CMSHeap::heap();
  if (heap->incremental_collection_will_fail(true /* consult_young */)) {
    log.print("CMSCollector: collect because incremental collection will fail ");
    return true;
  }

  if (MetaspaceGC::should_concurrent_collect()) {
    log.print("CMSCollector: collect for metadata allocation ");
    return true;
  }

  // CMSTriggerInterval starts a CMS cycle if enough time has passed.
  if (CMSTriggerInterval >= 0) {
    if (CMSTriggerInterval == 0) {
      // Trigger always
      return true;
    }

    // Check the CMS time since begin (we do not check the stats validity
    // as we want to be able to trigger the first CMS cycle as well)
    if (stats().cms_time_since_begin() >= (CMSTriggerInterval / ((double) MILLIUNITS))) {
      if (stats().valid()) {
        log.print("CMSCollector: collect because of trigger interval (time since last begin %3.7f secs)",
                  stats().cms_time_since_begin());
      } else {
        log.print("CMSCollector: collect because of trigger interval (first collection)");
      }
      return true;
    }
  }

  return false;
}

Analyze the logic to determine whether to trigger GC, which can be divided into the following situations:

  • Trigger CMS GC:   By calling  _ collector->collect_ in_ background()   Trigger the Background GC.

    • By default, CMS uses the statistics of JVM runtime to determine whether CMS GC needs to be triggered. If necessary, it can be triggered according to the  - XX:CMSInitiatingOccupancyFraction   Parameters need to be set to judge the value of  - XX:+UseCMSInitiatingOccupancyOnly.

    • If it's on  - XX:UseCMSInitiatingOccupancyOnly   Parameter to judge whether the current Old area utilization is greater than the threshold, then CMS GC is triggered, and the threshold can be determined by the parameter  - XX:CMSInitiatingOccupancyFraction   If it is not set, the default value is 92%.

    • If the previous Young GC has failed, or the next Young GC execution in the Young area may fail, CMS GC needs to be triggered in both cases.

    • CMS does not garbage collect MetaSpace or Perm by default. If you want to garbage collect these areas, you need to set parameters  - XX:+CMSClassUnloadingEnabled.

  • Trigger Full GC:   Directly perform Full GC, which will be described in scenario 7.

    • If  _ full_gc_requested   If true, it indicates that there are clear requirements for GC, such as calling System.gc.

    • Failed to allocate memory for object or TLAB in Eden area, resulting in a Young GC   GenCollectorPolicy   Class   satisfy_failed_allocation()   Method.

You can look at the log printing in the source code. Through the log, we can know the specific reasons more clearly, and then we can start to analyze.

4.5.3 strategy

Let's take the most common scenario of reaching the recycling ratio as an example. Unlike early promotion, these objects did survive for a period of time. The Survival Time exceeded TP9999, but they could not survive for a long time, such as various databases, network links, caches with expiration time, etc.

It is a basic idea to deal with this conventional memory leakage problem. The main steps are as follows:

Dump Diff and Leak Suspects are intuitive, so I won't introduce them. Here are some other key points:

  • Memory dump:   When taking snapshots using dump heaps such as jmap and arthas, remember to remove the traffic and dump it before and after CMS GC.
  • Analyze Top Component:   Remember to observe Histogram according to multiple dimensions such as object, class, class loader and package. At the same time, use outgoing and incoming to analyze the associated objects. In addition, take a look at Soft Reference, Weak Reference and Finalizer.
  • Analyze Unreachable:   Focus on this and focus on the sizes of Shallow and Retained. As shown in the figure below, the author found the sliding window problem of Hystrix according to Unreachable Objects in the previous GC optimization.

4.5.4 summary

After the whole process, you can basically locate the problem, but remember to use the method of control variables in the optimization process to prevent some changes that will aggravate the problem from being covered up.

4.6 scenario 6: a single CMS Old GC takes a long time*

4.6.1 phenomena

The maximum single STW of CMS GC exceeds 1000ms and will not occur frequently. As shown in the figure below, the longest STW reaches 8000ms. Some scenes will cause "avalanche effect", which is very dangerous and we should try to avoid it.

4.6.2 causes

In the recovery process of CMS, the STW stage is mainly Init Mark and Final Remark, which is also the reason for the most CMS Old GC. In addition, in some cases, waiting for the Mutator thread to arrive at SafePoint before STW will also lead to a long time, but this is less. We mainly discuss the former here. For the scenario of collector degradation or fragment compression, see scenario 7.

To understand why these two phases take time, we need to take a look at what these two phases will do.

The core code is in / src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp. There is a thread concurrentmarksweepthread polling to verify. The details related to garbage collection in the Old area are completely encapsulated in   CMSCollector   The call entrance is called by ConcurrentMarkSweepThread.   CMSCollector::collect_in_background   and   ConcurrentMarkSweepGeneration   Invoked   CMSCollector::collect   Methods, here we discuss most scenarios   collect_in_background. During the whole process, the STW is mainly the initial Mark and Final Remark, and the core code is   VM_CMS_Initial_Mark  /  VM_CMS_Final_Remark   During execution, the execution authority needs to be handed over to VMThread for execution.

  • CMS Init Mark executes the following steps to implement the   CMSCollector::checkpointRootsInitialWork()   and   CMSParInitialMarkTask::work   The overall steps and codes are as follows:
void CMSCollector::checkpointRootsInitialWork() {
  assert(SafepointSynchronize::is_at_safepoint(), "world should be stopped");
  assert(_collectorState == InitialMarking, "just checking");

  // Already have locks.
  assert_lock_strong(bitMapLock());
  assert(_markBitMap.isAllClear(), "was reset at end of previous cycle");

  // Setup the verification and class unloading state for this
  // CMS collection cycle.
  setup_cms_unloading_and_verification_state();

  GCTraceTime(Trace, gc, phases) ts("checkpointRootsInitialWork", _gc_timer_cm);

  // Reset all the PLAB chunk arrays if necessary.
  if (_survivor_plab_array != NULL && !CMSPLABRecordAlways) {
    reset_survivor_plab_arrays();
  }

  ResourceMark rm;
  HandleMark  hm;

  MarkRefsIntoClosure notOlder(_span, &_markBitMap);
  CMSHeap* heap = CMSHeap::heap();

  verify_work_stacks_empty();
  verify_overflow_empty();

  heap->ensure_parsability(false);  // fill TLABs, but no need to retire them
  // Update the saved marks which may affect the root scans.
  heap->save_marks();

  // weak reference processing has not started yet.
  ref_processor()->set_enqueuing_is_done(false);

  // Need to remember all newly created CLDs,
  // so that we can guarantee that the remark finds them.
  ClassLoaderDataGraph::remember_new_clds(true);

  // Whenever a CLD is found, it will be claimed before proceeding to mark
  // the klasses. The claimed marks need to be cleared before marking starts.
  ClassLoaderDataGraph::clear_claimed_marks();

  print_eden_and_survivor_chunk_arrays();

  {
    if (CMSParallelInitialMarkEnabled) {
      // The parallel version.
      WorkGang* workers = heap->workers();
      assert(workers != NULL, "Need parallel worker threads.");
      uint n_workers = workers->active_workers();

      StrongRootsScope srs(n_workers);

      CMSParInitialMarkTask tsk(this, &srs, n_workers);
      initialize_sequential_subtasks_for_young_gen_rescan(n_workers);
      // If the total workers is greater than 1, then multiple workers
      // may be used at some time and the initialization has been set
      // such that the single threaded path cannot be used.
      if (workers->total_workers() > 1) {
        workers->run_task(&tsk);
      } else {
        tsk.work(0);
      }
    } else {
      // The serial version.
      CLDToOopClosure cld_closure(&notOlder, true);
      heap->rem_set()->prepare_for_younger_refs_iterate(false); // Not parallel.

      StrongRootsScope srs(1);

      heap->cms_process_roots(&srs,
                             true,   // young gen as roots
                             GenCollectedHeap::ScanningOption(roots_scanning_options()),
                             should_unload_classes(),
                             &notOlder,
                             &cld_closure);
    }
  }

  // Clear mod-union table; it will be dirtied in the prologue of
  // CMS generation per each young generation collection.
  assert(_modUnionTable.isAllClear(),
       "Was cleared in most recent final checkpoint phase"
       " or no bits are set in the gc_prologue before the start of the next "
       "subsequent marking phase.");

  assert(_ct->cld_rem_set()->mod_union_is_clear(), "Must be");
  // Save the end of the used_region of the constituent generations
  // to be used to limit the extent of sweep in each generation.
  save_sweep_limits();
  verify_overflow_empty();
}
void CMSParInitialMarkTask::work(uint worker_id) {
  elapsedTimer _timer;
  ResourceMark rm;
  HandleMark   hm;

  // ---------- scan from roots --------------
  _timer.start();
  CMSHeap* heap = CMSHeap::heap();
  ParMarkRefsIntoClosure par_mri_cl(_collector->_span, &(_collector->_markBitMap));

  // ---------- young gen roots --------------
  {
    work_on_young_gen_roots(&par_mri_cl);
    _timer.stop();
    log_trace(gc, task)("Finished young gen initial mark scan work in %dth thread: %3.3f sec", worker_id, _timer.seconds());
  }

  // ---------- remaining roots --------------
  _timer.reset();
  _timer.start();

  CLDToOopClosure cld_closure(&par_mri_cl, true);

  heap->cms_process_roots(_strong_roots_scope,
                          false,     // yg was scanned above
                          GenCollectedHeap::ScanningOption(_collector->CMSCollector::roots_scanning_options()),
                          _collector->should_unload_classes(),
                          &par_mri_cl,
                          &cld_closure,
                          &_par_state_string);

  assert(_collector->should_unload_classes()
         || (_collector->CMSCollector::roots_scanning_options() & GenCollectedHeap::SO_AllCodeCache),
         "if we didn't scan the code cache, we have to be ready to drop nmethods with expired weak oops");
  _timer.stop();
  log_trace(gc, task)("Finished remaining root initial mark scan work in %dth thread: %3.3f sec", worker_id, _timer.seconds());
}

The whole process is relatively simple. Start from GC Root to mark the objects in Old. After processing, use BitMap to process the reference of Young area to Old area. The whole process is basically fast and there are few large pauses.

  • CMS Final Remark implements the following steps:   CMSCollector::checkpointRootsFinalWork()   The overall code and steps are as follows:
void CMSCollector::checkpointRootsFinalWork() {
  GCTraceTime(Trace, gc, phases) tm("checkpointRootsFinalWork", _gc_timer_cm);

  assert(haveFreelistLocks(), "must have free list locks");
  assert_lock_strong(bitMapLock());

  ResourceMark rm;
  HandleMark   hm;

  CMSHeap* heap = CMSHeap::heap();

  if (should_unload_classes()) {
    CodeCache::gc_prologue();
  }
  assert(haveFreelistLocks(), "must have free list locks");
  assert_lock_strong(bitMapLock());

  heap->ensure_parsability(false);  // fill TLAB's, but no need to retire them
  // Update the saved marks which may affect the root scans.
  heap->save_marks();

  print_eden_and_survivor_chunk_arrays();

  {
    if (CMSParallelRemarkEnabled) {
      GCTraceTime(Debug, gc, phases) t("Rescan (parallel)", _gc_timer_cm);
      do_remark_parallel();
    } else {
      GCTraceTime(Debug, gc, phases) t("Rescan (non-parallel)", _gc_timer_cm);
      do_remark_non_parallel();
    }
  }
  verify_work_stacks_empty();
  verify_overflow_empty();

  {
    GCTraceTime(Trace, gc, phases) ts("refProcessingWork", _gc_timer_cm);
    refProcessingWork();
  }
  verify_work_stacks_empty();
  verify_overflow_empty();

  if (should_unload_classes()) {
    CodeCache::gc_epilogue();
  }
  JvmtiExport::gc_epilogue();
  assert(_markStack.isEmpty(), "No grey objects");
  size_t ser_ovflw = _ser_pmc_remark_ovflw + _ser_pmc_preclean_ovflw +
                     _ser_kac_ovflw        + _ser_kac_preclean_ovflw;
  if (ser_ovflw > 0) {
    log_trace(gc)("Marking stack overflow (benign) (pmc_pc=" SIZE_FORMAT ", pmc_rm=" SIZE_FORMAT ", kac=" SIZE_FORMAT ", kac_preclean=" SIZE_FORMAT ")",
                         _ser_pmc_preclean_ovflw, _ser_pmc_remark_ovflw, _ser_kac_ovflw, _ser_kac_preclean_ovflw);
    _markStack.expand();
    _ser_pmc_remark_ovflw = 0;
    _ser_pmc_preclean_ovflw = 0;
    _ser_kac_preclean_ovflw = 0;
    _ser_kac_ovflw = 0;
  }
  if (_par_pmc_remark_ovflw > 0 || _par_kac_ovflw > 0) {
     log_trace(gc)("Work queue overflow (benign) (pmc_rm=" SIZE_FORMAT ", kac=" SIZE_FORMAT ")",
                          _par_pmc_remark_ovflw, _par_kac_ovflw);
     _par_pmc_remark_ovflw = 0;
    _par_kac_ovflw = 0;
  }
   if (_markStack._hit_limit > 0) {
     log_trace(gc)(" (benign) Hit max stack size limit (" SIZE_FORMAT ")",
                          _markStack._hit_limit);
   }
   if (_markStack._failed_double > 0) {
     log_trace(gc)(" (benign) Failed stack doubling (" SIZE_FORMAT "), current capacity " SIZE_FORMAT,
                          _markStack._failed_double, _markStack.capacity());
   }
  _markStack._hit_limit = 0;
  _markStack._failed_double = 0;

  if ((VerifyAfterGC || VerifyDuringGC) &&
      CMSHeap::heap()->total_collections() >= VerifyGCStartAt) {
    verify_after_remark();
  }

  _gc_tracer_cm->report_object_count_after_gc(&_is_alive_closure);

  // Change under the freelistLocks.
  _collectorState = Sweeping;
  // Call isAllClear() under bitMapLock
  assert(_modUnionTable.isAllClear(),
      "Should be clear by end of the final marking");
  assert(_ct->cld_rem_set()->mod_union_is_clear(),
      "Should be clear by end of the final marking");
}

Final mark is the final second mark, which can only be executed when the Background GC has executed the InitialMarking step. If it is the InitialMarking step executed by the Foreground GC, it is not necessary to execute final mark again. The initial stage of Final Remark is the same as that of Init Mark, but the subsequent traversal of Card Table and the cleaning of Reference instances are added to the process of Reference maintenance   pend_list   If you want to collect metadata information, you should also clean up resources that are no longer used in SystemDictionary, CodeCache, SymbolTable, StringTable and other components.

4.6.3 strategy

After knowing the execution processes of the two STW processes, we can easily analyze and solve them. Since most of the problems are in the Final Remark process, we also take this scenario as an example. The main steps are as follows:

  • [direction]   Observe the detailed GC log, find the Final Remark log when the problem occurs, and analyze whether the real time of Reference processing and metadata processing is normal. The details need to be passed  - XX:+PrintReferenceGC   Parameter on. Basically, you can locate the problem in the log. If it takes more than 10% of the time, you need to pay attention.
2019-02-27T19:55:37.920+0800: 516952.915: [GC (CMS Final Remark) 516952.915: [ParNew516952.939: [SoftReference, 0 refs, 0.0003857 secs]516952.939: [WeakReference, 1362 refs, 0.0002415 secs]516952.940: [FinalReference, 146 refs, 0.0001233 secs]516952.940: [PhantomReference, 0 refs, 57 refs, 0.0002369 secs]516952.940: [JNI Weak Reference, 0.0000662 secs]
[class unloading, 0.1770490 secs]516953.329: [scrub symbol table, 0.0442567 secs]516953.373: [scrub string table, 0.0036072 secs][1 CMS-remark: 1638504K(2048000K)] 1667558K(4352000K), 0.5269311 secs] [Times: user=1.20 sys=0.03, real=0.53 secs]
  • [root cause]   With specific directions, we can conduct in-depth analysis. Generally speaking, the most likely problems are the FinalReference in Reference and the scrub symbol table in metadata information processing. To find the specific problem code, we need the memory analysis tool MAT or JProfiler. Note that dump is about to start the CMS GC heap. Before using tools such as MAT, you can also use the command line to look at the object Histogram, which may directly locate the problem.

    • Analysis of FinalReference main observations   java.lang.ref.Finalizer   Object to find the source of the leak. Socket is one of the common problems   SocksSocketImpl  , Jersey's   ClientRuntime, MySQL   ConnectionImpl   wait.

    • scrub symbol table indicates that it takes time to clean up metadata symbol references. Symbol references are the representation of methods in the JVM when Java code is compiled into bytecode. The life cycle is generally consistent with that of Class  _ should_unload_classes   When set to true   CMSCollector::refProcessingWork()   It is processed together with Class Unload and String Table.

if (should_unload_classes()) {
    {
      GCTraceTime(Debug, gc, phases) t("Class Unloading", _gc_timer_cm);

      // Unload classes and purge the SystemDictionary.
      bool purged_class = SystemDictionary::do_unloading(_gc_timer_cm);

      // Unload nmethods.
      CodeCache::do_unloading(&_is_alive_closure, purged_class);

      // Prune dead klasses from subklass/sibling/implementor lists.
      Klass::clean_weak_klass_links(purged_class);
    }

    {
      GCTraceTime(Debug, gc, phases) t("Scrub Symbol Table", _gc_timer_cm);
      // Clean up unreferenced symbols in symbol table.
      SymbolTable::unlink();
    }

    {
      GCTraceTime(Debug, gc, phases) t("Scrub String Table", _gc_timer_cm);
      // Delete entries for dead interned strings.
      StringTable::unlink(&_is_alive_closure);
    }
  }
  • [strategy]   It is easier to deal with the root cause of GC time-consuming. This problem will not break out in a large area at the same time. However, in many cases, the time of a single STW will be relatively long. If the business impact is relatively large, remove the traffic in time. The specific follow-up optimization strategies are as follows:

    • FinalReference: find the memory source and solve it by optimizing the code. If it cannot be located in a short time, it can be increased  - XX:+ParallelRefProcEnabled   Parallel processing of Reference.

    • symbol table: observe the historical usage peak of MetaSpace area and the recovery before and after each GC. Generally, dynamic class loading or DSL processing are not used, and there will be no change in the usage of MetaSpace. This can be achieved through  - XX:-CMSClassUnloadingEnabled   To avoid the processing of MetaSpace, JDK8 will enable cmsclassunloading enabled by default, which will make CMS attempt to unload classes in the CMS remark phase.

4.6.4 summary

In the Background CMS GC under normal conditions, the problems basically focus on the metadata processing such as Reference and Class. In the problem processing of Reference Class, the core means of FinalReference, SoftReference and WeakReference is to find the right time dump snapshot, and then analyze it with memory analysis tools. At present, there is no good method for Class processing except turning off the Class unloading switch.

There is also a Reference problem in G1. You can observe the Ref Proc in the log. The processing method is similar to that in CMS.

4.7 scenario 7: memory fragmentation & collector degradation

4.7.1 phenomena

The concurrent CMS GC algorithm degenerates into foreround single thread serial GC mode, and the STW time is very long, sometimes as long as more than ten seconds. There are two single thread serial GC algorithms after CMS collector degradation:

  • The algorithm with compression action is called MSC. As we introduced above, the method of mark clean compression and single thread full pause is used to collect garbage from the whole heap, that is, Full GC in the real sense. The pause time is longer than that of ordinary CMS.
  • The algorithm without compression action collects the Old area, which is similar to the ordinary CMS algorithm, and the pause time is shorter than the MSC algorithm.

4.7.2 causes

The collector degradation of CMS mainly includes the following situations:

Promotion Failed

As the name suggests, promotion failure means that during young GC, Survivor can't put it down, and the object can only be put into Old, but Old can't put it down at this time. Intuitively, at first glance, this situation may often occur, but in fact, due to the existence of concurrentMarkSweepThread and guarantee mechanism, the conditions are very harsh, unless the remaining space in the Old area is quickly filled in a short time, such as the premature promotion caused by dynamic age judgment mentioned above (see the failure of incremental collection guarantee below). Another case is the Promotion Failed caused by memory fragmentation. Young GC thinks that Old has enough space. As a result, when allocating, the promoted large objects cannot find continuous storage space.

When CMS is used as the GC collector, the Old area that has been running for a period of time is shown in the figure below. The clearing algorithm leads to multi segment discontinuity and a large number of memory fragments.

Fragmentation poses two problems:

  • Low space allocation efficiency: as mentioned above, if the space is continuous, the JVM can allocate it by using pointer bumping. For this free linked list with a large number of fragments, you need to access the items in freelist one by one to find the address where the new object can be stored.
  • Space utilization efficiency becomes low: if the object size promoted in the Young area is larger than the size of continuous space, Promotion Failed will be triggered. Even if the capacity of the whole Old area is sufficient, new objects cannot be stored due to its discontinuity, which is the problem mentioned in this paper.

Incremental collection guarantee failed

After the memory allocation fails, it will judge whether the average size of the young GC promoted to the Old area and the size used by the current young area, that is, the maximum possible object size, are greater than the remaining space in the Old area. As long as the remaining space of CMS is larger than either of the first two, CMS thinks that promotion is still safe. On the contrary, it means that it is unsafe. Young GC is not carried out and Full GC is triggered directly.

Explicit GC

See scenario 2 for this situation.

Concurrent Mode Failure

In the last case, which has a high probability of occurrence, you can often see the keyword Concurrent Mode Failure in the GC log. This is because the concurrent Background CMS GC is executing and the objects promoted by Young GC are to be put into the Old area. At this time, the space in the Old area is insufficient.

Why does CMS GC execution cause collector degradation? This is mainly caused by CMS's inability to handle Floating Garbage. In the concurrent cleaning phase of CMS, the Mutator is still running, so new garbage is constantly generated. These garbage are not included in the scope of this cleaning mark and cannot be cleared in this GC. These are Floating Garbage. In addition, objects that disconnect references and break away from the read-write barrier control before Remark are also Floating Garbage. Therefore, the recycling threshold of the Old area cannot be too high, otherwise the reserved memory space may not be enough, resulting in Concurrent Mode Failure.

4.7.3 strategy

After analyzing the specific causes, we can solve them. The specific ideas are still based on the root causes, and the specific solutions are as follows:

  • Memory fragmentation:   Through configuration  - XX:UseCMSCompactAtFullCollection=true   To control whether space sorting is performed during Full GC (it is enabled by default. Note that Full GC is not an ordinary CMS GC), and  - XX: CMSFullGCsBeforeCompaction=n   To control how many times Full GC is compressed.

  • Incremental collection:   Lower the threshold that triggers CMS GC, i.e. parameter  - XX:CMSInitiatingOccupancyFraction   To allow CMS GC to execute as soon as possible, so as to ensure sufficient continuous space and reduce the use size of Old area space. In addition, it needs to be used  - XX:+UseCMSInitiatingOccupancyOnly   Otherwise, the JVM will only use the set value for the first time, and will automatically adjust it later.

  • Floating garbage:   Control the size of each promotion object as appropriate, or shorten the time of each CMS GC. If necessary, adjust the value of NewRatio. The other is to use  - XX:+CMSScavengeBeforeRemark   The Young GC is triggered in advance during the process to prevent too many objects from being promoted later.

4.7.4 summary

Under normal circumstances, CMS GC triggering concurrency mode has a very short pause and has little impact on the business. However, after the degradation of CMS GC, the impact will be very large. It is recommended to completely cure it after one discovery. As long as the specific causes of memory fragments, floating garbage and incremental collection can be located, it is still easier to solve. For memory fragments, if  - XX:CMSFullGCsBeforeCompaction   If the value of is not easy to select, it can be used  - XX:PrintFLSStatistics   To observe the memory fragmentation rate, and then set the specific value.

Finally, avoid the generation of large objects requiring continuous address space when encoding, such as long strings, byte arrays used to store attachments, serialization or deserialization, and the problem of premature promotion. Try to avoid it before the problem breaks out.

4.8 scenario 8: out of heap memory OOM

4.8.1 phenomena

The memory utilization rate keeps rising, and even starts to use SWAP memory. At the same time, GC time may soar, threads may be blocked, etc. it is found that the RES of Java processes even exceeds the limit through the top command  - Xmx   The size of the. When these phenomena occur, it can be basically determined that there is an out of heap memory leak.

4.8.2 causes

There are two main reasons for JVM out of heap memory leakage:

  • adopt   UnSafe#allocateMemory,ByteBuffer#allocateDirect   It is common for NIO, Netty and other related components to actively apply for out of heap memory without releasing it.
  • The memory requested by JNI calling Native Code in the code has not been released.

4.8.3 strategy

What causes an out of heap memory leak?

First, we need to determine what causes the out of heap memory leak. NMT can be used here( NativeMemoryTracking )Analysis. Add to project  - XX:NativeMemoryTracking=detail   Restart the project after the JVM parameters (note that opening NMT will cause 5% ~ 10% performance loss). Use command   jcmd pid VM.native_memory detail   View the memory distribution. Focus on the committed in total, because the memory displayed by the jcmd command includes in heap memory, Code area and pass through   Unsafe.allocateMemory   and   DirectByteBuffer   The requested memory, but does not contain the off heap memory requested by other Native Code (C Code).

If the difference between the committed in total and the RES in top is small, it should be caused by the unsolicited out of heap memory. If the difference is large, it can be determined that it is caused by JNI call.

Reason 1: active application not released

JVM usage  - XX:MaxDirectMemorySize=size   Parameter to control the maximum amount of off heap memory that can be requested. In Java 8, if this parameter is not configured, the default and  - Xmx   equal.

NIO and Netty will take it  - XX:MaxDirectMemorySize   To limit the size of the requested out of heap memory. There is also a counter field in NIO and Netty to calculate the currently applied out of heap memory size. In NIO, it is   java.nio.Bits#totalCapacity, Netty   io.netty.util.internal.PlatformDependent#DIRECT_MEMORY_COUNTER.

When applying for out of heap memory, NIO and Netty will compare the size of the counter field and the maximum value. If the counter value exceeds the limit of the maximum value, an OOM exception will be thrown.

NIO: OutOfMemoryError: Direct buffer memory.

In Netty: outofdirectoryerror: failed to allocate capacity byte (s) of direct memory (used: usedmemory, Max: direct_memory_limit).

We can check how the off heap memory is used in the code. NIO or Netty can obtain the counter field in the corresponding component through reflection, and dot the value of the field in the project to accurately monitor the use of this part of off heap memory.

At this time, you can use Debug to determine whether the code to free memory is correctly executed where off heap memory is used. In addition, you need to check whether the parameters of the JVM are valid  - XX:+DisableExplicitGC   Option, if any, it will be removed, because this parameter will invalidate System.gc. (scenario 2: explicit GC removal and retention)

Reason 2: the memory applied by the Native Code called by JNI is not released

It's difficult to check this situation. We can use Google perftools + Btrace and other tools to help us analyze where the problem code is.

Gperftools is a very practical tool set developed by Google. Its principle is to use its libtcmalloc.so when calling malloc when Java applications are running, so as to make some statistics on memory allocation. We use gperftools to track commands that allocate memory. As shown in the figure below, it is found through gperftools   Java_java_util_zip_Inflater_init   It's suspicious.

Next, you can use btrace to try to locate the specific call stack. Btrace is a Java tracking and monitoring tool launched by Sun, which can monitor online Java programs without downtime. As shown in the following figure, locate the in the project through btrace   ZipHelper   In frequent calls   GZIPInputStream  , Allocate objects out of heap memory.

The final positioning is yes, in the project   GIPInputStream   Incorrect use of, no correct close().

Besides the reasons of the project itself, there may also be leaks caused by external dependencies, such as Netty and Spring Boot. You can learn more about these two articles, Troubleshooting and experience summary of "out of heap memory leakage" caused by Spring Boot,Netty off heap memory leak troubleshooting feast.

4.8.4 summary

First, NMT + jcmd can be used to analyze where the leaked off heap memory is applied. After determining the cause, different means can be used to locate the cause.

4.9 scenario 9: GC problems caused by JNI

4.9.1 phenomena

In the GC log, the GC Cause is GCLocker Initiated GC.

2020-09-23T16:49:09.727+0800: 504426.742: [GC (GCLocker Initiated GC) 504426.742: [ParNew (promotion failed): 209716K->6042K(1887488K), 0.0843330 secs] 1449487K->1347626K(3984640K), 0.0848963 secs] [Times: user=0.19 sys=0.00, real=0.09 secs]
2020-09-23T16:49:09.812+0800: 504426.827: [Full GC (GCLocker Initiated GC) 504426.827: [CMS: 1341583K->419699K(2097152K), 1.8482275 secs] 1347626K->419699K(3984640K), [Metaspace: 297780K->297780K(1329152K)], 1.8490564 secs] [Times: user=1.62 sys=0.20, real=1.85 secs]

4.9.2 causes

JNI (Java Native Interface) means Java local call. It allows java code to interact with native code written in other languages.

JNI can obtain strings or arrays in the JVM in two ways:

  • Copy delivery.
  • Shared references (pointers) for better performance.

Because the Native code directly uses the pointer of the JVM heap, if GC occurs at this time, it will lead to data errors. Therefore, when such JNI calls occur, the occurrence of GC is prohibited, and other threads are prevented from entering the JNI critical area until the last thread exits the critical area.

GC Locker experiment:

public class GCLockerTest {

  static final int ITERS = 100;
  static final int ARR_SIZE =  10000;
  static final int WINDOW = 10000000;

  static native void acquire(int[] arr);
  static native void release(int[] arr);

  static final Object[] window = new Object[WINDOW];

  public static void main(String... args) throws Throwable {
    System.loadLibrary("GCLockerTest");
    int[] arr = new int[ARR_SIZE];

    for (int i = 0; i < ITERS; i++) {
      acquire(arr);
      System.out.println("Acquired");
      try {
        for (int c = 0; c < WINDOW; c++) {
          window[c] = new Object();
        }
      } catch (Throwable t) {
        // omit
      } finally {
        System.out.println("Releasing");
        release(arr);
      }
    }
  }
}
#include <jni.h>
#include "GCLockerTest.h"

static jbyte* sink;

JNIEXPORT void JNICALL Java_GCLockerTest_acquire(JNIEnv* env, jclass klass, jintArray arr) {
sink = (*env)->GetPrimitiveArrayCritical(env, arr, 0);
}

JNIEXPORT void JNICALL Java_GCLockerTest_release(JNIEnv* env, jclass klass, jintArray arr) {
(*env)->ReleasePrimitiveArrayCritical(env, arr, sink, 0);
}

When running the JNI program, you can see that all GCs are gclocker initiated GCS, and note that GC cannot occur when "Acquired" and "Released".

Possible adverse consequences of GC Locker include:

  • If this is a GC caused by insufficient Allocation Failure in the Young area, the object will be directly allocated to the Old area because the Young GC cannot be performed.

  • If there is no space in the Old area, it will wait for the lock to be released, resulting in thread blocking.

  • Additional unnecessary Young GC may be triggered. JDK has a Bug, which has a certain probability. The Young GC that should have triggered GCLocker Initiated GC only once actually has an Allocation Failure GC, followed by a GCLocker Initiated GC. Because the attribute of GCLocker Initiated GC is set to full, the GC cannot converge twice.

4.9.3 strategy

  • add to  - XX+PrintJNIGCStalls   Parameter, you can print out the thread when JNI call occurs, and further analyze to find the JNI call that causes the problem.

  • JNI calls need to be cautious. They may not improve performance, but may cause GC problems.

  • Upgrade JDK version to 14 to avoid   JDK-8048556   Resulting in duplicate GC.

4.9.4 summary

GC problems caused by JNI are difficult to troubleshoot and should be used with caution.

5. Summary

Here, we summarize the content of the whole article to facilitate everyone's overall understanding and review.

5.1 process flow (SOP)

The following figure shows the general processing flow of the overall GC problem, and the key points will be marked separately below. Other basic processes are standard processing flow, which will not be repeated here. Finally, after the whole problem is processed, it is recommended to make a repeat if conditions permit.

  • Standard setting:   In fact, this content is very important, but most systems are missing. Less than 10% of the students interviewed by the author in the past can give their own system GC standard. Others use a unified index template and lack predictability. For specific index formulation, please refer to the content in 3.1, which needs to be combined with the TP9999 time and delay of the application system Set specific indicators such as throughput, rather than being driven by problems.

  • Keep the site:   At present, online services are basically distributed services. If conditions permit, if a node has a problem, it must not directly operate restart, rollback and other actions to recover. It is preferred to recover by removing the traffic. In this way, we can retain the heap, stack, GC log and other key information, otherwise we miss the opportunity to locate the root cause, and the difficulty of subsequent solution will be greatly increased. Of course, in addition to these, application logs, middleware logs, kernel logs and various Metrics indicators are also very helpful for problem analysis.

  • Causal analysis:   To judge the causal relationship between GC anomaly and other system index anomalies, we can refer to the four causal analysis methods introduced by the author in 3.2, such as time series analysis, probability analysis, experimental analysis and counter evidence analysis, so as to avoid going into misunderstanding in the troubleshooting process.

  • Root cause analysis:   After it is really a GC problem, you can use the tools mentioned above to match each of the nine common scenarios in Section 3 through 5 why root cause analysis, or directly refer to the root cause fishbone diagram below to find out the root cause of the problem, and finally select the optimization method.

5.2 root cause fishbone diagram

Send a fish bone diagram of the root cause of the problem. Generally, when we deal with a GC problem, as long as we can locate the "focus" of the problem and have a clear target, it is actually equivalent to 80% of the problem. If it is not easy to locate in some scenes, we can use this root cause analysis diagram to locate it by exclusion.

5.3 tuning suggestions

  • Trade Off:   Like CAP, GC optimization needs to balance Latency, Throughput and Capacity.

  • Final means:   In case of GC problems, it is not necessary to tune the GC parameters of the JVM. In most cases, some business problems are found through the GC. Remember to adjust the GC parameters immediately, except for the scenarios with explicit configuration errors.

  • Control variables:   Control variable method is a technical method used to reduce variance in Monte Carlo method. We should use it as much as possible during tuning, and only one variable should be adjusted in each tuning process as much as possible.

  • Make good use of search:   In theory, 99.99% of GC problems are basically encountered. We should learn to use advanced search engine skills, focus on StackOverFlow, Issue on Github and various forum blogs, and see how others solve the problems first, which will get twice the result with half the effort. To see this article, your search ability is basically qualified~

  • Tuning focus:   Generally speaking, the types of problems encountered in our development process basically conform to the normal distribution, and the probability of basic encounter is very low if it is too simple or too complex. Here, the author adds "*" to the three most important scenarios in the middle, hoping to observe whether the above problems exist in the system we are responsible for after reading this article.

  • GC parameters:   If the heap and stack cannot be retained for the first time, the GC log must be retained, so that we can at least see the GC Cause and have a general troubleshooting direction. About GC log related parameters, the most basic  - XX:+HeapDumpOnOutOfMemoryError   Some parameters will not be mentioned. The author suggests adding the following parameters to improve the efficiency of our analysis.

  • Other recommendations:   It is not mentioned in the above scenario, but there are some suggestions to improve GC performance.

    • Active GC:   There is also another way to monitor and observe the usage of the Old area through monitoring means. When the threshold is about to be reached, the application service will remove the traffic and manually trigger a Major GC to reduce the pause caused by CMS GC, but the robustness of the system will also be reduced. If it is not necessary, it is not recommended to introduce it.

    • Disable deflection lock:   The biased lock is very efficient when only one thread uses the lock, but it will be upgraded to a lightweight lock in the fierce competition. At this time, it is necessary to eliminate the biased lock first. This process is STW   of If each synchronization resource goes through this upgrade process, the overhead will be very large. Therefore, under the premise of known intense concurrency, biased locking is generally disabled  - XX:-UseBiasedLocking   To improve performance.

    • Virtual memory:   At the beginning of startup, some operating systems (such as Linux) do not really allocate physical memory to the JVM, but allocate it in virtual memory. Memory pages will be allocated in physical memory only when used, which will also lead to long GC time. This situation can be added  - XX:+AlwaysPreTouch   Parameter to let the VM run a cycle when committing memory to force the requested memory to commit, so as to avoid triggering page missing exceptions at runtime. In some scenarios with large memory, sometimes the GC time of the previous several times can be reduced by an order of magnitude, but after adding this parameter, the startup process may slow down.

6. Write at the end

Finally, let's talk about some personal suggestions. If you encounter some GC problems, if you have energy, you must explore the source and find out the deepest reasons. In addition, in this era of information flooding, some experiences that are "regarded as the standard" may be wrong. Try to form the habit of reading the source code. There is a saying that "there is no secret in front of the source code", which means that we can see from the source code. There are miraculous effects in some scenarios. But it's not just learning by reading the source code. If you bite the source code but ignore the possible theoretical basis behind it, it's easy to "pick up sesame seeds and throw watermelon", "see trees, see no forest", which makes "no secret" become empty words. We still need to learn targeted in combination with some actual business scenarios.

Where your time is, your achievements will be. Only in the first two years did the author gradually deepen in the direction of GC, check the problems, look at the source code and make a summary. Each Case forms a small closed loop. At present, he has preliminarily found some ways to deal with GC problems. At the same time, he has applied the experience summary to the practice of production environment and slowly formed a virtuous circle.

This article mainly introduces some common scenario analysis of CMS GC. Other problems, such as JIT failure caused by CodeCache problem, long SafePoint readiness time, Card Table scanning time and so on, are not very common, so I didn't spend too much time to explain them. Java GC has been rolled up under the idea of "generation division" for many years before breaking through the "partition". At present, meituan has also begun to use G1 to replace CMS that has been used for many years. Although G1 is slightly inferior to CMS in terms of small heap, this is a trend. It can not be upgraded to ZGC in a short time, so the G1 problems encountered in the future may gradually increase. At present, we have collected problems such as member set coarsening, Humongous allocation, Ergonomics exceptions, evaluation failure in Mixed GC, etc. in addition, we will also give some suggestions on upgrading CMS to G1. Next, the author will continue to complete this part of the article. Please look forward to it.

"Fire prevention" is always better than "fire fighting". It is possible to avoid a failure if any abnormal small index is not missed (generally speaking, any uneven curve is questionable). As Java programmers, we will basically encounter some GC problems. Solving GC problems independently is a hurdle we must overcome. The opening also mentioned that GC is a classic technology, which is very worthy of our study. Some GC learning materials, such as The Garbage Collection Handbook and in-depth understanding of Java virtual machine, are also often read and new. Hurry up and practice the basic skills of GC.

Finally, one more word. The first sentence of all articles related to GC tuning is "don't optimize too early", which makes many students shy away from GC optimization. Here, the author puts forward a different view. The law of entropy increase (in an isolated system, if there is no external force to do work, the total chaos (i.e. entropy) will continue to increase) is also applicable to the computer system. If we do not take the initiative to do work to reduce the entropy, the system will eventually be out of your control. When we master the business system and GC principle deeply enough, We can do optimization with confidence and boldness, because we can basically predict the results of each operation. Let's go, boy!

  Scan VX for Java data, front-end, test, python and so on

Tags: Java network Back-end

Posted on Tue, 30 Nov 2021 20:03:53 -0500 by Hamlets666