A Bug that causes the JVM to consume a large amount of physical memory

summary

Recently, our company was checking a JVM for a customer (JDK1.8.0_191-b12), and found that a system was always killed by OS, which was caused by memory leakage. In the process of checking, another Bug of the JVM was found by mistake. This Bug may cause a large amount of physical memory to be used. We have fed back to the community and received quick feedback. It is expected to be released in the latest version of OpenJDK8 (this problem also exists in JDK 11).


PS: the user's problem was finally solved. It was identified as a design defect of C2, resulting in a large amount of memory being used, and the security was not guaranteed.

Identify threads that consume large memory

Next, I'll mainly share the BUG discovery process. First, the customer should track the process in real time. When the memory usage increases significantly, I see a lot of 64MB memory allocation through / proc//smaps, and Rss is basically consumed.

7fd690000000-7fd693f23000 rw-p 00000000 00:00 0 
Size:              64652 kB
Rss:               64652 kB
Pss:               64652 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:     64652 kB
Referenced:        64652 kB
Anonymous:         64652 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me nr sd 
7fd693f23000-7fd694000000 ---p 00000000 00:00 0 
Size:                884 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: mr mw me nr sd 

Then trace the next system call through the strace command, and then return to the above virtual address. We find the relevant mmap system call

[pid 71] 13:34:41.982589 mmap(0x7fd690000000, 67108864, PROT_NONE,MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fd690000000<0.000107>

The thread executing mmap is thread 71, and then dump the thread through jstack. It is found that the corresponding thread is actually C2 CompilerThread0

"C2 CompilerThread0" #39 daemon prio=9 os_prio=0 tid=0x00007fd8acebb000 nid=0x47 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Finally, grep the output of strace. Sure enough, I saw that this thread was allocating a lot of memory, with a total of more than 2G.

Classic 64M problem

The 64M problem is a very classic problem. There is no such logic to allocate a large number of 64M sizes in the JVM, so the allocation in a specific sense of the JVM can be excluded. This is actually a memory allocation mechanism for malloc function in glibc. Glibc has provided a mechanism since 2.10. In order to allocate memory more efficiently, glibc provides an arena mechanism. By default, the size of each arena is 64M in 64 bits. The following is the 64M calculation logic, in which sizeof(long) is 8

define DEFAULT_MMAP_THRESHOLD_MAX (4 * 1024 * 1024 * sizeof(long))
define HEAP_MAX_SIZE (2 * DEFAULT_MMAP_THRESHOLD_MAX)

p2 = (char *) MMAP (aligned_heap_area, HEAP_MAX_SIZE, PROT_NONE,
                          MAP_NORESERVE);

The maximum number of arena s that a process can allocate is 8 * core s at 64 bits and 2 * core s at 32 bits

#define NARENAS_FROM_NCORES(n) ((n) * (sizeof (long) == 4 ? 2 : 8))

 {
              int n = __get_nprocs ();

              if (n >= 1)
                narenas_limit = NARENAS_FROM_NCORES (n);
              else
                /* We have no information about the system.  Assume two
                   cores.  */
                narenas_limit = NARENAS_FROM_NCORES (2);
            }

The advantage of this allocation mechanism is mainly to deal with the multi-threaded environment, leaving several 64M cache blocks for each core, so that the thread becomes more efficient when allocating memory because there is no lock. If the upper limit is reached, it will go to slow main_ Assigned in arena.

You can set the environment variable MALLOC_ARENA_MAX to set the number of 64M blocks. When we set it to 1, we will find that these 64M memory blocks are gone, and then they are allocated to a large area, that is, main_arena, indicating that this parameter is effective.

Unintentional discovery

When thinking about why C2 threads consume more than 2G of memory, I inadvertently tracked the C2 code and found that the following code may cause a large amount of memory consumption. The location of this code is nmethod:: data of nmethod.cpp_ Do method, but if it does happen, you will not see a large number of C2 threads allocated, but VMThread, because the following code is mainly executed by it.

void nmethod::metadata_do(void f(Metadata*)) {
  address low_boundary = verified_entry_point();
  if (is_not_entrant()) {
    low_boundary += NativeJump::instruction_size;
    // %%% Note:  On SPARC we patch only a 4-byte trap, not a full NativeJump.
    // (See comment above.)
  }
  {
    // Visit all immediate references that are embedded in the instruction stream.
    RelocIterator iter(this, low_boundary);
    while (iter.next()) {
      if (iter.type() == relocInfo::metadata_type ) {
        metadata_Relocation* r = iter.metadata_reloc();
        // In this metadata, we must only follow those metadatas directly embedded in
        // the code.  Other metadatas (oop_index>0) are seen as part of
        // the metadata section below.
        assert(1 == (r->metadata_is_immediate()) +
               (r->metadata_addr() >= metadata_begin() && r->metadata_addr() < metadata_end()),
               "metadata must be found in exactly one place");
        if (r->metadata_is_immediate() && r->metadata_value() != NULL) {
          Metadata* md = r->metadata_value();
          if (md != _method) f(md);
        }
      } else if (iter.type() == relocInfo::virtual_call_type) {
        // Check compiledIC holders associated with this nmethod
        CompiledIC *ic = CompiledIC_at(&iter);
        if (ic->is_icholder_call()) {
          CompiledICHolder* cichk = ic->cached_icholder();
          f(cichk->holder_metadata());
          f(cichk->holder_klass());
        } else {
          Metadata* ic_oop = ic->cached_metadata();
          if (ic_oop != NULL) {
            f(ic_oop);
          }
        }
      }
    }
  }


inline CompiledIC* CompiledIC_at(RelocIterator* reloc_iter) {
  assert(reloc_iter->type() == relocInfo::virtual_call_type ||
      reloc_iter->type() == relocInfo::opt_virtual_call_type, "wrong reloc. info");
  CompiledIC* c_ic = new CompiledIC(reloc_iter);
  c_ic->verify();
  return c_ic;
}

Note the CompiledIC * ic = CompiledIC above_ at(&iter); This code, because the compiled IC is a ResourceObj, this kind of resources will be allocated (malloc) in the c heap, but they are associated with the thread. If we declare a ResourceMark in a code somewhere, the current location will be marked when it is executed here, and then when the thread wants to allocate memory, if the memory associated with the thread is not enough, Malloc will be inserted and managed, otherwise memory reuse will be realized. When the ResourceMark destructor executes, it will restore the previous location. If the subsequent thread wants to allocate memory, it will reuse memory blocks from this location. Note that the memory block mentioned here is not the same concept as the 64M memory block above.

Because this code is in the while loop, there are a lot of repeated calls, so it is clear that the memory that can be reused after one execution cannot be reused, which may lead to a large amount of memory being continuously allocated. It may be that the physical memory consumption is very large, much larger than Xmx.

The repair method is also very simple, that is, in compiledic * ic = compiledic_ at(&iter); Add ResourceMark rm before; Just.

The main scenario of this problem is for frequent and large-scale Class Retransform or Class Redefine. Therefore, if there is such an agent in the system, you should pay a little attention to this problem.

After the problem was found, we proposed a patch to the community. However, it was later found that JDK12 had actually been repaired, but none of the previous versions had been repaired. After the problem was submitted to the community, someone responded quickly and may have been fix ed in OpenJDK1.8.0-212.

Finally, I will briefly mention the problem on the customer's side. The main reason why C2 thread consumption is too large is that there are very large methods that need to be compiled, and this compilation process requires a lot of memory consumption. That's why memory suddenly increases. So I give you a suggestion. Don't write too large methods, If this method is called frequently, it will be really tragic.

Tags: Java

Posted on Sat, 09 Oct 2021 03:13:58 -0400 by richierich