Hbase source code analysis MemStore flush processing (in) 2021SC@SDUSC

preface

This article continues to introduce the main process and main details of memory flush on hregon, and how cacheflush handles flush requests.

How does cacheflush handle flush requests

Through the introduction of how to initialize cacheflush, we know that there are two queues and collections that store flush requests and their hregon encapsulation classes, namely, flushQueue and regionsInQueue, in memstoreflush. Memstoreflush provides a requestFlush() method. Let's take a general look at this method:

 public void requestFlush(HRegion r) {
    synchronized (regionsInQueue) {// Use the synchronized keyword to synchronize the regionsInQueue threads
      
      if (!regionsInQueue.containsKey(r)) {// If the corresponding hregon does not exist in regionsInQueue
    	
        // This entry has no delay so it will be added at the top of the flush
        // queue.  It'll come out near immediately.
    	// Encapsulate r of hregon type into fqe of FlushRegionEntry type
    	// This fqe has no delay, i.e. delayed execution time, so it is added to the top of the flush queue. Soon it will be processed.
        FlushRegionEntry fqe = new FlushRegionEntry(r);
        
        // Add the corresponding relationship of hregon - > flushregionentry to the regionsInQueue collection
        // Add the flush request FlushRegionEntry to the flushQueue queue queue
        // From here, we can see that the two member variables, regionsInQueue and flushQueue, go together
        this.regionsInQueue.put(r, fqe);
        this.flushQueue.add(fqe);
      }
    }

requestFlush()

The main function of the requestFlush() method is to add a flush region request to the memstoreflush internal queue. The main logic is as follows:
1. Firstly, you need to use the synchronized keyword to synchronize the threads of regionsInQueue to prevent multi-threaded concurrency;

2. Then judge whether there is a corresponding hregon in the regionsInQueue. If there is no corresponding hregon in the regionsInQueue set, continue, otherwise return directly;

3. Since there is no corresponding hregon in the regionsInQueue set, encapsulate r of hregon type into fqe of FlushRegionEntry type;

4. Add the corresponding relationship of hregon - > flushregionentry to the regionsInQueue set;

5. Add the flush request FlushRegionEntry to the flushQueue queue queue.

From the above steps 4 and 5, we can see that the two member variables go together, regionsInQueue and flushQueue, and this fqe has no delay, that is, the execution time is delayed, so it is added to the top of the flush queue, and it will be dequeued and processed soon. Back to the definition of flushQueue, flushQueue is a queue that stores Region refresh cache requests. It stores objects that implement the FlushQueueEntry interface. FlushQueueEntry does not define any behavior, but inherits the java.util.concurrent.Delayed interface. Therefore, flushQueue is a DelayQueue in Java. The objects stored in the queue have a concept of expiration time.

Since the flush request has been added to the flushQueue queue queue, which is equivalent to that the producer has produced the product, a consumer is needed, and the role of the consumer is played by the FlushHandler thread. Since it is a thread, the processing logic must be in its run() method. First look at what is stored in the flushQueue?

Review the definition of flushQueue, which is a queue DelayQueue that stores FlushQueueEntry. Let's first look at the definition of FlushQueueEntry:

interface FlushQueueEntry extends Delayed {
  }

It is an empty interface without any method integrating the Delayed interface of Java. Its implementation classes are WakeupFlushThread and FlushRegionEntry. First, let's look at the queue type corresponding to flushQueue - DelayQueue in Java.
DelayQueue is an unbounded BlockingQueue, and its internal storage must be the object that implements the Delayed interface. Therefore, FlushQueueEntry must implement the Delayed interface of java. One of the biggest characteristics of the members in this queue is that they can be listed only after their expiration, and the members in the queue are orderly, sorted according to the Delayed expiration time from beginning to end. So how to judge whether a member expires? The getDelay() method of the corresponding member object returns a value less than or equal to 0, which indicates that the corresponding object has expired in the queue and can be taken away.

Since the member objects stored in the DelayQueue are ordered, the class that implements the Delayed interface must provide the compareTo() method for sorting, and the above getDelay() method needs to be implemented to judge whether the members in the team can be taken away when they expire.

Let's start to study WakeupFlushThread and FlushRegionEntry.

WakeupFlushThread

Firstly, WakeupFlushThread is very simple without any substance. The code is as follows:
static class WakeupFlushThread implements FlushQueueEntry {

  @Override
    public long getDelay(TimeUnit unit) {
      return 0;
    }
 
    @Override
    public int compareTo(Delayed o) {
      return -1;
    }
 
    @Override
    public boolean equals(Object obj) {
      return (this == obj);
    }

Its main function is to insert it into the flush queue as a placeholder or token to ensure that the FlushHandler will not sleep. Moreover, the return value of its getDelay() method is 0, indicating that there is no delay time. After entering the column, it can be listed. The value returned by its compareTo() method is - 1, indicating that it is equivalent to the order of other wakeupflushthreads in the team. There is no distinction between front and back. In fact, WakeupFlushThread has no meaning to distinguish between front and back, and it itself has no substantive content.

FlushRegionEntry

    Next, let's take a look FlushRegionEntry Class, which is defined as follows:
static class FlushRegionEntry implements FlushQueueEntry {
    
	// Hregon to be flush ed
	private final HRegion region;
 
    // Creation time 
    private final long createTime;
    
    // When will it expire
    private long whenToExpire;
    
    // Number of re-entry queues
    private int requeueCount = 0;
 
    FlushRegionEntry(final HRegion r) {
      
      // Hregon to be flush ed
      this.region = r;
      
      // The creation time is the current time
      this.createTime = EnvironmentEdgeManager.currentTime();
      
      // When it expires is also the current time, which means that there is no delay time when entering the queue for the first time, and you can be listed
      this.whenToExpire = this.createTime;
    }
 
    /**
     * @param maximumWait
     * @return True if we have been delayed > <code>maximumWait</code> milliseconds.
     */
    public boolean isMaximumWait(final long maximumWait) {
      return (EnvironmentEdgeManager.currentTime() - this.createTime) > maximumWait;
    }
 
    /**
     * @return Count of times {@link #requeue(long)} was called; i.e this is
     * number of times we've been requeued.
     */
    public int getRequeueCount() {
      return this.requeueCount;
    }
 
    /**
     * Similar to the processing method of re listing, the number of re listing times requestcount is added by 1, and the current time is not added by the parameter when
     * 
     * @param when When to expire, when to come up out of the queue.
     * Specify in milliseconds.  This method adds EnvironmentEdgeManager.currentTime()
     * to whatever you pass.
     * @return This.
     */
    public FlushRegionEntry requeue(final long when) {
      this.whenToExpire = EnvironmentEdgeManager.currentTime() + when;
      this.requeueCount++;
      return this;
    }
 
    /**
     * Method of determining when to expire
     */
    @Override
    public long getDelay(TimeUnit unit) {
      // When is it due minus the current time
      return unit.convert(this.whenToExpire - EnvironmentEdgeManager.currentTime(),
          TimeUnit.MILLISECONDS);
    }
 
    /**
     * The sorting comparison method determines the order according to the getDelay() method that determines when to expire
     */
    @Override
    public int compareTo(Delayed other) {
      // Delay is compared first. If there is a tie, compare region's hash code
      int ret = Long.valueOf(getDelay(TimeUnit.MILLISECONDS) -
        other.getDelay(TimeUnit.MILLISECONDS)).intValue();
      if (ret != 0) {
        return ret;
      }
      
      // When the expiration time is constant, it is sorted according to hashCode(), which is actually sorted according to the return value of the hashCode() method of hregon
      FlushQueueEntry otherEntry = (FlushQueueEntry) other;
      return hashCode() - otherEntry.hashCode();
    }
 
    @Override
    public String toString() {
      return "[flush region " + Bytes.toStringBinary(region.getRegionName()) + "]";
    }
 
    @Override
    public int hashCode() {
      int hash = (int) getDelay(TimeUnit.MILLISECONDS);
      return hash ^ region.hashCode();
    }
 
   @Override
    public boolean equals(Object obj) {
      if (this == obj) {
        return true;
      }
      if (obj == null || getClass() != obj.getClass()) {
        return false;
      }
      Delayed other = (Delayed) obj;
      return compareTo(other) == 0;
    }
  }

Next, let's look at the actual processing flow of the flush request, that is, the run() method of the FlushHandler:
Its main processing logic is:
1. First, if HRegionServer is not stopped, the run() method runs all the time;

2. Set wakeupPending of flag bit AtomicBoolean type to false;

3. Pull a FlushQueueEntry from the flushQueue queue queue, namely fqe:

3.1. If fqe is empty or WakeupFlushThread:

3.1.1. If the size of the global MemStore is higher than the low level of the limit value through the isAboveLowWaterMark() method, call the flush oneforglobalpressure() method. According to certain policies, flush the MemStore of an hregon, reduce the size of the MemStore, prevent the occurrence of abnormal conditions such as OOM, and list another token to wake up the thread again;

3.2. If fre is not empty and not WakeupFlushThread, convert it to fre of FlushRegionEntry type: call the flushRegion() method, and if the result is false, jump out of the loop;

4. If the loop ends, clear regionsInQueue and flushQueue at the same time (ps: together again, O(∩ ∩) O ~)

5. Wake up all the waiting so that they can see the close sign;

6. Log.
WakeupFlushThread is mainly used as a placeholder or token to insert into the refresh queue flushQueue to ensure that the FlushHandler will not sleep. In fact, WakeupFlushThread plays more than that. When the FlushHandler thread constantly poll s the elements in the queue flushQueue, if it obtains a WakeupFlushThread, it will initiate a detection, That is, whether the global memstore size of the RegionServer exceeds the low level line. If not, WakeupFlushThread only serves as a placeholder. Otherwise, WakeupFlushThread not only serves as a placeholder to ensure that the refresh thread does not sleep, but also selects a Region on the RegionServer to refresh the memstore according to certain policies to alleviate the memory pressure of the RegionServer.

summary

This paper introduces the main process and main details of Memstore flush on hregon, describes how cacheflush handles flush requests, how to select an hregon to flush to alleviate the pressure of MemStore, and subsequent problems will be introduced below.

Tags: Database HBase

Posted on Fri, 03 Dec 2021 06:12:15 -0500 by JD^