2021SC@SDUSC
This paper studies how to select an hregon to flush to relieve the pressure of MemStore, and how hregon flush is initiated.
As mentioned above, when flush processes threads, it may call the flush oneforglobalpressure() method. According to certain policies, flush a MemStore of an hregon to reduce the size of the MemStore, so as to prevent some exceptions. This time, let's focus on the analysis of the flush oneforglobalpressure() method:
/** * The memstore across all regions has exceeded the low water mark. Pick * one region to flush and flush it synchronously (this is called from the * flush thread) * * @return true if successful */ private boolean flushOneForGlobalPressure() { SortedMap<Long, HRegion> regionsBySize = server.getCopyOfOnlineRegionsSortedBySize(); Set<HRegion> excludedRegions = new HashSet<HRegion>(); boolean flushedOne = false; while (!flushedOne) { // Find the biggest region that doesn't have too many storefiles // (might be null!) HRegion bestFlushableRegion = getBiggestMemstoreRegion( regionsBySize, excludedRegions, true); // Find the biggest region, total, even if it might have too many flushes. HRegion bestAnyRegion = getBiggestMemstoreRegion( regionsBySize, excludedRegions, false); if (bestAnyRegion == null) { LOG.error("Above memory mark but there are no flushable regions!"); return false; } HRegion regionToFlush; if (bestFlushableRegion != null && bestAnyRegion.memstoreSize.get() > 2 * bestFlushableRegion.memstoreSize.get()) { // Even if it's not supposed to be flushed, pick a region if it's more than twice // as big as the best flushable one - otherwise when we're under pressure we make // lots of little flushes and cause lots of compactions, etc, which just makes // life worse! if (LOG.isDebugEnabled()) { LOG.debug("Under global heap pressure: " + "Region " + bestAnyRegion.getRegionNameAsString() + " has too many " + "store files, but is " + StringUtils.humanReadableInt(bestAnyRegion.memstoreSize.get()) + " vs best flushable region's " + StringUtils.humanReadableInt(bestFlushableRegion.memstoreSize.get()) + ". Choosing the bigger."); } regionToFlush = bestAnyRegion; } else { if (bestFlushableRegion == null) { regionToFlush = bestAnyRegion; } else { regionToFlush = bestFlushableRegion; } } Preconditions.checkState(regionToFlush.memstoreSize.get() > 0); LOG.info("Flush of region " + regionToFlush + " due to global heap pressure"); flushedOne = flushRegion(regionToFlush, true); if (!flushedOne) { LOG.info("Excluding unflushable region " + regionToFlush + " - trying to find a different region to flush."); excludedRegions.add(regionToFlush); } } return true; }
The process of this method is as follows:
- Obtain the online regions on the RegionServer, and then arrange them in reverse order according to the memoriesize of the regions to obtain regionsBySize.
- Construct the excluded Region collection excludedRegions.
- flushedOne is set to false.
- Cycle regionBySize and select a region with the largest Menstore and no too many storefiles as the bestFlushableRegion: skip directly in the following cases: the current region is in the excluded regions list; The write status of the current region is flushing; The write status of the current region is not write enabled; The number of storefiles needs to be checked and contains too many storefiles. The region is returned in other cases.
- Cycle regionsBySize and select a region with the largest Memstore, even if it contains too many storefiles, as the bestAnyRegion: skip directly in the following cases: the current region is in the excluded regions list; The write state of the current region is flush ing, or the write state of the current region is not write enabled. The region is returned in other cases.
- If it is above the memory threshold but there is no region that can flush, it will directly return false.
- Select the region to flush.
- Check whether the memstoreSize of the selected region is greater than zero.
- Call flush region() to flush the memstore for a single region.
- If flush fails, it will be added to the excludedRegions collection to avoid being selected.
The above is the flush oneforglobalpressure() method, that is, the method of selecting an hregon to flush the memstore according to a certain strategy to relieve the pressure of the memstore. Next, how to initiate the flush of hregon. First, take a look at the flush region () method with one parameter:
/* * A flushRegion that checks store file count. If too many, puts the flush * on delay queue to retry later. * * @param fqe * @return true if the region was successfully flushed, false otherwise. If * false, there will be accompanying log messages explaining why the region was * not flushed. */ private boolean flushRegion(final FlushRegionEntry fqe) { HRegion region = fqe.region; if (!region.getRegionInfo().isMetaRegion() && isTooManyStoreFiles(region)) { if (fqe.isMaximumWait(this.blockingWaitTime)) { LOG.info("Waited " + (EnvironmentEdgeManager.currentTime() - fqe.createTime) + "ms on a compaction to clean up 'too many store files'; waited " + "long enough... proceeding with flush of " + region.getRegionNameAsString()); } else { // If this is first time we've been put off, then emit a log message. if (fqe.getRequeueCount() <= 0) { // Note: We don't impose blockingStoreFiles constraint on meta regions LOG.warn("Region " + region.getRegionNameAsString() + " has too many " + "store files; delaying flush up to " + this.blockingWaitTime + "ms"); if (!this.server.compactSplitThread.requestSplit(region)) { try { this.server.compactSplitThread.requestSystemCompaction( region, Thread.currentThread().getName()); } catch (IOException e) { LOG.error( "Cache flush failed for region " + Bytes.toStringBinary(region.getRegionName()), RemoteExceptionHandler.checkIOException(e)); } } } // Put back on the queue. Have it come back out of the queue // after a delay of this.blockingWaitTime / 100 ms. this.flushQueue.add(fqe.requeue(this.blockingWaitTime / 100)); // Tell a lie, it's not flushed but it's ok return true; } } return flushRegion(region, false); }
The method flow is as follows:
- If the region is not mataRegion and there are too many storeFiles on the region:
- isMaximumWait() determines the blocking time. If the blocking has reached or exceeded the specified time, record the log and execute flush. Skip to 2 and end.
- If it is the first delay, record a log information, and then request Split for the hregon. If the Split is unsuccessful, then request the system to merge systemcomparison.
- Put fqe back to the flushQueue, increase the delay time by 900ms, and then take it out of the queue for processing after expiration.
- If the Region is delayed from flush ing and the result is uncertain, it should return true.
- Call the flushRegion() method with two parameters to notify hregon to execute flush.
Next is the flushRegion() method with two parameters:
/* * Flush a region. * @param region Region to flush. * @param emergencyFlush Set if we are being force flushed. If true the region * needs to be removed from the flush queue. If false, when we were called * from the main flusher run loop and we got the entry to flush by calling * poll on the flush queue (which removed it). * * @return true if the region was successfully flushed, false otherwise. If * false, there will be accompanying log messages explaining why the region was * not flushed. */ private boolean flushRegion(final HRegion region, final boolean emergencyFlush) { long startTime = 0; synchronized (this.regionsInQueue) { FlushRegionEntry fqe = this.regionsInQueue.remove(region); // Use the start time of the FlushRegionEntry if available if (fqe != null) { startTime = fqe.createTime; } if (fqe != null && emergencyFlush) { // Need to remove from region from delay queue. When NOT an // emergencyFlush, then item was removed via a flushQueue.poll. flushQueue.remove(fqe); } } if (startTime == 0) { // Avoid getting the system time unless we don't have a FlushRegionEntry; // shame we can't capture the time also spent in the above synchronized // block startTime = EnvironmentEdgeManager.currentTime(); } lock.readLock().lock(); try { notifyFlushRequest(region, emergencyFlush); HRegion.FlushResult flushResult = region.flushcache(); boolean shouldCompact = flushResult.isCompactionNeeded(); // We just want to check the size boolean shouldSplit = region.checkSplit() != null; if (shouldSplit) { this.server.compactSplitThread.requestSplit(region); } else if (shouldCompact) { server.compactSplitThread.requestSystemCompaction( region, Thread.currentThread().getName()); } if (flushResult.isFlushSucceeded()) { long endTime = EnvironmentEdgeManager.currentTime(); server.metricsRegionServer.updateFlushTime(endTime - startTime); } } catch (DroppedSnapshotException ex) { // Cache flush can fail in a few places. If it fails in a critical // section, we get a DroppedSnapshotException and a replay of wal // is required. Currently the only way to do this is a restart of // the server. Abort because hdfs is probably bad (HBASE-644 is a case // where hdfs was bad but passed the hdfs check). server.abort("Replay of WAL required. Forcing server shutdown", ex); return false; } catch (IOException ex) { LOG.error("Cache flush failed" + (region != null ? (" for region " + Bytes.toStringBinary(region.getRegionName())) : ""), RemoteExceptionHandler.checkIOException(ex)); if (!server.checkFileSystem()) { return false; } } finally { lock.readLock().unlock(); wakeUpIfBlocking(); } return true; }
The method flow is as follows:
- Remove the corresponding hregon information from regionsInQueue
- Gets the start time of the flush
- If it is an emergency refresh, the corresponding fqe needs to be removed from the flushQueue queue queue. Otherwise, the fqe will be removed through flushQueue.poll()
- If the start time is null, get the start time of flush
- Upper read lock
- Notify the flush requester of the flush type through the Listener
- Call the flush cache () method of hregon to execute the flush of MemStore and obtain the flush result
- Judge whether to merge compact (flag bit shouldCompact) according to the flush result
- Call the checkSplit() method of hregon to check whether splitting split (flag bit shouldSplit) should be performed
- Judge by two flag bits. If necessary, split first and then system compact
- If the flush succeeds, obtain the flush end time, calculate the time consumption, and record the measurement information on the hregon
- Release the read lock and wake up other blocked threads.
above.
If there is any mistake, please correct it.