Probe into ReadType of scan in HBase

background knowledge

Linux level

linux provides different functions for reading files, with the following references:

When a random read-write operation is performed on the same file handle (under Windows) or file descriptor (under Linux), there are two steps to locate and read/write the file pointer, but since this is not an atomic operation, the following problems may occur: Process A locates a file at f1, is interrupted, and Process B locates the same file at f2.Process A executes again, starts reading or writing from the current pointer of the file, and this produces unexpected results.(Note here that if you open the same file twice, you will get two different handles or descriptors, so you don't have to worry about this problem.)


On Linux, the pread function seems to be dedicated to the above problems. It is an atomic operation, locating file pointers and reading operations work together, and reading operations do not change file pointers.

Overall, there are two commonly used methods, seek()+read(), and pread(), which have the following advantages and disadvantages:
See()+read() is non-thread-safe, but because it uses the file pointer saved by the file descriptor, it does not need to be located every time it is read, so it is more efficient to read, and synchronization is required for multi-threaded access at the application level.
pread() is an atomic operation and thread-safe, but it is inefficient to read because the file pointer needs to be positioned each time.

Hdfs level

hdfs provides different implementations based on different functions of linux, corresponding to issue as follows (

HDFS File API should be extended to include positional read

HDFS Input streams should support positional read. Positional read (such as the pread syscall on linux) allows reading for a specified offset without affecting the current file offset. Since the underlying file state is not touched, pread can be used efficiently in multi-threaded programs.

Here is how I plan to implement it.

Provide PositionedReadable interface, with the following methods:

int read(long position, byte[] buffer, int offset, int length);
void readFully(long position, byte[] buffer, int offset, int length);
void readFully(long position, byte[] buffer);

Abstract class FSInputStream would provide default implementation of the above methods using getPos(), seek() and read() methods. The default implementation is inefficient in multi-threaded programs since it locks the object while seeking, reading, and restoring to old state.

DFSClient.DFSInputStream, which extends FSInputStream will provide an efficient non-synchronized implementation for above calls.

In addition, FSDataInputStream, which is a wrapper around FSInputStream, will provide wrapper methods for above read methods as well.

Application of HBase

In HBase, two different ReadType s are defined: PREAD and STREAM, representing pread() and seek()+read():

  public enum ReadType {

Reading hfile s requires scanner, and when creating StoreFileScanner, different processes are entered based on ReadType:

for (int i = 0, n = files.size(); i < n; i++) {
        HStoreFile sf = sortedFiles.remove();
        StoreFileScanner scanner;
        if (usePread) {
          scanner = sf.getPreadScanner(cacheBlocks, readPt, i, canOptimizeForNonNullColumn);
        } else {
          scanner = sf.getStreamScanner(canUseDrop, cacheBlocks, isCompaction, readPt, i,

Where getPreadScanner directly returns the shared reader object, that is, the underlying layer shares the same inputStream:

   * Get a scanner which uses pread.
   * <p>
   * Must be called after initReader.
  public StoreFileScanner getPreadScanner(boolean cacheBlocks, long readPt, long scannerOrder,
      boolean canOptimizeForNonNullColumn) {
    return getReader().getStoreFileScanner(cacheBlocks, true, false, readPt, scannerOrder,
   * @return Current reader. Must call initReader first else returns null.
   * @see #initReader()
  public StoreFileReader getReader() {
    return this.reader;

getStreamScanner creates a new reader, opens a new inputStream in the method, and reads the relevant metadata information in the hfile. If preFetchOnOpen is enabled, it also triggers the reading of the data block:

   * Get a scanner which uses streaming read.
   * <p>
   * Must be called after initReader.
  public StoreFileScanner getStreamScanner(boolean canUseDropBehind, boolean cacheBlocks,
      boolean isCompaction, long readPt, long scannerOrder, boolean canOptimizeForNonNullColumn)
      throws IOException {
    return createStreamReader(canUseDropBehind).getStoreFileScanner(cacheBlocks, false,
      isCompaction, readPt, scannerOrder, canOptimizeForNonNullColumn);
  private StoreFileReader createStreamReader(boolean canUseDropBehind) throws IOException {
    StoreFileReader reader =, this.cacheConf, canUseDropBehind, -1L,
      primaryReplica, refCount, false);
    return reader;
   * Open a Reader for the StoreFile
   * @param fs The current file system to use.
   * @param cacheConf The cache configuration and block cache reference.
   * @return The StoreFile.Reader for the file
  public StoreFileReader open(FileSystem fs, CacheConfig cacheConf, boolean canUseDropBehind,
      long readahead, boolean isPrimaryReplicaStoreFile, AtomicInteger refCount, boolean shared)
      throws IOException {
    FSDataInputStreamWrapper in;
    FileStatus status;

    final boolean doDropBehind = canUseDropBehind && cacheConf.shouldDropBehindCompaction();
    if ( != null) {
      // HFileLink
      in = new FSDataInputStreamWrapper(fs,, doDropBehind, readahead);
      status =;
    } else if (this.reference != null) {
      // HFile Reference
      Path referencePath = getReferredToFile(this.getPath());
      in = new FSDataInputStreamWrapper(fs, referencePath, doDropBehind, readahead);
      status = fs.getFileStatus(referencePath);
    } else {
      in = new FSDataInputStreamWrapper(fs, this.getPath(), doDropBehind, readahead);
      status = fs.getFileStatus(initialPath);
    long length = status.getLen();
    hdfsBlocksDistribution = computeHDFSBlocksDistribution(fs);

    StoreFileReader reader = null;
    if (this.coprocessorHost != null) {
      reader = this.coprocessorHost.preStoreFileReaderOpen(fs, this.getPath(), in, length,
        cacheConf, reference);
    if (reader == null) {
      if (this.reference != null) {
        reader = new HalfStoreFileReader(fs, this.getPath(), in, length, cacheConf, reference,
            isPrimaryReplicaStoreFile, refCount, shared, conf);
      } else {
        reader = new StoreFileReader(fs, status.getPath(), in, length, cacheConf,
            isPrimaryReplicaStoreFile, refCount, shared, conf);
    if (this.coprocessorHost != null) {
      reader = this.coprocessorHost.postStoreFileReaderOpen(fs, this.getPath(), in, length,
        cacheConf, reference, reader);
    return reader;

There are two questions

1: Where did the shared reader come from

When a new hfile is created either by open Regionation or by flush and bulkload, a reader is created to read the metadata. This reader is a shared reader and its shared property is set to true.

  // indicate that whether this StoreFileReader is shared, i.e., used for pread. If not, we will
  // close the internal reader when readCompleted is called.
  final boolean shared;

2:pread and stream when to use

By default, get requests use pread,compact scan uses stream;
For user scan, there are the following rules:

  • If the client explicitly specifies readType, use it directly
  • If not specified by the client, the server will start in pread mode and switch to stream mode when reading more than 4 blocksize data, which is configured with hbase.storescanner.pread.max.bytes
  • If you do not want the server to make the above switch and use pread permanently, configure hbase.storescanner.use.pread as true

In addition, when scanner is closed after reading is complete, the readCompleted method is called, which determines whether shared will close the reader used:

  public void close() {
    if (closed) return;
    cur = null;
    if (this.reader != null) {
    closed = true;

   * Indicate that the scanner has finished reading with this reader. We need to decrement the ref
   * count, and also, if this is not the common pread reader, we should close it.
  void readCompleted() {
    if (!shared) {
      try {
      } catch (IOException e) {
        LOG.warn("failed to close stream reader", e);

Problem and optimization

The code in version 2.0 above has one obvious problem: many scans repeat the method, which contains too much logic, causing many unnecessary reads, affecting scan performance and wasting system resources. The newer version of the community optimizes this, and the issue concerned is

Tags: Database Linux HBase Java Apache

Posted on Sat, 04 Apr 2020 17:10:10 -0400 by MrAlaska