2021SC@SDUSC HBase project analysis: Master startup

2021SC@SDUSC

catalogue

Master introduction

Master overall architecture

Introduction to Master components

Master startup process source code analysis

2021SC@SDUSCĀ 

Master introduction

Master overall architecture

Introduction to Master components

ZooKeeperWatcher: all components that need to know and handle the state changes of ZNode need to register ZooKeeperListener on ZooKeeperWatcher and provide the ability to operate nodes on ZooKeeper;

ActiveMasterManager: the management object of active master, which is responsible for monitoring the changes of master znode on zookeeper;

RpcServer: a component that provides RPC services. The specific services are supported by RpcEngine;

InfoServer: a web server that responds to access http://MasterHost:60010 The essence of the request is a Jetty web server;

RegionServerTracker: tracks the status of the online region server. If an RS znode is deleted, it will terminate the RS through the server manager and remove it from the online servers list;

DrainingServerTracker: tracks the status of the drawing region server;

Master file system: abstracts the master's operation on the underlying file system;

ServerManager: manages region server s, maintains online and offline server lists, handles RS startup and shutdown, reports RS load information to it, and closes and opens region;

AssignmentManager: responsible for region allocation and maintenance of region status;

CatalogTracker: - ROOT - and. META. Trackers. The specific work is completed by RootRegionTracker and MetaNodeTracker. The former tracks the state of "ROOT region server" and the latter tracks the state of ZNode corresponding to. META;

MemoryBoundedLogMessageBuffer: stores the fatal error information from the region server. If the buffer size exceeds, it will be automatically cleaned up;

ExecutorService: event executor. Different events will be submitted to different queues for execution, and the default resources of different events are also different. The corresponding is not java.util.concurrent.ExecutorService, but org.apache.hadoop.hbase.executor.ExecutorService;

LoadBalancer: balance the load of the region on the region server;

BalancerChore: execute master.balance() regularly;

CatalogJanitor: regularly clean up the parent region information left by split in. meta;

LogCleaner: regularly clean the logs in the. oldlogs directory;

HFileCleaner: periodically clean hfiles in the. archive directory;

MasterCoprocessorHost: provides the execution environment and framework of coprocessors on the Master side. These coprocessors are packaged in the MasterEnvironment. When an action occurs, it will traverse all masterenvironments in the MasterCoprocessorHost, obtain the MasterObserver in it, and call relevant methods;

Snapshot Manager: manage table snapshots;

HealthCheckChore: it is not a necessary component. It can be started only after hbase.node.health.script.location is configured. It will periodically execute the Script set by location to detect health. If the number of failures reaches the threshold in the failure window, the master will stop.

Master startup process source code analysis

The entrance method of Master starts with the main() method of org.apache.hadoop.hbase.master.HMaster. From here, we start to analyze: when we execute main() method, we need to pass a parameter to it: start or stop, main() method first prints HBase version information, then calls doMain() method of HMasterCommandLine.

 public static void main(String [] args) {
    VersionInfo.logVersion();
    new HMasterCommandLine(HMaster.class).doMain(args);
  }

View the HMasterCommandLine.doMain() method: HMasterCommandLine inherits the ServerCommandLine class, and the ServerCommandLine class implements the Tool interface, which is used to parse the HBase Master server command line parameters and start the Master thread

public void doMain(String args[]) {
    try {
      int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
      if (ret != 0) {
        System.exit(ret);
      }
    } catch (Exception e) {
      LOG.error("Failed to run", e);
      System.exit(-1);
    }
  }

Check again accordingly   ToolRunner.run() method: it can be seen that the method actually calls the tool.run() method at last. From the above, we can see that the tool passed to the method is "this", that is, HMasterCommandLine, so this method actually calls back the HMasterCommandLine.run() method

public static int run(Configuration conf, Tool tool, String[] args) throws Exception {
        if (conf == null) {
            conf = new Configuration();
        }

        GenericOptionsParser parser = new GenericOptionsParser(conf, args);
        tool.setConf(conf);
        String[] toolArgs = parser.getRemainingArgs();
        return tool.run(toolArgs);
    }

Now let's see   HMasterCommandLine.run() method: first, it will process the parameters of the command line and convert them into corresponding configurations. The corresponding relationship between parameters and configurations is as follows:  

Correspondence between parameters and configuration
parameterto configure
minRegionServershbase.regions.server.count.min
minServershbase.regions.server.count.min
backuphbase.master.backup
localRegionServershbase.regionservers
mastershbase.masters

Then, different methods will be called according to the command line parameters. If the parameter is start, the startMaster() method will be called

public int run(String args[]) throws Exception {
    Options opt = new Options();
    opt.addOption("localRegionServers", true,
      "RegionServers to start in master process when running standalone");
    opt.addOption("masters", true, "Masters to start in this process");
    opt.addOption("minRegionServers", true, "Minimum RegionServers needed to host user tables");
    opt.addOption("backup", false, "Do not try to become HMaster until the primary fails");

    CommandLine cmd;
    try {
      cmd = new GnuParser().parse(opt, args);
    } catch (ParseException e) {
      LOG.error("Could not parse: ", e);
      usage(null);
      return 1;
    }
    if (cmd.hasOption("minRegionServers")) {
      String val = cmd.getOptionValue("minRegionServers");
      getConf().setInt("hbase.regions.server.count.min",
                  Integer.valueOf(val));
      LOG.debug("minRegionServers set to " + val);
    }

    if (cmd.hasOption("minServers")) {
      String val = cmd.getOptionValue("minServers");
      getConf().setInt("hbase.regions.server.count.min",
                  Integer.valueOf(val));
      LOG.debug("minServers set to " + val);
    }
    if (cmd.hasOption("backup")) {
      getConf().setBoolean(HConstants.MASTER_TYPE_BACKUP, true);
    }
    if (cmd.hasOption("localRegionServers")) {
      String val = cmd.getOptionValue("localRegionServers");
      getConf().setInt("hbase.regionservers", Integer.valueOf(val));
      LOG.debug("localRegionServers set to " + val);
    }
    if (cmd.hasOption("masters")) {
      String val = cmd.getOptionValue("masters");
      getConf().setInt("hbase.masters", Integer.valueOf(val));
      LOG.debug("masters set to " + val);
    }

    List<String> remainingArgs = cmd.getArgList();
    if (remainingArgs.size() != 1) {
      usage(null);
      return 1;
    }

    String command = remainingArgs.get(0);

    if ("start".equals(command)) {
      return startMaster();
    } else if ("stop".equals(command)) {
      return stopMaster();
    } else if ("clear".equals(command)) {
      return (ZNodeClearer.clear(getConf()) ? 0 : 1);
    } else {
      usage("Invalid command: " + command);
      return 1;
    }
  }

Check the startMaster() method: this method will judge whether it is local mode or distributed mode according to hbase.cluster.distributed. If hbase.cluster.distributed is false, HBase is local mode. In this mode, the master thread and regionserver thread will start in the same JVM. First call minizookeepcluster to start ZooKeeper, Then the LocalHBaseCluster class is invoked to start the master thread and the regionserver thread. The number of master threads is specified by the parameter hbase.masters, and the number of regionserver threads is specified by hbase.regionservers. If the hbase.cluster.distributed value is true, HBase is distributed. At this time, the constructor of HMaster will be called by reflection to create an HMaster instance (this implementation is to allow users to create subclasses of HMaster to extend the behavior of HMaster), and its start() and join() methods will be called

 private int startMaster() {
    Configuration conf = getConf();
    try {
      if (LocalHBaseCluster.isLocal(conf)) {
        final MiniZooKeeperCluster zooKeeperCluster = new MiniZooKeeperCluster(conf);
        File zkDataPath = new File(conf.get(HConstants.ZOOKEEPER_DATA_DIR));
        int zkClientPort = conf.getInt(HConstants.ZOOKEEPER_CLIENT_PORT, 0);
        if (zkClientPort == 0) {
          throw new IOException("No config value for "
              + HConstants.ZOOKEEPER_CLIENT_PORT);
        }
        zooKeeperCluster.setDefaultClientPort(zkClientPort);
        int zkTickTime = conf.getInt(HConstants.ZOOKEEPER_TICK_TIME, 0);
        if (zkTickTime > 0) {
          zooKeeperCluster.setTickTime(zkTickTime);
        }
        ZKUtil.loginServer(conf, HConstants.ZK_SERVER_KEYTAB_FILE,
          HConstants.ZK_SERVER_KERBEROS_PRINCIPAL, null);
        int clientPort = zooKeeperCluster.startup(zkDataPath);
        if (clientPort != zkClientPort) {
          String errorMsg = "Could not start ZK at requested port of " +
            zkClientPort + ".  ZK was started at port: " + clientPort +
            ".  Aborting as clients (e.g. shell) will not be able to find " +
            "this ZK quorum.";
          System.err.println(errorMsg);
          throw new IOException(errorMsg);
        }
        conf.set(HConstants.ZOOKEEPER_CLIENT_PORT,
                 Integer.toString(clientPort));
        int localZKClusterSessionTimeout =
            conf.getInt(HConstants.ZK_SESSION_TIMEOUT + ".localHBaseCluster", 10*1000);
        conf.setInt(HConstants.ZK_SESSION_TIMEOUT, localZKClusterSessionTimeout);
        LocalHBaseCluster cluster = new LocalHBaseCluster(conf, conf.getInt("hbase.masters", 1),
          conf.getInt("hbase.regionservers", 1), LocalHMaster.class, HRegionServer.class);
        ((LocalHMaster)cluster.getMaster(0)).setZKCluster(zooKeeperCluster);
        cluster.startup();
        waitOnMasterThreads(cluster);
      } else {
        logProcessInfo(getConf());
        HMaster master = HMaster.constructMaster(masterClass, conf);
        if (master.isStopped()) {
          LOG.info("Won't bring the Master up as a shutdown is requested");
          return 1;
        }
        master.start();
        master.join();
        if(master.isAborted())
          throw new RuntimeException("HMaster Aborted");
      }
    } catch (Throwable t) {
      LOG.error("Master exiting", t);
      return 1;
    }
    return 0;
  }

Check the constructor of HMaster: first instantiate the Configuration object and configure it: 1. Prohibit block cache on the master; 2. Set the reconnection times of the server; 3. Create rpcServer (java nio is used in this place, and a blog will be dedicated to the use of java nio); 4. Set serverName; 5. Log in to zookeeper; 6. Initialize the server principal; 7. Set the thread name; 8. Register the user-defined ReplicationLogCleaner class; 9. Configure parameters for task trackers; 10. Create ZooKeeperWatcher; 11. Start the rpcServer thread; 12. Create a metricsMaster; 13. Judge whether to conduct health test

  public HMaster(final Configuration conf)
  throws IOException, KeeperException, InterruptedException {
    this.conf = new Configuration(conf);

    //block cache is prohibited on the master
    this.conf.setFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY, 0.0f);
    FSUtils.setupShortCircuitRead(conf);
    String hostname = Strings.domainNamePointerToHostName(DNS.getDefaultHost(
      conf.get("hbase.master.dns.interface", "default"),
      conf.get("hbase.master.dns.nameserver", "default")));
    int port = conf.getInt(HConstants.MASTER_PORT, HConstants.DEFAULT_MASTER_PORT);
    InetSocketAddress initialIsa = new InetSocketAddress(hostname, port);
    if (initialIsa.getAddress() == null) {
      throw new IllegalArgumentException("Failed resolve of hostname " + initialIsa);
    }
    String bindAddress = conf.get("hbase.master.ipc.address");
    if (bindAddress != null) {
      initialIsa = new InetSocketAddress(bindAddress, port);
      if (initialIsa.getAddress() == null) {
        throw new IllegalArgumentException("Failed resolve of bind address " + initialIsa);
      }
    }
    String name = "master/" + initialIsa.toString();

    //Sets the number of reconnections to the server
    HConnectionManager.setServerSideHConnectionRetries(this.conf, name, LOG);
    int numHandlers = conf.getInt(HConstants.MASTER_HANDLER_COUNT,
      conf.getInt(HConstants.REGION_SERVER_HANDLER_COUNT, HConstants.DEFAULT_MASTER_HANLDER_COUNT));

    //Create rpcServer
    this.rpcServer = new RpcServer(this, name, getServices(),
      initialIsa, // BindAddress is IP we got for this server.
      conf,
      new FifoRpcScheduler(conf, numHandlers));
    this.isa = this.rpcServer.getListenerAddress();

    //Set serverName
    this.serverName = ServerName.valueOf(hostname, this.isa.getPort(), System.currentTimeMillis());
    this.rsFatals = new MemoryBoundedLogMessageBuffer(
      conf.getLong("hbase.master.buffer.for.rs.fatals", 1*1024*1024));

    //Log in to zookeeper
    ZKUtil.loginClient(this.conf, HConstants.ZK_CLIENT_KEYTAB_FILE,
      HConstants.ZK_CLIENT_KERBEROS_PRINCIPAL, this.isa.getHostName());
    
    //initialize server
    UserProvider provider = UserProvider.instantiate(conf);
    provider.login("hbase.master.keytab.file",
      "hbase.master.kerberos.principal", this.isa.getHostName());

    LOG.info("hbase.rootdir=" + FSUtils.getRootDir(this.conf) +
        ", hbase.cluster.distributed=" + this.conf.getBoolean("hbase.cluster.distributed", false));

    //Set thread name
    setName(MASTER + ":" + this.serverName.toShortString());

    //Register the user-defined ReplicationLogCleaner class
    Replication.decorateMasterConfiguration(this.conf);

    //Configure parameters for task trackers
    if (this.conf.get("mapred.task.id") == null) {
      this.conf.set("mapred.task.id", "hb_m_" + this.serverName.toString());
    }

    //Create ZooKeeperWatcher
    this.zooKeeper = new ZooKeeperWatcher(conf, MASTER + ":" + isa.getPort(), this, true);
    //Start rpcServer thread
    this.rpcServer.startThreads();
    this.pauseMonitor = new JvmPauseMonitor(conf);
    this.pauseMonitor.start();

    this.msgInterval = conf.getInt("hbase.regionserver.msginterval", 3 * 1000);

    this.masterCheckCompression = conf.getBoolean("hbase.master.check.compression", true);

    this.masterCheckEncryption = conf.getBoolean("hbase.master.check.encryption", true);
    
    //Create a metricsMaster 
    this.metricsMaster = new MetricsMaster( new MetricsMasterWrapperImpl(this));

    this.preLoadTableDescriptors = conf.getBoolean("hbase.master.preload.tabledescriptors", true);

    //Determine whether health testing is performed
    int sleepTime = this.conf.getInt(HConstants.HEALTH_CHORE_WAKE_FREQ,
      HConstants.DEFAULT_THREAD_WAKE_FREQUENCY);
    if (isHealthCheckerConfigured()) {
      healthCheckChore = new HealthCheckChore(sleepTime, this, getConfiguration());
    }

    boolean shouldPublish = conf.getBoolean(HConstants.STATUS_PUBLISHED,
        HConstants.STATUS_PUBLISHED_DEFAULT);
    Class<? extends ClusterStatusPublisher.Publisher> publisherClass =
        conf.getClass(ClusterStatusPublisher.STATUS_PUBLISHER_CLASS,
            ClusterStatusPublisher.DEFAULT_STATUS_PUBLISHER_CLASS,
            ClusterStatusPublisher.Publisher.class);

    if (shouldPublish) {
      if (publisherClass == null) {
        LOG.warn(HConstants.STATUS_PUBLISHED + " is true, but " +
            ClusterStatusPublisher.DEFAULT_STATUS_PUBLISHER_CLASS +
            " is not set - not publishing status");
      } else {
        clusterStatusPublisherChore = new ClusterStatusPublisher(this, conf, publisherClass);
        Threads.setDaemonThreadRunning(clusterStatusPublisherChore.getThread());
      }
    }
  }

Tags: Big Data HBase

Posted on Sat, 30 Oct 2021 01:31:10 -0400 by davidkierz