Hadoop source code analysis applicationmaster service source code analysis 2021SC@SDUSC

2021SC@SDUSC

1, Introduction to applicationmaster service

In the previous chapter, we introduced the relevant contents of the ApplicationMaster launcher. The ApplicationMaster launcher is a part of the ApplicationMaster management. It is mainly responsible for communicating with the NodeManager to complete the startup of the ApplicationMaster. The ApplicationMaster service mainly processes requests from the ApplicationMaster, including registration and heartbeat requests. The registration request includes the Application startup node, external RPC port number, tracking URL and other information. The heartbeat is a periodic behavior. The reporting information includes the description of required resources, the list of containers to be released, the blacklist, etc. ApplicationMaster service returns newly allocated containers, failed containers, list of containers to be preempted, etc.

2, Applicationmaster service interface

Application master protocol is a protocol for communication between AM and RM, which is used to manage all submitted am. The main tasks are as follows:
Registering a new AM, termination / deregistration requests from any ending AM, authenticating all requests from different AM,
Ensure that the requests sent by legitimate AM are passed to the application objects in RM, obtain the allocation and release requests from all containers running AM, and asynchronously forward them to the Yan scheduler.
ApplicaitonMaster Service ensures that only one thread of any am can send requests to RM at any point in time, because all RPC requests from AM are serialized on RM.
The protocol has three main methods:
registerApplicationMaster,finishApplicationMaster,allocate

//The new ApplicationMaster is registered with RM
//The ApplicationMaster will provide RPC port, url and other information to RM, and the response information will return the maximum resource capacity that the cluster can respond
  public RegisterApplicationMasterResponse registerApplicationMaster(
      RegisterApplicationMasterRequest request) 
  throws YarnException, IOException;

//ApplicationMaster notifies RM that its status is success / failure
  public FinishApplicationMasterResponse finishApplicationMaster(
      FinishApplicationMasterRequest request) 
  throws YarnException, IOException;

//The ApplicationMaster requests a resource / heartbeat from the ResourceManager
  public AllocateResponse allocate(AllocateRequest request) 
  throws YarnException, IOException;

3, Application masterservice source code analysis

3.1 construction method

Build through serviceInit method of ResourceManager

 public ApplicationMasterService(RMContext rmContext,
      YarnScheduler scheduler) {
    this(ApplicationMasterService.class.getName(), rmContext, scheduler);
  }

  public ApplicationMasterService(String name, RMContext rmContext,
      YarnScheduler scheduler) {
    super(name);
    this.amLivelinessMonitor = rmContext.getAMLivelinessMonitor();
    this.rScheduler = scheduler;
    this.rmContext = rmContext;
    // AMSProcessingChain handles the registration of ApplicationMaster through the responsibility chain mode
    // The head node of the processor in the responsibility chain is DefaultAMSProcessor
    this.amsProcessingChain = new AMSProcessingChain(new DefaultAMSProcessor());
  }

3.2 properties

  // AM monitoring
  private final AMLivelinessMonitor amLivelinessMonitor;
  // Scheduler
  private YarnScheduler rScheduler;
  // Interface address
  protected InetSocketAddress masterServiceAddress;

  // service entity
  protected Server server;
  protected final RecordFactory recordFactory =  RecordFactoryProvider.getRecordFactory(null);

  // Store response entity
  private final ConcurrentMap<ApplicationAttemptId, AllocateResponseLock> responseMap = new ConcurrentHashMap<ApplicationAttemptId, AllocateResponseLock>();

  // ApplicationAttemptId status
  private final ConcurrentHashMap<ApplicationAttemptId, Boolean> finishedAttemptCache = new ConcurrentHashMap<>();

  // RM information
  protected final RMContext rmContext;
  // Processing Chain for storing AM
  private final AMSProcessingChain amsProcessingChain;
  // Whether to enable timelineServiceV2. The default is false
  private boolean timelineServiceV2Enabled;

3.3 serviceInit method

Initialized masterServiceAddress. Service address: 0.0.0.0/0.0.0.0:8030. Then start initializeprocessing chain

The masterServiceAddress service address was initialized

  @Override
  protected void serviceInit(Configuration conf) throws Exception {

    // Building rpc services
    // 0.0.0.0/0.0.0.0:8030
    masterServiceAddress = conf.getSocketAddr(
        YarnConfiguration.RM_BIND_HOST,
        YarnConfiguration.RM_SCHEDULER_ADDRESS,
        YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS,
        YarnConfiguration.DEFAULT_RM_SCHEDULER_PORT);

    // Initialize amsProcessingChain
    initializeProcessingChain(conf);
  }

Initializeprocessing chain

  private void initializeProcessingChain(Configuration conf) {
    amsProcessingChain.init(rmContext, null);

    // Processing placement policy, rejected by default
    // yarn.resourcemanager.placement-constraints.handler : disabled
    addPlacementConstraintHandler(conf);

    // Get the applicationmaster serviceprocessor from the configuration file and add it to the amsProcessingChain
    List<ApplicationMasterServiceProcessor> processors = getProcessorList(conf);
    if (processors != null) {
      Collections.reverse(processors);
      for (ApplicationMasterServiceProcessor p : processors) {
        // Ensure only single instance of PlacementProcessor is included
        if (p instanceof AbstractPlacementProcessor) {
          LOG.warn("Found PlacementProcessor=" + p.getClass().getCanonicalName()
              + " defined in "
              + YarnConfiguration.RM_APPLICATION_MASTER_SERVICE_PROCESSORS
              + ", however PlacementProcessor handler should be configured "
              + "by using " + YarnConfiguration.RM_PLACEMENT_CONSTRAINTS_HANDLER
              + ", this processor will be ignored.");
          continue;
        }
        this.amsProcessingChain.addProcessor(p);
      }
    }
  }

3.4 serviceStart method

The core is to start the server service: Boyi pro.local/192.168.xx.xxx: 8030

  @Override
  protected void serviceStart() throws Exception {
    Configuration conf = getConfig();
    YarnRPC rpc = YarnRPC.create(conf);

    Configuration serverConf = conf;
    // If the auth is not-simple, enforce it to be token-based.
    serverConf = new Configuration(conf);

    serverConf.set(  CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, SaslRpcServer.AuthMethod.TOKEN.toString());

    // ProtobufRpcEngin$Server   ==> 0.0.0.0: 8030
    this.server = getServer(rpc, serverConf, masterServiceAddress, this.rmContext.getAMRMTokenSecretManager());
    // TODO more exceptions could be added later.

    this.server.addTerseExceptions(ApplicationMasterNotRegisteredException.class);

    // Enable service authorization?
    if (conf.getBoolean(
        CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION, 
        false)) {


      InputStream inputStream =
          this.rmContext.getConfigurationProvider()
              .getConfigurationInputStream(conf,
                  YarnConfiguration.HADOOP_POLICY_CONFIGURATION_FILE);

      if (inputStream != null) {
        conf.addResource(inputStream);
      }
      refreshServiceAcls(conf, RMPolicyProvider.getInstance());
    }


    this.server.start();

    // Refresh configuration Boyi pro.local/192.168.xx.xxx: 8030
    this.masterServiceAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
                               YarnConfiguration.RM_SCHEDULER_ADDRESS,
                               YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS,
                               server.getListenerAddress());

    this.timelineServiceV2Enabled = YarnConfiguration.timelineServiceV2Enabled(conf);

    super.serviceStart();
  }

3.5 registerApplicationMaster method

The method defined in the applicationmaster protocol protocol is used to register the application
After obtaining the registration request and passing the verification, build the response, and then use amsProcessingChain to register.

public RegisterApplicationMasterResponse registerApplicationMaster(
      RegisterApplicationMasterRequest request) throws YarnException,
      IOException {

    AMRMTokenIdentifier amrmTokenIdentifier =
        YarnServerSecurityUtils.authorizeRequest();
    ApplicationAttemptId applicationAttemptId =
        amrmTokenIdentifier.getApplicationAttemptId();

    // Get ApplicationId
    ApplicationId appID = applicationAttemptId.getApplicationId();

    AllocateResponseLock lock = responseMap.get(applicationAttemptId);
    if (lock == null) {
      RMAuditLogger.logFailure(this.rmContext.getRMApps().get(appID).getUser(),
          AuditConstants.REGISTER_AM, "Application doesn't exist in cache "
              + applicationAttemptId, "ApplicationMasterService",
          "Error in registering application master", appID,
          applicationAttemptId);
      throwApplicationDoesNotExistInCacheException(applicationAttemptId);
    }

    // Only one thread can be registered at a time
    // Allow only one thread in AM to do registerApp at a time.
    synchronized (lock) {

      AllocateResponse lastResponse = lock.getAllocateResponse();
      if (hasApplicationMasterRegistered(applicationAttemptId)) {
        // allow UAM re-register if work preservation is enabled
        ApplicationSubmissionContext appContext =
            rmContext.getRMApps().get(appID).getApplicationSubmissionContext();
        if (!(appContext.getUnmanagedAM()
            && appContext.getKeepContainersAcrossApplicationAttempts())) {
          String message =
              AMRMClientUtils.APP_ALREADY_REGISTERED_MESSAGE + appID;
          LOG.warn(message);
          RMAuditLogger.logFailure(
              this.rmContext.getRMApps().get(appID).getUser(),
              AuditConstants.REGISTER_AM, "", "ApplicationMasterService",
              message, appID, applicationAttemptId);
          throw new InvalidApplicationMasterRequestException(message);
        }
      }
      // Update heartbeat time
      this.amLivelinessMonitor.receivedPing(applicationAttemptId);

      // Set the response id to 0 to identify whether the application host has registered the corresponding attemptid
      lastResponse.setResponseId(0);
      // Update lastResponse
      lock.setAllocateResponse(lastResponse);

      RegisterApplicationMasterResponse response =
          recordFactory.newRecordInstance(
              RegisterApplicationMasterResponse.class);

      // Perform the registration operation
      this.amsProcessingChain.registerApplicationMaster(amrmTokenIdentifier.getApplicationAttemptId(), request, response);

      return response;
    }
  }

3.6 finishApplicationMaster method

The method defined in the applicationmaster protocol protocol is used for App Master to notify applicationmaster service
Directly call this.amsprocessingchain.finishapplicationmaster to perform the registration operation.

@Override
  public FinishApplicationMasterResponse finishApplicationMaster(
      FinishApplicationMasterRequest request) throws YarnException,
      IOException {

    // Get applicationAttemptId
    ApplicationAttemptId applicationAttemptId =
        YarnServerSecurityUtils.authorizeRequest().getApplicationAttemptId();

    // Get ApplicationId
    ApplicationId appId = applicationAttemptId.getApplicationId();

    // Get RMApp
    RMApp rmApp =
        rmContext.getRMApps().get(applicationAttemptId.getApplicationId());

    // Remove collector address when app get finished.
    if (timelineServiceV2Enabled) {
      ((RMAppImpl) rmApp).removeCollectorData();
    }
    if (rmApp.isAppFinalStateStored()) {
      LOG.info(rmApp.getApplicationId() + " unregistered successfully. ");
      return FinishApplicationMasterResponse.newInstance(true);
    }

    AllocateResponseLock lock = responseMap.get(applicationAttemptId);
    if (lock == null) {
      throwApplicationDoesNotExistInCacheException(applicationAttemptId);
    }

    // Allow only one thread in AM to do finishApp at a time.
    synchronized (lock) {
      if (!hasApplicationMasterRegistered(applicationAttemptId)) {
        String message =
            "Application Master is trying to unregister before registering for: "
                + appId;
        LOG.error(message);
        RMAuditLogger.logFailure(
            this.rmContext.getRMApps()
                .get(appId).getUser(),
            AuditConstants.UNREGISTER_AM, "", "ApplicationMasterService",
            message, appId,
            applicationAttemptId);
        throw new ApplicationMasterNotRegisteredException(message);
      }

      FinishApplicationMasterResponse response =
          FinishApplicationMasterResponse.newInstance(false);

      // Does the finishedAttemptCache have applicationAttemptId
      if (finishedAttemptCache.putIfAbsent(applicationAttemptId, true)
          == null) {
        // They haven't been dealt with, so they can be dealt with directly
        this.amsProcessingChain
            .finishApplicationMaster(applicationAttemptId, request, response);
      }
      // Process monitor heartbeat
      this.amLivelinessMonitor.receivedPing(applicationAttemptId);
      return response;
    }
  }

3.7 allocate method

Or call amsProcessingChain.allocate to process the request.

  @Override
  public AllocateResponse allocate(AllocateRequest request)
      throws YarnException, IOException {

    AMRMTokenIdentifier amrmTokenIdentifier = YarnServerSecurityUtils.authorizeRequest();

    ApplicationAttemptId appAttemptId = amrmTokenIdentifier.getApplicationAttemptId();

    // Update heartbeat time
    this.amLivelinessMonitor.receivedPing(appAttemptId);

    //If there is no data in the cache, an exception is thrown directly

    AllocateResponseLock lock = responseMap.get(appAttemptId);
    if (lock == null) {
      String message =
          "Application attempt " + appAttemptId
              + " doesn't exist in ApplicationMasterService cache.";
      LOG.error(message);
      throw new ApplicationAttemptNotFoundException(message);
    }
    synchronized (lock) {
      AllocateResponse lastResponse = lock.getAllocateResponse();
      if (!hasApplicationMasterRegistered(appAttemptId)) {
        String message =
            "AM is not registered for known application attempt: "
                + appAttemptId
                + " or RM had restarted after AM registered. "
                + " AM should re-register.";
        throw new ApplicationMasterNotRegisteredException(message);
      }

      // Normally request.getResponseId() == lastResponse.getResponseId()
      if (AMRMClientUtils.getNextResponseId(
          request.getResponseId()) == lastResponse.getResponseId()) {
        // heartbeat one step old, simply return lastReponse
        return lastResponse;
      } else if (request.getResponseId() != lastResponse.getResponseId()) {
        throw new InvalidApplicationMasterRequestException(AMRMClientUtils
            .assembleInvalidResponseIdExceptionMessage(appAttemptId,
                lastResponse.getResponseId(), request.getResponseId()));
      }

      // Build response
      AllocateResponse response =  recordFactory.newRecordInstance(AllocateResponse.class);

      // Key points 
      this.amsProcessingChain.allocate(  amrmTokenIdentifier.getApplicationAttemptId(), request, response);

      // update AMRMToken if the token is rolled-up
      MasterKeyData nextMasterKey =
          this.rmContext.getAMRMTokenSecretManager().getNextMasterKeyData();

      if (nextMasterKey != null
          && nextMasterKey.getMasterKey().getKeyId() != amrmTokenIdentifier
          .getKeyId()) {

        // Get RM application
        RMApp app =  this.rmContext.getRMApps().get(appAttemptId.getApplicationId());


        RMAppAttempt appAttempt = app.getRMAppAttempt(appAttemptId);

        RMAppAttemptImpl appAttemptImpl = (RMAppAttemptImpl)appAttempt;

        Token<AMRMTokenIdentifier> amrmToken = appAttempt.getAMRMToken();
        if (nextMasterKey.getMasterKey().getKeyId() !=
            appAttemptImpl.getAMRMTokenKeyId()) {
          LOG.info("The AMRMToken has been rolled-over. Send new AMRMToken back"
              + " to application: " + appAttemptId.getApplicationId());
          amrmToken = rmContext.getAMRMTokenSecretManager()
              .createAndGetAMRMToken(appAttemptId);
          appAttemptImpl.setAMRMToken(amrmToken);
        }

        response.setAMRMToken(org.apache.hadoop.yarn.api.records.Token
            .newInstance(amrmToken.getIdentifier(), amrmToken.getKind()
                .toString(), amrmToken.getPassword(), amrmToken.getService()
                .toString()));
      }

      response.setResponseId(
          AMRMClientUtils.getNextResponseId(lastResponse.getResponseId()));
      lock.setAllocateResponse(response);
      return response;
    }
  }

After that, we will introduce the relevant contents of amlevelinessmonitor in detail in order to improve the analysis and research of ApplicationMaster.

Tags: Big Data Hadoop

Posted on Sun, 07 Nov 2021 12:26:17 -0500 by aminnuto