2021SC@SDUSC
1, Introduction to applicationmaster service
In the previous chapter, we introduced the relevant contents of the ApplicationMaster launcher. The ApplicationMaster launcher is a part of the ApplicationMaster management. It is mainly responsible for communicating with the NodeManager to complete the startup of the ApplicationMaster. The ApplicationMaster service mainly processes requests from the ApplicationMaster, including registration and heartbeat requests. The registration request includes the Application startup node, external RPC port number, tracking URL and other information. The heartbeat is a periodic behavior. The reporting information includes the description of required resources, the list of containers to be released, the blacklist, etc. ApplicationMaster service returns newly allocated containers, failed containers, list of containers to be preempted, etc.
2, Applicationmaster service interface
Application master protocol is a protocol for communication between AM and RM, which is used to manage all submitted am. The main tasks are as follows:
Registering a new AM, termination / deregistration requests from any ending AM, authenticating all requests from different AM,
Ensure that the requests sent by legitimate AM are passed to the application objects in RM, obtain the allocation and release requests from all containers running AM, and asynchronously forward them to the Yan scheduler.
ApplicaitonMaster Service ensures that only one thread of any am can send requests to RM at any point in time, because all RPC requests from AM are serialized on RM.
The protocol has three main methods:
registerApplicationMaster,finishApplicationMaster,allocate
//The new ApplicationMaster is registered with RM //The ApplicationMaster will provide RPC port, url and other information to RM, and the response information will return the maximum resource capacity that the cluster can respond public RegisterApplicationMasterResponse registerApplicationMaster( RegisterApplicationMasterRequest request) throws YarnException, IOException; //ApplicationMaster notifies RM that its status is success / failure public FinishApplicationMasterResponse finishApplicationMaster( FinishApplicationMasterRequest request) throws YarnException, IOException; //The ApplicationMaster requests a resource / heartbeat from the ResourceManager public AllocateResponse allocate(AllocateRequest request) throws YarnException, IOException;
3, Application masterservice source code analysis
3.1 construction method
Build through serviceInit method of ResourceManager
public ApplicationMasterService(RMContext rmContext, YarnScheduler scheduler) { this(ApplicationMasterService.class.getName(), rmContext, scheduler); } public ApplicationMasterService(String name, RMContext rmContext, YarnScheduler scheduler) { super(name); this.amLivelinessMonitor = rmContext.getAMLivelinessMonitor(); this.rScheduler = scheduler; this.rmContext = rmContext; // AMSProcessingChain handles the registration of ApplicationMaster through the responsibility chain mode // The head node of the processor in the responsibility chain is DefaultAMSProcessor this.amsProcessingChain = new AMSProcessingChain(new DefaultAMSProcessor()); }
3.2 properties
// AM monitoring private final AMLivelinessMonitor amLivelinessMonitor; // Scheduler private YarnScheduler rScheduler; // Interface address protected InetSocketAddress masterServiceAddress; // service entity protected Server server; protected final RecordFactory recordFactory = RecordFactoryProvider.getRecordFactory(null); // Store response entity private final ConcurrentMap<ApplicationAttemptId, AllocateResponseLock> responseMap = new ConcurrentHashMap<ApplicationAttemptId, AllocateResponseLock>(); // ApplicationAttemptId status private final ConcurrentHashMap<ApplicationAttemptId, Boolean> finishedAttemptCache = new ConcurrentHashMap<>(); // RM information protected final RMContext rmContext; // Processing Chain for storing AM private final AMSProcessingChain amsProcessingChain; // Whether to enable timelineServiceV2. The default is false private boolean timelineServiceV2Enabled;
3.3 serviceInit method
Initialized masterServiceAddress. Service address: 0.0.0.0/0.0.0.0:8030. Then start initializeprocessing chain
The masterServiceAddress service address was initialized
@Override protected void serviceInit(Configuration conf) throws Exception { // Building rpc services // 0.0.0.0/0.0.0.0:8030 masterServiceAddress = conf.getSocketAddr( YarnConfiguration.RM_BIND_HOST, YarnConfiguration.RM_SCHEDULER_ADDRESS, YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS, YarnConfiguration.DEFAULT_RM_SCHEDULER_PORT); // Initialize amsProcessingChain initializeProcessingChain(conf); }
Initializeprocessing chain
private void initializeProcessingChain(Configuration conf) { amsProcessingChain.init(rmContext, null); // Processing placement policy, rejected by default // yarn.resourcemanager.placement-constraints.handler : disabled addPlacementConstraintHandler(conf); // Get the applicationmaster serviceprocessor from the configuration file and add it to the amsProcessingChain List<ApplicationMasterServiceProcessor> processors = getProcessorList(conf); if (processors != null) { Collections.reverse(processors); for (ApplicationMasterServiceProcessor p : processors) { // Ensure only single instance of PlacementProcessor is included if (p instanceof AbstractPlacementProcessor) { LOG.warn("Found PlacementProcessor=" + p.getClass().getCanonicalName() + " defined in " + YarnConfiguration.RM_APPLICATION_MASTER_SERVICE_PROCESSORS + ", however PlacementProcessor handler should be configured " + "by using " + YarnConfiguration.RM_PLACEMENT_CONSTRAINTS_HANDLER + ", this processor will be ignored."); continue; } this.amsProcessingChain.addProcessor(p); } } }
3.4 serviceStart method
The core is to start the server service: Boyi pro.local/192.168.xx.xxx: 8030
@Override protected void serviceStart() throws Exception { Configuration conf = getConfig(); YarnRPC rpc = YarnRPC.create(conf); Configuration serverConf = conf; // If the auth is not-simple, enforce it to be token-based. serverConf = new Configuration(conf); serverConf.set( CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, SaslRpcServer.AuthMethod.TOKEN.toString()); // ProtobufRpcEngin$Server ==> 0.0.0.0: 8030 this.server = getServer(rpc, serverConf, masterServiceAddress, this.rmContext.getAMRMTokenSecretManager()); // TODO more exceptions could be added later. this.server.addTerseExceptions(ApplicationMasterNotRegisteredException.class); // Enable service authorization? if (conf.getBoolean( CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION, false)) { InputStream inputStream = this.rmContext.getConfigurationProvider() .getConfigurationInputStream(conf, YarnConfiguration.HADOOP_POLICY_CONFIGURATION_FILE); if (inputStream != null) { conf.addResource(inputStream); } refreshServiceAcls(conf, RMPolicyProvider.getInstance()); } this.server.start(); // Refresh configuration Boyi pro.local/192.168.xx.xxx: 8030 this.masterServiceAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST, YarnConfiguration.RM_SCHEDULER_ADDRESS, YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS, server.getListenerAddress()); this.timelineServiceV2Enabled = YarnConfiguration.timelineServiceV2Enabled(conf); super.serviceStart(); }
3.5 registerApplicationMaster method
The method defined in the applicationmaster protocol protocol is used to register the application
After obtaining the registration request and passing the verification, build the response, and then use amsProcessingChain to register.
public RegisterApplicationMasterResponse registerApplicationMaster( RegisterApplicationMasterRequest request) throws YarnException, IOException { AMRMTokenIdentifier amrmTokenIdentifier = YarnServerSecurityUtils.authorizeRequest(); ApplicationAttemptId applicationAttemptId = amrmTokenIdentifier.getApplicationAttemptId(); // Get ApplicationId ApplicationId appID = applicationAttemptId.getApplicationId(); AllocateResponseLock lock = responseMap.get(applicationAttemptId); if (lock == null) { RMAuditLogger.logFailure(this.rmContext.getRMApps().get(appID).getUser(), AuditConstants.REGISTER_AM, "Application doesn't exist in cache " + applicationAttemptId, "ApplicationMasterService", "Error in registering application master", appID, applicationAttemptId); throwApplicationDoesNotExistInCacheException(applicationAttemptId); } // Only one thread can be registered at a time // Allow only one thread in AM to do registerApp at a time. synchronized (lock) { AllocateResponse lastResponse = lock.getAllocateResponse(); if (hasApplicationMasterRegistered(applicationAttemptId)) { // allow UAM re-register if work preservation is enabled ApplicationSubmissionContext appContext = rmContext.getRMApps().get(appID).getApplicationSubmissionContext(); if (!(appContext.getUnmanagedAM() && appContext.getKeepContainersAcrossApplicationAttempts())) { String message = AMRMClientUtils.APP_ALREADY_REGISTERED_MESSAGE + appID; LOG.warn(message); RMAuditLogger.logFailure( this.rmContext.getRMApps().get(appID).getUser(), AuditConstants.REGISTER_AM, "", "ApplicationMasterService", message, appID, applicationAttemptId); throw new InvalidApplicationMasterRequestException(message); } } // Update heartbeat time this.amLivelinessMonitor.receivedPing(applicationAttemptId); // Set the response id to 0 to identify whether the application host has registered the corresponding attemptid lastResponse.setResponseId(0); // Update lastResponse lock.setAllocateResponse(lastResponse); RegisterApplicationMasterResponse response = recordFactory.newRecordInstance( RegisterApplicationMasterResponse.class); // Perform the registration operation this.amsProcessingChain.registerApplicationMaster(amrmTokenIdentifier.getApplicationAttemptId(), request, response); return response; } }
3.6 finishApplicationMaster method
The method defined in the applicationmaster protocol protocol is used for App Master to notify applicationmaster service
Directly call this.amsprocessingchain.finishapplicationmaster to perform the registration operation.
@Override public FinishApplicationMasterResponse finishApplicationMaster( FinishApplicationMasterRequest request) throws YarnException, IOException { // Get applicationAttemptId ApplicationAttemptId applicationAttemptId = YarnServerSecurityUtils.authorizeRequest().getApplicationAttemptId(); // Get ApplicationId ApplicationId appId = applicationAttemptId.getApplicationId(); // Get RMApp RMApp rmApp = rmContext.getRMApps().get(applicationAttemptId.getApplicationId()); // Remove collector address when app get finished. if (timelineServiceV2Enabled) { ((RMAppImpl) rmApp).removeCollectorData(); } if (rmApp.isAppFinalStateStored()) { LOG.info(rmApp.getApplicationId() + " unregistered successfully. "); return FinishApplicationMasterResponse.newInstance(true); } AllocateResponseLock lock = responseMap.get(applicationAttemptId); if (lock == null) { throwApplicationDoesNotExistInCacheException(applicationAttemptId); } // Allow only one thread in AM to do finishApp at a time. synchronized (lock) { if (!hasApplicationMasterRegistered(applicationAttemptId)) { String message = "Application Master is trying to unregister before registering for: " + appId; LOG.error(message); RMAuditLogger.logFailure( this.rmContext.getRMApps() .get(appId).getUser(), AuditConstants.UNREGISTER_AM, "", "ApplicationMasterService", message, appId, applicationAttemptId); throw new ApplicationMasterNotRegisteredException(message); } FinishApplicationMasterResponse response = FinishApplicationMasterResponse.newInstance(false); // Does the finishedAttemptCache have applicationAttemptId if (finishedAttemptCache.putIfAbsent(applicationAttemptId, true) == null) { // They haven't been dealt with, so they can be dealt with directly this.amsProcessingChain .finishApplicationMaster(applicationAttemptId, request, response); } // Process monitor heartbeat this.amLivelinessMonitor.receivedPing(applicationAttemptId); return response; } }
3.7 allocate method
Or call amsProcessingChain.allocate to process the request.
@Override public AllocateResponse allocate(AllocateRequest request) throws YarnException, IOException { AMRMTokenIdentifier amrmTokenIdentifier = YarnServerSecurityUtils.authorizeRequest(); ApplicationAttemptId appAttemptId = amrmTokenIdentifier.getApplicationAttemptId(); // Update heartbeat time this.amLivelinessMonitor.receivedPing(appAttemptId); //If there is no data in the cache, an exception is thrown directly AllocateResponseLock lock = responseMap.get(appAttemptId); if (lock == null) { String message = "Application attempt " + appAttemptId + " doesn't exist in ApplicationMasterService cache."; LOG.error(message); throw new ApplicationAttemptNotFoundException(message); } synchronized (lock) { AllocateResponse lastResponse = lock.getAllocateResponse(); if (!hasApplicationMasterRegistered(appAttemptId)) { String message = "AM is not registered for known application attempt: " + appAttemptId + " or RM had restarted after AM registered. " + " AM should re-register."; throw new ApplicationMasterNotRegisteredException(message); } // Normally request.getResponseId() == lastResponse.getResponseId() if (AMRMClientUtils.getNextResponseId( request.getResponseId()) == lastResponse.getResponseId()) { // heartbeat one step old, simply return lastReponse return lastResponse; } else if (request.getResponseId() != lastResponse.getResponseId()) { throw new InvalidApplicationMasterRequestException(AMRMClientUtils .assembleInvalidResponseIdExceptionMessage(appAttemptId, lastResponse.getResponseId(), request.getResponseId())); } // Build response AllocateResponse response = recordFactory.newRecordInstance(AllocateResponse.class); // Key points this.amsProcessingChain.allocate( amrmTokenIdentifier.getApplicationAttemptId(), request, response); // update AMRMToken if the token is rolled-up MasterKeyData nextMasterKey = this.rmContext.getAMRMTokenSecretManager().getNextMasterKeyData(); if (nextMasterKey != null && nextMasterKey.getMasterKey().getKeyId() != amrmTokenIdentifier .getKeyId()) { // Get RM application RMApp app = this.rmContext.getRMApps().get(appAttemptId.getApplicationId()); RMAppAttempt appAttempt = app.getRMAppAttempt(appAttemptId); RMAppAttemptImpl appAttemptImpl = (RMAppAttemptImpl)appAttempt; Token<AMRMTokenIdentifier> amrmToken = appAttempt.getAMRMToken(); if (nextMasterKey.getMasterKey().getKeyId() != appAttemptImpl.getAMRMTokenKeyId()) { LOG.info("The AMRMToken has been rolled-over. Send new AMRMToken back" + " to application: " + appAttemptId.getApplicationId()); amrmToken = rmContext.getAMRMTokenSecretManager() .createAndGetAMRMToken(appAttemptId); appAttemptImpl.setAMRMToken(amrmToken); } response.setAMRMToken(org.apache.hadoop.yarn.api.records.Token .newInstance(amrmToken.getIdentifier(), amrmToken.getKind() .toString(), amrmToken.getPassword(), amrmToken.getService() .toString())); } response.setResponseId( AMRMClientUtils.getNextResponseId(lastResponse.getResponseId())); lock.setAllocateResponse(response); return response; } }
After that, we will introduce the relevant contents of amlevelinessmonitor in detail in order to improve the analysis and research of ApplicationMaster.