Senior big data Development Engineer - Hadoop learning notes

Hadoop advanced level YARN: Hadoop resource scheduling system What is YARN Apache Hadoop YARN(Yet Another Resource Negotiator) is a sub project of Hadoop, which is introduced to separate Hadoop 2.0 resource management and computing components.YARN has enough universality, and customers support other distributed computing modes. Analys ...

Posted on Sun, 05 Dec 2021 18:08:55 -0500 by snpo123

7. Hadoop3.3.1 HA High Availability Cluster QJM (based on Zookeeper, NameNode High Availability + Yarn High Availability)

Previous 1. CentOS7 hadoop3.3.1 Installation (single machine distributed, pseudo distributed, distributed)2. Implementation of HDFS by JAVA API3. MapReduce programming examplesIV. Zookeeper3.7 Installation5. Shell operation of Zookeeper6. Java API Operations zookeeper Node) Setup of Hadoop3.3.1 HA High Availability Cluster (NameNode High ...

Posted on Mon, 22 Nov 2021 20:33:21 -0500 by Hendricus

spark source code analysis (based on the yarn cluster pattern) - talk about RDD and dependency

We know that RDD is a particularly important concept in spark. It can be said that all logic of spark needs to rely on RDD. In this article, we briefly talk about RDD in spark. The definition of RDD in spark is as follows: abstract class RDD[T: ClassTag]( @transient private var _sc: SparkContext, @transient private var deps: Seq[Depend ...

Posted on Tue, 02 Nov 2021 04:05:18 -0400 by Elle0000

Learning tutorial of YARN in hadoop

YARN learning of hadoop MapReduce overview MapReduce definition MapReduce is a programming framework for distributed computing programs and the core framework for users to develop "Hadoop based data analysis applications". The core function of MapReduce is to integrate the business logic code written by the user and its own de ...

Posted on Wed, 22 Sep 2021 18:43:54 -0400 by andrewmcgibbon