Senior big data Development Engineer - Hadoop learning notes
Hadoop advanced level
YARN: Hadoop resource scheduling system
What is YARN
Apache Hadoop YARN(Yet Another Resource Negotiator) is a sub project of Hadoop, which is introduced to separate Hadoop 2.0 resource management and computing components.YARN has enough universality, and customers support other distributed computing modes.
Analys ...
Posted on Sun, 05 Dec 2021 18:08:55 -0500 by snpo123
7. Hadoop3.3.1 HA High Availability Cluster QJM (based on Zookeeper, NameNode High Availability + Yarn High Availability)
Previous
1. CentOS7 hadoop3.3.1 Installation (single machine distributed, pseudo distributed, distributed)2. Implementation of HDFS by JAVA API3. MapReduce programming examplesIV. Zookeeper3.7 Installation5. Shell operation of Zookeeper6. Java API Operations zookeeper Node)
Setup of Hadoop3.3.1 HA High Availability Cluster
(NameNode High ...
Posted on Mon, 22 Nov 2021 20:33:21 -0500 by Hendricus
spark source code analysis (based on the yarn cluster pattern) - talk about RDD and dependency
We know that RDD is a particularly important concept in spark. It can be said that all logic of spark needs to rely on RDD. In this article, we briefly talk about RDD in spark. The definition of RDD in spark is as follows:
abstract class RDD[T: ClassTag](
@transient private var _sc: SparkContext,
@transient private var deps: Seq[Depend ...
Posted on Tue, 02 Nov 2021 04:05:18 -0400 by Elle0000
Learning tutorial of YARN in hadoop
YARN learning of hadoop
MapReduce overview
MapReduce definition
MapReduce is a programming framework for distributed computing programs and the core framework for users to develop "Hadoop based data analysis applications".
The core function of MapReduce is to integrate the business logic code written by the user and its own de ...
Posted on Wed, 22 Sep 2021 18:43:54 -0400 by andrewmcgibbon