Flume deployment and introduction case

1, Flume installation and deployment 1.1 installation address 1. Flume official website addresshttp://flume.apache.org/ 2. Download addresshttp://archive.apache.org/dist/flume/ 3. Document addresshttp://flume.apache.org/FlumeUserGuide.html 1.2 installation and deployment 1. Upload apache-flume-1 ...

Posted on Wed, 26 Feb 2020 01:02:44 -0500 by john_zakaria

Building of Hadoop cluster stand-alone version-3. Hadoop cluster version

Build a simple cluster version based on Hadoop stand-alone version Article directory 0. planning 1. Clone virtual machine 2. Start the virtual machine and change the static ip 3. Modify ip mapping in hosts file 4. Modify hostname 5. Modify hadoop configuration file 5.1 core site.xml modification 5.2 ...

Posted on Tue, 25 Feb 2020 22:40:06 -0500 by stargate03

Install Sqoop on Linux (and connect mysql test)

Article directory Environment description Download and unzip Change configuration sqoop-env.sh After decompression Modify sqoop-env.sh Configure environment variables Copy mysql driver mysql start View sqoop version Test with mysql Environment description Software Edition operating syste ...

Posted on Tue, 25 Feb 2020 10:36:44 -0500 by FluxNYC

Spark RDD creates API MySQL HBase

Generally speaking, each Spark application contains a Driver, which runs the user's main method and performs various parallel operations on the cluster. Spark provides the main abstract concept, which is the elastic distributed data set (RDD), which is an element divided across clusters Can be operated ...

Posted on Sun, 23 Feb 2020 06:15:35 -0500 by ramesh_iridium

CentOS 6.8 build hadoop cluster

CentOS 6.8 build hadoop cluster 1. Prepare a clean CentOS 6.8 virtual machine 2. Turn off the firewall 1. Temporarily close the firewall service iptables stop 2. Turn off firewall self startup chkconfig iptables off 3. View firewall status service iptables status 3. Set static ip vim /etc/sys ...

Posted on Thu, 20 Feb 2020 02:33:01 -0500 by igorek

Fast learning - Hive enterprise level tuning

Chapter 9 enterprise level optimization 9.1 Fetch Fetch refers to the fact that some queries in Hive can be queried without MapReduce. For example: SELECT * FROM employees; in this case, Hive can simply read the files in the storage directory corresponding to the employee, and then output the query ...

Posted on Mon, 17 Feb 2020 22:19:15 -0500 by lazytiger

Sparkstreaming \ updatestatebykey state calculation

Catalog 1, Theoretical basis 2, Code test wordCount 1, code 2. Test data 3. Results display 1, Theoretical basis 1. In flow computing, there is usually a need for state computing, that is, the current computing results not only depend on the current received data, but also need to merge the p ...

Posted on Sun, 16 Feb 2020 01:04:55 -0500 by godwisam

Sqoop of big data technology

Sqoop of big data technology Chapter 1 Introduction to Sqoop Sqoop is an open source tool, mainly used in Hadoop(Hive) and traditional databases (mysql, postgresql )For data transfer, you can import data from a relational database (such as mysql, Oracle, Postgres, etc.) into HDFS of Hadoop, or i ...

Posted on Wed, 12 Feb 2020 23:54:15 -0500 by vickie

RabbitMQ explains the concept of message communication

1. RabbitMQ message communication architecture When it comes to message communication, we may first think of email, QQ, wechat, SMS and other communication methods. These communication methods have sender, receiver and a container for storing offline messages. But these communication modes are diffe ...

Posted on Wed, 12 Feb 2020 22:10:25 -0500 by james_holden

Hadoop? Hdfs2. X? High availability building

Architecture specification HDFS 2.x HA HDFS High Availability Using the Quorum Journal Manager Set up instructions virtual machine NN-1 NN-2 DN ZK ZKFC JNN node01 * * * node02 * * * * * node03 * * * node04 * * Steps to build Official document: https://hadoop.apache.org/d ...

Posted on Wed, 05 Feb 2020 08:32:28 -0500 by ntroycondo