1, Flume installation and deployment
1.1 installation address
1. Flume official website addresshttp://flume.apache.org/
2. Download addresshttp://archive.apache.org/dist/flume/
3. Document addresshttp://flume.apache.org/FlumeUserGuide.html
1.2 installation and deployment
1. Upload apache-flume-1 ...
Posted on Wed, 26 Feb 2020 01:02:44 -0500 by john_zakaria
Build a simple cluster version based on Hadoop stand-alone version
1. Clone virtual machine
2. Start the virtual machine and change the static ip
3. Modify ip mapping in hosts file
4. Modify hostname
5. Modify hadoop configuration file
5.1 core site.xml modification
Posted on Tue, 25 Feb 2020 22:40:06 -0500 by stargate03
Download and unzip
Change configuration sqoop-env.sh
Configure environment variables
Copy mysql driver
View sqoop version
Test with mysql
operating syste ...
Posted on Tue, 25 Feb 2020 10:36:44 -0500 by FluxNYC
Generally speaking, each Spark application contains a Driver, which runs the user's main method and performs various parallel operations on the cluster.
Spark provides the main abstract concept, which is the elastic distributed data set (RDD), which is an element divided across clusters
Can be operated ...
Posted on Sun, 23 Feb 2020 06:15:35 -0500 by ramesh_iridium
CentOS 6.8 build hadoop cluster
1. Prepare a clean CentOS 6.8 virtual machine
2. Turn off the firewall
1. Temporarily close the firewall
service iptables stop
2. Turn off firewall self startup
chkconfig iptables off
3. View firewall status
service iptables status
3. Set static ip
vim /etc/sys ...
Posted on Thu, 20 Feb 2020 02:33:01 -0500 by igorek
Chapter 9 enterprise level optimization
Fetch refers to the fact that some queries in Hive can be queried without MapReduce. For example: SELECT * FROM employees; in this case, Hive can simply read the files in the storage directory corresponding to the employee, and then output the query ...
Posted on Mon, 17 Feb 2020 22:19:15 -0500 by lazytiger
1, Theoretical basis
2, Code test wordCount
2. Test data
3. Results display
1, Theoretical basis
1. In flow computing, there is usually a need for state computing, that is, the current computing results not only depend on the current received data, but also need to merge the p ...
Posted on Sun, 16 Feb 2020 01:04:55 -0500 by godwisam
Sqoop of big data technology
Chapter 1 Introduction to Sqoop
Sqoop is an open source tool, mainly used in Hadoop(Hive) and traditional databases (mysql, postgresql )For data transfer, you can import data from a relational database (such as mysql, Oracle, Postgres, etc.) into HDFS of Hadoop, or i ...
Posted on Wed, 12 Feb 2020 23:54:15 -0500 by vickie
1. RabbitMQ message communication architecture
When it comes to message communication, we may first think of email, QQ, wechat, SMS and other communication methods. These communication methods have sender, receiver and a container for storing offline messages. But these communication modes are diffe ...
Posted on Wed, 12 Feb 2020 22:10:25 -0500 by james_holden
HDFS 2.x HA
HDFS High Availability Using the Quorum Journal Manager
Set up instructions
Steps to build
Official document: https://hadoop.apache.org/d ...
Posted on Wed, 05 Feb 2020 08:32:28 -0500 by ntroycondo