Spark Task Scheduling
Several key components of Spark
Concepts and characteristics of RDD
Two types of RDD generation
Two types of RDD operators
DAG: Directed Acyclic Graph
Custom Partiti ...
Posted on Mon, 11 Nov 2019 22:46:13 -0500 by 2oMst
Download MySQL server from the official website (yum installation)
If wget is not available, download and install wget:
yum -y install wget
rpm -ivh mysql-community-release-el7-5.noarch.rpm
yum install m ...
Posted on Mon, 04 Nov 2019 13:46:40 -0500 by Bike Racer
Hive built in function
The first blog ~ ~ ~ hahaha, as a learning note, if there are narrative mistakes or other better methods, I hope you can give me some advice in the comment area! Hey hey!
Query Hive built-in functions
Enter hive client:
Query Hive built-in functions:
Posted on Tue, 22 Oct 2019 15:16:03 -0400 by BrandonKahre
This paper is compiled from the IBM developer community. It mainly describes the problems of ORC and Parquet files in HDFS, and how these small files affect the reading performance of Big SQL. It also explores possible solutions to compress small files into large files using existing tools in order to improve the reading performance.
brief intr ...
Posted on Tue, 08 Oct 2019 19:41:06 -0400 by kincet7
The data analysis process of this project is implemented in hadoop cluster, mainly using the tool of hive data warehouse. Therefore, the data collected and pretreated need to be loaded into the hive data warehouse for subsequent mining and analysis.
1. Create raw data tables
Establishment of Post Sour ...
Posted on Sun, 06 Oct 2019 01:40:02 -0400 by inni
Execution and testing process:
1. Create lzo related tables: (validation process, negligible)
create external table test_lzo(
)partitioned by(`date_par` string)
ROW FORMAT SERDE
STORED AS INPUTFORMAT
Posted on Sat, 05 Oct 2019 04:13:14 -0400 by nestorvaldez
Flume - Common Errors
1 Closing file failed. Will retry again in 120 seconds.
1.1 Error Reporting Phenomenon and Solution
Detailed error reporting information is as follows:
09 Aug 2019 17:00:31,787 WARN [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter$Close ...
Posted on Thu, 03 Oct 2019 07:32:02 -0400 by LazyJones
Create a table to store new data (partitioned tables) every day.
Statistics of daily active users (daily activities) (need user's ip, user's account, user access time of the earliest url ...
Posted on Tue, 01 Oct 2019 11:25:31 -0400 by dino345
For the convenience of query and analysis, almost all tables in Hive offline warehouse are partitioned, most commonly by day. Flink writes data to HDFS through the following configuration.
BucketingSink<Object> sink = new BucketingSink<>(path);
//In this way, data cross-sky partitioning can be realized.
Posted on Sun, 29 Sep 2019 09:31:41 -0400 by silversinner
First, let's set up a database:
create database if not exists myhive1;
Use this database: use myhive1;
Delete the database table student if it exists: drop table if exists student;
Create a database table student and create a partition: create table student(id int, name string, sex string ,age int, department string) row format delimited fi ...
Posted on Mon, 23 Sep 2019 21:54:42 -0400 by flhtc