flink real-time data warehouse
Common sense of e-commerce Since this project is based on e-commerce data, here is a simple popularization of some commo...
Find Common Friends - Data Mining - Scala Edition
Hello, there are many language implementations on the Internet about the algorithm of "Find common friends". W...
Spark_ Correct use of checkpoint in spark and its difference from cache
1.Spark performance tuning: use of checkPoint
https://blo...
Machine learning - overview, feature extraction of data (notes)
1. The relationship among artificial intelligence, machine learning and deep learning What machine learning can do. Reco...
Spark streaming reads the database data extracted from Flume by Kafka and saves it in HBase. Hive maps HBase for query
Recently, the company is working on real-time flow processing. The specific requirements are: real-time import of releva...
RDD common operators of spark notes
hello everyone! Here are the saprk operator notes I learned during the epidemic holiday. I just spent the whole afternoo...
Machine Learning Model Training Scheme in Mass Data Scenarios
It is very difficult to train the machine learning model by single point in the process of actual processing and solving...
Analysis of Hadoop YARN ResourceManager crash caused by data limit of ZooKeeper node
This problem makes us encounter again. It happens infrequently, but once it happens, it will cause resource manager serv...
Several ways of reading and writing spark articles by HBase
1. Overview of how HBase is read and written
...
Flink's common Source and sink operations in stream processing
The source of flink on stream processing is basically the same as that on batch processing. There are four categories C...
Spark.ml -- Naive Bayes
Preface Naive Bayes classifier is a classifier with low variance and high deviation. It assumes that there is conditional independence between each f...
MLlib basic data type
MLlib uses vectors as its localized storage type, which are mainly composed of two types: sparse and dense Code: import org.apache.spark.mllib.linalg...
Scala: tuples, arrays, maps
Tuple: tuple, aggregation of values of different types.Combine a fixed number of items so that they can be passed as a whole. Unlike arrays or lists, ...
Kylin configures Spark and builds Cube
HDP version: 2.6.4.0Kylin version: 2.5.1 Machine: Three CentOS-7,8G memory In addition to MapReduce, Kylin's computing engine also has a faster S...