Spark

flink real-time data warehouse

Common sense of e-commerce Since this project is based on e-commerce data, here is a simple popularization of some commo...
16:23 1 December 2021

[Spark] [RDD] summary of notes for initial learning RDD

RDD Author: cute wolf blue sky ...
18:55 30 October 2021

Find Common Friends - Data Mining - Scala Edition

Hello, there are many language implementations on the Internet about the algorithm of "Find common friends". W...
10:58 4 July 2020

Spark_ Correct use of checkpoint in spark and its difference from cache

1.Spark performance tuning: use of checkPoint https://blo...
0:48 14 June 2020

Machine learning - overview, feature extraction of data (notes)

1. The relationship among artificial intelligence, machine learning and deep learning What machine learning can do. Reco...
0:23 12 June 2020

Spark streaming reads the database data extracted from Flume by Kafka and saves it in HBase. Hive maps HBase for query

Recently, the company is working on real-time flow processing. The specific requirements are: real-time import of releva...
0:55 10 June 2020

RDD common operators of spark notes

hello everyone! Here are the saprk operator notes I learned during the epidemic holiday. I just spent the whole afternoo...
4:02 18 May 2020

Machine Learning Model Training Scheme in Mass Data Scenarios

It is very difficult to train the machine learning model by single point in the process of actual processing and solving...
23:50 11 May 2020

Analysis of Hadoop YARN ResourceManager crash caused by data limit of ZooKeeper node

This problem makes us encounter again. It happens infrequently, but once it happens, it will cause resource manager serv...
10:38 10 May 2020

Several ways of reading and writing spark articles by HBase

1. Overview of how HBase is read and written ...
23:05 9 May 2020

Flink's common Source and sink operations in stream processing

The source of flink on stream processing is basically the same as that on batch processing. There are four categories C...
10:54 7 May 2020

Spark.ml -- Naive Bayes

Preface Naive Bayes classifier is a classifier with low variance and high deviation. It assumes that there is conditional independence between each f...
10:05 16 January 2020

MLlib basic data type

MLlib uses vectors as its localized storage type, which are mainly composed of two types: sparse and dense Code: import org.apache.spark.mllib.linalg...
4:44 3 December 2019

Scala: tuples, arrays, maps

Tuple: tuple, aggregation of values of different types.Combine a fixed number of items so that they can be passed as a whole. Unlike arrays or lists, ...
15:31 2 December 2019

Kylin configures Spark and builds Cube

HDP version: 2.6.4.0Kylin version: 2.5.1 Machine: Three CentOS-7,8G memory In addition to MapReduce, Kylin's computing engine also has a faster S...
12:08 24 September 2019