Flink independent cluster deployment and HA deployment

Scene description 172.19.9.202 master node JobManager master / slave 172.19.9.201 slave node TaskManager master / slave 172.19.9.203 slave node TaskManager master / slave 1, SSH master node and slave node settings should be unified ssh-keygen -t rsa -P "" Do not set password cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys ...

Posted on Thu, 02 Dec 2021 16:30:34 -0500 by angelssin

flink real-time data warehouse

Common sense of e-commerce Since this project is based on e-commerce data, here is a simple popularization of some common sense of e-commerce SKU and SPU SKU: a silver, 128G memory iPhoneX that supports China Unicom Network SPU: iPhoneX Tm_id: brand Id apple, including IPHONE, headset, mac, etc What is the difference between an order ta ...

Posted on Wed, 01 Dec 2021 16:23:01 -0500 by Xil3

Blocking and waking up, waiting for the stage of the queue

1, Foreword In the previous article, we introduced the lock method and unlock method in the AQS source code. These two methods are mainly used to solve the problem of mutual exclusion in concurrency. In this article, we mainly introduce the await method, signal method and signalAll method used to solve the problem of thread synchronization ...

Posted on Sun, 28 Nov 2021 08:51:46 -0500 by Jurik

Memory overflow caused by Spark reading Snappy compressed files on HDFS

There are some files growing every day on HDFS. At present, Snappy compression is used. Suddenly, one day, OOM 1. Reasons: Because snappy cannot split slices, a file will be read by a task. After reading and decompressing, the data will expand many times. If the number of files is too large and your parallelism is very large, it will lead to ...

Posted on Fri, 19 Nov 2021 01:54:04 -0500 by tstout2

Spark phase summary

Kafka kafka consumption data At the same time, the data in kafka can only be consumed by one consumer under one consumer group. kafka consumers are grouped when they consume data. The consumption of different groups is not affected. For the consumption in the same group, it should be noted that if there are 3 partitions and 3 consumers, t ...

Posted on Wed, 17 Nov 2021 11:07:48 -0500 by Cantaloupe

Spark learning road 3 - the core of spark - Advanced RDD

Spark learning road 3 - the core of spark - Advanced RDD 1, Spark optimization 1.1 description of common parameters ## driver memory size. Generally, 4g is enough when there is no broadcast variable. If there is a broadcast variable, 6G, 8G, 12G, etc. can be set as appropriate --driver-memory 4g ## The memory of each executor is usually ...

Posted on Tue, 02 Nov 2021 11:31:40 -0400 by marshdabeachy

spark source code analysis (based on the yarn cluster pattern) - talk about RDD and dependency

We know that RDD is a particularly important concept in spark. It can be said that all logic of spark needs to rely on RDD. In this article, we briefly talk about RDD in spark. The definition of RDD in spark is as follows: abstract class RDD[T: ClassTag]( @transient private var _sc: SparkContext, @transient private var deps: Seq[Depend ...

Posted on Tue, 02 Nov 2021 04:05:18 -0400 by Elle0000

Pattern Matching in Scala [Simple Pattern Matching, Match Type, Guard, Match Sample Tight, Match Set, Pattern Matching in Variable Declarations, Match for1 Expression]

Pattern match ing There is a very powerful matching mechanism in Scala, such as: Judging Fixed ValuesType QueryQuick data acquisition Simple pattern matching A pattern match contains a series of alternatives, each starting with the keyword case, and each alternative contains a pattern and one or more expressions. Arrow symbol=>se ...

Posted on Sun, 31 Oct 2021 14:47:29 -0400 by bdlang

[Spark] [RDD] summary of notes for initial learning RDD

RDD Author: cute wolf blue sky [Bili Bili] cute wolf blue sky [blog] https://mllt.cc [blog park] Menglang blue sky blog Park WeChat official account mllt9920 [learning and communication QQ group] 238948804 catalogueRDDcharacteristicestablishCreate RDD from memoryCreate RDD from external storage1. Create local file2. Start spark shell3. Read ...

Posted on Sat, 30 Oct 2021 18:55:43 -0400 by [Demonoid]

Record troubleshooting after an hbase outage

hbase downtime troubleshooting The reason is that when using spark to write database data to hbase, it is found that during normal operation, three spark tasks are suddenly suspended. By checking the log, it is found that the program card is in the following position: 2021-10-18 18:23:58,158 INFO jdbc.Utils: Supplied authorities: 192.168.xx.x ...

Posted on Mon, 18 Oct 2021 22:22:13 -0400 by Vibralux