Common sense of e-commerce
Since this project is based on e-commerce data, here is a simple popularization of some common sense of e-commerce
SKU and SPU
SKU: a silver, 128G memory iPhoneX that supports China Unicom Network SPU: iPhoneX Tm_id: brand Id apple, including IPHONE, headset, mac, etc
What is the difference between an order ta ...
In the previous article, we introduced the lock method and unlock method in the AQS source code. These two methods are mainly used to solve the problem of mutual exclusion in concurrency. In this article, we mainly introduce the await method, signal method and signalAll method used to solve the problem of thread synchronization ...
Posted on Sun, 28 Nov 2021 08:51:46 -0500 by Jurik
There are some files growing every day on HDFS. At present, Snappy compression is used. Suddenly, one day, OOM
Because snappy cannot split slices, a file will be read by a task. After reading and decompressing, the data will expand many times. If the number of files is too large and your parallelism is very large, it will lead to ...
Posted on Fri, 19 Nov 2021 01:54:04 -0500 by tstout2
kafka consumption data
At the same time, the data in kafka can only be consumed by one consumer under one consumer group.
kafka consumers are grouped when they consume data. The consumption of different groups is not affected. For the consumption in the same group, it should be noted that if there are 3 partitions and 3 consumers, t ...
Posted on Wed, 17 Nov 2021 11:07:48 -0500 by Cantaloupe
Spark learning road 3 - the core of spark - Advanced RDD
1, Spark optimization
1.1 description of common parameters
## driver memory size. Generally, 4g is enough when there is no broadcast variable. If there is a broadcast variable, 6G, 8G, 12G, etc. can be set as appropriate
## The memory of each executor is usually ...
Posted on Tue, 02 Nov 2021 11:31:40 -0400 by marshdabeachy
We know that RDD is a particularly important concept in spark. It can be said that all logic of spark needs to rely on RDD. In this article, we briefly talk about RDD in spark. The definition of RDD in spark is as follows:
abstract class RDD[T: ClassTag](
@transient private var _sc: SparkContext,
@transient private var deps: Seq[Depend ...
Posted on Tue, 02 Nov 2021 04:05:18 -0400 by Elle0000
Pattern match ing
There is a very powerful matching mechanism in Scala, such as:
Judging Fixed ValuesType QueryQuick data acquisition
Simple pattern matching
A pattern match contains a series of alternatives, each starting with the keyword case, and each alternative contains a pattern and one or more expressions. Arrow symbol=>se ...
Posted on Sun, 31 Oct 2021 14:47:29 -0400 by bdlang
Author: cute wolf blue sky
[Bili Bili] cute wolf blue sky
[blog park] Menglang blue sky blog Park
WeChat official account mllt9920
[learning and communication QQ group] 238948804
catalogueRDDcharacteristicestablishCreate RDD from memoryCreate RDD from external storage1. Create local file2. Start spark shell3. Read ...
Posted on Sat, 30 Oct 2021 18:55:43 -0400 by [Demonoid]
hbase downtime troubleshooting
The reason is that when using spark to write database data to hbase, it is found that during normal operation, three spark tasks are suddenly suspended. By checking the log, it is found that the program card is in the following position:
2021-10-18 18:23:58,158 INFO jdbc.Utils: Supplied authorities: 192.168.xx.x ...
Posted on Mon, 18 Oct 2021 22:22:13 -0400 by Vibralux