Spark is used to process large data, not afraid of big data, but afraid of data skew. When data skew occurs, spark job will run for a long time to end, and OOM will explode the storage space of an executor, resulting in the termination of the program.
A spark job is composed of several stages, which have a sequential relationshi ...
Posted on Tue, 29 Jan 2019 02:09:16 -0500 by superdan_35
1. Implementing Functions
For scenarios where data needs to be accumulated, the results of current batches are calculated, and the results of previous batches are accumulated. At this point, you need to use the updateStateByKey operator and check point to implement it.
//import com.beifeng.util.Sp ...
Posted on Tue, 29 Jan 2019 01:42:16 -0500 by RestlessThoughts
Differences between hdfs and RDBMS
mr and grid computing, volunteer computing
1. Data storage
Solving Distributed Problems
Hardware failure, data accuracy of multiple data sources
Data transmission: hard disk bandwidth
Posted on Tue, 29 Jan 2019 01:36:15 -0500 by t31os
Storm application scenario - docking Kafka 0.10.x + as a new consumer (1)
With the upgrade of Kafka version, Storm is different from the previous way in which consumers dock with Kafka 0.10.x+version. Now we record the new way to provide a reference for future scenarios when using Storm to process the new versi ...
Posted on Mon, 28 Jan 2019 18:24:15 -0500 by danelkayam
This time, we introduce sparkSQL query and relational database reading and writing in a programmable way, mainly including
Inference of Schema by Reflection
Specify Schema through StructType
Using Spark SQL Programming to Operate HiveQL
SparkSQL Reads Database Files
Spark writes to relational database
1. Programming SparkSQ ...
Posted on Mon, 28 Jan 2019 12:45:14 -0500 by shehroz
Step 1. Enter the / root directory
Step 2: Create a directory under the / root directory [jenkins-installer] with the following commands:
Step 3. Download the packages of Tomcat, Jenkins and JDK to the directory jenkins-installer:
tomcat Download Address: http://mirror.bit.edu.cn/apache/tomcat/tomcat-9/v ...
Posted on Mon, 28 Jan 2019 05:06:15 -0500 by sachavdk
Written in front
Referring to permissions, you will think of security, which is a very difficult topic. It's just a record of Shiro, not that permissions should be designed like this.
1. Shiro is a powerful and flexible open source security framework based on Apache open source.
2. Shiro provides authentication, authorization, e ...
Posted on Sun, 27 Jan 2019 22:27:15 -0500 by kpulatsu
Recently, bloggers are learning about spark, which is a good framework. The idea of distributed processing of large data sets is worth learning.
I feel that java development in the future is definitely not just SSM, but we need to learn how to use these big data tools when the data volume is getting bigger and bigger ...
Posted on Sun, 27 Jan 2019 19:39:14 -0500 by Jarl
In the actual use of spark + parquet, there are two confusing points:
We only have one parquet file (smaller than hdfs block size), but spark generates four tasks at a stage to process.
Only one of the four task s handles all the data, while the others do not.
These two issues involve how parquet spark partitions are segmented and what part o ...
Posted on Sun, 27 Jan 2019 04:27:14 -0500 by simply
As a java programmer, the mainstream framework used in the project is more or less related to spring. During the interview process, it is unavoidable to ask some questions about Spring Spring Spring Spring MVC spring boot, such as the use of design patterns, how to implement spring ioc, how to implement spring MVC and so on. Today we will expl ...
Posted on Sun, 27 Jan 2019 01:39:15 -0500 by mharju