Discovery of Spark Data Tilt

Spark is used to process large data, not afraid of big data, but afraid of data skew. When data skew occurs, spark job will run for a long time to end, and OOM will explode the storage space of an executor, resulting in the termination of the program. A spark job is composed of several stages, which have a sequential relationshi ...

Posted on Tue, 29 Jan 2019 02:09:16 -0500 by superdan_35

SparkStreaming (16): updateStateByKey operator

1. Implementing Functions For scenarios where data needs to be accumulated, the results of current batches are calculated, and the results of previous batches are accumulated. At this point, you need to use the updateStateByKey operator and check point to implement it. 2. code package _0809kafka //import com.beifeng.util.Sp ...

Posted on Tue, 29 Jan 2019 01:42:16 -0500 by RestlessThoughts

Preliminary introduction of hadoop: hdfs distributed storage + mr distribution calculation

Differences between hdfs and RDBMS mr and grid computing, volunteer computing 1. Data storage Disk storage Solving Distributed Problems Hardware requirement System bottleneck hdfs RAID-Cluster Hardware failure, data accuracy of multiple data sources Ordinary machine Data transmission: hard disk bandwidth RDBMS Singl ...

Posted on Tue, 29 Jan 2019 01:36:15 -0500 by t31os

Storm Docking Kafka 0.10.x + Version as a New Consumer

Storm application scenario - docking Kafka 0.10.x + as a new consumer (1) 00 background With the upgrade of Kafka version, Storm is different from the previous way in which consumers dock with Kafka 0.10.x+version. Now we record the new way to provide a reference for future scenarios when using Storm to process the new versi ...

Posted on Mon, 28 Jan 2019 18:24:15 -0500 by danelkayam

Spark Learning (7) - - Programming Spark SQL and Relational Database Reading and Writing

This time, we introduce sparkSQL query and relational database reading and writing in a programmable way, mainly including Inference of Schema by Reflection Specify Schema through StructType Using Spark SQL Programming to Operate HiveQL SparkSQL Reads Database Files Spark writes to relational database 1. Programming SparkSQ ...

Posted on Mon, 28 Jan 2019 12:45:14 -0500 by shehroz

One-click deployment of Jenkins (Centos 7.3)

Step 1. Enter the / root directory Step 2: Create a directory under the / root directory [jenkins-installer] with the following commands: mkdir jenkins-installer Step 3. Download the packages of Tomcat, Jenkins and JDK to the directory jenkins-installer: tomcat Download Address: http://mirror.bit.edu.cn/apache/tomcat/tomcat-9/v ...

Posted on Mon, 28 Jan 2019 05:06:15 -0500 by sachavdk

Shiro | Implement full version of permission validation

Written in front Referring to permissions, you will think of security, which is a very difficult topic. It's just a record of Shiro, not that permissions should be designed like this. Shiro framework 1. Shiro is a powerful and flexible open source security framework based on Apache open source. 2. Shiro provides authentication, authorization, e ...

Posted on Sun, 27 Jan 2019 22:27:15 -0500 by kpulatsu

Writing spark-WordCount examples using java and scala

Preface: Recently, bloggers are learning about spark, which is a good framework. The idea of distributed processing of large data sets is worth learning. I feel that java development in the future is definitely not just SSM, but we need to learn how to use these big data tools when the data volume is getting bigger and bigger ...

Posted on Sun, 27 Jan 2019 19:39:14 -0500 by Jarl

Spark Parquet file split

In the actual use of spark + parquet, there are two confusing points: We only have one parquet file (smaller than hdfs block size), but spark generates four tasks at a stage to process. Only one of the four task s handles all the data, while the others do not. These two issues involve how parquet spark partitions are segmented and what part o ...

Posted on Sun, 27 Jan 2019 04:27:14 -0500 by simply

Build wheels with me and write springmvc by hand

As a java programmer, the mainstream framework used in the project is more or less related to spring. During the interview process, it is unavoidable to ask some questions about Spring Spring Spring Spring MVC spring boot, such as the use of design patterns, how to implement spring ioc, how to implement spring MVC and so on. Today we will expl ...

Posted on Sun, 27 Jan 2019 01:39:15 -0500 by mharju