4. Principle and use of spark--sparkSQL

[TOC] 1. Overview of spark SQL 1.1 What is spark SQL Spark SQL is a module Spark uses to process structured data. It provides a programming abstraction called DataFrame and functions as a distributed SQL query engine.Similar to what hive does. Features of 1.2 spark SQL 1. Easy to integrate: When you installed Spark, it was already integrated.N ...

Posted on Sat, 16 Nov 2019 01:24:56 -0500 by HokieTracks

5. Principles and uses of spark--spark streaming

1. Overview of spark-streaming 1.1 Common real-time computing engines Real-time computing engines, also known as streaming computing engines, currently have three commonly used:1. Apache Storm: True Streaming2. Spark Streaming: Strictly speaking, it's not really streaming (real-time computing)Processing continuous streaming data as discrete RD ...

Posted on Sat, 16 Nov 2019 01:20:27 -0500 by Snart

spark machine learning: r/python implementation of decision tree and computing node

spark machine learning decision tree --------Only for personal learning knowledge arrangement and R language / python code arrangement 1. Preface The project uses the decision tree in spark environment, and uses the functions in ml of r and python to learn python sklearn package when returning. ml is not convenient for draw ...

Posted on Tue, 12 Nov 2019 15:04:59 -0500 by InternetX

Spark Core Knowledge Points Review-1

Day1111 Spark Task Scheduling Several key components of Spark Spark Core Concepts and characteristics of RDD Two types of RDD generation Two types of RDD operators Operator Practice partition RDD Dependency DAG: Directed Acyclic Graph Task Submission cache checkPoint Custom Sorting Custom Partiti ...

Posted on Mon, 11 Nov 2019 22:46:13 -0500 by 2oMst

Two ways to convert Spark RDD to DataFrame

Spark SQL supports two ways to convert an existing RDD to a DataFrame.The first method uses reflection to infer the RDD schema, create a DataSet, and then convert it to a DataFrame. This reflection based approach is simple, but only if you know the schema type of RDD when you write your Spark application.The second approach is to use the Struc ...

Posted on Sun, 03 Nov 2019 04:10:18 -0500 by truCido

flume+springboot+kafka+sparkStream integration

Next chapter flume+springboot+kafka integration In this paper, sparkStream is also integrated. As kafka's consumer, sparkStream accepts kafka's data and realizes real-time calculation of log error and warning data. (1) the environment is the environment in the previous article. Only one sparkStream ...

Posted on Tue, 29 Oct 2019 17:32:40 -0400 by zushiba

Read odps table data

Preface This is my first time to write a blog. During this period of work, I will learn new things almost every day. As an ordinary person, I also feel the general memory of myself, so I want to record the technical points I encounter in my daily work through the way of blog, and share them when I am ...

Posted on Tue, 29 Oct 2019 12:59:38 -0400 by fahrvergnuugen

Create connection pool with BoneCP and write sparkstreaming data to MySQL

Get ready First, we go to the maven repository to find the dependency of BoneCP: Add to pom.xml My kafka version is 0.10 spark version is 2.4.2 because it's experimental, so I don't pack it in linux and run it directly on IDEA. Start-up First, I set up a new table Kafka? Test? TBL in the G6 database ...

Posted on Mon, 28 Oct 2019 14:27:34 -0400 by Mr P!nk

Spark JDBC read database partition

spark reads data from the database through jdbc. If the data is too large, it must be partitioned. Otherwise, it runs slowly. The number of partitions can be seen from webui. The number of partitions is the number of tasks. If some tasks are completed quickly and some tasks are completed slowly after pa ...

Posted on Tue, 22 Oct 2019 15:04:42 -0400 by Sven70

Big data day 33 - spark Java correlation operator

Big data day 33 - spark Java related operator practice Transformations operator package com.cs.java.spark; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; import org.a ...

Posted on Sun, 20 Oct 2019 14:42:09 -0400 by TCovert