11-flink-1.10.1- Flink window API

catalogue

1. Window API concept

2. Window API type

  2.1 Time Window

2.2 counting window

3 window function

  four   window api combination code

1. Window API concept

  •   Half of the real streams are unbounded. How to deal with unbounded streams?
  • Unbounded streams can be segmented to obtain limited data sets for processing - that is, bounded streams for processing.
  • Window is a way to cut an infinite stream into a finite stream. It will distribute the finite stream to a bucket of finite size for processing

2. Window API type

  2.1 Time Window

    (1) Scrolling time window

  • The data is segmented according to the fixed time window length
  • The time is fixed, the window length is the same, and there is no overlap  
.window(TumblingProcessingTimeWindows.of(Time.seconds(15)))//scroll window
.timeWindow(Time.seconds(15))//Scrolling window abbreviation

    (2) Sliding time window

  • Sliding window is a more extensive form of fixed window. Sliding window is composed of fixed window length and sliding interval
  • The window length is fixed and can overlap
  • Usage scenarios, such as statistical indicators in the last 24 hours, the last 1 hour, and so on
.window(SlidingEventTimeWindows.of(Time.seconds(15),Time.seconds(10)))//sliding window
.timeWindow(Time.seconds(15),Time.seconds(10))//Sliding window abbreviation

  (3) Session window

  • Some columns of time are combined into a timeout gap of a specified length of time, that is, a new window will be generated if no data is received for a period of time
  • Characteristics, time has no effect on it  
.window(EventTimeSessionWindows.withGap(Time.seconds(20)))//Session window

2.2 counting window

    Scroll count window

.countWindow(10)//Count scroll window

  Sliding count window

.countWindow(10,5)//Count sliding window

3 window function

window function defines the calculation operation for the data collected in the window

It can be divided into two categories:

① incremental aggregation function

    Each data is calculated when it arrives, and a simple state is maintained

    ReduceFunction,Aggregation

② full window functions

    First collect all the data in the window, and then traverse all the data when calculating

  ProcessWindowFunction,WindowFunction

  four   window api combination code

  Demand, find the lowest temperature in the data received in the last 15 seconds, and what is the time stamp of the last temperature acquisition?

  ① Read bounded data cannot be output

package com.study.liucf.unbounded.window

import com.study.liucf.bean.LiucfSensorReding
import org.apache.flink.api.common.functions.ReduceFunction
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.{EventTimeSessionWindows, SlidingEventTimeWindows, TumblingEventTimeWindows, TumblingProcessingTimeWindows}
import org.apache.flink.streaming.api.windowing.time.Time

/**
 * @Author liucf
 * @Date 2021/9/22
 *      Demand, find the lowest temperature in the data received in the last 15 seconds, and what is the time stamp of the last temperature acquisition?
 */
object LiucfWindowApi {
  def main(args: Array[String]): Unit = {
    //Create a flick execution environment
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    //Read data
    val inputStream: DataStream[String] = env.readTextFile("src\\main\\resources\\sensor.txt")
    //Convert the data type string type to LiucfSensorReding and find the minimum value
    val ds = inputStream.map(r => {
      val arr = r.split(",")
      LiucfSensorReding(arr(0), arr(1).toLong, arr(2).toDouble)
    }).map(d=>(d.id,d.temperature,d.timestamp))
      .keyBy(_._1) // Grouped by sensor id
//      . window (tumblingprocessingtimewindows. Of (time. Seconds (15)) / / scroll the window
//      . window (slidingeventtimewindows. Of (time. Seconds (15), time. Seconds (10)) / / sliding window
//      . window (eventtimesessionwindows. Withgap (time. Seconds (20)) / / session window
//
//      . timeWindow(Time.seconds(15),Time.seconds(10)) / / short for sliding window
//      . countWindow(10) / / count scrolling windows
//      . countWindow(10,5) / / count sliding windows
      .timeWindow(Time.seconds(15))//Scrolling window abbreviation
//        .reduce((currentData,newData)=>{
//          (currentData._1,currentData._2.min(newData._2),currentData._3.max(currentData._3))
//        })
        .reduce(new LiucfReduceFunction())
    //Output to console
    ds.print()
    env.execute("flink window api:liucf window api test ")
  }
}

/**
 * Custom reduceFunction processing
 */
class LiucfReduceFunction extends ReduceFunction[(String,Double,Long)]{
  override def reduce(value1: (String, Double, Long), value2: (String, Double, Long)): (String, Double, Long) = {

    (value1._1,value1._2.min(value2._2),value1._3.max(value2._3))
  }
}

It can be seen that there is no output, because it takes 15 seconds for the window to close, and the bounded stream for reading file data is completed soon, so there is no output

The solution is to change to unbounded flow for data reading. I use kafka in the following code to test

In order to see the effect, I changed the window to 60 seconds and quickly generated several pieces of data into kafka

package com.study.liucf.kafka

import java.util.Properties

import org.apache.kafka.clients.producer.{Callback, KafkaProducer, ProducerRecord, RecordMetadata}
import org.apache.kafka.common.serialization.StringSerializer

object SensorProduce2 {
  def main(args: Array[String]): Unit = {
    val kafkaProp = new Properties()
    kafkaProp.put("bootstrap.servers", "192.168.109.151:9092")
    kafkaProp.put("acks", "1")
    kafkaProp.put("retries", "3")
    //kafkaProp.put("batch.size", 16384)//16k
    kafkaProp.put("key.serializer", classOf[StringSerializer].getName)
    kafkaProp.put("value.serializer", classOf[StringSerializer].getName)
    kafkaProp.put("topic","sensor_input_csv")
    val producer = new KafkaProducer[String, String](kafkaProp)
    val sensor = "sensor_1,1617505481,30.6"
    send(sensor,producer)
    producer.close()
  }


  def send(str:String,producer: KafkaProducer[String, String]): Unit ={
    val record = new ProducerRe cord[String, String]("sensor_input_csv", str )
    producer.send(record, new Callback {
      override def onCompletion(metadata: RecordMetadata, exception: Exception): Unit = {
        if (metadata != null) {
          println("Sent successfully")
        }
        if (exception != null) {
          println("Message sending failed")
        }
      }
    })
  }
}

 

It can be seen that the expected can be achieved.

Tags: Scala flink window

Posted on Fri, 24 Sep 2021 11:22:42 -0400 by willl