Flink window Basics

Flink window Foundation (I)

1. Window concept:

Streaming computing is a data processing engine designed to process infinite data sets, which refers to a growing essentially infinite data set, and Window is a means to cut infinite data into finite blocks.

2. Classification of windows:

Time driven: time-based windows

**Time driven → tumbling windows: * * rolling windows have a fixed size, and there is no overlap or gap between windows.

Rolling window can cut the data flow into non overlapping windows, and each event can only belong to one window. As shown in the figure

code:

env
  .socketTextStream("CentOS", 9999)
  .flatMap(new
FlatMapFunction<String,
Tuple2<String, Long>>**()
{
      @Override
      public void flatMap(String value, Collector<Tuple2<String,
Long>> out) throws Exception {
          Arrays.stream(value.split("\\W+")).forEach(word -> out.collect(Tuple2.of(word, 1L)));
      }
  })
  .keyBy(t -> t.f0)
//The time interval can be specified by Time.milliseconds(x), Time.seconds(x), Time.minutes(x), and so on
//The object we pass to the window function is called a window allocator
.window(TumblingProcessingTimeWindows.of(Time.seconds(8)))
// Add scroll window
  .sum(1)
  .print();

**Time driven → sliding windows: * * like rolling windows, sliding windows also have a fixed length. Another parameter is called sliding step size, which is used to control the starting frequency of sliding windows. Therefore, if the sliding step size is less than the window length, the sliding windows will overlap. In this case, an element may be assigned to multiple windows.

code:

env
  .socketTextStream("CentOS", 9999)
  .flatMap(new FlatMapFunction<String,
Tuple2<String, Long>>**()
{
      @Override
      public void flatMap(String value, Collector<Tuple2<String,
Long>> out) throws Exception {
          Arrays.stream(value.split("\\W+")).forEach(word -> out.collect(Tuple2.of(word, 1L)));
      }
  })
  .keyBy(t -> t.f0)
  .window(SlidingProcessingTimeWindows.of(Time.seconds(10), Time.seconds(5)))
  //** add * * scroll window*
  .sum(1)
  .print();
env.execute();

**Time driven → session windows: * * the session window allocator will group according to the active elements. Session windows will not overlap. Compared with rolling windows and sliding windows, session windows do not have fixed opening and closing times. If the session window does not receive data for a period of time, the session window will close automatically. This period of time when no data is received is the gap of the session window. We can configure static gap or define the length of gap through a gap extractor function. When the time exceeds this gap, the current session window will be closed and the subsequent elements will be assigned to a new session window

code

static state gap

.window**(**ProcessingTimeSessionWindows.*withGap***(**Time.*seconds***(**10**)))
 dynamic gap

.window**(**ProcessingTimeSessionWindows.*withDynamicGap***(new** SessionWindowTimeGapExtractor<Tuple2<String, Long>>**() {**     @Override
     **public long** extract**(**Tuple2<String, Long> element**) {** *//** return the gap * * value, * * single * * bit MS * * * return * * element. * * F0 * * *. Length * * () * * * 1000;
     **} }))

Creation principle:

Because session windows do not have fixed opening and closing times, the creation and closing of session windows are different from rolling and sliding windows. Within Flink, a new session window will be created every time a new element is reached. If these windows are relatively small from each other, they will be merged. In order to merge, The session window operator requires merge triggers and merge window functions: ReduceFunction,AggregateFunction,ProcessWindowFunction

Time driven → global windows: the Global window allocator will assign all elements with the same key to enter the same Global window. This window mechanism is only useful when specifying a custom trigger. Otherwise, task calculation will not be performed because this window cannot handle the end point of the clustered elements

code

.window(GlobalWindows.create());

Data driven: window based on the number of elements

Data driven → scroll window:

The default CountWindow is a scrolling window. You only need to specify the window size. When the number of elements reaches the window size, the execution of the window will be triggered. When that window reaches three elements first, which window closes. It does not affect other windows

code

.countWindow(3)

Data driven → sliding window:

The function names of sliding window and rolling window are exactly the same, but two parameters need to be passed in when passing parameters, one is window_size, one is sliding_size. Sliding in the following code_ The size is set to 2, that is, it is calculated every time two data with the same key are received, and the window range of each calculation is up to 3 elements.

code

.countWindow**(**3, 2**)**

Tags: Java flink

Posted on Thu, 02 Sep 2021 04:36:42 -0400 by tlawless