Research on storm source code analysis

2021SC@SDUSC

bolt source code analysis (IV)

2021SC@SDUSC

This article mainly introduces the Bolt output collector. The messages processed by Bolt are sent through the output collector, and the output collectors used by different types of Bolt are also different.

In the previous article, several bolt ports were introduced. Their output collectors are as follows:
IRichBolt:
It uses the OutputCollector output collector, which implements the IOutputCollector interface and is actually a proxy class.
IBasicBolt:
It uses the basic output collector output collector, which is actually the output collector
The encapsulation class implements the IBasicOutputCollector interface.
IBatchBolt:
It uses the BatchOutputCollector output collector, which is a virtual base class provided by Storm
It provides its default implementation class BatchOutputCollectorImpl, which actually implements message sending by encapsulating the OutputCollector class.

IOutputCollector.java

public interface IOutputCollector extends IErrorReporter {
    /**
     * Returns the task ids that received the tuples.
     */
    List<Integer> emit(String streamId, Collection<Tuple> anchors, List<Object> tuple);

    void emitDirect(int taskId, String streamId, Collection<Tuple> anchors, List<Object> tuple);

    void ack(Tuple input);

    void fail(Tuple input);

    void resetTimeout(Tuple input);

    void flush();
}

The interface IErrorReporter defines the reportError method. Its input is a Throwable object. Users can handle exceptions in this method. The interface IOutputCollector extends the interface IErrorReporter and defines some of the above basic methods.

emit method:
It is used to send data outward. Its return value is the Taskld collection of all sending targets of the message. The meaning of its input parameters is as follows:
streamld: the stream to which the message will be output.
anchors: the tag of the output message, which usually represents which messages the message is generated from. It is mainly used for the Ack system of the message.
tuple: the message to be output is a list of objects.

emitDirect method:
The input list is similar to the emit method. The main difference is that the messages sent by emitDirect are only
Only the specified Task can be received. This method requires that the stream corresponding to streamld must be defined as a direct stream, and the Task at the receiving end must receive messages by direct grouping, otherwise an exception will be thrown. If no downstream node receives the message, such message is not actually sent.

fail and ack methods:
Used to indicate whether the message was successfully processed.

Storm provides the default implementation class OutputCollector of the IOutputCollector interface, which is actually a proxy. It contains a working instance of IOutputCollector, which is defined in Clojure code. OutputCollector is mainly used to send data from IRichBolt. In the implementation of OutputCollector, all operations are done by proxy objects.

IBasicOutputCollector.java

package org.apache.storm.topology;

import java.util.List;
import org.apache.storm.task.IErrorReporter;
import org.apache.storm.tuple.Tuple;

public interface IBasicOutputCollector extends IErrorReporter {
    List<Integer> emit(String streamId, List<Object> tuple);

    void emitDirect(int taskId, String streamId, List<Object> tuple);

    void resetTimeout(Tuple tuple);
}



If ibasic bolt is used, the storm framework will automatically help users with Ack, Fail and Anchor operations.

BatchOutputCollector.java

public abstract class BatchOutputCollector {

    public List<Integer> emit(List<Object> tuple) {
        return emit(Utils.DEFAULT_STREAM_ID, tuple);
    }

    public abstract List<Integer> emit(String streamId, List<Object> tuple);

    public void emitDirect(int taskId, List<Object> tuple) {
        emitDirect(taskId, Utils.DEFAULT_STREAM_ID, tuple);
    }

    public abstract void emitDirect(int taskId, String streamId, List<Object> tuple);

    public abstract void flush();

    public abstract void reportError(Throwable error);
}

BatchOutputCollector is the output collector used for batch processing of data in storm. Its method is basically consistent with the interface method defined in IBasicOutputCollector. Storm provides the default implementation class BatchOutputCollector lmpl of BatchOutputCollector. It is actually a proxy class that encapsulates the OutputCollector variable. All methods are implemented by calling the OutputCollector method.

By understanding the bolt output collector, you can better understand the bolt interface. To sum up:

IRichBolt: the interface most commonly used in Storm to define Topology components. It is very flexible. Users can realize various control logic through it, and can control when to perform Ack, Fail and Anchor operations.

IBasicBolt: Topology component interface provided in Storm to define simple logic. For this Bolt, Storm has built-in mechanisms to implement Ack, Fail and Anchor. It is also relatively simple for users to implement their own Bolt Based on it. However, its use is limited. All messages derived from a received message must be sent in one execution (or the message needs to be cached and numbered), otherwise the built-in Ack mechanism will not ensure the normal operation of bolt. Therefore, users should avoid using this type of bolt for operations such as aggregation or connection.

IBatchBolt: it is an interface provided by Storm to process batch data. At present, it is only used in transaction Topology. It is the basis for Storm to implement transaction Topology

Tags: Big Data storm

Posted on Thu, 25 Nov 2021 13:38:52 -0500 by delhiris