ignite 2.11.0 node discovery principle and source code analysis

Introduction to node discovery

The main goal of the discovery mechanism is to create the topology of Ignite nodes and build and maintain a consistent memory view on each node. For example, this view contains the number and order of nodes in the cluster.

The discovery mechanism is represented by the DiscoverySpi interface, and the default implementation is TcpDiscoverySpi. Other implementations, such as ZookeeperDiscoverySpi, TcpDiscoverySpi, which this article focuses on.

The topology is defined by a specific DiscoverySpi implementation. For example, TcpDiscoverySpi defines a ring topology.

When describing the cluster topology, we are talking about logical layouts that exist only at the "discovery" level. For example, when querying data residing in the cache, the cluster may use a different topology than that described in this article.

Ring topology

TcpDiscoverySpi organizes all server nodes in the cluster into a ring structure, where each node can only send discovery messages to a single node (called "neighbor"). The client node is located outside the ring and connected to a server. This code logic is contained in the ServerImpl class of the server node and the ClientImpl class of the client node, respectively.

Node joining process

summary

When a new node starts, it attempts to find an existing cluster by probing the address list provided by TcpDiscoveryIpFinder. If all addresses are unavailable, the node considers itself the only node, forms a cluster from itself, and becomes the coordinator of the cluster. Otherwise, the following node joining process will be performed.

The node joining process includes the following stages:

  1. The joining node sends a TcpDiscoveryJoinRequestMessage to the random node in the cluster, and the node will forward the message to the coordinator.
  2. The coordinator places the new node between the last node and itself, and propagates the topology change message by sending TcpDiscoveryNodeAddedMessage in the ring.
  3. After all members of the cluster receive the TcpDiscoveryNodeAddedMessage, send the TcpDiscoveryNodeAddFinishedMessage to complete the change.

Create connection

The connection code trace created by the client is as follows:
Ignite ignite = Ignition.start(cfg);// Client startup code -- > ignitionex $ignitenamedinstance grid0.start (line 2112) - >... -- > the clientimpl spistart method starts the IgniteSpiThread thread

IgniteSpiThread thread, run:58, IgniteSpiThread, body(); Methods – >... – > jointopology: 629, clientimpl – > sendjoinrequest: 734, clientimpl – >... – > tcpdiscoveryspi opensocket() method, openSocket:1592, TcpDiscoverySpi

In the opensocket method, create a socket connection and send 0x00004747 to the server

    /** Network packet header. */
    public static final byte[] IGNITE_HEADER = intToBytes(0x00004747);

    protected Socket openSocket(Socket sock, InetSocketAddress remAddr, IgniteSpiOperationTimeoutHelper timeoutHelper)
        throws IOException, IgniteSpiOperationTimeoutException {
            ...
            writeToSocket(sock, null, U.IGNITE_HEADER, timeoutHelper.nextTimeoutChunk(sockTimeout));
            ...

Wireshark packet capture is as follows:

TcpDiscoveryJoinRequestMessage

The node starts the node joining process by calling ServerImpl#joinTopology (for server nodes) or ClientImpl#joinTopology (for the client node), and then calls TcpDiscoverySpi#collectExchangeData to collect all the necessary.
discovery data (for example, for cache configuration from GridCacheProcessor, see different GridComponent# collectJoiningNodeData implementations). The data is packaged into a join request (TcpDiscoveryJoinRequestMessage) and sent to the coordinator.

Take the client ClientImpl as a code example:
ClientImpl class, the sendJoinRequest method is called in the joinTopology method, as follows
(code path: run:58, IgniteSpiThread – > Body: 317, clientimpl – >... – > tryjoin: 2108, clientimpl $messageworker – >... – > jointopology: 629, clientimpl – > sendjoinrequest, clientimpl)

    @Nullable private T3<SocketStream, Integer, Boolean> sendJoinRequest(boolean recon,
        InetSocketAddress addr) {
                ...
                // Send TcpDiscoveryHandshakeRequest message to the server
                TcpDiscoveryHandshakeRequest req = new TcpDiscoveryHandshakeRequest(locNodeId);
                req.client(true);
                spi.writeToSocket(sock, req, timeoutHelper.nextTimeoutChunk(spi.getSocketTimeout()));
                  ...       
                    // collectExchangeData to collect all necessary discovery data
                    if (discoveryData == null)
                        discoveryData = spi.collectExchangeData(new DiscoveryDataPacket(getLocalNodeId()));
                    // The data is packaged into the join request and sent to the coordinator
                    TcpDiscoveryJoinRequestMessage joinReqMsg = new TcpDiscoveryJoinRequestMessage(node, discoveryData);

The structure of discovery data is as follows:

When the coordinator receives the request, it will validate the message and generate TcpDiscoveryNodeAddedMessage. If the validation is successful (see ServerImpl.RingMessageWorker#processJoinRequestMessage). This message is then sent through the ring.

Code example of server ClientImpl processing TcpDiscoveryJoinRequestMessage:

        @Override protected void body() throws InterruptedException {
                       ...
                        else if (msg instanceof TcpDiscoveryJoinRequestMessage) {
                            TcpDiscoveryJoinRequestMessage req = (TcpDiscoveryJoinRequestMessage)msg;

                            if (!req.responded()) {
                                boolean ok = processJoinRequestMessage(req, clientMsgWrk);

                                if (clientMsgWrk != null && ok)
                                    continue;
                                else
                                    // Direct join request - no need to handle this socket anymore.
                                    break;
                            }
                        }
                        ...

Code example for generating TcpDiscoveryNodeAddedMessage:

        private void processJoinRequestMessage(final TcpDiscoveryJoinRequestMessage msg) {
                ...
                //Generate TcpDiscoveryNodeAddedMessage
                TcpDiscoveryNodeAddedMessage nodeAddedMsg = new TcpDiscoveryNodeAddedMessage(locNodeId,
                    node, data, spi.gridStartTime);

                nodeAddedMsg = tracing.messages().branch(nodeAddedMsg, msg);

                nodeAddedMsg.client(msg.client());

                processNodeAddedMessage(nodeAddedMsg);

                tracing.messages().finishProcessing(nodeAddedMsg);
                ...

TcpDiscoveryNodeAddedMessage

When processing TcpDiscoveryNodeAddedMessage, each node in the cluster applies the discovery data of the joining node to the component, collects its local discovery data, and adds it to the message (see serverimpl.ringmessageworker#processnodeadddedmessage for details). Then, the message is further propagated to the ring by calling serverimpl. Ringmessageworker #sendmessageacrosring.
When TcpDiscoveryNodeAddedMessage completes the whole cycle and reaches the coordinator again, it will be consumed by the coordinator and the coordinator issues the TcpDiscoveryNodeAddFinishedMessage message.

TcpDiscoveryNodeAddedMessage is also passed to the joining node, which receives the message after all other nodes have processed it.

Code example:

        private void processNodeAddedMessage(TcpDiscoveryNodeAddedMessage msg) {
       				 ...
                    DiscoveryDataPacket dataPacket = msg.gridDiscoveryData();
      				  ...          
                    if (dataPacket.hasJoiningNodeData()) {
                        if (spiState == CONNECTED) {
                            // Node already connected to the cluster can apply joining nodes' disco data immediately
                            //The discovery data added to the node is applied to the component
                            spi.onExchange(dataPacket, U.resolveClassLoader(spi.ignite().configuration()));

                            if (!node.isDaemon())
                                spi.collectExchangeData(dataPacket);
                        } 
 					...

TcpDiscoveryNodeAddFinishedMessage

TcpDiscoveryNodeAddFinishedMessage completes the node joining process. When this message is received, each node triggers node_ Join event to notify the discovery manager about the newly joined node.

NodeAddFinished and additional join requests
If the joining node does not receive the TcpDiscoveryNodeAddFinishedMessage in time, it will send an additional joining request. This time is defined by TcpDiscoverySpi#networkTimeout. The default value is 5 seconds (TcpDiscoverySpi#DFLT_NETWORK_TIMEOUT).

        private void processNodeAddFinishedMessage(TcpDiscoveryNodeAddFinishedMessage msg) {
                ...
                if (state == CONNECTED) {
                //Trigger NODE_JOINED event
                    boolean notified = notifyDiscovery(EVT_NODE_JOINED, topVer, node, msg.spanContainer());
                ...

Tags: Operation & Maintenance network server

Posted on Thu, 28 Oct 2021 01:50:28 -0400 by Code_guy