See how FusionInsight Spark supports the multi instance feature of JDBC server

Absrtact: the HA scheme with multi master instance mode can not only avoid the problem of service interruption during active / standby switching and realize no or less service interruption, but also improve the concurrency ability by horizontally expanding the cluster.

This article is shared from Huawei cloud community< FusionInsight Spark supports the multi instance feature of JDBC server >, author: a walnut.

Based on the existing JDBC server in the community, the multi master instance mode is adopted to realize its high availability scheme. Multiple JDBC server services can coexist in the cluster at the same time, and any one of them can be randomly connected through the client for business operations. Even if one or more JDBC server services in the cluster stop working, it does not affect users to connect to other normal JDBC server services through the same client interface.

Compared with the HA scheme of active standby mode, the advantages of multi active instance mode are mainly reflected in the improvement of the following two scenarios.

  • In the active / standby mode, when the active / standby switch occurs, the service will be unavailable for a period of time, which cannot be controlled by the JDBC server, depending on the resources of the Yan service.
  • Spark provides services through Thrift JDBC similar to HiveServer2, and users access it through Beeline and JDBC interfaces. Therefore, the processing capacity of the JDBC Server cluster depends on the single point capability of the main Server, and the scalability is not enough.

The HA scheme with multi master instance mode can not only avoid the problem of service interruption during active / standby switching and realize no or less service interruption, but also improve the concurrency ability by horizontally expanding the cluster.

Implementation scheme

The HA scheme principle of multi master instance mode is shown in the following figure.

1. When the JDBC server starts, it registers its own message with ZooKeeper and writes the node in the specified directory. The node contains the IP, port, version number, serial number and other information corresponding to the instance (multi node information is separated by commas).

Examples are as follows:

[serverUri=192.168.169.84:22550;version=8.1.2;sequence=0000001244,serverUri=192.168.195.232:22550 ;version=8.1.2;sequence=0000001242,serverUri=192.168.81.37:22550 ;version=8.1.2;sequence=0000001243]

2. When the client connects to the JDBC server, it needs to specify the Namespace, that is, the JDBC server instance in which directory ZooKeeper is accessed. When connecting, an instance connection will be randomly selected from the Namespace. See the introduction to URL connection for details.

3. After the client successfully connects to the JDBC server service, it sends SQL statements to the JDBC server service.

4. After the JDBC server service executes the SQL statement sent by the client, it returns the result to the client.

In the HA scheme, each JDBC server service (i.e. instance) is independent and equivalent. When one instance is upgrading or business is interrupted, other instances can also accept the connection request of the client.

The multi master instance scheme follows the following rules:

  • When an instance exits abnormally, other instances will not take over the session on this instance or the business running on this instance.
  • When the JDBC server process stops, delete the corresponding node on the ZooKeeper.
  • Because the strategy of selecting the server by the client is random, the random distribution of sessions may be uneven, which may lead to load imbalance between instances.
  • After the instance enters the maintenance mode (that is, it will not accept new client connections after entering this mode), the business still running on this instance may fail when the service withdrawal timeout is reached.

URL connection introduction

Multi master instance mode

The client in multi master instance mode reads the contents in the ZooKeeper node and connects to the corresponding JDBC server service. The connection string is:

  • In safe mode:
    • The JDBC URL under Kinit authentication mode is as follows:
jdbc:hive2://<zkNode1_ IP>:<zkNode1_ Port>,<zkNode2_ IP>:<zkNode2_ Port>,<zkNode3_ IP>:<zkNode3_ Port>/; serviceDiscoveryMode=zooKeeper; zooKeeperNamespace=sparkthriftserver2x; saslQop=auth-conf; auth=KERBEROS; Principal = spark2x / Hadoop. < system domain name > @ < system domain name >;

explain:

      • Where "< zknode_ip >: < zknode_port >" is the URL of ZooKeeper, and multiple URLs are separated by commas.

For example: "192.168.81.37:24002192.168.195.232:24002192.168.169.84:24002".

      • Where "sparkthriftserver2x" is the directory on ZooKeeper, indicating that the client randomly selects a JDBC server instance from this directory to connect.

Example: when connecting through Beeline client in safe mode, execute the following command:

sh CLIENT_HOME/spark/bin/beeline -u "jdbc:hive2://<zkNode1_ IP>:<zkNode1_ Port>,<zkNode2_ IP>:<zkNode2_ Port>,<zkNode3_ IP>:<zkNode3_ Port>/; serviceDiscoveryMode=zooKeeper; zooKeeperNamespace=sparkthriftserver2x; saslQop=auth-conf; auth=KERBEROS; Principal = spark2x / Hadoop. < system domain name > @ < system domain name >; "
    • The JDBC URL under Keytab authentication mode is as follows:
jdbc:hive2://< zknode1_ip >: < zknode1_port >, < zknode2_ip >: < zknode2_port >, < zknode3_ip >: < zknode3_port > /; servicediscoverymode = zookeeper; zookeepernamespace = sparkthriftserver2x; saslqop = auth conf; auth = Kerberos; principal = spark2x / Hadoop. < system domain name > @ < system domain name >; user. Principal = < principal_name >; user. KeyTab = < path_keytab >

Where < principal_name > represents the principal of the Kerberos user used by the user, such as "test @ < system domain name >". < path_to_keytab > represents the path of the KeyTab file corresponding to < principal_name >, such as "/ opt/auth/test/user.keytab".

  • In normal mode:
jdbc:hive2://<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3_IP>:<zkNode3_Port>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=sparkthriftserver2x;

Example: when connecting through Beeline client in normal mode, execute the following command:

sh CLIENT_HOME/spark/bin/beeline -u "jdbc:hive2://<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3_IP>:<zkNode3_Port>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=sparkthriftserver2x;"

Non multi master instance mode

The client of non multi primary instance mode connects to a specified JDBC server node. Compared with the connection string of multi primary instance mode, the parameter items "serviceDiscoveryMode" and "zooKeeperNamespace" about Zookeeper are removed.

Example: execute the following command when connecting to non primary instance mode through Beeline client in safe mode:

sh CLIENT_HOME/spark/bin/beeline -u "jdbc:hive2://< server_ip >: < server_port > /; user. Principal = spark2x / Hadoop. < system domain name > @ < system domain name >; saslqop = auth conf; auth = Kerberos; principal = spark2x / Hadoop. < system domain name > @ < system domain name >“

explain

  • Where "< server_ip >: < server_port >" is the URL of the specified JDBC server node.
  • "CLIENT_HOME" refers to the client path.

Compared with the JDBC server interface of non multi primary instance mode and multi primary instance mode, the other usage methods are the same except for different connection modes. Since spark JDBC server is another implementation of hiveserver2 in hive, please refer to hive's official website for specific usage: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients.

 

Click focus to learn about Huawei cloud's new technologies for the first time~

Tags: Java hive Zookeeper apache spark

Posted on Wed, 01 Dec 2021 12:47:39 -0500 by lurius