shardingsphere source code analysis -- execution engine

shardingsphere source code analysis (V) -- execution engine

Official introduction

The link is as follows:
https://shardingsphere.apache.org/document/current/cn/features/sharding/principle/rewrite/

ShardingSphere adopts a set of automatic execution engine, which is responsible for sending the real SQL after routing and rewriting safely and efficiently to the underlying data source for execution. It does not simply send SQL directly to the data source through JDBC for execution; Nor does it directly put the execution request into the thread pool for concurrent execution. It pays more attention to balancing the consumption caused by data source connection creation and memory occupation, as well as maximizing the rational use of concurrency. The goal of execution engine is to automatically balance resource control and execution efficiency.

  • Connection mode
    • Memory limit mode
      The premise of using this mode is that ShardingSphere does not limit the number of database connections consumed by one operation. If the actual SQL needs to operate on 200 tables in a database instance, create a new database connection for each table and process it concurrently through multithreading to maximize the execution efficiency. In addition, if the SQL conditions are met, stream merging is preferred to prevent memory overflow or frequent garbage collection.
    • Connection restriction mode
      The premise of using this mode is that ShardingSphere strictly controls the number of database connections consumed for one operation. If the actually executed SQL needs to operate on 200 tables in a database instance, only a unique database connection will be created and its 200 tables will be processed serially. If the fragments in an operation are scattered in different databases, multithreading is still used to handle the operations on different databases, but each operation of each library still creates only one unique database connection. In this way, you can prevent the problem caused by occupying too much database connection for one request. This mode always selects memory merge.

The memory restriction mode is applicable to OLAP operation, and can improve the system throughput by relaxing the restrictions on database connection; The connection restriction mode is applicable to OLTP operation. OLTP usually has a partition key and will be routed to a single partition. Therefore, it is a wise choice to strictly control the database connection to ensure that the online system database resources can be used by more applications.

  • Automated execution engine
    ShardingSphere initially leaves the decision of which mode to use to the user for configuration, so that developers can choose to use memory restriction mode or connection restriction mode according to the actual scenario needs of their business.
    This solution leaves the decision of the dilemma to the user, so that the user must understand the advantages and disadvantages of the two modes and make a choice according to the needs of the business scenario. This undoubtedly increases the cost of learning and using ShardingSphere, which is not the best solution.
    In order to reduce the use cost of users and the dynamic connection mode, ShardingSphere refined the idea of automatic execution engine and digested the concept of connection mode inside. Users do not need to know what the so-called memory restriction mode and connection restriction mode are, but let the execution engine automatically select the optimal execution scheme according to the current scene.
    The automatic execution engine refines the selection granularity of connection mode to each SQL operation. For each SQL request, the automatic execution engine will perform real-time calculation and trade-off according to its routing results, and independently adopt the appropriate connection mode to achieve the optimal balance of resource control and efficiency. For the automatic execution engine, users only need to configure Max connections size per query, which indicates the maximum number of connections allowed for each database during a query.
    The execution engine is divided into two stages: preparation and execution.
    • Preparation stage
      As the name suggests, this stage is used to prepare data for execution. It is divided into two steps: result set grouping and execution unit creation.
    • Execution phase
      This stage is used for real SQL execution. It is divided into two steps: grouping execution and merging result set generation.

The overall structure of the execution engine is divided as shown in the figure below.

debug

Modify the configuration of examples / shardingsphere JDBC example / shardingexample / shardingraw JDBC example / SRC / main / resources / shardingdatabases.yaml

props:
  sql-show: true #Print sql
  max-connections-size-per-query: 2 #The maximum number of connections allowed per database during a query.

Run examples / shardingsphere JDBC example / sharding example / sharding raw JDBC example / SRC / main / Java / org / Apache / shardingsphere / example / sharding / raw / JDBC / yamlrangeconfigurationexamplemain.java

After startup, the maximum number of connections for each database will be set according to our configured Max connections size per query

Then there are familiar classes. After routing and rewriting, sql will be executed

// KernelProcessor.jva
public ExecutionContext generateExecutionContext(LogicSQL logicSQL, ShardingSphereMetaData metaData, ConfigurationProperties props) {
	// route
    RouteContext routeContext = this.route(logicSQL, metaData, props);
    // rewrite
    SQLRewriteResult rewriteResult = this.rewrite(logicSQL, metaData, props, routeContext);
    // implement
    ExecutionContext result = this.createExecutionContext(logicSQL, metaData, routeContext, rewriteResult);
    // Print sql
    this.logSQL(logicSQL, props, result);
    return result;
}

private void logSQL(LogicSQL logicSQL, ConfigurationProperties props, ExecutionContext executionContext) {
	// Judge whether the sql print switch in the configuration file is on
    if ((Boolean)props.getValue(ConfigurationPropertyKey.SQL_SHOW)) {
    	// Print sql
        SQLLogger.logSQL(logicSQL, (Boolean)props.getValue(ConfigurationPropertyKey.SQL_SIMPLE), executionContext);
    }

}

The sql printed is as follows

After the ExecutionContext is generated, the execute function of the execution engine will be called

// ExecutorEngine.java
public <I, O> List<O> execute(ExecutionGroupContext<I> executionGroupContext, ExecutorCallback<I, O> firstCallback, ExecutorCallback<I, O> callback, boolean serial) throws SQLException {
    if (executionGroupContext.getInputGroups().isEmpty()) {
        return Collections.emptyList();
    } else {
    	// Determine whether it is serial execution or parallel execution
    	// The default value of serial in JDBC executor is false, so the execution engine defaults to parallel execution
        return serial ? this.serialExecute(executionGroupContext.getInputGroups().iterator(), firstCallback, callback) : this.parallelExecute(executionGroupContext.getInputGroups().iterator(), firstCallback, callback);
    }
}

private <I, O> List<O> parallelExecute(Iterator<ExecutionGroup<I>> executionGroups, ExecutorCallback<I, O> firstCallback, ExecutorCallback<I, O> callback) throws SQLException {
    // 
    ExecutionGroup<I> firstInputs = (ExecutionGroup)executionGroups.next();
    // Asynchronous execution
    Collection<ListenableFuture<Collection<O>>> restResultFutures = this.asyncExecute(executionGroups, callback);
    return this.getGroupResults(this.syncExecute(firstInputs, null == firstCallback ? callback : firstCallback), restResultFutures);
}

You can see from the debug information that the connection mode of this sql is the memory limit mode

Let's find a place to set the connection mode

// AbstractExecutionPrepareEngine.java
public final ExecutionGroupContext<T> prepare(RouteContext routeContext, Collection<ExecutionUnit> executionUnits) throws SQLException {
   	...
    ConnectionMode connectionMode = this.maxConnectionsSizePerQuery < sqlUnits.size() ? ConnectionMode.CONNECTION_STRICTLY : ConnectionMode.MEMORY_STRICTLY;
	...
}

The document says:

You can obtain the SQL routing result group that each database instance needs to execute within the allowable range of maxConnectionSizePerQuery, and calculate the optimal connection mode of this request.

This is where the execution is completed. The table creation statement returns two zeros (because the number of rows affected by the table creation is 0)

// JDBCLockEngine.java
private <T> List<T> doExecute(ExecutionGroupContext<JDBCExecutionUnit> executionGroupContext, Collection<RouteUnit> routeUnits, JDBCExecutorCallback<T> callback, SQLStatement sqlStatement) throws SQLException {
    List<T> results = this.jdbcExecutor.execute(executionGroupContext, callback);
    this.refreshMetadata(sqlStatement, routeUnits);
    return results;
}


We can see that the tables in the database have also been created

summary

The default value of Max connections size per query is 1. Generally, this value should not be set too large. Like the number of database connections, 10-30 is almost the same, and the maximum should not exceed 100. After all, too many connections consume resources and the execution efficiency is not high.

Tags: Java Database SQL shardingsphere

Posted on Thu, 02 Sep 2021 13:47:58 -0400 by irandoct