This article was first published in Nebula Graph Community official account
In the previous article, we talked about the content of Query Engine Optimizer. In this article, we will explain the remaining Scheduler and Executor parts of Query Engine.
summary
In the execution phase, the execution engine converts the physical execution plan generated by the Planner into a series of executors through the Scheduler to drive the execution of the Executor.
Executor, that is, executor. Each PlanNode in the physical execution plan corresponds to an executor.
Source location
The source code of the scheduler is in the src/scheduler Directory:
src/scheduler ├── AsyncMsgNotifyBasedScheduler.cpp ├── AsyncMsgNotifyBasedScheduler.h ├── CMakeLists.txt ├── Scheduler.cpp └── Scheduler.h
The Scheduler abstract class defines the public interface of the Scheduler and can inherit this class to implement a variety of schedulers.
At present, the AsyncMsgNotifyBasedScheduler scheduler is implemented, which is based on asynchronous message communication and breadth first search to avoid stack overflow.
The source code of the actuator is in the src/executor Directory:
src/executor ├── admin ├── algo ├── CMakeLists.txt ├── ExecutionError.h ├── Executor.cpp ├── Executor.h ├── logic ├── maintain ├── mutate ├── query ├── StorageAccessExecutor.cpp ├── StorageAccessExecutor.h └── test
Execution process
Firstly, the scheduler starts from the root node of the execution plan, traverses the whole execution plan by using the breadth first search algorithm, and constructs their message notification mechanism according to the execution dependencies between nodes.
During execution, each node will be scheduled for execution after receiving the message that all its dependent nodes have completed execution. Once the self execution is completed, a message will be sent to the dependent node until the whole plan is executed.
void AsyncMsgNotifyBasedScheduler::runExecutor( std::vector<folly::Future<Status>>&& futures, Executor* exe, folly::Executor* runner, std::vector<folly::Promise<Status>>&& promises) const { folly::collect(futures).via(runner).thenTry( [exe, pros = std::move(promises), this](auto&& t) mutable { if (t.hasException()) { return notifyError(pros, Status::Error(t.exception().what())); } auto status = std::move(t).value(); auto depStatus = checkStatus(std::move(status)); if (!depStatus.ok()) { return notifyError(pros, depStatus); } // Execute in current thread. std::move(execute(exe)).thenTry( [pros = std::move(pros), this](auto&& exeTry) mutable { if (exeTry.hasException()) { return notifyError(pros, Status::Error(exeTry.exception().what())); } auto exeStatus = std::move(exeTry).value(); if (!exeStatus.ok()) { return notifyError(pros, exeStatus); } return notifyOK(pros); }); }); }
Each Executor will go through four stages: create open execute close:
create
Generate the corresponding Executor according to the node type.
open
Before the formal execution of the Executor, do some initialization operations, and judge the slow query termination and memory water level.
Nebula supports manually killing the execution of a query statement. Therefore, before each Executor executes, it is necessary to check the current execution plan status. If it is marked as killed, the execution will be terminated.
Before executing each Query type Executor, you also need to check whether the memory occupied by the current system reaches the memory level. If memory level is reached, execution is terminated, which can avoid OOM to some extent.
Status Executor::open() { if (qctx_->isKilled()) { VLOG(1) << "Execution is being killed. session: " << qctx()->rctx()->session()->id() << "ep: " << qctx()->plan()->id() << "query: " << qctx()->rctx()->query(); return Status::Error("Execution had been killed"); } auto status = MemInfo::make(); NG_RETURN_IF_ERROR(status); auto mem = std::move(status).value(); if (node_->isQueryNode() && mem->hitsHighWatermark(FLAGS_system_memory_high_watermark_ratio)) { return Status::Error( "Used memory(%ldKB) hits the high watermark(%lf) of total system memory(%ldKB).", mem->usedInKB(), FLAGS_system_memory_high_watermark_ratio, mem->totalInKB()); } numRows_ = 0; execTime_ = 0; totalDuration_.reset(); return Status::OK(); }
execute
The input and output of the Executor of Query type are a table (DataSet).
The execution of the Executor is based on the iterator model: during each calculation, the next() method of the iterator of the input table is called to obtain a row of data for calculation until the input table is traversed.
The calculated results form a new table and output to subsequent Executor s as output.
folly::Future<Status> ProjectExecutor::execute() { SCOPED_TIMER(&execTime_); auto* project = asNode<Project>(node()); auto columns = project->columns()->columns(); auto iter = ectx_->getResult(project->inputVar()).iter(); DCHECK(!!iter); QueryExpressionContext ctx(ectx_); VLOG(1) << "input: " << project->inputVar(); DataSet ds; ds.colNames = project->colNames(); ds.rows.reserve(iter->size()); for (; iter->valid(); iter->next()) { Row row; for (auto& col : columns) { Value val = col->expr()->eval(ctx(iter.get())); row.values.emplace_back(std::move(val)); } ds.rows.emplace_back(std::move(row)); } VLOG(1) << node()->outputVar() << ":" << ds; return finish(ResultBuilder().value(Value(std::move(ds))).finish()); }
If the input tables of the current Executor will not be input by other executors, the memory used by these input tables will be drop ped during the execution phase to reduce the memory occupation.
void Executor::drop() { for (const auto &inputVar : node()->inputVars()) { if (inputVar != nullptr) { // Make sure use the variable happened-before decrement count if (inputVar->userCount.fetch_sub(1, std::memory_order_release) == 1) { // Make sure drop happened-after count decrement CHECK_EQ(inputVar->userCount.load(std::memory_order_acquire), 0); ectx_->dropResult(inputVar->name); VLOG(1) << "Drop variable " << node()->outputVar(); } } } }
close
After the execution of the Executor, add some collected execution information, such as execution time and the number of rows in the output table, to profiling stats.
Users can view these statistics in the execution plan displayed after a statement in the profile.
Execution Plan (optimize time 141 us) -----+------------------+--------------+-----------------------------------------------------+-------------------------------------- | id | name | dependencies | profiling data | operator info | -----+------------------+--------------+-----------------------------------------------------+-------------------------------------- | 2 | Project | 3 | ver: 0, rows: 56, execTime: 147us, totalTime: 160us | outputVar: [ | | | | | | { | | | | | | "colNames": [ | | | | | | "VertexID", | | | | | | "player.age" | | | | | | ], | | | | | | "name": "__Project_2", | | | | | | "type": "DATASET" | | | | | | } | | | | | | ] | | | | | | inputVar: __TagIndexFullScan_1 | | | | | | columns: [ | | | | | | "$-.VertexID AS VertexID", | | | | | | "player.age" | | | | | | ] | -----+------------------+--------------+-----------------------------------------------------+-------------------------------------- | 3 | TagIndexFullScan | 0 | ver: 0, rows: 56, execTime: 0us, totalTime: 6863us | outputVar: [ | | | | | | { | | | | | | "colNames": [ | | | | | | "VertexID", | | | | | | "player.age" | | | | | | ], | | | | | | "name": "__TagIndexFullScan_1", | | | | | | "type": "DATASET" | | | | | | } | | | | | | ] | | | | | | inputVar: | | | | | | space: 318 | | | | | | dedup: false | | | | | | limit: 9223372036854775807 | | | | | | filter: | | | | | | orderBy: [] | | | | | | schemaId: 319 | | | | | | isEdge: false | | | | | | returnCols: [ | | | | | | "_vid", | | | | | | "age" | | | | | | ] | | | | | | indexCtx: [ | | | | | | { | | | | | | "columnHints": [], | | | | | | "index_id": 325, | | | | | | "filter": "" | | | | | | } | | | | | | ] | -----+------------------+--------------+-----------------------------------------------------+-------------------------------------- | 0 | Start | | ver: 0, rows: 0, execTime: 1us, totalTime: 19us | outputVar: [ | | | | | | { | | | | | | "colNames": [], | | | | | | "type": "DATASET", | | | | | | "name": "__Start_0" | | | | | | } | | | | | | ] | -----+------------------+--------------+-----------------------------------------------------+--------------------------------------
Above, the modules related to source code analysis Query Engine have been explained, and some features will be explained later.
AC diagram database technology? Please join the Nebula communication group first Fill out your Nebula business card , Nebula's little assistant will pull you into the group~~
[activity] Nebula Hackathon 2021 is in progress. Let's explore the unknown and get a ¥ 150000 bonus → https://nebula-graph.com.cn/hackathon/