Detailed analysis of MySQL execution plan and cost model

background

In order to facilitate subsequent communication, a ** Stack-Java Technology Exchange Group ** was created in the morning of 2020,213, which also facilitates active participation by scanning QR codes.

If the QR code is out of date, you can add the mini-signal (lastpass4u) of the stack culture-assistant and let him pull you into the group.We keep the spirit of small and beautiful, rather than abuse.

Then in the morning someone in the group asked a question:

The scan functions in the execution plan do not match the execution time. For example, the query optimizer finds that the scan a index has more rows and is therefore slower, so the optimizer chooses index b, but in fact the walk B index is slower than a. The walk a index is about 4 seconds and B is about 8 seconds.

This question has stirred up discussion, and some people have suggested that:

1. This can force index execution

2. The number of lines scanned is not predicted accurately. If you can manipulate the shell, execute the analyse table to see.

3. Take a look at your index, DDL, explain lain, etc.

But the questioner clearly knows this all by himself, he is concerned about the underlying optimizer cost rules, and so on.This makes me realize that EXPLAIN is the result. In fact, there are optimizers at the bottom of the database itself, and there are rules for who to choose and whether to index or not.This involves both efficiency and cost.

Explain Execution Plan Details

Using the explain keyword, you can simulate the optimizer to execute SQL queries to learn how MySQL handles your SQL statements, analyze your queries, or analyze performance bottlenecks in the table structure.

Information contained in Explain execution plan

![http://static.cyblogs.com/SouthEast (6).png](http://static.cyblogs.com/SouthEast (6).png)

The most important fields are id, type, key, rows, Extra.

id field

Sequence number of the select query, containing a set of numbers indicating the order in which the select clause or table of operations is executed in the query

  • Three cases:
    • 1. The id is the same: execution order is top-down

![http://static.cyblogs.com/SouthEast (4).png](http://static.cyblogs.com/SouthEast (4).png)

  • 2. The ID is different: if it is a subquery, the id's ordinal number will increase. The higher the ID value, the higher the priority, the more it will be executed first

![http://static.cyblogs.com/SouthEast (8).png](http://static.cyblogs.com/SouthEast (8).png)

  • 3. The id is the same but different (both cases exist simultaneously): if the id is the same, it can be thought of as a group and executed from top to bottom; in all groups, the higher the id value, the higher the priority, the earlier the execution

![http://static.cyblogs.com/SouthEast (5).png](http://static.cyblogs.com/SouthEast (5).png)

select_type field

The type of query used to distinguish complex queries such as general queries, joint queries, subqueries, etc.

1. SIMPLE: Simple select query with no subquery or union 2. PRIMARY: Queries contain any complex subparts, while outermost queries are marked as primary 3. SUBQUERY: Include subqueries in the select or where list 4. DERIVED: Subqueries contained in the from list are marked derived, mysql or recursively executed, and the results are placed in a zero-time table 5. UNION: If the second select appears after the union, it is marked as union; if the union is included in a subquery of the from clause, the outer select is marked as derived 6. UNION RESULT: select the results from the union table

![http://static.cyblogs.com/SouthEast (9).png](http://static.cyblogs.com/SouthEast (9).png)

type field

Access type, an important indicator in sql query optimization, results from good to bad are:

system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL

In general, a good sql query is at least range level, preferably ref

1, system: a table has only one row of records (equal to the system table), which is a special case of const type. It usually does not appear and can be ignored.

2. const: Indicates that the index is found once, and const is used to compare primary key or unique indexes.Because you only need to match one row of data, everything is fast.If you place the primary key in the where list, mysql can convert the query to a const.

3. eq_ref: Unique index scan in which only one record in the table matches each index key.Common in primary key or unique index scans.

Note: ALL full table scans have the least table records such as t1 table

![http://static.cyblogs.com/SouthEast (2).png](http://static.cyblogs.com/SouthEast (2).png)

4. ref: A non-unique index scan that returns all rows matching a single value.Essentially, it is also an index access that returns all rows that match a single value, but it may find multiple qualified rows, so it should be a mixture of lookup and scan

![http://static.cyblogs.com/SouthEast (3).png](http://static.cyblogs.com/SouthEast (3).png)

5. range: Retrieve only the rows of a given range and use an index to select the rows.The key column shows which index is used.Typically, queries such as bettween, <, >, in appear in where statements.Range scans on this indexed column are better than full-index scans.Just start at one point and end at another point without scanning all indexes

![http://static.cyblogs.com/SouthEast (7).png](http://static.cyblogs.com/SouthEast (7).png)

6. Index:Full Index Scan, index differs from ALL in that index type traverses only the index tree.This is usually an ALL block and should be an index file that is usually smaller than a data file.(Both Index and ALL read the full table, but index reads from the index, while ALL reads from the hard disk)

![http://static.cyblogs.com/SouthEast (1).png](http://static.cyblogs.com/SouthEast (1).png)

7. ALL:Full Table Scan, traverse the entire table to find matching rows

![http://static.cyblogs.com/SouthEast (10).png](http://static.cyblogs.com/SouthEast (10).png)

possible_keys field

If an index exists on the fields involved in the query, it will be listed, but not necessarily actually used by the query

key field

The actual index used, if NULL, is not used. If an override index is used in a query, it appears only in the key list

![http://static.cyblogs.com/SouthEast (11).png](http://static.cyblogs.com/SouthEast (11).png)

![http://static.cyblogs.com/SouthEast (2).png](http://static.cyblogs.com/SouthEast (12).png)

key_len field

Represents the number of bytes used in an index. The length of the index used in a query (the maximum possible length) is not the actual length used. In theory, the shorter the length, the better.key_len is calculated from the table definition, not retrieved from the table

ref field

The column displaying the index is used, if possible, as a constant const.

rows field

Roughly estimate the number of rows to read to find the required records based on table statistics and index selection

Extra field Not suitable for display in other fields, but important extra information

1,Using filesort

mysql uses an external index to sort the data rather than sorting it by an index within the table.That is, the sort operation mysql cannot do with the index becomes "File Sort"

Since the indexes are sorted by email and then by address, if they are sorted directly by address when querying, the index will not meet the requirements and must be "file sorted" again inside mysql

![http://static.cyblogs.com/SouthEast (13).png](http://static.cyblogs.com/SouthEast (13).png)

2,Using temporary

Use temporary tables to hold intermediate results, which means mysql uses temporary tables when sorting query results, common in order by and group by

![http://static.cyblogs.com/SouthEast (14).png](http://static.cyblogs.com/SouthEast (14).png)

3,Using index

Indicates that a Covering Index is used in the corresponding select operation, which avoids accessing the data rows of the table and is efficient If Using where occurs at the same time, the index is used to perform lookups of index key values (see figure above) If Using where does not occur at the same time, the index is used to read data instead of performing a lookup

Covering Index: Also called index override.This is the field in the select list, which can be retrieved only from the index, without having to read the data file again from the index. In other words, the query column is overwritten by the index that is built. Be careful: a. If you need to use an override index, only the columns you want will be taken out of the fields in the select list. Do not use select * b. If all fields are indexed, the index file will be too large and crud performance will be reduced

![http://static.cyblogs.com/SouthEast (15).png](http://static.cyblogs.com/SouthEast (15).png)

4,Using where

where filtering is used

5,Using join buffer

Link cache used

6,Impossible WHERE

The value of the where clause is always false and cannot be used to get any meta-ancestors

7,select tables optimized away

Optimizing MIN/MAX operations based on indexes or COUNT (*) operations for the MyISAM storage engine without a group by clause does not have to wait until the execution phase is computed and the query execution plan generation phase completes the optimization

8,distinct

Optimize distinct operation to stop finding the same worthwhile action after finding the first matching meta-ancestor

Optimizer Cost Model

index
  1. Cost model: RBO (rule-based optimization), CBO (cost-based optimization)
  2. For each execution path of SQL, a corresponding execution cost can be calculated, and the lower the cost, the more efficient the execution
Calculation of CBO Mode Cost
Total cost = CPU cost + IO cost
CPU Costt Computing Model

CPU cost = rows/5 + (rows/10 if comparing key)

CPU cost:

  1. MySQL upper level, overhead for processing returned records
  2. CPU Cost=records/TIME_FOR_COMPARE=Records/5
  3. Time at every 5 records, as 1 Cost
IO Costt Computing Model

IO cost is calculated as the number of clustered index leaf nodes

  • Full Scan IO Cost = table οƒ stat_clustered_index_size Total number of pages clustered index, one page as 1 cost
  • Scan Range IO Cost = [(ranges+rows)/total_rows]*Full scan IO Cost Cluster Index Range Scan to Return Record Ratio

If a return is required, IO cost is calculated as the estimated number of records, which is expensive

  • Index Coverage Scan for Secondary Indexes
    • Index override scan to reduce IO cost for returning clustered indexes keys_per_block=(stats_block_size/2)/(key_info[keynr].key_lenth+ref_length+1) stats_block_size/2 =index page half full
    • IO Cost: (records+keys_per_block-1)/keys_per_block
    • Calculates how many secondary index pages the range occupies, both overwriting the IO Cost scanned for the index
  • Index non-overlay scan for secondary index
    • Index non-override scan, requiring clustered index to read full records, increasing IO cost
    • IO Cost = (range+rows)
    • range:How many ranges For an IN query, it is converted to multiple index range queries
    • row:How many records are in range Since each record needs to return a clustered index, 1 Cost is generated for each record
Cost Model Analysis
  • Clustered index scan costs total number of index pages
  • Secondary index override scan costs less
  • Secondary index non-overlay scanning, expensive
  • Cost model calculation, requires statistical information support
    • stat_clustered_index_size
    • ranges
    • records/rows
    • stats_block_size
    • key_info[keynr].key_length
    • rec_per_key
    • ......
How to View the Log to Determine Cost's Choice
drop database lyj;
create database lyj;
use lyj;

create table t1 (
    c1 int(11) not null default '0',
    c2 varchar(128) default null,
    c3 varchar(64) default null,
    c4 int(11) default null,
    primary key (c1),
    key ind_c2 (c2),
    key ind_c4 (c4));

insert into t1 values(1,'a','A',10);
insert into t1 values(2,'b','B',20);
insert into t1 values(3,'b','BB',20);
insert into t1 values(4,'b','BBB',30);
insert into t1 values(5,'b','BBB',40);
insert into t1 values(6,'c','C',50);
insert into t1 values(7,'d','D',60);
commit;

select * from t1;
+----+------+------+------+
| c1 | c2   | c3   | c4   |
+----+------+------+------+
|  1 | a    | A    |   10 |
|  2 | b    | B    |   20 |
|  3 | b    | BB   |   20 |
|  4 | b    | BBB  |   30 |
|  5 | b    | BBB  |   40 |
|  6 | c    | C    |   50 |
|  7 | d    | D    |   60 |
+----+------+------+------+

Why not index ind_c2 when executing the following SQL?

mysql> explain select * from t1 where c4=20 and c2='b'\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: ind_c2,ind_c4,ind_c2_c4
          key: ind_c4
      key_len: 5
          ref: const
         rows: 2
        Extra: Using where

set optimizer_trace='enabled=on';
set optimizer_trace_max_mem_size=1000000;
set end_markers_in_json=on;

select * from t1 where c4=20 and c2='b';

mysql> select * from information_schema.optimizer_trace\G;
*************************** 1. row ***************************
                            QUERY: select * from t1 where c4=20 and c2='b'
                            TRACE: {
......
                  "potential_range_indices": [ # List alternate indexes
                    {
                      "index": "PRIMARY", 
                      "usable": false,         # This line indicates that the primary key index is not available
                      "cause": "not_applicable"
                    },
                    {
                      "index": "ind_c2",
                      "usable": true,
                      "key_parts": [
                        "c2",
                        "c1"
                      ] /* key_parts */
                    },
                    {
                      "index": "ind_c4",
                      "usable": true,
                      "key_parts": [
                        "c4",
                        "c1"
                      ] /* key_parts */
                    }
                  ] /* potential_range_indices */,
                  "setup_range_conditions": [
                  ] /* setup_range_conditions */,
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  } /* group_index_range */,
                  "analyzing_range_alternatives": {   # Start calculating the cost of a range scan for each index
                    "range_scan_alternatives": [
                      {
                        "index": "ind_c2",
                        "ranges": [
                          "b <= c2 <= b"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 4,         # The result of c2=b has 4 rows
                        "cost": 5.81,
                        "chosen": false,   # This index was not selected because cost
                        "cause": "cost"   
                      },
                      {
                        "index": "ind_c4",
                        "ranges": [
                          "20 <= c4 <= 20"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 2,
                        "cost": 3.41,
                        "chosen": true     # This index is the cheapest and selected
                      }
......
                  "chosen_range_access_summary": {  # Summary: Because cost selected ind_c4 as the minimum
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "ind_c4",
                      "rows": 2,
                      "ranges": [
                        "20 <= c4 <= 20"
                      ] /* ranges */
                    } /* range_access_plan */,
                    "rows_for_plan": 2,
                    "cost_for_plan": 3.41,
                    "chosen": true
                  } /* chosen_range_access_summary */
......

Because the cost of the ind_c4 range scan is less than ind_c2, the index does not walk ind_c2

where does the field C2 and c4 change places, or does the index not go ind_c2?Why?

explain select * from t1 where  c2='b' and c4=20\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: ind_c2,ind_c4,ind_c2_c4
          key: ind_c4
      key_len: 5
          ref: const
         rows: 2
        Extra: Using where

Because the cost of the ind_c4 range scan is less than ind_c2, the index does not walk ind_c2, regardless of the positions of C2 and c4.The validation method is the same as above.

In the following statement, replace the condition c2='c', why can I index ind_c2?

mysql> explain select * from t1 where c2='c' and c4=20\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: ind_c2,ind_c4,ind_c2_c4
          key: ind_c2
      key_len: 387
          ref: const
         rows: 1
        Extra: Using index condition; Using where

mysql> select * from information_schema.optimizer_trace\G;
*************************** 1. row ***************************
                            QUERY: select * from t1 where c2='c' and c4=20
                            TRACE: {
......
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "ind_c2",
                        "ranges": [
                          "c <= c2 <= c"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 1,           # The result set of c2=c has one row
                        "cost": 2.21,
                        "chosen": true       # This index is the cheapest and selected
                      },
                      {
                        "index": "ind_c4",
                        "ranges": [
                          "20 <= c4 <= 20"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 2,
                        "cost": 3.41,
                        "chosen": false,   # This index was not selected because cost
                        "cause": "cost"
                      }
                  ......

Create Composite Index

ALTER TABLE t1 ADD KEY ind_c2_c4(c2,c4);

Why doesn't the following statement take the composite index ind_c2_c4?

explain select * from t1 where  c2='b' and c4=20\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: ind_c2,ind_c4,ind_c2_c4
          key: ind_c4
      key_len: 5
          ref: const
         rows: 2
        Extra: Using where

set optimizer_trace='enabled=on';
set optimizer_trace_max_mem_size=1000000;
set end_markers_in_json=on;
select * from t1 where  c2='b' and c4=20;
select * from information_schema.optimizer_trace\G;
*************************** 1. row ***************************
                            QUERY: select * from t1 where  c2='b' and c4=20
                            TRACE: {
......
                  "potential_range_indices": [
                    {
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "ind_c2",
                      "usable": true,
                      "key_parts": [
                        "c2",
                        "c1"
                      ] /* key_parts */
                    },
                    {
                      "index": "ind_c4",
                      "usable": true,
                      "key_parts": [
                        "c4",
                        "c1"
                      ] /* key_parts */
                    },
                    {
                      "index": "ind_c2_c4",
                      "usable": true,
                      "key_parts": [
                        "c2",
                        "c4",
                        "c1"
                      ] /* key_parts */
                    }
                  ] /* potential_range_indices */,
                  "setup_range_conditions": [
                  ] /* setup_range_conditions */,
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  } /* group_index_range */,
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "ind_c2",
                        "ranges": [
                          "b <= c2 <= b"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 4,
                        "cost": 5.81,
                        "chosen": false,
                        "cause": "cost"
                      },
                      {
                        "index": "ind_c4",
                        "ranges": [
                          "20 <= c4 <= 20"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 2,
                        "cost": 3.41,
                        "chosen": true
                      },
                      {
                        "index": "ind_c2_c4",
                        "ranges": [
                          "b <= c2 <= b AND 20 <= c4 <= 20"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 2,
                        "cost": 3.41,
                        "chosen": false,
                        "cause": "cost"
                      }
......
                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "ind_c4",
                      "rows": 2,
                      "ranges": [
                        "20 <= c4 <= 20"
                      ] /* ranges */
                    } /* range_access_plan */,
                    "rows_for_plan": 2,
                    "cost_for_plan": 3.41,
                    "chosen": true
                  } /* chosen_range_access_summary */
                } /* range_analysis */
              }
 ......

The indices ind_c4 and ind_c2_c4 are both non-overlay scans, while the cost s of ind_c4 and ind_c2_c4 are the same. mysql will select the index with a smaller number of leaf blocks, which is obvious that ind_c4 has a smaller number of leaf blocks.

Why can the following statement take the composite index ind_c2_c4?

explain select c2,c4 from t1 where c2='b' and c4=20\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: ind_c2,ind_c4,ind_c2_c4
          key: ind_c2_c4
      key_len: 392
          ref: const,const
         rows: 2
        Extra: Using where; Using index

set optimizer_trace='enabled=on';
set optimizer_trace_max_mem_size=1000000;
set end_markers_in_json=on;
select c2,c4 from t1 where c2='b' and c4=20;
select * from information_schema.optimizer_trace\G;

*************************** 1. row ***************************
                            QUERY: select c2,c4 from t1 where c2='b' and c4=20
                            TRACE: {
......
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "ind_c2",
                        "ranges": [
                          "b <= c2 <= b"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 4,
                        "cost": 5.81,
                        "chosen": false,
                        "cause": "cost"
                      },
                      {
                        "index": "ind_c4",
                        "ranges": [
                          "20 <= c4 <= 20"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 2,
                        "cost": 3.41,
                        "chosen": false,
                        "cause": "cost"
                      },
                      {
                        "index": "ind_c2_c4",
                        "ranges": [
                          "b <= c2 <= b AND 20 <= c4 <= 20"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": true,    # Index Coverage Scan
                        "rows": 2,
                        "cost": 3.41,
                        "chosen": false,
                        "cause": "cost"
                      }
                    ] /* range_scan_alternatives */,
                    "analyzing_roworder_intersect": {
                      "intersecting_indices": [
                        {
                          "index": "ind_c2_c4",
                          "index_scan_cost": 1.0476,
                          "cumulated_index_scan_cost": 1.0476,
                          "disk_sweep_cost": 0,
                          "cumulated_total_cost": 1.0476,
                          "usable": true,
                          "matching_rows_now": 2,
                          "isect_covering_with_this_index": true,
                          "chosen": true
                        }
                      ] /* intersecting_indices */,
                      "clustered_pk": {
                        "clustered_pk_added_to_intersect": false,
                        "cause": "no_clustered_pk_index"
                      } /* clustered_pk */,
                      "chosen": false,
                      "cause": "too_few_indexes_to_merge"
                    } /* analyzing_roworder_intersect */
                  } /* analyzing_range_alternatives */
                } /* range_analysis */
              }
            ] /* rows_estimation */
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`t1`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "access_type": "ref",
                      "index": "ind_c2",
                      "rows": 4,
                      "cost": 2.8,
                      "chosen": true
                    },
                    {
                      "access_type": "ref",
                      "index": "ind_c4",
                      "rows": 2,
                      "cost": 2.4,
                      "chosen": true
                    },
                    {
                      "access_type": "ref",
                      "index": "ind_c2_c4",
                      "rows": 2,
                      "cost": 1.4476,
                      "chosen": true
                    },
                    {
                      "access_type": "scan",
                      "cause": "covering_index_better_than_full_scan",  
                      "chosen": false
                    }
                  ] /* considered_access_paths */
                } /* best_access_path */,
                "cost_for_plan": 1.4476,
                "rows_for_plan": 2,
                "chosen": true
              }
......

Since ind_c2_c4 is an index override scan, no loopback is required and the cost is small.

summary

** We see Explain as a result only, Cost as a process.** If we really want to understand how the database optimizes our SQL or how it actually executes, we need to have a deep understanding of the underlying level.

Reference Address

If you like my article, you can focus on your personal subscription number.Welcome to leave a message and exchange at any time.

Tags: Programming MySQL SQL Database less

Posted on Thu, 20 Feb 2020 20:16:45 -0500 by curlious