Use of Mysql index optimization trace tool index design principles how to select an appropriate index what is index push down why does range lookup Mysql not use index push down optimization?

preface

In the previous chapters, I only learned about the principles of using indexes and what indexes to use with the explain keyword. The next two chapters consider how to make better use of indexes.

Example table

drop table `employees`;
CREATE TABLE `employees` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(24) NOT NULL DEFAULT '' COMMENT 'full name',
 `age` int(11) NOT NULL DEFAULT '0' COMMENT 'Age',
 `position` varchar(20) NOT NULL DEFAULT '' COMMENT 'position',
 `hire_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Entry time',
 PRIMARY KEY (`id`),
 KEY `idx_name_age_position` (`name`,`age`,`position`) USING BTREE
 ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT='Employee record form';
 INSERT INTO employees(name,age,position,hire_time) VALUES('LiLei',22,'manager',NOW());
 INSERT INTO employees(name,age,position,hire_time) VALUES('HanMeimei', 23,'dev',NOW());
 INSERT INTO employees(name,age,position,hire_time) VALUES('Lucy',23,'dev',NOW());
-- Insert some sample data
 drop procedure if exists insert_emp;
 delimiter ;;
 create procedure insert_emp()
 begin
 declare i int;
 set i=1;
 while(i<=100000)do
 insert into employees(name,age,position) values(CONCAT('zhuge',i),i,'dev');
 set i=i+1;
 end while;
 end;;
 delimiter ;
 call insert_emp();

The range of the first field of the joint index will not go through the index

EXPLAIN SELECT * FROM employees WHERE name > 'LiLei' AND age = 22 AND position ='manager';

mysql may think that the first field is used in the range, and the result set should be large. The efficiency of returning to the table is not high. It's better to scan the whole table

Forced index

EXPLAIN SELECT * FROM employees force index(idx_name_age_position) WHERE name > 'LiLei' AND age = 22 AND position ='manager';

Although the forced index is used to make the search of the first field range of the joint index follow the index, and the rows scanned look a little less, the final search efficiency is not necessarily higher than that of the full table scan, because the return table efficiency is not high.

Overlay index optimization

EXPLAIN SELECT name,age,position FROM employees WHERE name > 'LiLei' AND age = 22 AND position ='manager';

ken_ The 74 of len column comes from this. When we use utf-8 encoding, the calculation method is 3 * n + 2. When we build the table, the name field is 24 bits, that is, 3 * 24 + 2 = 74, so the index is name

in and or will go through the index when the amount of table data is large, and full table scanning will be selected when there are few table records

EXPLAIN SELECT * FROM employees WHERE name in ('LiLei','HanMeimei','Lucy') AND age = 22 AND position ='manager';

EXPLAIN SELECT * FROM employees WHERE (name = 'LiLei' or name = 'HanMeimei') AND age = 22 AND position='manager';

like KK% usually use the index

EXPLAIN SELECT * FROM employees WHERE name like 'LiLei%' AND age = 22 AND position='manager';

Index Condition Pushdown (ICP)

like KK% actually uses index push down optimization

For the auxiliary joint index (name,age,position), normally, according to the leftmost prefix principle, SELECT * FROM employees WHERE name like 'LiLei%' AND age = 22 AND position = 'manager'. In this case, only the name field index will be used, because after filtering according to the name field, the age and position in the index line are unordered, so the index cannot be used well.

In versions before MySQL 5.6, this query can only match indexes with names beginning with 'LiLei' in the joint index, then take the primary keys corresponding to these indexes back to the table one by one, find the corresponding records in the primary key index, and then compare the values of age and position.

MySQL 5.6 introduces index push down optimization. During index traversal, you can judge all fields contained in the index first, filter out unqualified records, and then return to the table, which can effectively reduce the number of return to the table. After using the index push down optimization, after the above query matches the index with the name beginning with 'LiLei' in the joint index, it will also filter the age and position fields in the index, take the primary key id corresponding to the filtered index, and then go back to the table to check the whole row of data. Index push down will reduce the number of table returns. For the table of innodb engine, index push down can only be used for secondary indexes. The leaf node of innodb's primary key index (clustered index) tree stores all row data, so index push down will not reduce the effect of querying all row data at this time.

Why is range lookup Mysql not optimized by index push down?

It is estimated that Mysql thinks that the filtered result set of range lookup is too large. Like KK% in most cases, the filtered result set is relatively small. Therefore, Mysql chooses to use index push down optimization for like KK%. Of course, this is not absolute. Sometimes like KK% does not necessarily push down.

How does Mysql select the appropriate index

 EXPLAIN select * from employees where name > 'a'

If you use the name index, you need to traverse the name field joint index tree, and then you need to find the final data in the primary key index tree according to the traversed primary key value. The cost is higher than the full table scan. You can use the overlay index optimization. In this way, you only need to traverse the name field joint index tree to get all the results, as follows:

EXPLAIN select name,age,position from employees where name > 'a' ;

For the execution results of name > 'a' and name > 'zzz', whether mysql finally chooses to go through the index or whether a table involves multiple indexes, and how mysql finally selects the index, you can use the trace tool to find out. Opening the trace tool will affect the performance of mysql, so you can only analyze the use of sql temporarily and close it immediately after use.

 EXPLAIN select * from employees where name > 'zzz' ;

Usage of trace tool:

set session optimizer_trace="enabled=on",end_markers_in_json=on; ‐‐open trace
select * from employees where name > 'a' order by position;
SELECT * FROM information_schema.OPTIMIZER_TRACE;

Contents of trace in result set 2

{
  "steps": [
    {
      "join_preparation": { --Preparatory work
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `employees`.`id` AS `id`,`employees`.`name` AS `name`,`employees`.`age` AS `age`,`employees`.`position` AS `position`,`employees`.`hire_time` AS `hire_time` from `employees` where (`employees`.`name` > 'a') order by `employees`.`position`"
          }
        ] /* steps */
      } /* join_preparation */
    },
    {
      "join_optimization": { --Optimize 
        "select#": 1,
        "steps": [
          {
            "condition_processing": {--Conditional processing
              "condition": "WHERE",
              "original_condition": "(`employees`.`name` > 'a')",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "(`employees`.`name` > 'a')"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "(`employees`.`name` > 'a')"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "(`employees`.`name` > 'a')"
                }
              ] /* steps */
            } /* condition_processing */
          },
          {
            "substitute_generated_columns": { -- Replace generated columns
            } /* substitute_generated_columns */
          },
          {
            "table_dependencies": [ -- Table dependencies
              {
                "table": "`employees`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ] /* depends_on_map_bits */
              }
            ] /* table_dependencies */
          },
          {
            "ref_optimizer_key_uses": [ -- Use key
            ] /* ref_optimizer_key_uses */
          },
          {
            "rows_estimation": [ -- Estimated table access costs
              {
                "table": "`employees`",
                "range_analysis": {
                  "table_scan": { -- Full table scanning
                    "rows": 100099, -- Number of scan lines
                    "cost": 10084.3 -- Scanning cost
                  } /* table_scan */,
                  "potential_range_indexes": [ -- Possible indexes for query
                    {
                      "index": "PRIMARY", -- primary key 
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "idx_name_age_position", -- Secondary index
                      "usable": true,
                      "key_parts": [
                        "name",
                        "age",
                        "position",
                        "id"
                      ] /* key_parts */
                    }
                  ] /* potential_range_indexes */,
                  "setup_range_conditions": [ -- Set range conditions
                  ] /* setup_range_conditions */,
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  } /* group_index_range */,
                  "skip_scan_range": {
                    "potential_skip_scan_indexes": [
                      {
                        "index": "idx_name_age_position",
                        "usable": false,
                        "cause": "query_references_nonkey_column"
                      }
                    ] /* potential_skip_scan_indexes */
                  } /* skip_scan_range */,
                  "analyzing_range_alternatives": { -- Analyze the use cost of each index
                    "range_scan_alternatives": [
                      {
                        "index": "idx_name_age_position", 
                        "ranges": [
                          "a < name" -- Index usage range
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false, -- Whether the records obtained using this index are sorted by primary key
                        "using_mrr": false,
                        "index_only": false,-- Use overlay index
                        "rows": 50049, -- Index scan rows
                        "cost": 17517.4, -- Index scan cost
                        "chosen": false,-- Select this index
                        "cause": "cost"
                      }
                    ] /* range_scan_alternatives */,
                    "analyzing_roworder_intersect": { 
                      "usable": false,
                      "cause": "too_few_roworder_scans"
                    } /* analyzing_roworder_intersect */
                  } /* analyzing_range_alternatives */
                } /* range_analysis */
              }
            ] /* rows_estimation */
          },
          {
            "considered_execution_plans": [ -- Consider implementation plan
              {
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`employees`",
                "best_access_path": { -- Optimal access path
                  "considered_access_paths": [ -- Final selected access path
                    {
                      "rows_to_scan": 100099,
                      "access_type": "scan", -- Access type: is scan,Full table scan
                      "resulting_rows": 100099,
                      "cost": 10082.2,
                      "chosen": true, -- Confirm selection
                      "use_tmp_table": true
                    }
                  ] /* considered_access_paths */
                } /* best_access_path */,
                "condition_filtering_pct": 100, -- Row filter percentage
                "rows_for_plan": 100099,
                "cost_for_plan": 10082.2,
                "sort_cost": 100099,
                "new_cost_for_plan": 110181,
                "chosen": true
              }
            ] /* considered_execution_plans */
          },
          {
            "attaching_conditions_to_tables": { --Attach condition to table
              "original_condition": "(`employees`.`name` > 'a')",
              "attached_conditions_computation": [
              ] /* attached_conditions_computation */,
              "attached_conditions_summary": [ -- Summary of additional conditions
                {
                  "table": "`employees`",
                  "attached": "(`employees`.`name` > 'a')"
                }
              ] /* attached_conditions_summary */
            } /* attaching_conditions_to_tables */
          },
          {
            "optimizing_distinct_group_by_order_by": {
              "simplifying_order_by": {
                "original_clause": "`employees`.`position`",
                "items": [
                  {
                    "item": "`employees`.`position`"
                  }
                ] /* items */,
                "resulting_clause_is_simple": true,
                "resulting_clause": "`employees`.`position`"
              } /* simplifying_order_by */
            } /* optimizing_distinct_group_by_order_by */
          },
          {
            "reconsidering_access_paths_for_index_ordering": {
              "clause": "ORDER BY",
              "steps": [
              ] /* steps */,
              "index_order_summary": {
                "table": "`employees`",
                "index_provides_order": false,
                "order_direction": "undefined",
                "index": "unknown",
                "plan_changed": false
              } /* index_order_summary */
            } /* reconsidering_access_paths_for_index_ordering */
          },
          {
            "finalizing_table_conditions": [
              {
                "table": "`employees`",
                "original_table_condition": "(`employees`.`name` > 'a')",
                "final_table_condition   ": "(`employees`.`name` > 'a')"
              }
            ] /* finalizing_table_conditions */
          },
          {
            "refine_plan": [ -- Streamlining plan
              {
                "table": "`employees`"
              }
            ] /* refine_plan */
          },
          {
            "considering_tmp_tables": [
              {
                "adding_sort_to_table": "employees"
              } /* filesort */
            ] /* considering_tmp_tables */
          }
        ] /* steps */
      } /* join_optimization */
    },
    {
      "join_execution": { -- implement
        "select#": 1,
        "steps": [
          {
            "sorting_table": "employees",
            "filesort_information": [
              {
                "direction": "asc",
                "expression": "`employees`.`position`"
              }
            ] /* filesort_information */,
            "filesort_priority_queue_optimization": {
              "usable": false,
              "cause": "not applicable (no LIMIT)"
            } /* filesort_priority_queue_optimization */,
            "filesort_execution": [
            ] /* filesort_execution */,
            "filesort_summary": {
              "memory_available": 262144,
              "key_size": 40,
              "row_size": 190,
              "max_rows_per_buffer": 1379,
              "num_rows_estimate": 100099,
              "num_rows_found": 100003,
              "num_initial_chunks_spilled_to_disk": 31,
              "peak_memory_used": 269496,
              "sort_algorithm": "std::stable_sort",
              "sort_mode": "<fixed_sort_key, packed_additional_fields>"
            } /* filesort_summary */
          }
        ] /* steps */
      } /* join_execution */
    }
  ] /* steps */
}

The cost of full table scanning is less than that of index lookup, so full table scanning is finally selected

set session optimizer_trace="enabled=off"; ‐‐close trace

Order by and Group by optimization

  • Case1:
EXPLAIN SELECT * FROM employees WHERE name = 'LiLei' and position = 'dev' ORDER BY age; 


Using the leftmost prefix rule: the middle field cannot be broken, so the query uses the name index from the key_len=74 also shows that the age index column is used in the sorting process because there is no using filesort in the Extra field

  • Case 2:
EXPLAIN SELECT * FROM employees WHERE name = 'LiLei' ORDER BY position;

From the execution result of explain: key_len=74, the query uses the name index. Because position is used for sorting, age is skipped and Using filesort appears.

  • Case 3:
EXPLAIN SELECT * FROM employees WHERE name = 'LiLei' ORDER BY age,position;

Only the indexes name, age and position are used for sorting without Using filesort.

  • Case 4:
EXPLAIN SELECT * FROM employees WHERE name = 'LiLei' ORDER BY position,age;

The result is the same as that of explain in Case 3, but Using filesort appears because the index is created in the order of name, age and position, but age and position are reversed during sorting.

  • Case 5:
EXPLAIN SELECT * FROM employees WHERE name = 'LiLei' and age = 18 ORDER BY position,age;

Compared with Case 4, Using filesort does not appear in Extra. Because age is a constant and optimized in sorting, the index is not reversed and Using filesort does not appear.

  • Case 6:
EXPLAIN SELECT * FROM employees WHERE name = 'zhuge' ORDER BY age asc,position DESC;

Although the sorted field columns are in the same order as the index, and order by is ascending by default, position desc here becomes descending, resulting in a different sorting method from the index, resulting in Using filesort. Mysql8 or above has a descending index, which can support this query method

  • Case 7:
EXPLAIN SELECT * FROM employees WHERE name in ('LiLei','zhuge') ORDER BY age,position;

For sorting, multiple equality conditions are also range queries

  • Case 8:
EXPLAIN SELECT * FROM employees WHERE name > 'a' ORDER BY name;

You can optimize with overlay indexes

EXPLAIN SELECT name,age,position FROM employees WHERE name > 'a' ORDER BY name;

Optimization summary:

  • MySQL supports two sorting methods, filesort and index. Using index means that MySQL scans the index itself to complete sorting. High index efficiency and low filesort efficiency.
  • order by uses the Using index when two conditions are met.
    • The order by statement uses the leftmost row of the index.
    • Use the combination of where clause and order by clause to satisfy the leftmost row of the index.
  • Try to sort on the index column and follow the leftmost prefix rule when establishing the index (the order in which the index is created).
  • If the condition of order by is not on the index column, Using filesort will be generated.
  • If you can use overlay index, try to use overlay index
  • Group by is very similar to order by. Its essence is to sort first and then group. It follows the leftmost prefix rule of index creation order. For the optimization of group by, if sorting is not required, order by null can be added to prohibit sorting. Note that where is higher than having. Do not have to limit the conditions that can be written in where.

Detailed explanation of Using filesort file sorting principle

filesort file sort method

  • Single channel sorting:
    Take out all the fields of the qualified rows at one time, and then sort them in the sort buffer; Use the trace tool to see sort_ The mode message displays < sort_ key, additional_ Fields > or < sort_ key, packed_ additional_ fields >
  • Two way sorting (also called back table sorting mode):
    First, take out the corresponding sorting field and the row ID that can directly locate the row data according to the corresponding conditions, and then sort in the sort buffer. After sorting, you need to retrieve other required fields again; Use the trace tool to see sort_ The mode message displays < sort_ key, rowid >

MySQL compares the system variable max_ length_ for_ sort_ The size of data (1024 bytes by default) and the total size of the fields to be queried determine which sort mode to use.

  • If the total length of the field is less than max_length_for_sort_data, then the one-way sorting mode is used;
  • If the total length of the field is greater than max_length_for_sort_data, then use the two-way sorting module.

Do not consider the underlying data structure of mysql and other factors
Detailed process of single channel sorting:

  • Find the first primary key id satisfying the condition of name = 'zhuge' from the index name
  • Take out the whole row according to the primary key id, take out the values of all fields, and store them in sort_ In buffer
  • Find the next primary key id satisfying the condition of name = 'zhuge' from the index name
  • Repeat steps 2 and 3 until name = 'zhuge' is not satisfied
  • Yes, sort_ The data in the buffer is sorted according to the field position
  • Return results to the client

Detailed process of two-way sorting:

  • Find the first primary key id satisfying name = 'zhuge' from the index name
  • Take out the whole row according to the primary key id, and put the sorting field position and primary key id into the sort buffer
  • Remove the primary key id of a record with name = 'zhuge' from the index name
  • Repeat 3 and 4 until name = 'zhuge' is not satisfied
  • Yes, sort_ The field position and primary key id in the buffer are sorted by the field position
  • Traverse the sorted id and field position, return to the original table according to the id value, take out the values of all fields and return them to the client

In fact, comparing the two sorting modes, one-way sorting will put all the fields to be queried into the sort buffer, while two-way sorting will only put the primary key and the fields to be sorted into the sort buffer, and then return to the fields to be queried in the original table through the primary key.

If MySQL sorts memory sort_ The buffer configuration is relatively small and there is no condition to continue to increase. You can properly set max_ length_ for_ sort_ Configure the small point of data to let the optimizer choose to use the two-way sorting algorithm, which can be found in sort_ More rows are sorted in the buffer at a time, but the data needs to be returned to the original table according to the primary key.

If MySQL sorting memory can be configured to be large, you can increase Max appropriately_ length_ for_ sort_ Data, let the optimizer give priority to full field sorting (one-way sorting) and put the required fields into sort_buffer, so that the query results will be returned directly from memory after sorting.

Therefore, MySQL passes max_length_for_sort_data is used to control sorting. Different sorting modes are used in different scenarios to improve sorting efficiency.

Note that if you use sort all_ In general, the efficiency of buffer memory sorting is higher than that of disk file sorting, but sort cannot be arbitrarily increased because of this_ Buffer (1m by default). Many mysql parameter settings have been optimized. Do not adjust them easily.

Index design principles

Code first, index last

I don't know how you usually build indexes for data tables. Do you build indexes immediately after building tables?

In fact, this is wrong. Generally, you should wait until the main business function is developed and take out and analyze the sql related to the table before creating an index.

Joint index coverage conditions

For example, you can design one or two or three joint indexes (try to build less single value indexes), make each joint index include the fields of where, order by and group by in the sql statement as much as possible, and ensure that the field order of these joint indexes meets the leftmost prefix principle of sql query as much as possible.

Do not index on small cardinality fields

Index cardinality refers to the total number of different values of this field in the table. For example, if there are 1 million rows of records in a table, there is a gender field whose value is either male or female, then the cardinality of this field is 2.

If you index this small base number segment, it's better to scan the whole table, because your index tree contains male and female values, so it's impossible to carry out fast binary search, so it doesn't make much sense to use the index.

Generally, when establishing an index, try to use the fields with large cardinality, that is, the fields with more values, so as to give play to the advantage of fast binary search of B + tree.

We can use prefix index for long strings

Try to design indexes for columns with small field types, such as tinyint, because if the field type is small, the disk space will be small, and your performance will be better when searching.

Of course, this so-called column with a smaller field type is not absolute. Many times, you just need to establish an index for fields such as varchar(255), even if it takes up more disk space.

The large field of varchar(255) may occupy more disk space. It can be slightly optimized. For example, index the first 20 characters of this field, that is, put the first 20 characters of each value in this field in the index tree, similar to KEY index(name(20),age,position).

At this time, when you search in the where condition, if you search according to the name field, you will first search according to the first 20 characters of the name field in the index tree, locate some data matching the prefix of the next 20 characters, and then return to the cluster index to extract the complete name field value for comparison.

However, if you order by name, your name only contains the first 20 characters in the index tree, so you can't use the index for this sort, and the same is true for group by. So here we should have an understanding of prefix index.

In case of conflict between where and order by, where takes precedence

When there is an index design conflict between where and order by, do you design the index for where or order by? Do you want where to use the index or order by to use the index?

In general, the where condition is often used to use the index to quickly filter out a part of the specified data, and then sort it.

Because in most cases, where filtering based on index can filter out a small part of the data you want as quickly as possible, and then the cost of sorting may be much less.

Optimization based on slow sql query

You can perform specific index optimization for these slow sql queries according to some slow sql in the monitoring background.

Tags: Database MySQL SQL

Posted on Mon, 04 Oct 2021 20:44:10 -0400 by alco19357