MYSQL index analysis and optimization

MYSQL index analysis and optimization

EXPLAIN

MySQL provides an EXPLAIN command, which can analyze the SELECT statement and output the details of the SELECT execution for targeted optimization by developers.
For example:

EXPLAIN SELECT * from user WHERE id < 3;

The output of the EXPLAIN command is roughly as follows:

select_type

Represents the type of query. Common values are as follows:

  • SIMPLE: indicates that the query statement does not contain subqueries or union s
  • RIMARY: indicates that this query is the outermost query
  • UNION: indicates that this query is the second or subsequent query of UNION
  • Dependent Union: the second or subsequent query statement in the union, which uses external query results
  • Union result: result of Union
  • Subquery: select subquery statement
  • Dependent subquery: the select subquery statement depends on the results of the outer query.

The most common query type is SIMPLE, which means that our query has no sub query or UNION query.

Case 1

Case 2

Case 3

TYPE

all

This is the so-called "full table scan". If you display all the data items in a data table, it doesn't matter. If the all type appears in the sql for finding data items, it usually means that your sql statement is in the most native state and has a lot of optimization space.

# This is because the no column is neither a primary key nor an index, so you can only use a full table scan to find the target No.
mysql> explain select * from employee where `no` = '20150001';
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | employee | ALL  | NULL          | NULL | NULL    | NULL |    5 | Using where |

index

This connection type is just another form of full table scanning, but its scanning order is in the order of index. This kind of scanning is based on the index and then returns to the table to retrieve data. Compared with all, they all obtain the data of the whole table, and index needs to read the index first and return to the table to retrieve data randomly. Therefore, index cannot be faster than all (retrieve the data of the same table), but the only possible reason why the official manual says its efficiency is better than all is that, The data that scans the entire table by index is ordered. In this way, the results are different, and there is no way to compare the efficiency.

mysql> explain select * from employee order by `no` ;
+----+-------------+----------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+----------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | employee | ALL  | NULL          | NULL | NULL    | NULL |    5 | Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+------+----------------+
mysql> explain select * from employee order by rec_id ;
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------+
|  1 | SIMPLE      | employee | index | NULL          | PRIMARY | 4       | NULL |    5 | NULL  |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------+

As can be seen from the above, the connection type sorted according to the no column is all, but note that the extra column uses sorting (Using filesort), and the connection type sorted according to REC_ The connection type of ID column sorting is index, and the results are naturally ordered. No additional sorting is allowed. It may be for this reason that index is more efficient than all, but note that this requires the same conditions (i.e. sorting).

If the connection type is index and the value in the extra column is' Using index ', it is called index overwrite;

mysql> explain select rec_id from employee ;
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | employee | index | NULL          | PRIMARY | 4       | NULL |    5 | Using index |

Rec obtained in the above example_ The ID is just an index column, so there is no need to retrieve data from the table.

range

Range refers to the index scan with a range. Compared with the full index scan of index, it has a range limit, so it is better than index. The range is easy to understand. It should be remembered that if a range appears, it must be based on the index. Besides the obvious between, and and '>', '<', in and or are also index range scans.

ref

This connection type occurs when the lookup criteria column uses an index and is not primary key or unique. In fact, it means that although the index is used, the value of the index column is not unique and repeated. In this way, even if the first piece of data is quickly found by using the index, it still cannot be stopped, and a small-scale scan near the target value is required. But its advantage is that it does not need to scan the whole table, because the index is ordered. Even if there are duplicate values, it is scanned in a very small range.

To illustrate this situation, add a common key to the name column in the employee table (duplicate values are allowed)

 alter table employee add key I_EMPLOYEE_NAME(`name`); 

Next, when searching data by name in the employee table, the mysql optimizer selects the connection type of ref.

mysql> explain select * from employee where `name` = 'Zhang San';
+----+-------------+----------+------+----------------+----------------+---------+-------+------+-----------------------+
| id | select_type | table    | type | possible_keys  | key            | key_len | ref   | rows | Extra                 |
+----+-------------+----------+------+----------------+----------------+---------+-------+------+-----------------------+
|  1 | SIMPLE      | employee | ref  | I_EMPLOYEE_NAM | I_EMPLOYEE_NAM | 62      | const |    1 | Using index condition |
+----+-------------+----------+------+----------------+----------------+---------+-------+------+-----------------------+

ref_eq

ref_ The difference between EQ and ref is that it knows that there is only one search result set of this type? When there is only one result set!
That is the case when the primary key or unique index is used for search. For example, we find a student in a school according to the student number. Before we find it, we know that there must be only one result. Therefore, when we find the student number for the first time, we immediately stop the query. This connection type carries out accurate query every time without too much scanning, so the search efficiency is higher. Of course, the uniqueness of the column needs to be determined according to the actual situation.

Create a table

CREATE TABLE `score` (
  `rec_id` INT(11) NOT NULL AUTO_INCREMENT,
  `stu_id` INT(11) NOT NULL,
  `mark` INT(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`rec_id`),
  UNIQUE KEY `UK_SCORE_STU_ID` (`stu_id`)
) ENGINE=INNODB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8

There are five pieces of data in the employee table and five corresponding pieces of data in the score table, including the REC of employee_ Stu of ID and score_ ID is a one-to-one correspondence.

mysql> explain select ep.name,sc.mark from employee ep,score sc where ep.rec_id = sc.stu_id;
+----+-------------+-------+--------+-----------------+---------+---------+-----------------+------+-------+
| id | select_type | table | type   | possible_keys   | key     | key_len | ref             | rows | Extra |
+----+-------------+-------+--------+-----------------+---------+---------+-----------------+------+-------+
|  1 | SIMPLE      | sc    | ALL    | UK_SCORE_STU_ID | NULL    | NULL    | NULL            |    5 | NULL  |
|  1 | SIMPLE      | ep    | eq_ref | PRIMARY         | PRIMARY | 4       | my_db.sc.stu_id |    1 | NULL  |
+----+-------------+-------+--------+-----------------+---------+---------+-----------------+------+-------+

As you can see from the above, the score table is the type of full table scanning. rows=5 means that the outer table has been cycled for five times (because there are five pieces of data), but how is it possible that the rows of the employee table is 1? At the beginning, I was also very confused. This is closely related to the query principle of mysql. Rows actually reflects the number of inner loops of the query. For each data match in the outer layer, employees can hit with one shot, so rows is 1.

const

Usually, if you place a primary key behind where as a conditional query, the mysql optimizer can convert the query optimization into a constant. How and when to convert depends on the optimizer.

possible keys

Represents the index that can be used when querying. Note that it is not necessarily used, but the index name is displayed.

key

Indicates the index actually used during query, and displays the index name.

rows

MySQL query optimizer will estimate how many rows of records need to be scanned for SQL query results according to statistics. In principle, the fewer rows, the higher the efficiency. You can intuitively understand the efficiency of SQL.

key_len

Indicates the number of bytes in the index used by the query. You can determine whether all composite indexes are used, or only some field values of the leftmost part of the index are used.

key_ The calculation rules of len are as follows:

  • key_ The calculation rules of len are as follows:

    String type
    The string length is related to the character set: latin1 = 1,gbk = 2, utf8 = 3, utf8mb4 = 4
    char(n): n * character set length
    varchar(n): n * character set length + 2 bytes

    value type
    TINYINT: 1 byte
    SMALLINT: 2 bytes
    MEDIUMINT: 3 bytes
    INT, FLOAT: 4 bytes
    BIGINT, DOUBLE: 8 bytes

    Time type
    DATA: 3 bytes
    TIMESTAMP: 4 bytes
    DATETIME: 8 bytes

    Field properties
    The NULL property takes 1 byte. If NOT NULL is set for a field, this item is not available.

Extra

Extra indicates that there are a lot of additional information. Various operations will prompt relevant information in extra. The common types are as follows:

  • Using where
    Indicates that the query needs to query data through the index back table.
  • Using Index
    Indicates that the query needs to pass the index, and the index can meet the required data.
  • Using filesort
    It means that the query results need additional sorting. The small amount of data is in memory and the large amount is on disk. Therefore, it is recommended to optimize Using filesort.
  • Using temprorary
    The query uses temporary tables, which generally appear in de duplication and grouping operations.

Tags: MySQL

Posted on Sat, 27 Nov 2021 22:18:58 -0500 by greatme