One business is to query the latest approved 5 pieces of data
SELECT `id`, `title` FROM `th_content` WHERE `audit_time` < 1541984478 AND `status` = 'ONLINE' ORDER BY `audit_time` DESC, `id` DESC LIMIT 5;
Check the monitoring situation at that time. The cpu utilization rate is over 100%. show processlist sees that many similar queries are in the state of create sort index.
View the structure of the table
CREATE TABLE `th_content` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `title` varchar(500) CHARACTER SET utf8 NOT NULL DEFAULT '' COMMENT 'Content title', `content` mediumtext CHARACTER SET utf8 NOT NULL COMMENT 'Text content', `audit_time` int(11) unsigned NOT NULL DEFAULT '0' COMMENT 'Audit time', `last_edit_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Last edit time', `status` enum('CREATED','CHECKING','IGNORED','ONLINE','OFFLINE') CHARACTER SET utf8 NOT NULL DEFAULT 'CREATED' COMMENT 'Information status', PRIMARY KEY (`id`), KEY `idx_at_let` (`audit_time`,`last_edit_time`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
The index has an audit_time is in the union index on the left, and there is no index on status.
Analyze the above sql execution logic:
- Find all primary key IDs that are less than the approval time from the federated index (if 1 million pieces of data have been approved before the timestamp, the corresponding primary key IDs of 1 million pieces of data will be taken out from the federated index)
- In the future, it would be better if there were an optimization. At present, there are 100 primary key IDs to be sorted, and then in the next step of back to the table operation, the nearest primary keys may be obtained at one disk I/O
- Go back to the table one by one, find 1 million row records, and filter out the row records with status='ONLINE '
- Finally, sort the query results (if 500000 rows are ONLINE, continue to sort the 500000 rows)
Finally, because of the large amount of data, although only 5 rows are taken, according to the extreme example we just mentioned, 1 million rows of data are actually queried, and finally 500000 rows of database memory sorting are carried out in memory.
So it is very inefficient.
A schematic diagram is drawn to illustrate the query process in the first step. The pink part represents the data rows that need to be queried back to the table at last.
In the figure, I forge YY and fill in some data according to the index storage law. If there is any error, please leave a message and point out. I hope you can see the way of joint index storage and index query through this figure
Improvement idea 1
It's never easy to find the range and use a good index. If we add an audit_ What improvements will be made to the joint index of time and status?
ALTER TABLE `th_content` ADD INDEX `idx_audit_status` (`audit_time`, `status`);
mysql> explain select `id`, `title` from `th_content` where `audit_time` < 1541984478 and `status` = 'ONLINE' order by `audit_time` desc, `id` desc limit 5; +----+-------------+------------+-------+------------------------------------------+------------------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+------------------------------------------+------------------+---------+------+--------+-------------+ | 1 | SIMPLE | th_content | range | idx_at_ft_pt_let,idx_audit_status | idx_audit_status | 4 | NULL | 209754 | Using where | +----+-------------+------------+-------+------------------------------------------+------------------+---------+------+--------+-------------+
Details: because audit_time is a range search, so the index of the second column can't be used, only audit can be used_ Time, so key_len is 4. In idea 2 below, these two fields are still the key_len is 5.
Or analyze the execution process after adding the index:
- Find the audit that is less than the audit time from the union index_ The union index of the largest row at time
- Then look down in turn, because < audit_ Time is a range lookup, and the values of the second column index are scattered. Therefore, you need to look forward in turn to match the index rows that meet the condition (status = 'ONLINE') until you get line 5.
- Specific data required for back table query
In the above diagram, pink identifies the rows that meet the index requirements of the first column, and queries forward in turn. Three records are filtered on this leaf node, and then you need to continue to the left and continue to query on the previous leaf node. Until five rows that meet the record are found.
Improvements
Because there is a value of status in the index, there is no need to query back to the table when the filter does not meet the row of status = 'ONLINE'. When returning to the table, there are only 5 rows of data to query, which will be greatly reduced on iops.
Disadvantages of this index
If idx_ audit_ If all 5 lines scanned in status are ONLINE, only 5 lines need to be scanned;
If idx_ audit_ Among the first 1 million lines scanned in status, only 4 lines of status are ONLINE, so 1 million 1 lines need to be scanned to get the required 5 lines of records. The number of rows that the index needs to scan is uncertain.
Improvement ideas 2
ALTER TABLE `th_content` DROP INDEX `idx_audit_status`; ALTER TABLE `th_content` ADD INDEX `idx_status_audit` (`status`, `audit_time`);
In this way, there is no pressure for sorting or returning to the table.
Source: mengkang.net/1302.html