Are MySQL Join statements inefficient?

We often hear Join join queries that avoid using MySQL because of poor performance. Is this the case?Let's try it out.

1. Environment

MySQL: 5.7.30

Storage Engine: InnoDB

Index type: default optimized B+tree

 

2. Prepare

2.1 Table preparation and data preparation

We plan to create five tables and insert 10W pieces of data into them, as follows:

2.1.1 01_test

CREATE TABLE IF NOT EXISTS `01_test` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key ID',
  `phone_number` bigint(11) NOT NULL DEFAULT 0 COMMENT 'Phone number',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uidx_pn` (`phone_number`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Test Table 1';

Inserting data through stored procedures

DELIMITER $$
    CREATE PROCEDURE test_01_insert()
    BEGIN
        declare i int;
        set i = 0 ;
        WHILE (i < 100000) DO
            INSERT INTO `01_test` (`phone_number`) VALUES (CONCAT('130', LPAD(i, 8, '0')));	 
            set i = i + 1;
        END WHILE;
        commit;
END$$
DELIMITER ;

CALL test_01_insert();
Query OK, 0 rows affected (17.24 sec)

2.1.2 02_test

CREATE TABLE IF NOT EXISTS `02_test` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key ID',
  `first_test_id` int(11) unsigned NOT NULL DEFAULT 0 COMMENT 'Test Table 1 Primary Key ID',
  PRIMARY KEY (`id`),
  KEY `idx_fti` (`first_test_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Test Table 2';

03_test to 05_test table, table structure and 02_test is almost the same, except that the field name is changed to the previous table, and the index name is also changed accordingly, which is not to be overlooked here.

insert data

DELIMITER $$
    CREATE PROCEDURE test_insert()
    BEGIN
        declare i int;
        set i = 1;
        WHILE (i < 100001) DO
            INSERT INTO `02_test` (`first_test_id`) VALUES (i);	
            set i = i + 1;
        END WHILE;
        commit;
END$$
DELIMITER ;

CALL test_insert;
DROP PROCEDURE test_insert;
Query OK, 0 rows affected (17.17 sec)

Basically 10W pieces of data are inserted in about 17 seconds, which is faster.

 

3. Experiment

Let's execute the following statement directly

SELECT T1.* from 01_test AS T1
INNER JOIN 02_test AS T2
INNER JOIN 03_test AS T3
INNER JOIN 04_test AS T4
INNER JOIN 05_test AS T5
ON T1.id = T2.first_test_id
AND T2.id = T3.second_test_id
AND T3.id = T4.third_test_id
AND T4.id = T5.forth_test_id
ORDER BY T1.id DESC LIMIT 10;

There are five table join queries and the results are:

+--------+--------------+
| id     | phone_number |
+--------+--------------+
| 100000 |  13000099999 |
|  99999 |  13000099998 |
|  99998 |  13000099997 |
|  99997 |  13000099996 |
|  99996 |  13000099995 |
|  99995 |  13000099994 |
|  99994 |  13000099993 |
|  99993 |  13000099992 |
|  99992 |  13000099991 |
|  99991 |  13000099990 |
+--------+--------------+
10 rows in set (0.00 sec)

The result comes out in seconds.

View execution plan

EXPLAIN 
    -> SELECT T1.* from 01_test AS T1
    -> INNER JOIN 02_test AS T2
    -> INNER JOIN 03_test AS T3
    -> INNER JOIN 04_test AS T4
    -> INNER JOIN 05_test AS T5
    -> ON T1.id = T2.first_test_id
    -> AND T2.id = T3.second_test_id
    -> AND T3.id = T4.third_test_id
    -> AND T4.id = T5.forth_test_id
    -> ORDER BY T1.id DESC LIMIT 10;
+----+-------------+-------+------------+-------+------------------+----------+---------+------------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys    | key      | key_len | ref        | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+------------------+----------+---------+------------+------+----------+-------------+
|  1 | SIMPLE      | T1    | NULL       | index | PRIMARY          | PRIMARY  | 4       | NULL       |   10 |   100.00 | NULL        |
|  1 | SIMPLE      | T2    | NULL       | ref   | PRIMARY,uidx_fti | uidx_fti | 4       | test.T1.id |    1 |   100.00 | Using index |
|  1 | SIMPLE      | T3    | NULL       | ref   | PRIMARY,uidx_sti | uidx_sti | 4       | test.T2.id |    1 |   100.00 | Using index |
|  1 | SIMPLE      | T4    | NULL       | ref   | PRIMARY,uidx_tti | uidx_tti | 4       | test.T3.id |    1 |   100.00 | Using index |
|  1 | SIMPLE      | T5    | NULL       | ref   | uidx_fti         | uidx_fti | 4       | test.T4.id |    1 |   100.00 | Using index |
+----+-------------+-------+------------+-------+------------------+----------+---------+------------+------+----------+-------------+
5 rows in set, 1 warning (0.00 sec)

As you can see, each join query is used, so the filter rate reaches 100.

 

4. Summary

From the above experiment, it can be seen that if the query statement is appropriate, the field types on both sides of the ON statement are identical, table indexes are used in each ON statement, and JOIN queries are still fast.

So why do we often see JOIN queries avoided as much as possible?Even in Ali's published specifications for database operations, the following are explicitly mentioned:

4.1 Reason

  • In real-world scenarios, the above-mentioned conditions may not be satisfied, resulting in slow queries;
  • When DB is under heavy business pressure, it can reduce the burden.
  • Distributed database sub-tables, the current distributed middleware of MySQL, cross-library JOIN does not perform well, cross-library JOIN is not recommended.
  • To modify the schema of a table, it is easier to modify the query of a single table. To modify the sql statement written by JOIN, it is not easy to discover, the cost is large, and it is not easy to maintain when the system is large.

4.2 Alternatives

  • Replace MySQL JOIN queries with application logic, such as finding table A data first, and then using WHERE B.column IN (array);
  • Designing an anti-paradigm in a table, redundant fields can be found in the same table at one time.

Tags: Programming MySQL Database SQL

Posted on Tue, 02 Jun 2020 07:53:10 -0400 by hanpedro