The difference between Mysql join on filtering and where filtering

The tests are as follows:

(1) Create two tables and insert data. The sql statement is as follows:

a table:

CREATE TABLE `a` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(50) DEFAULT '',
  `grade` int(11) DEFAULT NULL,
  `dept` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  CHARSET=utf8mb4


insert into `a` (`id`, `name`, `grade`, `dept`) values('1','LIJIE1','100','10');
insert into `a` (`id`, `name`, `grade`, `dept`) values('2','LIJIE2','90','20');
insert into `a` (`id`, `name`, `grade`, `dept`) values('3','LIJIE3','60','10');
insert into `a` (`id`, `name`, `grade`, `dept`) values('4','LIJIE4','80','10');
insert into `a` (`id`, `name`, `grade`, `dept`) values('5','LIJIE5','70','20');

b table:

CREATE TABLE `b` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `NAME` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB CHARSET=utf8mb4

insert into `b` (`id`, `NAME`) values('10','IT');
insert into `b` (`id`, `NAME`) values('20','IT2');

Table data shows:

a table:

id  name    grade   dept
1   lijie1  100     10
2   lijie2  90      20
3   lijie3  60      10
4   lijie4  80      10
5   lijie5  70      20

b table:

id  name
10  IT1
20  IT2

(2) Comparison of on and where conditions in left join

1. If two tables join and the filter score is greater than or equal to 80, the condition is put on the join on

select 
    A.id,A.name,A.grade,A.dept,B.id,B.name 
from
    A left outer join B
on
    A.dept = B.id 
and 
    A.grade >=80 

Query results:

2. If two tables join and the filter score is greater than or equal to 80, the condition is put on where

select 
    A.id,A.name,A.grade,A.dept,B.id,B.name 
from
    A left outer join B
on
    A.dept = B.id
where
    A.grade >=80 

Query results:

  • conclusion

When the filter condition is written in left join on enables all data in the base table to be displayed (no matter how the base table is limited in the on condition or not, only the base table is grouped at most, not filtered, that is, if the base table is limited in the on condition, the records of the base table will be displayed, but the records of the non base table will be filtered in the on condition). The right table that does not meet the condition will be filled with null, and when the filter condition is written on where, only the right table that meets the filter condition will be filled with null Data display of the condition.

The difference between where and on when conditions are written in multi table join (summary)

This situation is often encountered in the development process:
When multiple tables are associated with a join, the restriction is to write it after where with high efficiency or after on, or to filter the table first to reduce the data volume of the table. In the end, which of the three kinds of efficiency is higher? After reading a lot of information on the Internet, it doesn't mention the specific points. Now, make the following specific instructions for these three situations, and you will understand what's going on

Dry goods summary: (the following only applies to left join, right join, full join, not inner join)
1. left join where + base table filter condition: first filter the base table, then left join;
2. left join where + filter condition of associated table: first execute left join, then execute filter condition of associated table;
3. left join on + base table filtering condition: the base table records that meet the requirements of filtering perform left join, and the base table records that do not meet the requirements are followed by null, and then the two sets are combined;
4. left join on + filter conditions of the associated table: first execute the filter conditions of the associated table, and then execute left join;

Example:

sql:

CREATE TABLE app_test_01 (
	id INT AUTO_INCREMENT PRIMARY KEY,
	city VARCHAR(50) DEFAULT ''
)

INSERT INTO app_test_01 VALUES
(NULL,'Beijing'),(NULL,'Shanghai'),(NULL,'Shenzhen'),(NULL,'Shanghai'),(NULL,'Hunan'),(NULL,'Hubei'),(NULL,'Wuhan');


CREATE TABLE app_test_02 (
	id INT AUTO_INCREMENT PRIMARY KEY,
	stu VARCHAR(50) DEFAULT '',
	city VARCHAR(50) DEFAULT ''
)

INSERT INTO app_test_02 VALUES
(NULL,'One','Beijing'),(NULL,'Two','Beijing'),
(NULL,'Three','Shanghai'),(NULL,'Four','Beijing'),
(NULL,'Five','Shenzhen'),(NULL,'Six','Shenzhen'),
(NULL,'Seven','Hunan'),(NULL,'Eight','Hubei');

on vs. where:

1, The first situation:

(1) Table conditions are written after where:

SELECT 
  a.id,
  b.stu 
FROM
  app_test_01 a 
  LEFT JOIN app_test_02 b 
    ON a.id = b.id 
WHERE a.`city` = 'Shenzhen' ;

result:

(2) Filter the base table first, and then associate

SELECT 
  a.id,
  b.stu 
FROM
  (SELECT 
    id 
  FROM
    app_test_01 
  WHERE city = 'Shenzhen') a 
  LEFT JOIN app_test_02 b 
    ON a.id = b.id ;

result:

The execution order of these two methods is the same, they are filtering first, then Association; so the operation efficiency is the same!

2, The second situation:

(1)where conditional base table

SELECT 
  a.id,
  b.stu 
FROM
  app_test_01 a 
  LEFT JOIN app_test_02 b 
    ON a.id = b.id
    WHERE a.city = 'Shenzhen';

result:

(2)where conditional place association table

SELECT 
  a.id,
  b.stu 
FROM
  app_test_01 a 
  LEFT JOIN app_test_02 b 
    ON a.id = b.id 
WHERE b.city = 'Shenzhen' ;

result:

The first execution order: < 1 > first filter the a table, and then associate the filtered a table with the b table
The second execution order: < 1 > associate table a and table b first, < 2 > then execute the conditions of table b after where for the associated results

3, The third situation

(1) First, write where as the base table filter condition

SELECT 
  a.id,
  b.stu 
FROM
  app_test_01 a 
  LEFT JOIN app_test_02 b 
    ON a.id = b.id 
WHERE a.city = 'Shenzhen' ;

result:

(2) Second: after the condition of the base table is written on

SELECT 
  a.id,
  b.stu 
FROM
  app_test_01 a 
  LEFT JOIN app_test_02 b 
    ON a.id = b.id 
    AND a.city = 'Shenzhen' ;

result:

(3) The third case: the filter conditions of the base table and the associated table are written after on

SELECT 
  a.id,
  b.stu 
FROM
  app_test_01 a 
  LEFT JOIN app_test_02 b 
    ON a.id = b.id 
    AND a.city = 'Shenzhen' 
    AND b.`city` = 'Shenzhen' 

result:

The first execution order is:

< 1 > filter the a table first,
Then the filtered a and b tables were correlated;
The second order of execution:

<1> First, a table is divided into two parts by using the filter condition of a.city = 'Shenzhen'. One part meets the filter condition and the other part does not meet the filter condition (that is, the condition of the base table after on is only used to associate with the associated table),
<2> For the association between the satisfied and the b table, the fields after the unsatisfied are filled with null, and then the satisfied and unsatisfied two part sets union are combined to form the final result set;

The third order of execution:

<1> First, filter table B under the condition of b.city = Shanghai,
<2> Using a.city = 'Shenzhen' condition, a table is divided into two parts: satisfied and unsatisfied
<3> Associate the satisfied set with the filtered b table, fill in null directly for the fields behind the unsatisfied set, and finally unite the two sets to form the final result set

 

All of the above are the results of reviewing the execution plan and specific tests, so different writing methods can be selected for different business scenarios to improve the execution efficiency.

I believe that with the above knowledge, no matter how to add conditions, you can quickly determine the execution order of the code!!!

Tags: Database SQL

Posted on Sun, 24 May 2020 23:38:25 -0400 by runnee