Grouping and filtering of DQL statements

Note: the following will use the emp table in the mysql database in the mysql database management system as an example

This content continues from the previous study notes:
DQL statement 2 in SQL (learning note 3)

1, Grouping function

Grouping function is also called multi line processing function, and there are only 5 multi line processing functions;
avg averaging
max Max
min for minimum
count how many numbers are there in a column
sum
It has the following characteristics:

1. All grouping functions operate on "a group" of data, and input multiple lines, and the final output result is one line

Example: find the maximum wage

select max(sal) from emp;

Output results:

+----------+
| max(sal) |
+----------+
|  5000.00 |
+----------+

Example:

select ename,max(sal) from emp;

Output results:

ERROR 1140 (42000): In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'mysql.emp.ENAME'; this is incompatible with sql_mode=only_full_group_by

2. The multiline processing function automatically ignores null

Note: all databases are specified in this way. As long as there is NULL, the operation result must be NULL

The table of emp is as follows:

+-------+--------+-----------+------+------------+---------+---------+--------+
| EMPNO | ENAME  | JOB       | MGR  | HIREDATE   | SAL     | COMM    | DEPTNO |
+-------+--------+-----------+------+------------+---------+---------+--------+
|  7369 | SMITH  | CLERK     | 7902 | 1980-12-17 |  800.00 |    NULL |     20 |
|  7499 | ALLEN  | SALESMAN  | 7698 | 1981-02-20 | 1600.00 |  300.00 |     30 |
|  7521 | WARD   | SALESMAN  | 7698 | 1981-02-22 | 1250.00 |  500.00 |     30 |
|  7566 | JONES  | MANAGER   | 7839 | 1981-04-02 | 2975.00 |    NULL |     20 |
|  7654 | MARTIN | SALESMAN  | 7698 | 1981-09-28 | 1250.00 | 1400.00 |     30 |
|  7698 | BLAKE  | MANAGER   | 7839 | 1981-05-01 | 2850.00 |    NULL |     30 |
|  7782 | CLARK  | MANAGER   | 7839 | 1981-06-09 | 2450.00 |    NULL |     10 |
|  7788 | SCOTT  | ANALYST   | 7566 | 1987-04-19 | 3000.00 |    NULL |     20 |
|  7839 | KING   | PRESIDENT | NULL | 1981-11-17 | 5000.00 |    NULL |     10 |
|  7844 | TURNER | SALESMAN  | 7698 | 1981-09-08 | 1500.00 |    0.00 |     30 |
|  7876 | ADAMS  | CLERK     | 7788 | 1987-05-23 | 1100.00 |    NULL |     20 |
|  7900 | JAMES  | CLERK     | 7698 | 1981-12-03 |  950.00 |    NULL |     30 |
|  7902 | FORD   | ANALYST   | 7566 | 1981-12-03 | 3000.00 |    NULL |     20 |
|  7934 | MILLER | CLERK     | 7782 | 1982-01-23 | 1300.00 |    NULL |     10 |
+-------+--------+-----------+------+------------+---------+---------+--------+

There are many null s in comm (subsidies and benefits). Let's take comm as an example:
For example, give each employee a subsidy of 10 yuan

select ename,(comm+10) as comm  from emp;

Output results:

+--------+---------+
| ename  | comm    |
+--------+---------+
| SMITH  |    NULL |
| ALLEN  |  310.00 |
| WARD   |  510.00 |
| JONES  |    NULL |
| MARTIN | 1410.00 |
| BLAKE  |    NULL |
| CLARK  |    NULL |
| SCOTT  |    NULL |
| KING   |    NULL |
| TURNER |   10.00 |
| ADAMS  |    NULL |
| JAMES  |    NULL |
| FORD   |    NULL |
| MILLER |    NULL |
+--------+---------+

It can be seen that all with NULL do not change after adding 10. This very intuitively shows that as long as there is NULL, the operation result must be NULL

Find the average value of comm;

select avg(comm) from emp;

Output results

+------------+
| avg(comm)  |
+------------+
| 550.000000 |
+------------+

This example also intuitively shows that the multiline processing function automatically ignores null

Note: it should be noted here that since the multiline processing function automatically ignores NULL, it must not be written like this:

select sum(comm) from emp where comm is not null;

When using multiple lines to process the number of lines, do not add where xxxx is not null after it
The program will not report errors, but it can be written, but it is completely unnecessary.

3. Grouping functions are generally used in combination with group by, which is why they are called grouping functions.

Any grouping function (count sum avg max min) is executed after the group by statement is executed.

Not writing group by is equivalent to the default of java. It will exist. The whole table is a group. Group by will be discussed below.

Note: 1. What is the difference between count (*) and count (a specific field)?

count(*): not the number of data in a field, but the total number of records. (independent of a field)
Count (specific field): indicates the total quantity of non NULL data in a specific field.

Small tips: grouping functions can also be combined

select count(*) , sum(sal) ,avg(sal) ,max (sal) ,min(sal) from emp;

Single line processing function

There are many single line processing functions. If you repeat them here one by one, it will appear that this blog post is too long. Here is only an example.
Here is a link to someone else's blog:
MySQL learning notes (2) – data processing function / single line processing function

The characteristics of single line processing function are different from multi line processing function. It inputs one line and outputs one line.
Review this: all databases are specified in this way. As long as there is NULL, the operation result must be NULL

Let's take an example: calculate the annual salary of all employees:
For the above reasons, we can't find it directly. Therefore, this time, we introduce our new function ifnull() null handler

ifnull is used as follows:
Ifnull (data that may be NULL is treated as what)

Then we can write:

select ename,(sal+ifnull(comm,0))*12 from emp;

Output results:

+--------+-------------------------+
| ename  | (sal+ifnull(comm,0))*12 |
+--------+-------------------------+
| SMITH  |                 9600.00 |
| ALLEN  |                22800.00 |
| WARD   |                21000.00 |
| JONES  |                35700.00 |
| MARTIN |                31800.00 |
| BLAKE  |                34200.00 |
| CLARK  |                29400.00 |
| SCOTT  |                36000.00 |
| KING   |                60000.00 |
| TURNER |                18000.00 |
| ADAMS  |                13200.00 |
| JAMES  |                11400.00 |
| FORD   |                36000.00 |
| MILLER |                15600.00 |
+--------+-------------------------+

2, Group by and having

Group by: group by a field or some fields.
Having: having is to filter the grouped data again

Key points: about the execution order of SQL statements

select    5
...
from      1
...
where     2
....
group by  3 
....
having    4
....
order by  6
...



Note: grouping function is in group by After execution
 Not written group by In fact, it is equivalent to existing by default, with the whole table as a group

At this time, there will be some problems, such as:
Select employees with more than average salary:

select ename,sal from emp where sal > avg(sal);

An error will be reported:
ERROR 1111 (HY000): Invalid use of group function

The correct wording should be:

select ename,sal from emp where sal > (select avg(sal) from emp);

Or, after the average wage is calculated, the average wage is substituted where sal > xxx Yes.

Note: note that this writing order and execution order are easy to make mistakes.
In addition, except that select from must be written, others should be combined according to requirements.

1. Usage of group by

Group by: group by a field or some fields.
Note: grouping functions are generally used in conjunction with group by, which is why they are called grouping functions.
And any grouping function (count sum avg max min) will be executed after the group by statement is executed.

Case: find out the highest salary for each job.

select max(sal),job from emp group by job;


Output results:
+----------+-----------+
| max(sal) | job       |
+----------+-----------+
|  1300.00 | CLERK     |
|  1600.00 | SALESMAN  |
|  2975.00 | MANAGER   |
|  3000.00 | ANALYST   |
|  5000.00 | PRESIDENT |
+----------+-----------+

Can you find out who these wages correspond to?

select ename´╝îmax(sal),job from emp group by job;

ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'mysql.emp.ENAME' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

I can't

Note: if there is group by in a statement, select can only be followed by grouping functions and fields participating in grouping.

Multi field grouping query

This is an important point. Generally speaking, it is not difficult to split requirements and clarify ideas
give an example:
Find out the highest salary for different positions in each department.

Here we need to disassemble the needs and clarify our ideas.

First, let's refer to the previous execution sequence. Here, I'm putting it here.

select    5
...
from      1
...
where     2
....
group by  3 
....
having    4
....
order by  6

Note: grouping function is in group by After execution

First filter where, then group by, then filter the grouped tables again, and finally sort them.

Well, this thing is easy to write.

Find out the highest salary for different positions in each department.

First, we can group first. Here we need to group two kinds of data. The first is part and the second is job.

select job,deptno from emp group by deptno,job;

Then find out the highest salary for different jobs in each department.

select job,deptno,max(sal) from emp group by deptno,job order by deptno;
+-----------+--------+----------+
| job       | deptno | max(sal) |
+-----------+--------+----------+
| CLERK     |     10 |  1300.00 |
| MANAGER   |     10 |  2450.00 |
| PRESIDENT |     10 |  5000.00 |
| ANALYST   |     20 |  3000.00 |
| CLERK     |     20 |  1100.00 |
| MANAGER   |     20 |  2975.00 |
| CLERK     |     30 |   950.00 |
| MANAGER   |     30 |  2850.00 |
| SALESMAN  |     30 |  1600.00 |
+-----------+--------+----------+

having and where

Both having and where are used to filter tables. The difference between them is that where is before group by and having is after group by. in other words,
When you need to filter before grouping, use where;
When you need to filter after grouping, select having;
When there is no difference between pre grouping filtering and post grouping filtering, there is no difference between selecting having and selecting where.

It is recommended to use where at this time, because after you filter, the amount of operations required for group by grouping is small.

Here are some examples:

There is no difference between using where and using having.

Find out the maximum salary of each department and display the data with salary greater than 2900.
There are two ways to write here.

 select max(sal),deptno from emp group by deptno having max(sal) > 2900;
 
select max(sal),deptno from emp where sal > 2900 group by deptno;

You must use having

Example:
Find out the average salary of each department and display the data with salary greater than 2000.

  select deptno,avg(sal) from emp group by deptno having avg(sal) > 2000;

Tags: Database MySQL SQL

Posted on Fri, 19 Nov 2021 13:32:06 -0500 by nainil