MySQL 8.0 window function practice and summary

Before MySQL 8.0, it was quite painful to do data ranking statistics, because there was no window function like Oracle, SQL SERVER, PostgreSQL and other databases. However, with the addition of window functions in MySQL 8.0, it is no longer a problem for this kind of statistics. This paper introduces MySQL window functions with common sorting examples.

1. Preparations

Create tables and test data

mysql> use testdb;
Database changed
/*  Create table */
mysql> create  table tb_score(id int primary key auto_increment,stu_no varchar(10),course varchar(50),score decimal(4,1),key idx_stuNo_course(stu_no,course));
Query OK, 0 rows affected (0.03 sec)

mysql> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| tb_score         |
+------------------+

/* Add a batch of test data */
mysql> insert into tb_score(stu_no,course,score)values('2020001','mysql',90),('2020001','C++',85),('2020003','English',100),('2020002','mysql',50),('2020002','C++',70),('2020002','English',99);
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0
mysql> insert into tb_score(stu_no,course,score)values('2020003','mysql',78),('2020003','C++',81),('2020003','English',80),('2020004','mysql',80),('2020004','C++',60),('2020004','English',100);
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0
mysql> insert into tb_score(stu_no,course,score)values('2020005','mysql',98),('2020005','C++',96),('2020005','English',70),('2020006','mysql',60),('2020006','C++',90),('2020006','English',70);
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0
mysql> insert into tb_score(stu_no,course,score)values('2020007','mysql',50),('2020007','C++',66),('2020007','English',76),('2020008','mysql',90),('2020008','C++',69),('2020008','English',86);
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0
mysql> insert into tb_score(stu_no,course,score)values('2020009','mysql',70),('2020009','C++',66),('2020009','English',86),('2020010','mysql',75),('2020010','C++',76),('2020010','English',81);
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0
mysql> insert into tb_score(stu_no,course,score)values('2020011','mysql',90),('2020012','C++',85),('2020011','English',84),('2020012','English',75),('2020013','C++',96),('2020013','English',88);
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

2. Rank the scores of each course

Rank according to the scores of each course from high to low. At this time, there will be the problem of how to deal with the same score. Next, we will deal with the needs of different scenarios according to different window functions

ROW_NUMBER

It can be seen from the results that the scores are ranked according to the order of student number at the same time

mysql> select stu_no,course,score, row_number()over(partition by course order by score desc ) rn
    -> from tb_score;
+---------+---------+-------+----+
| stu_no  | course  | score | rn |
+---------+---------+-------+----+
| 2020005 | C++     |  96.0 |  1 |
| 2020013 | C++     |  96.0 |  2 |
| 2020006 | C++     |  90.0 |  3 |
| 2020001 | C++     |  85.0 |  4 |
| 2020012 | C++     |  85.0 |  5 |
| 2020003 | C++     |  81.0 |  6 |
| 2020010 | C++     |  76.0 |  7 |
| 2020002 | C++     |  70.0 |  8 |
| 2020008 | C++     |  69.0 |  9 |
| 2020007 | C++     |  66.0 | 10 |
| 2020009 | C++     |  66.0 | 11 |
| 2020004 | C++     |  60.0 | 12 |
| 2020003 | English | 100.0 |  1 |
| 2020004 | English | 100.0 |  2 |
| 2020002 | English |  99.0 |  3 |
| 2020013 | English |  88.0 |  4 |
| 2020008 | English |  86.0 |  5 |
| 2020009 | English |  86.0 |  6 |
| 2020011 | English |  84.0 |  7 |
| 2020010 | English |  81.0 |  8 |
| 2020003 | English |  80.0 |  9 |
| 2020007 | English |  76.0 | 10 |
| 2020012 | English |  75.0 | 11 |
| 2020005 | English |  70.0 | 12 |
| 2020006 | English |  70.0 | 13 |
| 2020005 | mysql   |  98.0 |  1 |
| 2020001 | mysql   |  90.0 |  2 |
| 2020008 | mysql   |  90.0 |  3 |
| 2020011 | mysql   |  90.0 |  4 |
| 2020004 | mysql   |  80.0 |  5 |
| 2020003 | mysql   |  78.0 |  6 |
| 2020010 | mysql   |  75.0 |  7 |
| 2020009 | mysql   |  70.0 |  8 |
| 2020006 | mysql   |  60.0 |  9 |
| 2020002 | mysql   |  50.0 | 10 |
| 2020007 | mysql   |  50.0 | 11 |
+---------+---------+-------+----+
36 rows in set (0.00 sec)

DENSE_RANK

In order to have the same score and rank, you can use DENSE_RANK function, the result is as follows:

mysql> select stu_no,course,score, DENSE_RANK()over(partition by course order by score desc ) rn 
    -> from  tb_score  ;    
+---------+---------+-------+----+
| stu_no  | course  | score | rn |
+---------+---------+-------+----+
| 2020005 | C++     |  96.0 |  1 |
| 2020013 | C++     |  96.0 |  1 |
| 2020006 | C++     |  90.0 |  2 |
| 2020001 | C++     |  85.0 |  3 |
| 2020012 | C++     |  85.0 |  3 |
| 2020003 | C++     |  81.0 |  4 |
| 2020010 | C++     |  76.0 |  5 |
| 2020002 | C++     |  70.0 |  6 |
| 2020008 | C++     |  69.0 |  7 |
| 2020007 | C++     |  66.0 |  8 |
| 2020009 | C++     |  66.0 |  8 |
| 2020004 | C++     |  60.0 |  9 |
| 2020003 | English | 100.0 |  1 |
| 2020004 | English | 100.0 |  1 |
| 2020002 | English |  99.0 |  2 |
| 2020013 | English |  88.0 |  3 |
| 2020008 | English |  86.0 |  4 |
| 2020009 | English |  86.0 |  4 |
| 2020011 | English |  84.0 |  5 |
| 2020010 | English |  81.0 |  6 |
| 2020003 | English |  80.0 |  7 |
| 2020007 | English |  76.0 |  8 |
| 2020012 | English |  75.0 |  9 |
| 2020005 | English |  70.0 | 10 |
| 2020006 | English |  70.0 | 10 |
| 2020005 | mysql   |  98.0 |  1 |
| 2020001 | mysql   |  90.0 |  2 |
| 2020008 | mysql   |  90.0 |  2 |
| 2020011 | mysql   |  90.0 |  2 |
| 2020004 | mysql   |  80.0 |  3 |
| 2020003 | mysql   |  78.0 |  4 |
| 2020010 | mysql   |  75.0 |  5 |
| 2020009 | mysql   |  70.0 |  6 |
| 2020006 | mysql   |  60.0 |  7 |
| 2020002 | mysql   |  50.0 |  8 |
| 2020007 | mysql   |  50.0 |  8 |
+---------+---------+-------+----+
36 rows in set (0.00 sec)

RANK

DENSE_ The result of RANK is that the scores are the same at the same time, but the next RANK is next to the previous one. If the next RANK is the third one after the first one in parallel, I think it can be achieved by using RANK function

mysql> select stu_no,course,score, rank()over(partition by course order by score desc ) rn 
    -> from  tb_score;
+---------+---------+-------+----+
| stu_no  | course  | score | rn |
+---------+---------+-------+----+
| 2020005 | C++     |  96.0 |  1 |
| 2020013 | C++     |  96.0 |  1 |
| 2020006 | C++     |  90.0 |  3 |
| 2020001 | C++     |  85.0 |  4 |
| 2020012 | C++     |  85.0 |  4 |
| 2020003 | C++     |  81.0 |  6 |
| 2020010 | C++     |  76.0 |  7 |
| 2020002 | C++     |  70.0 |  8 |
| 2020008 | C++     |  69.0 |  9 |
| 2020007 | C++     |  66.0 | 10 |
| 2020009 | C++     |  66.0 | 10 |
| 2020004 | C++     |  60.0 | 12 |
| 2020003 | English | 100.0 |  1 |
| 2020004 | English | 100.0 |  1 |
| 2020002 | English |  99.0 |  3 |
| 2020013 | English |  88.0 |  4 |
| 2020008 | English |  86.0 |  5 |
| 2020009 | English |  86.0 |  5 |
| 2020011 | English |  84.0 |  7 |
| 2020010 | English |  81.0 |  8 |
| 2020003 | English |  80.0 |  9 |
| 2020007 | English |  76.0 | 10 |
| 2020012 | English |  75.0 | 11 |
| 2020005 | English |  70.0 | 12 |
| 2020006 | English |  70.0 | 12 |
| 2020005 | mysql   |  98.0 |  1 |
| 2020001 | mysql   |  90.0 |  2 |
| 2020008 | mysql   |  90.0 |  2 |
| 2020011 | mysql   |  90.0 |  2 |
| 2020004 | mysql   |  80.0 |  5 |
| 2020003 | mysql   |  78.0 |  6 |
| 2020010 | mysql   |  75.0 |  7 |
| 2020009 | mysql   |  70.0 |  8 |
| 2020006 | mysql   |  60.0 |  9 |
| 2020002 | mysql   |  50.0 | 10 |
| 2020007 | mysql   |  50.0 | 10 |
+---------+---------+-------+----+
36 rows in set (0.01 sec)

In this way, various sorting requirements are realized.

NTILE

The NTILE function is used to rank each group, and then divide the corresponding group into N groups, for example

mysql> select stu_no,course,score, rank()over(partition by course order by score desc )rn,NTILE(2)over(partition by course order by score desc ) rn_group  from  tb_score;
+---------+---------+-------+----+----------+
| stu_no  | course  | score | rn | rn_group |
+---------+---------+-------+----+----------+
| 2020005 | C++     |  96.0 |  1 |        1 |
| 2020013 | C++     |  96.0 |  1 |        1 |
| 2020006 | C++     |  90.0 |  3 |        1 |
| 2020001 | C++     |  85.0 |  4 |        1 |
| 2020012 | C++     |  85.0 |  4 |        1 |
| 2020003 | C++     |  81.0 |  6 |        1 |
| 2020010 | C++     |  76.0 |  7 |        2 |
| 2020002 | C++     |  70.0 |  8 |        2 |
| 2020008 | C++     |  69.0 |  9 |        2 |
| 2020007 | C++     |  66.0 | 10 |        2 |
| 2020009 | C++     |  66.0 | 10 |        2 |
| 2020004 | C++     |  60.0 | 12 |        2 |
| 2020003 | English | 100.0 |  1 |        1 |
| 2020004 | English | 100.0 |  1 |        1 |
| 2020002 | English |  99.0 |  3 |        1 |
| 2020013 | English |  88.0 |  4 |        1 |
| 2020008 | English |  86.0 |  5 |        1 |
| 2020009 | English |  86.0 |  5 |        1 |
| 2020011 | English |  84.0 |  7 |        1 |
| 2020010 | English |  81.0 |  8 |        2 |
| 2020003 | English |  80.0 |  9 |        2 |
| 2020007 | English |  76.0 | 10 |        2 |
| 2020012 | English |  75.0 | 11 |        2 |
| 2020005 | English |  70.0 | 12 |        2 |
| 2020006 | English |  70.0 | 12 |        2 |
| 2020005 | mysql   |  98.0 |  1 |        1 |
| 2020001 | mysql   |  90.0 |  2 |        1 |
| 2020008 | mysql   |  90.0 |  2 |        1 |
| 2020011 | mysql   |  90.0 |  2 |        1 |
| 2020004 | mysql   |  80.0 |  5 |        1 |
| 2020003 | mysql   |  78.0 |  6 |        1 |
| 2020010 | mysql   |  75.0 |  7 |        2 |
| 2020009 | mysql   |  70.0 |  8 |        2 |
| 2020006 | mysql   |  60.0 |  9 |        2 |
| 2020002 | mysql   |  50.0 | 10 |        2 |
| 2020007 | mysql   |  50.0 | 10 |        2 |
+---------+---------+-------+----+----------+
36 rows in set (0.01 sec)

3. Summary of window functions

There are many other window functions in MySQL. This article lists some of them. You can test them yourself

 
category function explain
sort ROW_NUMBER Assign a sequence number to each row in the table. You can specify grouping (or not) and sorting fields
DENSE_RANK Each row in each group is assigned a sequence number based on the sort field. When the ranking value is the same, the sequence number is the same, and there is no gap in the sequence number (such as 1, 1, 2, 3)
RANK Each row in each group is assigned a sequence number based on the sort field. When the ranking value is the same, the sequence number is the same, but there is a gap in the sequence number (1,1,3,4)
NTILE According to the sorting field, each group is divided into corresponding groups according to the sorting of the specified field
distribution PERCENT_RANK Calculate the percentage level for each group or result set row
CUME_DIST Calculate the cumulative distribution of a value in an ordered set of data
around LEAD Returns the value of line N after the current line in the group. NULL if no corresponding row exists. For example, when N=1, the value corresponding to the first name is the second, and the result of the last name is NULL
LAG Returns the value of line N before the current line in the group. NULL if no corresponding row exists. For example, when N=1, the value corresponding to the first place is NUL, and the result of the last place is the value next to the last
Middle of head and tail FIRST_VALUE Returns the value of the field (or expression) corresponding to the first name in each group. For example, in this paper, it can be the value of any field such as the score and student number of the first name
LAST_VALUE Returns the value of the field (or expression) corresponding to the last name in each group. For example, in this paper, it can be the value of any field such as score, student number, etc. of the last name
NTH_VALUE

Returns the value of the corresponding field (or expression) ranking N in each group, but the value of the row less than N is NULL

The main window functions in MySQL are summarized first. It's suggested to practice them. In addition, many people have summarized the implementation of sorting methods in MySQL 5.7 and earlier versions, and it is also recommended to implement them.

For more information or technical communication, you can focus on WeChat public database, dry cargo shop or official account.

Tags: MySQL Database Oracle SQL

Posted on Sat, 27 Jun 2020 03:42:25 -0400 by always_confused