Introduction to Postgresql table connection method (comparison test with Oracle)

Author: Yang Yunlong, core member of Hangao PG laboratory, Senior Database Engineer, good at HGDB, post gresql, Oracle and other mainstream databases.

Introduction to table connection methods (Nested Loop/Hash Join/Merge Join/Join) database version (oracle11.2.0.4 and PostgreSQL 13.1)

Environmental structure

-- oracle11.2.0.4
--City, country
drop table country purge;
CREATE TABLE country (
country_id int primary key,
country_name VARCHAR(50) NOT NULL
);

drop table city purge;
CREATE TABLE city (
city_id int primary key,
city_name VARCHAR(50) NOT NULL,
country_id int NOT NULL
);

begin
for i in 1 .. 10 loop
insert into country values(i,'country'||i);
end loop;
commit;
end;

begin
for i in 1 .. 10000 loop
insert into city values(i,'city'||i,ceil(i/1000));
end loop;
commit;
end;

execute dbms_stats.gather_table_stats(ownname =>'SCOTT',tabname => 'CITY' ,estimate_percent=> 100 ,cascade => true);
execute dbms_stats.gather_table_stats(ownname =>'SCOTT',tabname => 'COUNTRY' ,estimate_percent=> 100 ,cascade => true);

-- PostgreSQL 13.1
drop table country;
CREATE TABLE country (
country_id integer primary key,
country_name text NOT NULL
);

drop table city purge;
CREATE TABLE city (
city_id integer primary key,
city_name text NOT NULL,
country_id integer NOT NULL
);

insert into country values (generate_series(1,10),'country'||generate_series(1,10));

insert into city values(generate_series(1,10000),'city'||generate_series(1,10000),ceil(random()*(10-1)+1));

analyze city;
analyze country;

Nested Loop

As shown in the figure above, it is introduced in Nested Loop mode. Its algorithm: the driver table returns a row of data, and transmits the value to the driven table through the connection column. The driven table will be scanned as many times as many rows the driver table returns. Driven table index mode index unique scan or range scan. When there are few driven table data subsets, Nested Loop is a better choice, that is, it is suitable for queries with small result set. Generally, more than 10000 rows are considered as large rows, which will become inefficient. Examples are as follows:

Oracle execution example
SQL> explain plan for select city_name,country_name from city,country where city.country_id=country.country_id and city.city_id=99;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 2738185913

---------------------------------------------------------------------------------------------
| Id  | Operation		     | Name	    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	     |		    |	  1 |	 93 |	  2   (0)| 00:00:01 |
|   1 |  NESTED LOOPS		     |		    |	  1 |	 93 |	  2   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CITY	    |	  1 |	 53 |	  1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN	     | SYS_C0018242 |	  1 |	    |	  1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| COUNTRY	    |	  1 |	 40 |	  1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN	     | SYS_C0018239 |	  1 |	    |	  0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CITY"."CITY_ID"=99)
   5 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

18 rows selected.

Sentence comparison

--The following way oracle Will choose hash  Full table scan
explain plan for select city_name,country_name from city,country where city.country_id=country.country_id and city.city_id>=10;

--Mandatory use nested loop mode
explain plan for select /*+ leading(city) use_nl(country) */ city_name,country_name from city,country
 where city.country_id=country.country_id and city.city_id>=10;

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 103883790

---------------------------------------------------------------------------------------------
| Id  | Operation		     | Name	    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	     |		    |  9991 |	907K| 10005   (1)| 00:02:01 |
|   1 |  NESTED LOOPS		     |		    |  9991 |	907K| 10005   (1)| 00:02:01 |
|   2 |   NESTED LOOPS		     |		    |  9991 |	907K| 10005   (1)| 00:02:01 |
|*  3 |    TABLE ACCESS FULL	     | CITY	    |  9991 |	517K|	 11   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN	     | SYS_C0018239 |	  1 |	    |	  0   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| COUNTRY	    |	  1 |	 40 |	  1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("CITY"."CITY_ID">=10)
   4 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

Note
-----
   - dynamic sampling used for this statement (level=2)

22 rows selected.
--As shown above, the driver performs 9991 times, that is (10000)-9),Resource consumption increased significantly
Postgresql execution example
mydb=# explain analyze select city_name,country_name from city,country where city.country_id=country.country_id and city.city_id=99;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.44..16.49 rows=1 width=40) (actual time=0.031..0.036 rows=1 loops=1)
   ->  Index Scan using city_pkey on city  (cost=0.29..8.30 rows=1 width=12) (actual time=0.015..0.017 rows=1 loops=1)
         Index Cond: (city_id = 99)
   ->  Index Scan using country_pkey on country  (cost=0.15..8.17 rows=1 width=36) (actual time=0.006..0.006 rows=1 loops=1)
         Index Cond: (country_id = city.country_id)
 Planning Time: 0.621 ms
 Execution Time: 0.098 ms
(7 rows)

--Return multiple lines, mandatory nested loop,Than the default selection hash join More than 30 ms
mydb=# set enable_hashjoin=off;
SET
mydb=# set enable_mergejoin=off;
SET
mydb=# explain analyze select city_name,country_name from city,country where city.country_id=country.country_id and city.city_id>=10;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.15..1934.29 rows=9991 width=40) (actual time=0.050..43.354 rows=9991 loops=1)
   ->  Seq Scan on city  (cost=0.00..188.00 rows=9991 width=12) (actual time=0.025..4.870 rows=9991 loops=1)
         Filter: (city_id >= 10)
         Rows Removed by Filter: 9
   ->  Index Scan using country_pkey on country  (cost=0.15..0.17 rows=1 width=36) (actual time=0.002..0.002 rows=1 loops=9991)
         Index Cond: (country_id = city.country_id)
 Planning Time: 0.393 ms
 Execution Time: 45.366 ms
(8 rows)

PG Nested loop description

Postgresql uses nested loops to execute queries. It can iterate to represent all entries in 1 and all entries in 2, and then issue a row when the row pairs in Table 1 and table 2 meet the filter criteria. Nested loops are the only connection algorithm Postgresql can use to handle any connection. For example, in the following statement, Oracle will select Merge Join, and Postgresql will select nested lo OP, please refer to the Merge Join section for details.

--Even if the amount of data is large, the following statement pg Also choose Nested Loop
explain  select a.city_name,b.city_id from city a,city2 b where a.country_id<b.country_id;

Hash Join

When two tables are associated, they mainly rely on hash operation to obtain the table connection mode of the result set, and only support equivalence. Algorithm: two tables are connected equivalently, return a large amount of data, select fewer tables as the driving table, and read the relevant columns of the driving table into the work area in PGA (PG is put into the memory work_mem) , then hash the connected columns of the drive table to generate a hash table, then read the driven table and hash the associated columns, and then probe the hash table in PGA to find the data association. If the hash table is too large to be constructed in memory at one time, it will be divided into several partition s and written to the temporary segment of the disk, which will increase the cost of writing and reduce the efficiency

The figure above shows the Hash process. If the statistics are accurate, the database will automatically select the best execution plan. When the ORDERED prompt is used, the first table in the FROM clause will be used to create a Hash table.

Oracle example
-- country As drive table,Whatever you use join(left/right/full)Way, Oracle The small table will be selected as the driving table.
SQL> explain plan for select city_name,country_name from city,country where city.country_id=country.country_id;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 114462077

------------------------------------------------------------------------------
| Id  | Operation	   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	     | 10000 |	 781K|	  14   (0)| 00:00:01 |
|*  1 |  HASH JOIN	   |	     | 10000 |	 781K|	  14   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| COUNTRY |	  10 |	 400 |	   3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| CITY    | 10000 |	 390K|	  11   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

Note
-----
   - dynamic sampling used for this statement (level=2)

19 rows selected.

Postgresql example
mydb=# explain analyze select city_name,country_name from city,country where city.country_id=country.country_id;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=38.58..227.91 rows=10000 width=40) (actual time=0.185..12.086 rows=10000 loops=1)
   Hash Cond: (city.country_id = country.country_id)
   ->  Seq Scan on city  (cost=0.00..163.00 rows=10000 width=12) (actual time=0.026..3.735 rows=10000 loops=1)
   ->  Hash  (cost=22.70..22.70 rows=1270 width=36) (actual time=0.064..0.067 rows=10 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 17kB
         ->  Seq Scan on country  (cost=0.00..22.70 rows=1270 width=36) (actual time=0.025..0.031 rows=10 loops=1)
 Planning Time: 1.953 ms
 Execution Time: 13.983 ms
(8 rows)

In order to create a hash table using a small table, the optimizer will convert it. For example, right join is changed to Left Join

mydb=# explain analyze select * from country right join city on city.country_id=country.country_id;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=38.58..227.91 rows=10000 width=52) (actual time=0.087..11.883 rows=10000 loops=1)
   Hash Cond: (city.country_id = country.country_id)
   ->  Seq Scan on city  (cost=0.00..163.00 rows=10000 width=16) (actual time=0.020..3.035 rows=10000 loops=1)
   ->  Hash  (cost=22.70..22.70 rows=1270 width=36) (actual time=0.044..0.047 rows=10 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 17kB
         ->  Seq Scan on country  (cost=0.00..22.70 rows=1270 width=36) (actual time=0.024..0.028 rows=10 loops=1)
 Planning Time: 0.413 ms
 Execution Time: 13.588 ms
(8 rows)

Merge Join

When two tables are connected, sorting and merging operations are used to obtain the connection mode of the result set. Sorting mainly deals with non equivalent management. Algorithm: first sort the two tables according to the connection column. The nested loop matches the data from the index of the driven table, and the sorting merge connection is from memory (work area in PGA) Strictly speaking, there are no driven tables in the matching data in the. Fewer tables can be considered as driven tables. HASH JOIN only needs to put the driven tables into PGA, but sorting join merging needs to put the result sets of both tables into PGA

The figure above shows the general process of Merge. If the statistics are accurate, the database will automatically select the best execution plan.

Oracle example
col PLAN_TABLE_OUTPUT for a100
set lines 200 pages 999
explain plan for select a.city_name,b.country_name from city a,country b where a.country_id<b.country_id;

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1026867539

------------------------------------------------------------------------------------------------------
| Id  | Operation		      | Name	     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	      | 	     | 45000 |	1054K|	     |	  62   (2)| 00:00:01 |
|   1 |  MERGE JOIN		      | 	     | 45000 |	1054K|	     |	  62   (2)| 00:00:01 |
|   2 |   SORT JOIN		      | 	     |	  10 |	 120 |	     |	   2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| COUNTRY      |	  10 |	 120 |	     |	   2   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN	      | SYS_C0018239 |	  10 |	     |	     |	   1   (0)| 00:00:01 |
|*  5 |   SORT JOIN		      | 	     | 10000 |	 117K|	 408K|	  60   (2)| 00:00:01 |
|   6 |    TABLE ACCESS FULL	      | CITY	     | 10000 |	 117K|	     |	  11   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access(INTERNAL_FUNCTION("A"."COUNTRY_ID")<INTERNAL_FUNCTION("B"."COUNTRY_ID"))
       filter(INTERNAL_FUNCTION("A"."COUNTRY_ID")<INTERNAL_FUNCTION("B"."COUNTRY_ID"))

19 rows selected.
Postgresql example

merge join requires that the keys of two joined tables are sorted first. If there is an index, there is no sorting process. The advantage of merge join is that both tables are scanned only once. If there is a nested loop, one table is scanned once and the other table is cycled many times

In Postgresql, Merge Join is only applicable to natural connection and equivalent connection. As shown in the following example, even if enable_nestloop is turned off, pg will be transformed into Nested Loop in this connection mode, and the initial cost estimation is very large. Therefore, Postgresql is different from Oracle mechanism.

mydb=# analyze city;
ANALYZE
mydb=# analyze country;
ANALYZE
mydb=# set enable_nestloop=off;
SET
mydb=#  explain analyze select a.city_name,b.country_name from city a,country b where a.country_id<b.country_id;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=10000000000.00..10000016637.12 rows=333333 width=18) (actual time=0.104..419.608 rows=400687 loops=1)
   Join Filter: (a.country_id < b.country_id)
   Rows Removed by Join Filter: 599313
   ->  Seq Scan on city a  (cost=0.00..1636.00 rows=100000 width=13) (actual time=0.066..20.058 rows=100000 loops=1)
   ->  Materialize  (cost=0.00..1.15 rows=10 width=13) (actual time=0.000..0.001 rows=10 loops=100000)
         ->  Seq Scan on country b  (cost=0.00..1.10 rows=10 width=13) (actual time=0.014..0.019 rows=10 loops=1)
 Planning Time: 0.649 ms
 Execution Time: 457.499 ms
(8 rows)

hash join is the fastest connection mode only when it is executed in memory. When pg thinks that the hash connection needs more memory, it will select merge join, which can use the disk more effectively. In Postgresql, merge join is only applicable to connections with equal conditions. Its working principle is better than the first row (minimum value) of each output , if they are equal, add a row to the result and delete the two rows that have been compared from input; if the fields are different, delete the one with the smallest id and continue to compare with the next smallest.

Postgres can perform merging on disk very effectively by saving a small part of the data in memory, merging the data, writing the merging results to disk, and then reading more input from disk to memory. This method reads and writes a large amount of data on disk at one time, which is more efficient than Hash random reading.

mydb=# explain  select a.city_name,b.city_id from city a,city2 b where a.country_id=b.country_id and a.city_id<30000;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Hash Join  (cost=3274.21..3764393.24 rows=333242482 width=13)
   Hash Cond: (a.country_id = b.country_id)
   ->  Index Scan using city_pkey on city a  (cost=0.29..1052.87 rows=30033 width=13)
         Index Cond: (city_id < 30000)
   ->  Hash  (cost=1634.63..1634.63 rows=99863 width=8)
         ->  Seq Scan on city2 b  (cost=0.00..1634.63 rows=99863 width=8)
(6 rows)

mydb=# set enable_hashjoin=off;
SET
mydb=# explain  select a.city_name,b.city_id from city a,city2 b where a.country_id=b.country_id and a.city_id<30000;
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Merge Join  (cost=13213.55..5012000.94 rows=333242482 width=13)
   Merge Cond: (a.country_id = b.country_id)
   ->  Sort  (cost=3286.46..3361.55 rows=30033 width=13)
         Sort Key: a.country_id
         ->  Index Scan using city_pkey on city a  (cost=0.29..1052.87 rows=30033 width=13)
               Index Cond: (city_id < 30000)
   ->  Sort  (cost=9927.09..10176.74 rows=99863 width=8)
         Sort Key: b.country_id
         ->  Seq Scan on city2 b  (cost=0.00..1634.63 rows=99863 width=8)
(9 rows)

The Merge Join method works better when memory is not enough

When the amount of data is very large, Merge Join will be selected. If there is no index, Materialize will be selected to cache the results in memory or files to avoid multiple full table scans, as shown below:

mydb=# create  unlogged table tbl1(id int, info text) with (autovacuum_enabled=off);
CREATE TABLE
mydb=#  create  unlogged table tbl2(id int, info text) with (autovacuum_enabled=off);
CREATE TABLE
mydb=# insert into tbl1 select generate_series(1,10000000);  
INSERT 0 10000000
mydb=# insert into tbl2 select generate_series(1,10000000);  
INSERT 0 10000000
mydb=#  explain select * from tbl1,tbl2 where tbl1.id=tbl2.id;  
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Merge Join  (cost=2075531.84..2370522825.11 rows=157893676470 width=72)
   Merge Cond: (tbl1.id = tbl2.id)
   ->  Sort  (cost=1037765.92..1051814.66 rows=5619496 width=36)
         Sort Key: tbl1.id
         ->  Seq Scan on tbl1  (cost=0.00..100442.96 rows=5619496 width=36)
   ->  Materialize  (cost=1037765.92..1065863.40 rows=5619496 width=36)
         ->  Sort  (cost=1037765.92..1051814.66 rows=5619496 width=36)
               Sort Key: tbl2.id
               ->  Seq Scan on tbl2  (cost=0.00..100442.96 rows=5619496 width=36)
(9 rows)

mydb=#  explain select * from tbl1,tbl2 where tbl1.id<tbl2.id;  
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Nested Loop  (cost=0.00..720393977232.90 rows=10526245098005 width=72)
   Join Filter: (tbl1.id < tbl2.id)
   ->  Seq Scan on tbl1  (cost=0.00..100442.96 rows=5619496 width=36)
   ->  Materialize  (cost=0.00..172443.44 rows=5619496 width=36)
         ->  Seq Scan on tbl2  (cost=0.00..100442.96 rows=5619496 width=36)
(5 rows)

mydb=# set enable_mergejoin=off;
SET
mydb=#  explain select * from tbl1,tbl2 where tbl1.id=tbl2.id;  
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Hash Join  (cost=214589.66..5526742276.56 rows=157893676470 width=72)
   Hash Cond: (tbl1.id = tbl2.id)
   ->  Seq Scan on tbl1  (cost=0.00..100442.96 rows=5619496 width=36)
   ->  Hash  (cost=100442.96..100442.96 rows=5619496 width=36)
         ->  Seq Scan on tbl2  (cost=0.00..100442.96 rows=5619496 width=36)
(5 rows)

mydb=#  explain select * from tbl1,tbl2 where tbl1.id=tbl2.id and tbl1.id<2000;  
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Merge Join  (cost=1449858.75..790941515.17 rows=52631216124 width=72)
   Merge Cond: (tbl1.id = tbl2.id)
   ->  Sort  (cost=412092.83..416775.74 rows=1873165 width=36)
         Sort Key: tbl1.id
         ->  Seq Scan on tbl1  (cost=0.00..114491.70 rows=1873165 width=36)
               Filter: (id < 2000)
   ->  Materialize  (cost=1037765.92..1065863.40 rows=5619496 width=36)
         ->  Sort  (cost=1037765.92..1051814.66 rows=5619496 width=36)
               Sort Key: tbl2.id
               ->  Seq Scan on tbl2  (cost=0.00..100442.96 rows=5619496 width=36)
(10 rows)

Advantages and disadvantages of three connection methods

There is also a connection called cross join / Cartesian product. There are no conditions for the association of two tables. The result set is the product of the number of rows of two tables. This method will not be used. It will not be discussed this time.

categoryNested LoopHash JoinMerge Join
Service conditionsAny conditionEquivalent connectionNon equivalent, except '< >'
Use resourcesCPU, disk IOMemory, temporary spaceMemory, temporary space
characteristicHigh selection index or restricted search has high timeliness and can quickly return the first search resultsMany table records, lack of effective index and high efficiencyNon equivalent connection, high efficiency
shortcomingLarge amount of data and low efficiency, especially when there is no index or insufficient restrictionsRequires a lot of memory and may use diskAll tables need to be sorted, and the results are returned finally. PG sorts the memory in batch order

Join connection type

inner Join
--The result contains only records that meet the connection conditions
select a.id,a.col1,b.col2 from a,b where a.id=b.id;
--or
select a.id,a.col1,b.col2 from a inner join b on a.id=b.id;
--or
select id,a.col1,b.col2 from a inner join b using(id);
Full outer Join
SELECT * FROM TableA FULL OUTER JOIN TableB ON TableA.name = TableB.name;
	ID NAME 	  ID NAME
---------- ------ ---------- ------
	 1 aaa		   1 aaa
			       2 bbb
	 3 ccc

--If there is no matching record, null As values, the same result set can be removed
SELECT * FROM TableA FULL OUTER JOIN TableB ON TableA.name = TableB.name 
where tablea.id is null or tableb.id is null;
	ID      NAME  ID       NAME
---------- ------ ------ ------
			        2     bbb
	 3       ccc
left/right join
--As shown in the figure above, the A Table all results, B It's no use null As value,Except for the same value, right And left Just the opposite
SELECT * FROM TableA left JOIN TableB ON TableA.name = TableB.name;
	ID NAME 	  ID NAME
---------- ------ ---------- ------
	 1 aaa		   1 aaa
	 3 ccc
--You can remove the same result set
SELECT * FROM TableA left JOIN TableB ON TableA.name = TableB.name 
where  tableb.id is null;
	ID NAME 	  ID NAME
---------- ------ ---------- ------
	 3 ccc

union introduction

The UNION operator is used to combine the result sets of two or more SELECT statements.

Note that the SELECT statement inside the UNION must have the same number of columns. Columns must also have similar data types. At the same time, the order of columns in each SELECT statement must be the same. UNION only selects records and has the function of de duplication, while UNION ALL lists all records.

SELECT name FROM TableA UNION SELECT name FROM TableB;
NAME
------------------------------
aaa
bbb

SELECT name FROM TableA UNION all SELECT name FROM TableB;

NAME
------------------------------
aaa
aaa
bbb

In some cases, union can be used instead of or to provide query efficiency

select * from t1 where owner='SCOTT' or object_id in (select object_id from t2);
--The above statement uses FILTER,Can adjust sql,perhaps t2.object_id Create index
select * from t1 where owner='SCOTT' 
union 
select * from t1 where object_id in (select object_id from t2);

--Column to row example
select employee,child_1 as child from personnel
union all
select employee,child_2 as child from personnel;

Self connection

Paging query
--Four layer paging is more suitable for paging queries with large pages. The small returned result set is used as the driven table and the driven table is scanned.
with tmp as
(SELECT * FROM
    ( SELECT rid, ROWNUM as RN
       FROM
           (SELECT rowid as rid
             FROM t1
             where owner='SYS'
             order by object_id desc
            ) WHERE ROWNUM <= 500
   )  WHERE RN > 490
) 
select  /*+ use_nl(a) leading(b) */ owner,object_name,object_id,rn
from t1 a,tmp b
where a.rowid=b.rid;
--three layers select Is commonly used
select * from (select owner,object_name,object_id,rownum as rn
 from(select owner,object_name,object_id from t1 where owner='SYS' order by object_id desc) 
 where rownum<=500) 
 where rn>490;
Delete duplicate lines
--Extreme function
delete from products p1
where rowid < (
   select max(p2.rowid) from products p2
   where p1.name = p2.name
   and p1.price =p2.price
);
--Use non equivalent connection
delete from products p1
where exists (
   select * from products p2
   where p1.name = p2.name
   and p1.price = p2.price
   and p1.rowid < p2.rowid
);
The associated subquery compares rows with rows eg
--Seek the same year as the previous year's turnover
select year,sale from sales s1
where sale = (
   select sale from sales s2 
   where s2.year = s1.year-1
) order by year;

--or
select s1.year,s1.sale from sales s1,sales s2
where s2.sale =s1.sale
and s2.year = s1.year-1
order by year;

--Comparison of the same period
select s1.year,s1.sale,
case when s1.sale = s2.sale then 'Flat'
     when s1.sale > s2.sale then 'increase'
     when s1.sale < s2.sale then 'reduce'
else '-' end as var
from sales s1,sales s2
where s2.year = s1.year -1
order by year;     

Reference documents

  • Join introduce: https://www.cnblogs.com/xqzt/p/4469673.html
  • Join introduce: https://logicalread.com/
  • pg join method: http://www.jasongj.com/2015/03/07/Join1/
  • pg join method: https://developer.aliyun.com/article/70992
  • pg join method: https://www.interdb.jp/pg/pgsql03.html
  • pg Merge Joins: https://malisper.me/postgres-merge-joins/
  • Table connection method reference: Book core ideas of SQL optimization
  • SQL connection example reference: Advanced SQL tutorial

Tags: Database Oracle PostgreSQL

Posted on Tue, 09 Nov 2021 06:34:37 -0500 by peranha