Basic introduction to OushuDB products - table

1. Watch

The OushuDB table consists of rows and columns. Each column has a column name and a data type. The number and order of columns in a table are fixed. The number of rows in a table is variable. SQL does not assume the order of rows in a table. When reading a table, the returned rows will appear in any order unless the display requires sorting. In addition, SQL does not give each row a unique identifier, so it is possible to have the same several rows in a table.

To create a table, you can use the create table command. In the command, you need to specify the table name, column name and column type. For example:

create table my_first_table (
    first_column text,
    second_column integer
);

The above command creates a two column table with one column of text type and one column of integer type. To delete the table you just created, you can use the drop table command.

drop table my_first_table;

2. Table storage format

OushuDB now supports multiple storage formats: AO, Parquet, ORC and MagmaAP. AO is stored by row, while Parquet, ORC and MagmaAP are stored by column. MagmaAP is a new storage format released in 4.0.0.0. Both MagmaAP and ORC support update / delete, support transactions, and MagmaAP also supports index.
Note: similar to GPDB, the previous version of OushuDB supports the CO format, but the CO format is not suitable for large clusters and many partitions. The subsequent new version removes the CO support.

For the table creation syntax of tables in various formats, several examples are given below.

# The AO table is created by default
CREATE TABLE rank1 (id int, rank int, year smallint,gender char(1), count int );

# As with the table created above, the storage format type is explicitly specified
CREATE TABLE rank2 (id int, rank int, year smallint,gender char(1), count int ) with (appendonly =true, orientation =row);

# Create a snappy compressed AO table
CREATE TABLE rank3 (id int, rank int, year smallint,gender char(1), count int ) with (appendonly =true, orientation =row, compresstype = snappy);

# Create a snappy compressed Parquet table. If the compression type is not specified, it will not be compressed by default.
CREATE TABLE rank3 (id int, rank int, year smallint,gender char(1), count int ) with (appendonly =true, orientation =parquet, compresstype = snappy);


# Create an uncompressed ORC table. If the compression type is not specified, it will not be compressed by default.
CREATE TABLE rank3 (id int, rank int, year smallint,gender char(1), count int ) with (appendonly =true, orientation =orc);

# To create an ORC table with compression, specify the compression type.
CREATE TABLE rank3 (id int, rank int, year smallint,gender char(1), count int ) with (appendonly =true, orientation =orc, compresstype = lz4);

# Create a compressed magma table, and the compression is automatically realized within magma.
CREATE TABLE rank3 (id int, rank int, year smallint,gender char(1), count int ) format 'magmaap';

# Create a magma table with a primary key, and the internal compression of magma is automatically realized.
CREATE TABLE rank3 (id int, rank int, year smallint,gender char(1), count int,primary key(id) ) format 'magmaap';

3. Table distribution

In OushuDB, tables can be distributed in two ways: Hash based distribution and Random distribution. The distribution method based on Hash is based on the Hash value of the distribution column, and the Random distribution mode is adopted for the Random distribution.

When the user does not specify the distribution method when creating a table, the Random distribution is used by default for non magmap tables. At present, the Random distribution is not supported for magmap tables. The following two examples are equivalent.

CREATE TABLE rank (id int, rank int, year smallint,gender char(1), count int );


CREATE TABLE rank (id int, rank int, year smallint,gender char(1), count int );
DISTRIBUTED RANDOMLY;

The following example creates a Hash distributed table. The distributed Key uses a combination of three columns (rank, gender and year), and the data is distributed into 32 bucket s.

If you do not specify bucketnum, default is used by default_hash_table_bucket_number is the value of the system parameter as bucketnum.

CREATE TABLE rank (id int, rank int, year smallint,gender char(1), count int )
WITH (bucketnum = 32)

DISTRIBUTED BY (rank, gender,year);

4. Selection of Hash distribution and Random distribution

Non Magma tables: Random distributed tables are more flexible, and there is no need to redistribute data after adding nodes in the system expansion. After the system is expanded, the Hash distributed tables need to redistribute the data in order to take advantage of the computing power of the newly added nodes. In addition, the table resource manager for Hash distribution allocates a fixed number of virtual segment s when allocating resources, which is not as flexible as Random distribution.

Hash distributed tables have performance benefits on some queries, because it is sometimes possible to avoid redistributing some tables.
For example, in the query of the following example, if the lineitem and orders tables are listed by l_orderkey and o_ If the orderkey is distributed, the query does not need to redistribute any table during execution, and the connection operation can be performed in parallel at each node.

SELECT l_orderkey, count(l_quantity)
FROM lineitem, orders
WHERE l_orderkey = o_orderkey

For most queries, experiments show that they are not network bottlenecks, and there is little difference between Hash distribution and Random distribution. Therefore, we recommend that users adopt Random distribution by default and use Hash distributed tables only for specific occasions that need to be optimized.

The Magma table has the advantages of Hash and random tables through default_magma_hash_table_nvseg_per_node to control the number of virtual segment s that each node can start. There is no need to redistribute data after system expansion.

5. Selection of bucket num in Hash distribution table

For Hash distributed tables, bucketnum determines the parallelism of a query. In some common hardware configurations (128G memory and 12 SAS disks), we recommend selecting 6 nodes or 8 nodes. If the hardware is better, you can add bucketnum. During system initialization, default_ Hash_ table_ bucket_ The initialization default value of number is 8 * number of nodes. Magma table uses default_magma_hash_table_nvseg_per_node, indicating the number of virtual segment s of each node

6. Table partition

For large data warehouse fact tables, we can often split a large table into multiple sub tables by partitioning the table. In this case, there are two benefits:
● the query optimizer can optimize the partition table. If the query is designed to only some partitions, the query plan only needs to scan these partitions to speed up the query
● if we partition by date, we can simply add partitions and delete expired partitions.
OushuDB supports two partitioning methods based on Range and List.
● Range partition: partition according to the value Range, such as date, price, etc
● List partition: partition according to a List of values, such as regions
Let's illustrate the use of these two partitions through examples.

Range partition

# Create a sales table, partition by date column Range, and create a partition every month from 2008 to 2009

postgres=# CREATE TABLE sales (id int, date date, amt decimal(10,2))
PARTITION BY RANGE (date)
( START (date '2008-01-01') INCLUSIVE
   END (date '2009-01-01') EXCLUSIVE
   EVERY (INTERVAL '1 month') );

View the information of the created table, and d + gives all the information of the table

postgres=# \d+ sales
              Append-Only Table "public.sales"
 Column |     Type      | Modifiers | Storage | Description
--------+---------------+-----------+---------+-------------
 id     | integer       |           | plain   |
 date   | date          |           | plain   |
 amt    | numeric(10,2) |           | main    |
Compression Type: None
Compression Level: 0
Block Size: 32768
Checksum: f
Child tables: sales_1_prt_1,
              sales_1_prt_10,
              sales_1_prt_11,
              sales_1_prt_12,
              sales_1_prt_2,
              sales_1_prt_3,
              sales_1_prt_4,
              sales_1_prt_5,
              sales_1_prt_6,
              sales_1_prt_7,
              sales_1_prt_8,
              sales_1_prt_9
Has OIDs: no
Options: appendonly=true
Distributed randomly
Partition by: (date)

You can also explicitly declare the sub partition and specify the sub table name.

CREATE TABLE sales_exp (id int, date date, amt decimal(10,2))
PARTITION BY RANGE (date)
( PARTITION Jan08 START (date '2008-01-01') INCLUSIVE ,
  PARTITION Feb08 START (date '2008-02-01') INCLUSIVE ,
  PARTITION Mar08 START (date '2008-03-01') INCLUSIVE ,
  PARTITION Apr08 START (date '2008-04-01') INCLUSIVE ,
  PARTITION May08 START (date '2008-05-01') INCLUSIVE ,
  PARTITION Jun08 START (date '2008-06-01') INCLUSIVE ,
  PARTITION Jul08 START (date '2008-07-01') INCLUSIVE ,
  PARTITION Aug08 START (date '2008-08-01') INCLUSIVE ,
  PARTITION Sep08 START (date '2008-09-01') INCLUSIVE ,
  PARTITION Oct08 START (date '2008-10-01') INCLUSIVE ,
  PARTITION Nov08 START (date '2008-11-01') INCLUSIVE ,
  PARTITION Dec08 START (date '2008-12-01') INCLUSIVE
                   END (date '2009-01-01') EXCLUSIVE );

View created table information

postgres=# \d+ sales_exp
            Append-Only Table "public.sales_exp"
 Column |     Type      | Modifiers | Storage | Description
--------+---------------+-----------+---------+-------------
 id     | integer       |           | plain   |
 date   | date          |           | plain   |
 amt    | numeric(10,2) |           | main    |
Compression Type: None
Compression Level: 0
Block Size: 32768
Checksum: f
Child tables: sales_exp_1_prt_apr08,
              sales_exp_1_prt_aug08,
              sales_exp_1_prt_dec08,
              sales_exp_1_prt_feb08,
              sales_exp_1_prt_jan08,
              sales_exp_1_prt_jul08,
              sales_exp_1_prt_jun08,
              sales_exp_1_prt_mar08,
              sales_exp_1_prt_may08,
              sales_exp_1_prt_nov08,
              sales_exp_1_prt_oct08,
              sales_exp_1_prt_sep08
Has OIDs: no
Options: appendonly=true
Distributed randomly
Partition by: (date)

The following is another example of partitioning according to Range. This time, integer columns are used for partitioning. We have added a DEFAULT PARTITION. If other partitions are not satisfied, the data will be inserted into the DEFAULT PARTITION.

CREATE TABLE rank (id int, rank int, year int, gender char(1), count int)
PARTITION BY RANGE (year)
( START (2001) END (2008) EVERY (1),
  DEFAULT PARTITION extra );

7. List partition

The following example creates a List based partitioned table. The List partition table can be based on any data type that supports equivalence comparison. For List partitions, you need to explicitly specify all sub partitions.

postgres=# CREATE TABLE rank (id int, rank int, year int, gender char(1), count int )
PARTITION BY LIST (gender)
( PARTITION girls VALUES ('F'),
  PARTITION boys VALUES ('M'),
  DEFAULT PARTITION other );
postgres=# \d+ rank
              Append-Only Table "public.rank"
 Column |     Type     | Modifiers | Storage  | Description
--------+--------------+-----------+----------+-------------
 id     | integer      |           | plain    |
 rank   | integer      |           | plain    |
 year   | integer      |           | plain    |
 gender | character(1) |           | extended |
 count  | integer      |           | plain    |
Compression Type: None
Compression Level: 0
Block Size: 32768
Checksum: f
Child tables: rank_1_prt_boys,
              rank_1_prt_girls,
              rank_1_prt_other
Has OIDs: no
Options: appendonly=true
Distributed randomly
Partition by: (gender)

8. Multilevel partition

You can use the subartition template to define multi-level partitions. The following example defines a two-level partition table. The first level installs the date column to partition the Range, and the second level partitions the List according to the region column.

CREATE TABLE sales (trans_id int, date date, amount decimal(9,2), region text)
PARTITION BY RANGE (date)
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE
( SUBPARTITION usa VALUES ('usa'),
  SUBPARTITION asia VALUES ('asia'),
  SUBPARTITION europe VALUES ('europe'),
  DEFAULT SUBPARTITION other_regions)
(START (date '2011-01-01') INCLUSIVE
 END (date '2012-01-01') EXCLUSIVE
 EVERY (INTERVAL '1 month'),
 DEFAULT PARTITION outlying_dates);

Note: when you use multi-level partitions, the system will generate a large number of small tables. Some tables may have no data or contain little data, which will put too much pressure on the system metadata management. It is not recommended to create tables with too many partitions. Generally, it is reasonable to limit the number of partitions to 100 or less.

9. View your partition design

You can use PG_ Use the partitions view to view your partition table design. For example, you can view the partition design of the sales table through the following statement.

postgres=# SELECT partitionboundary, partitiontablename, partitionname, partitionlevel, partitionrank
postgres-# FROM pg_partitions
postgres-# WHERE tablename='sales';
                                           partitionboundary                                          | partitiontablename | partitionname | partitionlevel | partitionrank
------------------------------------------------------------------------------------------------------+--------------------+---------------+----------------+---------------
 START ('2008-01-01'::date) END ('2008-02-01'::date) EVERY ('1 mon'::interval) WITH (appendonly=true) | sales_1_prt_1      |               |              0 |             1
 START ('2008-02-01'::date) END ('2008-03-01'::date) EVERY ('1 mon'::interval) WITH (appendonly=true) | sales_1_prt_2      |               |              0 |             2
 START ('2008-03-01'::date) END ('2008-04-01'::date) EVERY ('1 mon'::interval) WITH (appendonly=true) | sales_1_prt_3      |               |              0 |             3
 START ('2008-04-01'::date) END ('2008-05-01'::date) EVERY ('1 mon'::interval) WITH (appendonly=true) | sales_1_prt_4      |               |              0 |             4
 START ('2008-05-01'::date) END ('2008-06-01'::date) EVERY ('1 mon'::interval) WITH (appendonly=true) | sales_1_prt_5      |               |              0 |             5
 START ('2008-06-01'::date) END ('2008-07-01'::date) EVERY ('1 mon'::interval) WITH (appendonly=true) | sales_1_prt_6      |               |              0 |             6
 START ('2008-07-01'::date) END ('2008-08-01'::date) EVERY ('1 mon'::interval) WITH (appendonly=true) | sales_1_prt_7      |               |              0 |             7
 START ('2008-08-01'::date) END ('2008-09-01'::date) EVERY ('1 mon'::interval) WITH (appendonly=true) | sales_1_prt_8      |               |              0 |             8
 START ('2008-09-01'::date) END ('2008-10-01'::date) EVERY ('1 mon'::interval) WITH (appendonly=true) | sales_1_prt_9      |               |              0 |             9
 START ('2008-10-01'::date) END ('2008-11-01'::date) EVERY ('1 mon'::interval) WITH (appendonly=true) | sales_1_prt_10     |               |              0 |            10
 START ('2008-11-01'::date) END ('2008-12-01'::date) EVERY ('1 mon'::interval) WITH (appendonly=true) | sales_1_prt_11     |               |              0 |            11
 START ('2008-12-01'::date) END ('2009-01-01'::date) EVERY ('1 mon'::interval) WITH (appendonly=true) | sales_1_prt_12     |               |              0 |            12
(12 rows)

10. Add a partition

You can add a partition through the following statement.

ALTER TABLE sales ADD PARTITION
            START (date '2009-02-01') INCLUSIVE
            END (date '2009-03-01') EXCLUSIVE;

If you did not use the subpartition template when creating the table, you need to give the definition of the subpartition when adding the partition, for example:

ALTER TABLE sales ADD PARTITION
    START (date '2009-02-01') INCLUSIVE
    END (date '2009-03-01') EXCLUSIVE
      ( SUBPARTITION usa VALUES ('usa'),
        SUBPARTITION asia VALUES ('asia'),
        SUBPARTITION europe VALUES ('europe') );

You can also modify a secondary partition separately:

CREATE TABLE sales_two_level (trans_id int, date date, amount decimal(9,2), region text)
PARTITION BY RANGE (date)
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE
( SUBPARTITION usa VALUES ('usa'),
  SUBPARTITION asia VALUES ('asia'),
  SUBPARTITION europe VALUES ('europe'))
(START (date '2011-01-01') INCLUSIVE
 END (date '2012-01-01') EXCLUSIVE
 EVERY (INTERVAL '1 month'),
 DEFAULT PARTITION outlying_dates);

ALTER TABLE sales_two_level ALTER PARTITION FOR (RANK(12))
   ADD PARTITION africa VALUES ('africa');

Where RANK(12) represents the 12th partition.
Note: specify a partition to use

PARTITION FOR (value) or PARTITION FOR(RANK(number))Grammar.

If your partition table has a Default partition, you cannot add partitions to the partition table. You can only add partitions by splitting the Default partition.

11. Rename partition

Partitioned tables use the following naming convention. Partitioned subtable names are subject to uniqueness requirements and length limitations.
Partition tables use the following naming rules.
<parentname>_<level>_prt_<partition_name>
For example: sales_1_prt_jan08 refers to the partition whose parent table name is sales and the first level partition name is jan08. When creating a Range partition table, if no partition name is specified, the partition name will be automatically generated as a number.
Changing the name of the parent table will also change the name of the partition table. For example:

postgres=# ALTER TABLE sales_two_level RENAME TO globalsales;

postgres=# \d+ globalsales
            Append-Only Table "public.globalsales"
  Column  |     Type     | Modifiers | Storage  | Description
----------+--------------+-----------+----------+-------------
 trans_id | integer      |           | plain    |
 date     | date         |           | plain    |
 amount   | numeric(9,2) |           | main     |
 region   | text         |           | extended |
Compression Type: None
Compression Level: 0
Block Size: 32768
Checksum: f
Child tables: globalsales_1_prt_10,
              globalsales_1_prt_11,
              globalsales_1_prt_12,
              globalsales_1_prt_13,
              globalsales_1_prt_2,
              globalsales_1_prt_3,
              globalsales_1_prt_4,
              globalsales_1_prt_5,
              globalsales_1_prt_6,
              globalsales_1_prt_7,
              globalsales_1_prt_8,
              globalsales_1_prt_9,
              globalsales_1_prt_outlying_dates
Has OIDs: no
Options: appendonly=true
Distributed randomly
Partition by: (date)

You can change the name of a partition, for example:

ALTER TABLE sales RENAME PARTITION FOR ('2008-01-01') TO jan08;

12. Add a default partition

You can use the Alter command to add a default partition. Partitions that do not meet any partition conditions will enter the default partition.

ALTER TABLE sales ADD DEFAULT PARTITION other;

ALTER TABLE sales ALTER PARTITION FOR (RANK(1)) ADD DEFAULT PARTITION other;

13. Delete a partition

You can delete a partition with the Alter command. If a partition has sub partitions, its sub partitions will also be deleted when the partition is deleted.
For a partitioned fact table, deleting a partition is often used to delete partition data outside the retention time window.

ALTER TABLE sales DROP PARTITION FOR (RANK(1));

14. Truncate partition

You can Truncate a partition with the Alter command. When Truncate a partition, its child partitions will also be Truncate.

ALTER TABLE sales TRUNCATE PARTITION FOR (RANK(1));

15. Swap partition

You can use the Alter Table command to Swap a partition. The Swap partition operation swaps a table with an existing partition. You can only Swap leaf node partitions.
Partition switching is often useful for data loading. For example, you can first load data into an intermediate table, and then exchange the intermediate table into the partitioned table.
You can also use partition exchange to change the type of partition table. For example:

CREATE TABLE sales (id int, date date, amt decimal(10,2))
PARTITION BY RANGE (date)
( START (date '2008-01-01') INCLUSIVE
END (date '2009-01-01') EXCLUSIVE
EVERY (INTERVAL '1 month') );

CREATE TABLE jan (LIKE sales) WITH (appendonly=true, orientation=parquet, compresstype = snappy);

INSERT INTO jan SELECT * FROM sales_1_prt_1 ;

ALTER TABLE sales EXCHANGE PARTITION FOR (RANK(1)) WITH TABLE jan;

16. Partition splitting

You can use Alter to split an existing partition, such as sales in the following example_ The split partition table is divided into two sub partitions: jan081to15 and jan0816to31.

CREATE TABLE sales_split (id int, date date, amt decimal(10,2))
PARTITION BY RANGE (date)
( START (date '2008-01-01') INCLUSIVE
END (date '2009-01-01') EXCLUSIVE
EVERY (INTERVAL '1 month') );

ALTER TABLE sales_split SPLIT PARTITION FOR ('2008-01-01')
AT ('2008-01-16')
INTO (PARTITION jan081to15, PARTITION jan0816to31);

If your partition table has a Default partition, you can only add sub partitions by splitting the Default partition. For example, the following example adds a jan2009 partition by splitting the Default partition.

CREATE TABLE sales_split_default (id int, date date, amt decimal(10,2))
PARTITION BY RANGE (date)
( START (date '2008-01-01') INCLUSIVE
END (date '2009-01-01') EXCLUSIVE
EVERY (INTERVAL '1 month'), DEFAULT PARTITION extra);

ALTER TABLE sales_split_default SPLIT DEFAULT PARTITION
START ('2009-01-01') INCLUSIVE
END ('2009-02-01') EXCLUSIVE
INTO (PARTITION jan2009, default partition);

17. Modify sub partition template

You can modify the sub partition template through the Alter command. First create a two-level partition table.

CREATE TABLE sales (trans_id int, date date, amount decimal(9,2), region text)
  DISTRIBUTED BY (trans_id)
  PARTITION BY RANGE (date)
  SUBPARTITION BY LIST (region)
  SUBPARTITION TEMPLATE
    ( SUBPARTITION usa VALUES ('usa'),
      SUBPARTITION asia VALUES ('asia'),
      SUBPARTITION europe VALUES ('europe'),
      DEFAULT SUBPARTITION other_regions )
  ( START (date '2014-01-01') INCLUSIVE
    END (date '2014-04-01') EXCLUSIVE
    EVERY (INTERVAL '1 month') );

The following command modifies the sub partition template.

ALTER TABLE sales SET SUBPARTITION TEMPLATE
( SUBPARTITION usa VALUES ('usa'),
  SUBPARTITION asia VALUES ('asia'),
  SUBPARTITION europe VALUES ('europe'),
  SUBPARTITION africa VALUES ('africa'),
  DEFAULT SUBPARTITION regions );

The following command can delete the sub partition template.

ALTER TABLE sales SET SUBPARTITION TEMPLATE ();

Partition an existing non partitioned table
To partition an existing table, you need to create a new partitioned table and import the data of the table to be partitioned into the new table. And assign relevant permissions.

CREATE TABLE sales2 (LIKE sales)
PARTITION BY RANGE (date)
( START (date '2008-01-01') INCLUSIVE
   END (date '2009-01-01') EXCLUSIVE
   EVERY (INTERVAL '1 month') );

INSERT INTO sales2 SELECT * FROM sales;
DROP TABLE sales;

ALTER TABLE sales2 RENAME TO sales;
GRANT ALL PRIVILEGES ON sales TO admin;
GRANT SELECT ON sales TO guest;

Posted on Wed, 24 Nov 2021 09:30:34 -0500 by dawnrae