Matching and intersection of PostgreSQL JSONB

PostgreSQL has supported JSONB for more than ten years. Over the past ten years, the community has provided many powerful functions for JSONB. Personally, matching is the most commonly used operation  @> .

If JSON data is regarded as an abstract syntax tree (AST), this operator determines whether the right parameter is a subgraph of the left parameter.

There should be a diagram here, but there is a data set temporarily at the weekend, so there is no time to find the right tool. To give a few examples, the following example gets true, which should be well understood:

select '{"a": 1, "b": 2, "c": 3}'::jsonb @> '{"b":2}' ;
--------------
t

It can also match more complex situations. The following example is also true:

select '{"a": 1, "b": 2, "c": {"value": 3}}'::jsonb @> '{"c":{"value": 3}}';
 ?column?
----------
 t
(1 row)

The following example may be a little confused for new users, but it also fits the rule well:

select '{"a": 1, "b": 2, "c": {"value": 3}}'::jsonb @> '{"c":{}}';
 ?column?
----------
 t
(1 row)

However, it should be noted that the result of the following example is false:

select '{"a": 1, "b": 2, "c": {"value": 3}}'::jsonb @> '{"c":[]}';
 ?column?
----------
 f
(1 row)

It's not hard to understand, {}   and   []   Unequal.

The following example is interesting:

select '{"a": 1, "b": 2, "c": {"value": [1, 2, 3]}}'::jsonb @> '{"c":{"value": [2]}}';
 ?column?
----------
 t
(1 row)

It should be noted here that when comparing whether one JSON array matches the other, it does not require the order of the two arrays to be equal, as long as the right is the true subset on the left:

select '{"a": 1, "b": 2, "c": {"value": [1, 2, 3]}}'::jsonb @> '{"c":{"value": [2]}}';
 ?column?
----------
 t
(1 row)

select '{"a": 1, "b": 2, "c": {"value": [1, 2, 3]}}'::jsonb @> '{"c":{"value": [5, 2]}}';
 ?column?
----------
 f
(1 row)

select '{"a": 1, "b": 2, "c": {"value": [1, 2, 3]}}'::jsonb @> '{"c":{"value": [3, 2]}}';
 ?column?
----------
 t
(1 row)

This rule is in line with the inverted index of PostgreSQL, the gin index of PostgreSQL, the JSONB field type and the matching operation @ > and becomes a very powerful combination. In the past few years, I used to add a meta field of JSONB type to some important business tables and build a gin index on it

create index idx_xxx_meta on xxx using(gin);

Note the create index syntax when specifying the index type.

This design can solve many problems that are difficult to solve traditionally. For example, I can mark a tag list for each item. Taking an item with several tags is a simple matching query:

select xxx from data_table where meta @> '{"tags": ["tag1", "tagx", "tagy"]}'

With the help of gin index, the performance of this search is enough for conventional Internet applications.

Even my colleagues in the CSDN NLP group have discovered new usage. In a table that stores tree nodes, we save a meta field, in which there is a path list to store the path of the current field in the tree. Each item is   {"id": node_id, "title": something} and when we search for all child nodes under a node, including its inter generational child nodes, we only need to execute such a query:

select xxx from tree_node where meta @> '{"path": [{"id": node_id}]}'

Of course, this matching operation also has its limitations. It will succeed only when the right is the true subgraph on the left. For example, if I want to find any of my search terms in the tags list (that is, there is a non empty intersection between the two), this method will not work. At this point, we need another operator  ?|

select '["tag1", "tag2", "tag3"]'::jsonb ?| '{tag2, tag3}';
 ?column?
----------
 t
(1 row)

select '["tag1", "tag2", "tag3"]'::jsonb ?| '{tag2, tag3, tag5}';
 ?column?
----------
 t
(1 row)

select '["tag1", "tag2", "tag3"]'::jsonb ?| '{tag5}';
 ?column?
----------
 f
(1 row)

Note these examples. First, the operator on the right is no longer jsonb, but must be   text []. Secondly, it actually checks the key value -- that is, the value that can be stored through the gin index:

select '{"tag1":1, "tag2":2, "tag3":3}'::jsonb ?| '{tag5}';
 ?column?
----------
 f
(1 row)

select '{"tag1":1, "tag2":2, "tag3":3}'::jsonb ?| '{tag3}';
 ?column?
----------
 t
(1 row)

select '{"tag1":1, "tag2":2, "tag3":3}'::jsonb ?| '{tag3, tag1}';
 ?column?
----------
 t
(1 row)

PostgreSQL has supported JSON and JSONB for more than ten years. Each version is actively enhancing its JSON data processing ability. Even though I have actively explored and studied in recent ten years, I have no comprehensive understanding. This intersection operation was only noticed recently in the work of NLP group.

Tags: Database PostgreSQL csdn

Posted on Sat, 11 Sep 2021 15:17:49 -0400 by akano