System built-in function
1) View the functions provided by the system
desc functions;
2) Display the usage of the built-in function: it shows the meaning of nvl function
desc function nvl;
3) Displays the usage of the built-in function in detail
Show the meaning of nvl function and give examples
desc function extended nvl;
Common built-in functions
Empty field assignment
1) Function description
NVL: assign a value to the data whose value is NULL. Its format is NVL (value, default_value). Its function is to return default if value is NULL_ Value, otherwise return the value of value. If both parameters are NULL, return NULL.
2) Data preparation: use employee table
3) Query: if the comm of the employee is NULL, use - 1 instead
4) Data preparation:
Employee table:
7369 SMITH CLERK 7902 1980-12-17 800.00 20 7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30 7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30 7566 JONES MANAGER 7839 1981-4-2 2975.00 20 7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30 7782 CLARK MANAGER 7839 1981-6-9 2450.00 10 7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20 7839 KING PRESIDENT 1981-11-17 5000.00 10 7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30 7876 ADAMS CLERK 7788 1987-5-23 1100.00 20 7900 JAMES CLERK 7698 1981-12-3 950.00 30 7902 FORD ANALYST 7566 1981-12-3 3000.00 20 7934 MILLER CLERK 7782 1982-1-23 1300.00 10
Create emp table:
create table if not exists emp( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int) row format delimited fields terminated by '\t';
Query results:
hive (default)> select comm,nvl(comm,-1) from emp; OK comm _c1 NULL -1.0 300.0 300.0 500.0 500.0 20.0 20.0 1400.0 1400.0 30.0 30.0 NULL -1.0 NULL -1.0 NULL -1.0 0.0 0.0 20.0 20.0 30.0 30.0 NULL -1.0 10.0 10.0
If comm is null, use - 1 instead.
CASE WHEN THEN ELSE END
(1) Data preparation
name of a fictitious monkey with supernatural powers A male sea A male Song song B male Miss Luo Yu feng A female Sister Ting B female Tingting B female
(2) Demand
Find out the number of men and women in different departments. The results are as follows:
dept_Id male female A 2 1 B 1 2
Count the number of men and women in department A and department B. at this time, use case when then else end
create table emp_sex( name string, dept_id string, sex string ) row format delimited fields terminated by '\t';
Hive SQL statement is as follows:
case is followed by the field name and when by the value. If it is the value, take the value 1 after then. If it is not, take 0. Finally, add up all 1 to sum
select dept_id, sum(case sex when 'male' then 1 else 0 end) sum_male, sum(case sex when 'female' then 1 else 0 end) sum_female from emp_sex group by dept_id;
The result is:
OK dept_id sum_male sum_female A 2 1 B 1 2 Time taken: 92.571 seconds, Fetched: 2 row(s)
In addition to the above functions, you can also use the if function to calculate.
First, if(exp1,exp2,exp3). If exp1 is true, it returns exp2; otherwise, it returns exp3
select dept_id, sum(if(sex='male',1,0)) maleCount, sum(if(sex='female',1,0)) femaleCount from emp_sex group by dept_id;
The same result can be obtained.
Row to column
Description of correlation function
CONCAT(string A/col, string B/col...): returns the result of connecting input strings. Any input string is supported;
CONCAT_WS(separator, str1, str2,...): it is a special form of CONCAT(). The separator between the first parameter and the remaining parameters. The delimiter can be the same string as the remaining parameters. If the delimiter is NULL, the return value will also be NULL. This function skips any NULL and empty strings after the delimiter parameter. The separator will be added between the connected strings;
Note: CONCAT_WS must be "string or array
COLLECT_SET(col): the function only accepts basic data types. Its main function is to de summarize the values of a field and generate an Array type field.
CONCAT:
hive (default)> select concat('a','-','b','-','c'); OK _c0 a-b-c
When we all use - splicing, CONCAT_WS can be used to omit writing
hive (default)> select concat_ws('-','a','b','c'); OK _c0 a-b-c
concat_ws
Splices with multiple written characters are omitted:
You can also splice arrays:
hive (default)> select concat_ws('.','www',array('baidu','com')); OK _c0 www.baidu.com
Next, we need to deepen the application of knowledge,
Edit data: vim person_info.txt:
Sun WuKong Aries A sea sagittarius A Song song Aries B Zhu Bajie Aries A Miss Luo Yu feng sagittarius A Aoi Sora Aries B
Implement data in the following format:
sagittarius,A sea|Miss Luo Yu feng Aries,A Sun WuKong|Zhu Bajie Aries,B Song song|Aoi Sora
Create table person_info:
create table person_info( name string, constellation string, blood_type string ) row format delimited fields terminated by '\t';
Import data:
load data local inpath '/opt/module/datas/person_info.txt' into table default.person_info;
The process and idea of writing sql statements:
1. According to the requirements of the target format, first splice the constellation and blood group together
select concat(constellation,',',blood_type) con_blood, name from person_info;t1
2. Gather the names of people with the same constellation blood group
select con_blood, collect_set(name) name_arr from t1 group by con_blood;t2
3. Process the array of name
select con_blood, concat_ws("|",name_arr) from t2;
4. The above is a process for writing this sql. Finally, sort it out to form the following complete sql:
select con_blood, concat_ws("|",name_arr) from (select con_blood, collect_set(name) name_arr from (select concat(constellation,',',blood_type) con_blood, name from person_info)t1 group by con_blood)t2;
Note: UDF: one in one out, UDTF: one in many out, UDAF: many in one out, where collect_set,sum,count, etc. are aggregate functions, which belong to UDAF
Column to row
1) Function Description:
Expand (Col): split the complex Array or Map structure in the hive column into multiple rows.
LATERAL VIEW
Usage: final view udtf (expression) tablealias as columnalias
Explanation: used with split, expand and other udtfs. It can split a column of data into multiple rows of data. On this basis, it can aggregate the split data.
Data preparation, local file creation: test.txt:
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing
Create a table based on the format of the data:
create table test3( name string, friends array<string>, children map<string,int>, address struct<street:string,city:string> ) row format delimited fields terminated by ',' collection items terminated by '_' map keys terminated by ':' lines terminated by '\n';
be careful:
1. When writing and creating this table, I made some records: when writing the type of map, the string is followed by a comma instead of a colon
2. After specifying the separator of each format, do not use commas for segmentation, and just wrap the line directly,
3. Lines instead of lines are used when writing line segmentation
Using the explode burst function, one in and many out, the results are as follows:
hive (default)> select explode(friends) from test3; OK col bingbing lili caicai susu
Test with another set of data:
create table movie_info( movie string, category string ) row format delimited fields terminated by '\t';
vi movie.txt:
<Suspect tracking Suspense,action,science fiction,plot <Lie to me> Suspense,gangster ,action,psychology,plot <War wolf 2 Warfare,action,disaster
Import data:
load data local inpath '/opt/module/datas/movie.txt' into table movie_info;
Here, you need to separate each data in the category column. If you directly use the expand function,
select explode(category) from movie_info;
The following error will be reported:
UDFArgumentException explode() takes an array or a map as a parameter
The parameter of the expand function must be array or map. When we create the table, the category column is specified as string, so we need to convert it into an array first, which can be realized through the split method.
hive (default)> select split(category,',') from movie_info; OK _c0 ["Suspense","action","science fiction","plot"] ["Suspense","gangster ","action","psychology","plot"] ["Warfare","action","disaster"]
As shown in the figure above, convert it into an array through the split method.
Then, through the explode function, the elements in each row can be cut:
hive (default)> select explode(split(category,',')) from movie_info; OK col Suspense action science fiction plot Suspense gangster action psychology plot Warfare action disaster
On the basis of the above, if you add a field after select, an error will appear, such as
select movie,explode(split(category,',')) from movie_info;
Errors are reported as follows:
ILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions
We want to add the corresponding movie name in front of each type above, that is, it needs to be associated with the fields in the original table, but the number of rows is not symmetrical. Therefore, we need to use the lateral view function to add the profile table:
select movie, category_name from movie_info lateral view explode(split(category,",")) movie_info_tmp as category_name;
The query results are as follows:
OK movie category_name <Suspect tracking Suspense <Suspect tracking action <Suspect tracking science fiction <Suspect tracking plot <Lie to me> Suspense <Lie to me> gangster <Lie to me> action <Lie to me> psychology <Lie to me> plot <War wolf 2 Warfare <War wolf 2 action <War wolf 2 disaster
Note: row to column and column to row need to be clarified here.
Row to column: it is to convert multiple rows of data into columns. Think of the concat function
Column to row: explode the data in a column into multiple rows