The differences between hive query statement and mysql

These are the two tables emp and Dept often used in online database examples. The first is dept and the second is emp

  For ease of viewing, I added the field name of each column to it  

 

1.hive does not support some sub queries

For example, this topic: query the names of employees with higher wages than SMITH

mysql can end in one sentence

select ename from emp where sal>(select sal from emp where ename='SMITH');

 

If we run this in hive, an error will be reported  

hive> select ename from emp where sal>(select sal from emp where ename='SMITH');

  hive does not support where followed by subqueries. These include

where   > Subquery statement

Where < subquery statement

where = subquery statement

Where < > subquery statement

select ename from emp where sal>(select sal from emp where ename='SMITH');
select ename from emp where sal<(select sal from emp where ename='SMITH');
select ename from emp where sal=(select sal from emp where ename='SMITH');
select ename from emp where sal<>(select sal from emp where ename='SMITH');

The solution given is:

We need to put a label on each row of the two tables, then connect the two tables and filter the conditions after where

For example, the people in this topic are those who earn more than smith

That is, when we encounter this problem, we must first consider the table connection, and put the corresponding filter criteria in where in the secondary query

select e.ENAME 
      ,e.SAL 
from 
(select ENAME 
       ,SAL 
       ,1 as cid 
from emp)e 
left join 
(select SAL 
        ,1 as cid 
from emp 
where ENAME='SMITH')
s on e.cid=s.cid 
where e.SAL>s.SAL;

2.hive does not support non equivalent connection

select * from emp join dept on emp.DEPTNO>dept.DEPTNO;

 

hive does not support non equivalent connection. I can't write greater than, less than or not equal to

Because mapreduce is difficult to process these conditions into a map task

hive> select * from emp join dept on emp.DEPTNO>dept.DEPTNO;
hive> select * from emp join dept on emp.DEPTNO<dept.DEPTNO;
hive> select * from emp join dept on emp.DEPTNO<>dept.DEPTNO;

//Unified error reporting:
FAILED: SemanticException [Error 10017]: Line 1:31 Both left and right aliases encountered in JOIN 'DEPTNO'

The solution hasn't been thought of yet. It will be updated in the future

3.hive supports multi table join queries

hive starts a mapreduce task on the join object. After connecting tables e and d, start a mapreduce task and connect the above connection results with the following table l

hive is executed from left to right, so don't worry that table e will skip table d and connect directly to table l

select e.ename,d.deptno,l.name
from emp e
join dept d
on d.deptno=e.deptno
join location l
on d.loc=l.loc;

4. Cartesian product in hive

1. Connection conditions are omitted

2. Invalid connection condition

3. All rows in all tables are connected to each other

hive> select empno,deptname from emp,dept;

5.hive connection predicate does not support or

This writing method is OK in mysql, but hive reports an error

select e.empno,d.dname from emp e 
join 
dept d 
on e.deptno=d.deptno or e.job=d.job;

So how? The two filter conditions are written into two statements separately, then spliced together by union, and finally de duplicated

select  DISTINCT tmp.ename,tmp.dname from
(select e.ename,d.dname,e.deptno,e.job from emp e 
join 
dept d 
on e.deptno=d.deptno
union all
select e.ename,d.dname,e.deptno,e.job  from emp e 
join 
dept d 
on e.job=d.job) tmp

 

Tags: Database hive

Posted on Sun, 28 Nov 2021 08:59:54 -0500 by mikes127