These are the two tables emp and Dept often used in online database examples. The first is dept and the second is emp
For ease of viewing, I added the field name of each column to it
1.hive does not support some sub queries
For example, this topic: query the names of employees with higher wages than SMITH
mysql can end in one sentence
select ename from emp where sal>(select sal from emp where ename='SMITH');
If we run this in hive, an error will be reported
hive> select ename from emp where sal>(select sal from emp where ename='SMITH');
hive does not support where followed by subqueries. These include
where > Subquery statement
Where < subquery statement
where = subquery statement
Where < > subquery statement
select ename from emp where sal>(select sal from emp where ename='SMITH'); select ename from emp where sal<(select sal from emp where ename='SMITH'); select ename from emp where sal=(select sal from emp where ename='SMITH'); select ename from emp where sal<>(select sal from emp where ename='SMITH');
The solution given is:
We need to put a label on each row of the two tables, then connect the two tables and filter the conditions after where
For example, the people in this topic are those who earn more than smith
That is, when we encounter this problem, we must first consider the table connection, and put the corresponding filter criteria in where in the secondary query
select e.ENAME ,e.SAL from (select ENAME ,SAL ,1 as cid from emp)e left join (select SAL ,1 as cid from emp where ENAME='SMITH') s on e.cid=s.cid where e.SAL>s.SAL;
2.hive does not support non equivalent connection
select * from emp join dept on emp.DEPTNO>dept.DEPTNO;
hive does not support non equivalent connection. I can't write greater than, less than or not equal to
Because mapreduce is difficult to process these conditions into a map task
hive> select * from emp join dept on emp.DEPTNO>dept.DEPTNO; hive> select * from emp join dept on emp.DEPTNO<dept.DEPTNO; hive> select * from emp join dept on emp.DEPTNO<>dept.DEPTNO; //Unified error reporting: FAILED: SemanticException [Error 10017]: Line 1:31 Both left and right aliases encountered in JOIN 'DEPTNO'
The solution hasn't been thought of yet. It will be updated in the future
3.hive supports multi table join queries
hive starts a mapreduce task on the join object. After connecting tables e and d, start a mapreduce task and connect the above connection results with the following table l
hive is executed from left to right, so don't worry that table e will skip table d and connect directly to table l
select e.ename,d.deptno,l.name from emp e join dept d on d.deptno=e.deptno join location l on d.loc=l.loc;
4. Cartesian product in hive
1. Connection conditions are omitted
2. Invalid connection condition
3. All rows in all tables are connected to each other
hive> select empno,deptname from emp,dept;
5.hive connection predicate does not support or
This writing method is OK in mysql, but hive reports an error
select e.empno,d.dname from emp e join dept d on e.deptno=d.deptno or e.job=d.job;
So how? The two filter conditions are written into two statements separately, then spliced together by union, and finally de duplicated
select DISTINCT tmp.ename,tmp.dname from (select e.ename,d.dname,e.deptno,e.job from emp e join dept d on e.deptno=d.deptno union all select e.ename,d.dname,e.deptno,e.job from emp e join dept d on e.job=d.job) tmp