[Hard Hive] MYSQL/HIVESQL Pen Test: HIVESQL

Welcome to the blog home page: WeChat Search: import_bigdata, original hard core creator of big data domain _import_bigdata_CSDN blog
Welcome to comment, collection, message, message exchange!
This article was originally created by Wang Zhiwu and first published in the CSDN blog!
The first CSDN forum of this article is forbidden to be reproduced without official and personal permission.

This article is about Learning Guide for Big Data Specialists from Zero (Full Upgrade) Part of HIV supplement.

Interview Topic One

Scenario: A log table records all the information about the changing status of a merchant's rate. Now there is a requirement to fetch the rows of data whose status has changed in chronological order.

1. The data are as follows:

create table datafrog_merchant
(f_merchant_id varchar(20),
f_rate varchar(20),
f_date date
);

insert into datafrog_merchant values
(100,0.1,'2016-03-02'),
(100,0.1,'2016-02-02'),
(100,0.2,'2016-03-05'),
(100,0.2,'2016-03-06'),
(100,0.3,'2016-03-07'),
(100,0.1,'2016-03-09'),
(100,0.1,'2016-03-10'),
(100,0.1,'2016-03-10'),
(200,0.1,'2016-03-10'),
(200,0.1,'2016-02-02'),
(200,0.2,'2016-03-05'),
(200,0.2,'2016-03-06'),
(200,0.3,'2016-03-07'),
(200,0.1,'2016-03-09'),
(200,0.1,'2016-03-10'),
(200,0.1,'2016-03-10');

Let's see how the data looks:

2. Achieve the desired effect
Solution ideas: Lag function or Lead function can get the field contents of the previous or next row to this row, which can compare whether the field has changed, and then determine whether the state has changed and whether the data row needs to be extracted.

select
    t1.f_merchant_id,
    t1.f_rate,
    t1.f_date
from
(
select
    f_merchant_id,
    f_rate,
    f_date,
    lag(f_rate,1,-999) over(partition by f_merchant_id order by f_date) as f_rate2
from 
    datafrog_merchant
) t1
where
    t1.f_rate <> t1.f_rate2

Interview Topic Two

1. The title is as follows

2. Start building tables and inserting data below

create table datafrog_test1
(userid varchar(20),
changjing varchar(20),
inttime varchar(20)
);

insert into datafrog_test1 values
(1,1001,1400),
(2,1002,1401),
(1,1002,1402),
(1,1001,1402),
(2,1003,1403),
(2,1004,1404),
(3,1003,1400)
(4,1004,1402),
(4,1003,1403),
(4,1001,1403),
(4,1002,1404)
(5,1002,1402),
(5,1002,1403),
(5,1001,1404),
(5,1003,1405);

3.mysql solution ideas: sorting and concat connection

select concat(t.userid,'-',group_concat(t.changjing separator'-')) as result
from(
 select userid,changjing,inttime,
     if(@tmp=userid,@rank:=@rank+1,@rank:=1) as new_rank,
   @tmp:=userid as tmp
 from (select userid,changjing, min(inttime) inttime from datafrog_test1 group by userid,changjing)temp
 order by userid,inttime
    )t
where t.new_rank<=2
group by t.userid;

4. Output results:

5. Note:
There may be errors in everybody's code, mainly from ONLY_FULL_GROUP_BY. The solution is to run this set when you run the above codeSql_mode='. In fact, mysql is a good query, but when it is used for analysis, it is a bit weak, such as sorting, median and so on. pandas, sqlserver, oracle, hive, spark are usually used to do analysis in the work. This is not to say that mysql is not useful, it is very useful, easy to use, and other basis.

6. Let's take a look at the hive solution

with tmp as  (
select 
            userid,
            changjing,order_num,changjing1 
        from  
            (SELECT  userid ,
                    changjing,
                    row_number() over(partition by userid order by inttime asc) as order_num,
                    lag(changjing,1,'datafrog') OVER(partition by userid order by inttime asc) AS changjing1
                FROM datafrog_test1) as a 
    where changjing!=changjing1)
, tmp2 as (
select userid,changjing,order_num,changjing1,
row_number() over(partition by userid order by order_num ) as changjing_num
from tmp
)
select concat( userid,'-',concat_ws('-', collect_set(changjing)) )
from tmp2 where changjing_num <3
group by userid

The usages of with as, row_number() over(), lag() in hive sql are mainly investigated here.

 

Tags: MySQL hive

Posted on Fri, 10 Sep 2021 12:40:56 -0400 by void