Ozie Common Statements

1. Introduction to oozie: oozie can be called a scheduling system, referring specifically to this blog: http://shiyanjun.cn/archives/684.html
In this blog post, the functions and usage of oozie are described in detail.
Second, oozie performs follow-up monitoring: the results of oozie execution can be written into the MySQL data table, and oozie execution can be viewed through the local sh mysql.sh:
1. Statistics of execution time:

    ,(unix_timestamp(tb.end_time) - unix_timestamp(tb.created_time))/60 long_minutes
    ,date_format(adddate(date(ta.nominal_time),interval -1 day),'%Y%m%d') data_date
    select * 
    from COORD_JOBS ---oozie job Statements
    where status='RUNNING'
    -- and app_name like 'ods%'
) tp inner join 
    SELECT * 
    FROM COORD_ACTIONS ---oozie actionOf list Report form
) ta on tp.id = ta.job_id 
AND tp.last_action_number=ta.action_number 
inner join 
    select * 
    from WF_ACTIONS --oozie eachactionExecution time report
    where id like '%hive%'
) tb on ta.external_id = tb.wf_id
order by long_minutes desc 

The above code can be used to view the names of tasks that take a long time to execute, so as to optimize the code.
2. Used to check task status

    -- ta.id,
    concat('oozie job -rerun ',tb.id,' -action ',ta.action_number) as cmd ,tb.app_name,ta.status    
from (select * from oozie.COORD_ACTIONS where status not in ('SUCCEEDED') ) ta 
inner join  
    where (app_name like '%fact%' or app_name like '%rpt%'  or app_name like 'dim_%' or app_name like 'ods%' or app_name like 'dw%') 
    and status = 'RUNNING' 
) tb 
on ta.job_id = tb.id 
-- where ta.status not in ('WAITING','READY')
order by status ,app_name;

3. Check the specified task to get the job name and how many action s it contains

SELECT app_name,concat('oozie job -rerun ',id,' -action ',last_action_number),status as rerun 
FROM COORD_JOBS   where  (app_name like '%yezhu%') 
and status = 'RUNNING' 

4. When rerun multi-day action is required to refresh historical data, the following methods can be used:

    update coord_jobs 
    set concurrency=3 ---The number of concurrent, should consider the difficulty of clustering, not too high
    where id in (
    commit ;

5. Jobs in oozie can't be repeated. When you need to check whether you start more job s, use the following methods:

    SELECT app_name,count(1) cnt  FROM COORD_JOBS   
    where   status = 'RUNNING' 
    group by app_name 
    having cnt >1 
  SELECT app_name,count(distinct id) cnt FROM COORD_JOBS   
    where (app_name like '%fact%' or app_name like '%rpt%'  or app_name like 'dim_%' or app_name like 'ods%' or app_name like 'dw%' ) 
    and status = 'RUNNING' 
    group by app_name
    having cnt >1 ;

6. If you want to change the state of some action s, you can use the following methods:

update oozie.COORD_ACTIONS 
set status='SUCCEEDED'
where id in (
commit ;

If you look at the oozie configuration document, you will find that oozie needs to configure three files: job.properties,coordinator.xml,workflow.xml. In addition, there should be another file that needs to be executed regularly, assuming the file name is test.sql. After debugging, these four files are placed in the local directory and need to be uploaded to the hadoop file directory: assume that the local directory is / home / datadev / oozie_jobs / RPT / Te St (folder name is best consistent with SQL file) the files in the directory are:
Then a folder with the same local name should be created on hadoop

Create the folder hadoop fs-mkdir oozie/rpt/test on hadoop
grep 'rpt--' *
2.hadoop dfs -put -f * oozie/rpt/test -- all passed over if repeated coverage
 3. Hadoop fs-put-f job.properties oozie/rpt/test/-- Pass only job.properties
 4. Hadoop fs-ls oozie/rpt/test/-- View folder content
 5. hadoop fs -ls oozie/rpt/test/test.sql -- View file content

oozie Common Command Operations:

 1,Test task
 oozie job -config job.properties -dryrun 
 2,Official submission to run a task
 oozie job -config job.properties -run 
 3,Kill a mission
 oozie job -kill job_id     
 4,Kill one action
 oozie job -kill job_id -action action_number
 5,Run one more. action 
 oozie job -rerun job_id -action action_number
 6,To update coordinator
 oozie job -update job_id

Sometimes it doesn't work with update after updating the code, but it just runs jobkill again.

Tags: Hadoop xml SQL MySQL

Posted on Wed, 10 Apr 2019 06:06:31 -0400 by anthonyfellows