1. Introduction to oozie: oozie can be called a scheduling system, referring specifically to this blog: http://shiyanjun.cn/archives/684.html
In this blog post, the functions and usage of oozie are described in detail.
Second, oozie performs follow-up monitoring: the results of oozie execution can be written into the MySQL data table, and oozie execution can be viewed through the local sh mysql.sh:
1. Statistics of execution time:
select tp.app_name ,ta.id ,(unix_timestamp(tb.end_time) - unix_timestamp(tb.created_time))/60 long_minutes ,date_format(adddate(date(ta.nominal_time),interval -1 day),'%Y%m%d') data_date from ( select * from COORD_JOBS ---oozie job Statements where status='RUNNING' -- and app_name like 'ods%' ) tp inner join ( SELECT * FROM COORD_ACTIONS ---oozie actionOf list Report form ) ta on tp.id = ta.job_id AND tp.last_action_number=ta.action_number inner join ( select * from WF_ACTIONS --oozie eachactionExecution time report where id like '%hive%' ) tb on ta.external_id = tb.wf_id order by long_minutes desc LIMIT 50;
The above code can be used to view the names of tasks that take a long time to execute, so as to optimize the code.
2. Used to check task status
select -- ta.id, concat('oozie job -rerun ',tb.id,' -action ',ta.action_number) as cmd ,tb.app_name,ta.status from (select * from oozie.COORD_ACTIONS where status not in ('SUCCEEDED') ) ta inner join ( SELECT * FROM COORD_JOBS where (app_name like '%fact%' or app_name like '%rpt%' or app_name like 'dim_%' or app_name like 'ods%' or app_name like 'dw%') and status = 'RUNNING' ) tb on ta.job_id = tb.id -- where ta.status not in ('WAITING','READY') order by status ,app_name;
3. Check the specified task to get the job name and how many action s it contains
SELECT app_name,concat('oozie job -rerun ',id,' -action ',last_action_number),status as rerun FROM COORD_JOBS where (app_name like '%yezhu%') and status = 'RUNNING'
4. When rerun multi-day action is required to refresh historical data, the following methods can be used:
update coord_jobs set concurrency=3 ---The number of concurrent, should consider the difficulty of clustering, not too high where id in ( '0402926-161231135848541-oozie-bigd-C' ); commit ;
5. Jobs in oozie can't be repeated. When you need to check whether you start more job s, use the following methods:
SELECT app_name,count(1) cnt FROM COORD_JOBS where status = 'RUNNING' group by app_name having cnt >1 //perhaps SELECT app_name,count(distinct id) cnt FROM COORD_JOBS where (app_name like '%fact%' or app_name like '%rpt%' or app_name like 'dim_%' or app_name like 'ods%' or app_name like 'dw%' ) and status = 'RUNNING' group by app_name having cnt >1 ;
6. If you want to change the state of some action s, you can use the following methods:
update oozie.COORD_ACTIONS set status='SUCCEEDED' where id in ( '0047657-160714171251022-oozie-bigd-C@75'---actionName ,'0047657-160714171251022-oozie-bigd-C@73' ,'0047657-160714171251022-oozie-bigd-C@74' ,'0047657-160714171251022-oozie-bigd-C@72' ); commit ;
If you look at the oozie configuration document, you will find that oozie needs to configure three files: job.properties,coordinator.xml,workflow.xml. In addition, there should be another file that needs to be executed regularly, assuming the file name is test.sql. After debugging, these four files are placed in the local directory and need to be uploaded to the hadoop file directory: assume that the local directory is / home / datadev / oozie_jobs / RPT / Te St (folder name is best consistent with SQL file) the files in the directory are:
Then a folder with the same local name should be created on hadoop
Create the folder hadoop fs-mkdir oozie/rpt/test on hadoop grep 'rpt--' * 2.hadoop dfs -put -f * oozie/rpt/test -- all passed over if repeated coverage 3. Hadoop fs-put-f job.properties oozie/rpt/test/-- Pass only job.properties 4. Hadoop fs-ls oozie/rpt/test/-- View folder content 5. hadoop fs -ls oozie/rpt/test/test.sql -- View file content
oozie Common Command Operations:
1,Test task oozie job -config job.properties -dryrun 2,Official submission to run a task oozie job -config job.properties -run 3,Kill a mission oozie job -kill job_id 4,Kill one action oozie job -kill job_id -action action_number 5,Run one more. action oozie job -rerun job_id -action action_number 6,To update coordinator oozie job -update job_id
Sometimes it doesn't work with update after updating the code, but it just runs jobkill again.