Mastering Oracle Scheduler in Oracle 11g Databases
This article is a sample chapter published for the book. Welcome to the world of Oracle Scheduler! Oracle Scheduler is a free utility included in the Oracle database that makes the Oracle RDBMS the most powerful scheduling tool on our planet (and in the known parts of the galaxy). The Oracle Scheduler can be used to automate not only the simple maintenance tasks, but also the complex business logic. Traditionally, only PL/SQL could be executed in the Scheduler. Later, operating system scripts were added to it, and now we can run jobs on remote systems and cross platform as well. This can turn the Oracle Scheduler into the spider in your Web, controlling all the jobs running in the organization and giving you a single point for control.
Relational database management systems (RDBMS) can be very powerful. With a little code, we can use the RDBMS as a filesystem, mail server, HTTP server, and now also as a full-blown job Scheduler that can compete very well with other commercial job Schedulers. The advantage that Oracle gives us is in terms of price, flexibility, and phenomenal power. The scheduling capabilities are all a part of the normal Oracle license for the RDBMS, whereas others have a serious price tag and often require a database for the repository to store the metadata of the jobs.
Scheduling events in the database
Since Oracle added the procedural option to the database, they also included some scheduling power provided by dbms_jobs. Although a bit restricted, it was used extensively. However, no one would even think about using this as an Enterprise-level
Scheduler tool. This changed when Oracle introduced 10gR2. In this release, Oracle could not only start jobs that ran outside the database, but they also added the job chain.
In 11g, Oracle also added the option to run jobs on remote systems where no database is running. Now it’s time to rethink what the database actually is. In the early days, a database was just a bunch of code that could hold data in tables. Today, the Oracle
RDBMS can do that—and that too well—along with many more things. In fact, the Oracle RDBMS can perform so many tasks so amazingly, that it’s surprising that we still call it just a database. We could easily turn it into a personal assistant.
Oracle Scheduler 11g can:
- Run jobs where no database ever was before
- Use different operating system credentials per job
- React on events
- Schedule jobs on multiple platforms simultaneously
- Give a tight security
Chapter 2 will show you a lot of possibilities of chains with many examples and explanations. In short, it will tell you all you ever wanted to know about chains, but were afraid to ask.
Chapter 3 is for all you people living in an organization that requires strict job separation. This chapter will show how to make good use of the Scheduler and apply job separation.
Chapter 4 is a very important chapter that explains how to crank up the power of a system to the limits by combining the Scheduler and the Resource Manager. Here you will find how to get the best out of your system.
Chapter 5 will be of a great help in setting up remote external jobs introduced in Oracle 11g. How is this related to the old-fashioned local external jobs that we know since Oracle 10g and why we should get rid of the old external jobs? Get your answers here.
Chapter 6 helps the reader to get a firm grip on events and explains how to make good use of events. Events sound like voodoo, but in the end are an extra tool found in the Scheduler.
Chapter 7 considers the fact that when the jobs get more complicated, it gets harder to understand why something works differently than planned. This chapter gives the reader a fresh look at how to follow and debug Scheduler jobs.
Chapter 8 will give you some creative implementations of more or less common tasks—this time implemented using the Scheduler. This chapter gives a working code with clear explanations. This broadens the horizon and will take down the barriers that might exist between other environments and Oracle.
Chapter 9 shows how the Scheduler can be used in other configurations such as standby databases and RAC.
Chapter 10 shows how the Scheduler can be managed and monitored remotely through a web interface.
So far, we have mostly used jobs that ran immediately upon being enabled, or when we called the run_job procedure of the dbms_scheduler package. Many jobs are time-based; they are controlled by a schedule based on some kind of calendar.
However, not everything in real life can be controlled by a calendar. Many things need an action on an ad hoc basis, depending on the occurrence of some other thing. This is called event-based scheduling. Events also exist as the outcome of a job. We can define a job to raise an event in several ways—when it ends, or when it ends in an error, or when it does not end within the expected runtime. Let’s start with creating job events in order to make job monitoring a lot easier for you.
In this chapter, we will see how events that are generated by a job or a chain step can be intercepted to enable the monitoring of jobs. After that, we will see how we can use events to start a job that is waiting for an event.
Monitoring job events
Most of the time when jobs just do their work as expected, there is not much to monitor. In most cases, the job controller has to fix application-specific problems for example, sometimes file systems or table spaces get filled up). To make this easier, we can incorporate events. We can make jobs raise events when something unexpected happens, and we can have the Scheduler generate events when a job runs for too long. This gives us tremendous power. We can also use this to make chains a little easier to maintain.
Events in chains
A chain consists of steps that depend on each other. In many cases, it does not make sense to continue to step 2 when step 1 fails. For example, when a create table fails, why try to load data into the nonexistent table? So it is logical to terminate the job if no other independent steps can be performed.
One of the ways to handle this is implementing error steps in the chain. This might be a good idea, but the disadvantage is that this quickly doubles the steps involved in the chain, where most of the steps—hopefully—will not be executed. Another disadvantage is that the chain becomes less maintainable. It’s a lot of extra code, and more code (mostly) gives us less oversight.
If a job chain has to be terminated because of a failure, using the option of creating an event handler to raise a Scheduler event is recommended instead of adding extra steps that try to tell which error possibly happened. This makes event notification a lot simpler because it’s all in separate code and not mixed up with the application code.
Another situation is when the application logic has to take care of steps that fail, and has well-defined countermeasures to be executed that make the total outcome of the job a success.
An example is a situation that starts with a test for the existence of a file. If the test fails, get it by FTP; and if this succeeds, load it into the database. In this case, the first step can fail and go to the step that gets the file. As there is no other action possible when the FTP action fails, this should raise a Scheduler event that triggers—for example—a notification action. The same should happen when the load fails.
In other third-party scheduling packages, I have seen these notification actions implemented as part of the chain definitions because they lack a Scheduler event queue. In such packages, messages are sent by mail in extra chain steps. In the Oracle Scheduler, this queue is present and is very useful for us. Compared to 10g, nothing has changed in 11g. An event monitoring package can de-queue from the SCHEDULER$_EVENT_QUEUE variable into a sys.scheduler$_event_info type variable. The definition is shown in the following screenshot:
starts running; and most of the time, they want a message when a job ends.
In a chain, it is especially important to be able to tell in which step the event happened and what that step was supposed to do. In the event message, only the job name is present and so you have to search a bit to find the name of the step that failed.
For this, we can use the LOG_ID to find the step name in the SCHEDULER_JOB_LOGS (user/dba/all_SCHEDULER_JOB_LOG) view, where the step name is listed as JOB_SUBNAME. The following query can be used to find the step_name from the dba all_scheduler_log view, assuming that the event message is received in msg:
select job_subname from all_scheduler_job_log where log_id = msg.log_id;
To enable the delivery of all the events a job can generate, we can set the raise_events attribute to a value of:
dbms_scheduler.job_started + dbms_scheduler.job_succeeded + dbms_scheduler.job_failed + dbms_scheduler.job_broken + dbms_scheduler.job_completed + dbms_scheduler.job_stopped + dbms_scheduler.job_sch_lim_reached + dbms_scheduler.job_disabled + dbms_scheduler.job_chain_stalled
Or in short, we can set it to: dbms_scheduler.job_all_events.
There are many things that can be called events. In the job system, there are basically two types of events: events caused by jobs (which we already discussed) and events that makes a job execute.
On many occasions, a calendar will do fine for scheduling jobs. However, there are situations that require an immediate action and which cannot wait for the next activation based on a calendar. An example might be of a user who logs on to the database and then, using a logon trigger, more actions are executed.
Another example could be a situation in which we want a backup server to be utilized to the maximum, but not beyond that. We schedule all the backups independent of each other and have each backup raise an event when ready, which tells the system that another backup can go ahead. By letting the backup jobs wait for an event that essentially fl ags “there is backup capacity available now”, we make sure that a backup does not take longer than needed. We also make sure that the backup system is pushed to the highest throughput.
When we just use a preset date and time to start the backups, chances are that more backups are running at the same time (possibly caused by the growth of one or more databases, which is potentially causing their backups to be longer than anticipated). On the other hand, when we make sure that more backups are never ever run at the same time, we will likely have lots of idle time in the backup system.
This is a reason enough to learn how we can make good use of events. However, there are a few things we need to do. It essentially comes down to:
- Creating a queue and defining a payload for that queue
- Having a process that puts the message on the queue
- Coupling one or more job definition(s) to the queue
Again, this gives a kind of control that is hard to find in third-party scheduling packages.
Event messages are placed on an event queue and this is handled by AQ. So we need to call the AQ packages and for that we require DBMS_AQ and DBMS_AQADM. In the days before Oracle 10g, we needed to set the AQ_TM_PROCESSES parameter to a non-zero value to work. Since Oracle 10g, this is no longer the case and we can leave the AQ_TM_PROCESSES value to zero.
First, make sure we can use AQ.
select grantee, privilege, table_name from dba_tab_privs where table_name in ( 'DBMS_AQ', 'DBMS_AQADM') and grantee = 'MARVIN'; /
The expected output is as shown in the following screenshot:
As a DBA, execute the following:
grant execute on dbms_aq to marvin; grant execute on dbms_aqadm to marvin; grant select on dba_aq_agents to marvin; grant create type to marvin; alter user marvin quota unlimited on users; --/ begin dbms_aqadm.grant_system_privilege ('ENQUEUE_ANY', 'marvin', FALSE); dbms_aqadm.grant_system_privilege ('DEQUEUE_ANY', 'marvin', FALSE); dbms_aqadm.grant_system_privilege ('MANAGE_ANY', 'marvin', TRUE); end; /
This makes sure that marvin has enough privileges to be able to create and use queues. Now connect as marvin, create an object type that we can use to put a message on the queue, and read from the queue later on.
connect marvin/panic create or replace type bckup_msgt as object ( msg varchar2(20) ) /
This defines a type consisting of one msg field of 20-character length. This is the type we will be using in the queue for which we create a queue table next:
--/ begin dbms_aqadm.create_queue_table ( queue_table => 'bckup_qt', queue_payload_type => 'bckup_msgt', multiple_consumers => TRUE ); dbms_aqadm.create_queue ( queue_name => 'bckup_q', queue_table => 'bckup_qt' ); dbms_aqadm.start_queue ( queue_name => 'bckup_q' ) ; end ; /
This creates a queue table called bckup_qt, which contains messages defined by bckup_msgt. After that, bckup_q starts immediately.
The following objects show up in the schema, which are created to support the queue table:
The queue definitions part is ready. Now, we can tie a job to the queue. First, create a job as follows:
<code>--/ BEGIN sys.dbms_scheduler.create_job ( job_name => '"MARVIN"."BCKUP_01"', job_type => 'EXECUTABLE', job_action => '/home/oracle/bin/rman.sh', event_condition => 'tab.user_data.msg=''GO''', queue_spec => '"MARVIN"."BCKUP_Q"', start_date => systimestamp at time zone 'Europe/Amsterdam', job_class => '"LONGER"', comments => 'backup a database', auto_drop => FALSE, number_of_arguments => 1, enable => FALSE ); sys.dbms_scheduler.set_attribute ( name => '"MARVIN"."BCKUP_01"', attribute => 'raise_events', value => dbms_scheduler.job_started + dbms_scheduler.job_succeeded + dbms_scheduler.job_failed + dbms_scheduler.job_broken + dbms_scheduler.job_completed + dbms_scheduler.job_stopped + dbms_scheduler.job_sch_lim_reached + dbms_scheduler.job_disabled + dbms_scheduler.job_chain_stalled ); sys.dbms_scheduler.set_job_argument_value ( job_name => '"MARVIN"."BCKUP_01"', argument_position => 1, argument_value => 'db_01' ); DBMS_SCHEDULER.SET_ATTRIBUTE ( name => '"MARVIN"."BCKUP_01"', attribute => 'destination', value => 'pantzer:15021' ); DBMS_SCHEDULER.SET_ATTRIBUTE ( name => '"MARVIN"."BCKUP_01"', attribute => 'credential_name', value => '"MARVIN"."JOBS_CRED2"' ); END; /</code>
This is just a simple remote external job that calls an RMAN script with an argument for the database to back up. As the backup will take longer than a few seconds, it looks obvious to put it in the job_class called LONGER that we defined a while ago. The queue that is coupled to this job is the queue we defined before. It is bckup_q as defined by the queue_spec parameter. As soon as the GO message appears in the payload of the queue, all of the jobs that listen to this queue and those waiting for this GO message will get started. The code listed for the MARVIN job can also be put together using DB Console. In the following Schedule screen, select Event as
--/ BEGIN sys.dbms_scheduler.enable( '"MARVIN"."BCKUP_01"' ); END; /
This produces the following:
--/ declare my_msgid RAW(16); props dbms_aq.message_properties_t; enqopts dbms_aq.enqueue_options_t; begin sys.dbms_aq.enqueue('marvin.bckup_q', enqopts, props, marvin.bckup_msgt('GO'), my_msgid); end; / commit;
The result is that all of the jobs waiting for the GO message are started at the same time. With the health of the backup system in mind, it would be wiser to query the jobs view, find the backup job that was scheduled first, and give that job its specific event. In that case, the BCKUP_01 job will wait for the message BCKUP_01; and BCKUP_02 will wait for the message “BCKUP_02”.
Another option is that Oracle can allow us to define an event that is delivered to exactly one waiting job at a time. An enhancement request has been filed for this. It will make this kind of waiting a bit easier because normal queuing behavior is then saved. This means that things such as job priorities will be honored. When we define a separate event for every job, we have manual control but we cannot infl uence the selection order of the jobs in the queue, for example, by raising the priority of a job.
When a backup is ready, the backup system can handle the next backup. We can utilize the enqueue operation by putting our next GO message into the queue in the epilogue of the backup script. However, what will happen if the script crashes? The next backup will never be released. Again, a smarter location for this code would be in an event handler routine that just waits for termination messages from the Scheduler event queue. As soon as it sees the termination of a backup, it can decide to call in the next one by giving that waiting job a signal at its location.