Data transformation with External Tables
One of the main uses of the External Tables is their support of the ETL process,
allowing the user to perform a data load that is transformed to the target format
without an intermediate stage table.
Let’s read an External Table whose contents are:
This data can be loaded in a single command to multiple tables. Let’s create several
tables with the same structure:
SQL> desc amount_jan
Name
Null?
Type
---------------
----------
----------
REGION
VARCHAR2(16)
AMOUNT
NUMBER(3)
Now we can issue a command to send the data from the External Table to the
different tables.
INSERT ALL
INTO AMOUNT_JAN (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_FEB (REGION, AMOUNT) VALUES(COUNTRY, FEB)
INTO AMOUNT_MAR (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_APR (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_MAY (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_JUN (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_JUL (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_AUG (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_SEP (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_OCT (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_NOV (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_DEC (REGION, AMOUNT) VALUES(COUNTRY, JAN)
SELECT COUNTRY,
JAN,
FEB,
MAR,
APR,
MAY,
JUN,
JUL,
AUG,
SEP,
OCT,
NOV,
DEC
FROM REGION_REVENUE;
In this example, we will perform a conditional insert to different tables depending on
the value of the amount column. We will first create three tables, one for low, another
for average, and a third for high amounts:
SQL> create table low_amount(
2 region varchar2(16),
3 month number(2),
4 amount number(3));
Table created.
SQL> create table high_amount as select * from low_amount;
Table created.
Now we can read the External Table and have the data inserted conditionally to one
of three mutually exclusive targets.
INSERT ALL
WHEN ( JAN <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '01', JAN )
WHEN ( FEB <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '02', FEB )
WHEN ( MAR <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '03', MAR )
WHEN ( APR <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '04', APR )
WHEN ( MAY <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '05', MAY )
WHEN ( JUN <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '06', JUN )
WHEN ( JUL <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '07', JUL )
WHEN ( AUG <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '08', AUG )
WHEN ( SEP <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '09', SEP )
WHEN ( OCT <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '10', OCT )
WHEN ( NOV <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '11', NOV )
WHEN ( DEC <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '12', DEC )
WHEN ( JAN > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '01', JAN )
WHEN ( FEB > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '02', FEB )
WHEN ( MAR > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '03', MAR )
WHEN ( APR > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '04', APR )
WHEN ( MAY > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '05', MAY )
WHEN ( JUN > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '06', JUN )
WHEN ( JUL > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '07', JUL )
WHEN ( AUG > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '08', AUG )
WHEN ( SEP > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '09', SEP )
WHEN ( OCT > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '10', OCT )
WHEN ( NOV > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '11', NOV )
WHEN ( DEC > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '12', DEC )
SELECT COUNTRY,
JAN,
FEB,
MAR,
APR,
MAY,
JUN,
JUL,
AUG,
SEP,
OCT,
NOV,
DEC
FROM REGION_REVENUE;
Extending the alert.log analysis with External Tables
Reading the alert.log from the database is a useful feature which can help you
to find any outstanding error messages reported in this file.
create table ALERT_LOG(
text_line varchar2(512)
)
organization external (
type ORACLE_LOADER
default directory BACKGROUND_DUMP_DEST
access parameters(
records delimited by newline
nobadfile
nodiscardfile
nologfile
)
location( 'alert_beta.log')
);
Once the External Table has been created, the alert.log file can be queried just like
any other regular table.
SQL> select text_line from alert_log
2 where text_line like 'ORA-%';
TEXT_LINE
--------------------------------------------------------------------------------
ORA-1109 signalled during: ALTER DATABASE CLOSE NORMAL...
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/u01/oracle/oradata/beta/redo01.log'
ORA-27037: unable to obtain file status
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/oracle/oradata/beta/redo02.log'
ORA-27037: unable to obtain file status
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/oracle/oradata/beta/redo03.log'
ORA-27037: unable to obtain file status
Querying the alert.log file up to this phase is useful just to see the contents of
the file and look for basic ORA-% strings. This could also be achieved by using the
alert.log link in the Enterprise Manager (EM).
The alert.log file can be queried by means of the EM, but as this can only be
viewed from the EM in an interactive mode, you can only rely on the preset alerts.
If further automatic work needs to be done, then it is useful to do some more work
with the alert analysis tool. A temporary table can be used to store the contents of
the ALERT_LOG table, along with an extra TIMESTAMP column, so it can be queried in
detail in an EM-like manner.
create global temporary table TMP_ALERT_LOG (
LINE_NO NUMBER(6),
TIMESTAMP DATE,
TEXT_LINE VARCHAR2(512)
)
on commit preserve rows;
A bit of PLSQL programming is necessary so the ALERT_LOG file can be modified and
inserted into the TMP_ALERT_LOG, (enabling further queries can be done).
declare
cursor
alertLogCur is
select ROWNUM, TEXT_LINE
from ALERT_LOG;
currentDate date;
altertLogRec ALERT_LOG.TEXT_LINE%TYPE;
testDay varchar2(10);
begin
currentDate := sysdate;
for alertLogInst in alertLogCur loop
-- fetch row and determine if this is a date row
testDay := substr(alertLogInst.text_line, 1, 3);
if testDay = 'Sun' or
testDay = 'Mon' or
testDay = 'Tue' or
testDay = 'Wed' or
testDay = 'Thu' or
testDay = 'Fri' or
testDay = 'Sat'
then
-- if this is a date row, it sets the current logical record date
currentDate := to_date( alertlogInst.text_line, 'Dy Mon DD HH24:
MI:SS YYYY');
end if;
insert into TMP_ALERT_LOG
values(
alertLogInst.rownum,
currentDate,
alertLogInst.text_line
);
end loop;
end;
/
As the contents of the alert.log end up in a temporary table, more than one DBA
can query it at the same time, or restrict the DBA’s accessibilities. There is no need to
manage the purge and maintenance of the table after the session has ended, it can be
indexed and there is little overhead by means of this procedure. More over, as this is
a temporary object, minimum redo log information is generated.
Once the external ALERT_LOG and the temporary ALERT_LOG tables have been
created, it is possible to perform, not only filters by date (provided by Enterprise
Manager) but also any query against the alert.log file.
SELECT TIMESTAMP, TEXT_LINE
FROM TMP_ALERT_LOG
WHERE TIMESTAMP IN (
SELECT TIMESTAMP
FROM TMP_ALERT_LOG
WHERE TEXT_LINE LIKE 'ORA-%'
)
AND TIMESTAMP BETWEEN SYSDATE-30 AND SYSDATE
ORDER BY LINE_NO;
Further treatment can be done on this concept to look for specific error messages,
analyze specific time frames and perform drill down analysis.
This procedure can be extended to read the trace files or any other text file from
the database.
Reading the listener.log from the database
One particular extension of the above procedure is to read the listener.log file.
This file has a specific star-delimited field file format which can be advantageous,
and eases the read by means of the Loader driver.
The file format is as follows:
21-JUL-2008 00:39:50 * (CONNECT_DATA=(SID=beta)(CID=(PROGRAM=perl)(HOS
T=alpha.us.oracle.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=1
92.168.2.10)(PORT=8392)) * establish * beta * 0
21-JUL-2008 00:39:56 * (CONNECT_DATA=(SID=beta)(CID=(PROGRAM=perl)(HOS
T=alpha.us.oracle.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=1
92.168.2.10)(PORT=8398)) * establish * beta * 0
21-JUL-2008 00:40:16 * service_update * beta * 0
21-JUL-2008 00:41:19 * service_update * beta * 0
21-JUL-2008 00:44:43 * ping * 0
The file has a format that can be deduced from the above data sample:
TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN
CODE
As you can see this format, even though it is structured, it may have a different
number of fields, so at loading time this issue must be considered.
In order for us to map this table to the database, we should consider the variable
number of fields to have the External Table created. We’ll create a temporary table so
that this doesn’t create an additional transactional overhead.
Now, let’s create an External Table based on this format that points to:
$ORACLE_HOME/network/log
create directory NETWORK_LOG_DIR
as '$ORACLE_HOME/network/log';
Now, let’s create the External Table:
create table LISTENER_LOG (
TIMESTAMP date,
CONNECT_DATA varchar2(2048),
PROTOCOL_INFO varchar2(64),
EVENT varchar2(64),
SID varchar2(64),
RETURN_CODE number(5)
)
organization external (
type ORACLE_LOADER
default directory NETWORK_LOG_DIR
access parameters (
records delimited by NEWLINE
nobadfile
nodiscardfile
nologfile
fields terminated by "*" LDRTRIM
reject rows with all null fields
(
"TIMESTAMP" char date_format DATE mask "DD-MON-YYYY HH24:MI:SS",
"CONNECT_DATA",
"PROTOCOL_INFO",
"EVENT",
"SID",
"RETURN_CODE"
)
)
location ('listener.log')
)
reject limit unlimited;
The structure of interest is specified above, so there will be several rows rejected.
Seeing as this file is not fully structured, you will find some non formatted
information; the bad file and the log file are not meaningful in this context.
Another application of the LISTENER_LOG External Table is usage trend analysis. This
query can be issued to detect usage peak hours.
SQL> select to_char(round(TIMESTAMP, 'HH'), 'HH24:MI') HOUR,
2 lpad('#', count(*), '#') CX
3 from listener_log
4 group by round(TIMESTAMP, 'HH')
5 order by 1;
HOUR CX
----- ------------------------------------------------
14:00 ###
15:00 ##########################
16:00 ######################
17:00 #####################
18:00 #####################
19:00 ###############
Reading the listener.log file this way allows the DBA not only to keep track
of the listener behavior, but also it allows a security administrator to easily spot
hacking attempts.
Let’s find out who is trying to access the database with sqlplus.exe.
SQL> select timestamp, protocol_info
2 from listener_log
3 where connect_data like '%sqlplus.exe%'
4 /
TIMESTAMP PROTOCOL_INFO
-------------------- ----------------------------------------------
01-SEP-2008 14:30:37 (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.101)
(PORT=3651))
01-SEP-2008 14:31:08 (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.101)
(PORT=3666))
01-SEP-2008 14:31:35 (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.101)
(PORT=3681))
The use of External Tables to analyze the listener.log can be used not only to have
an in-database version of the listener.log perform periodic and programmatic
analysis of the listener behavior, but also to determine usage trends and correlate
information with the audit team so that unauthorized connection programs can
be easily and quickly spotted. Further useful applications can be found by reading
the listener.log file. There are two fields that must be further parsed to get
information out of them, but parsing those fields goes beyond the scope of this
chapter. The structure that the analysis should consider is detailed next:
Connect String
- SID: The Database Oracle SID, which is populated if the connection was
performed by SID, otherwise it is NULL. - CID : It contains two subfields, PROGRAM and HOST
- SERVER : This field indicates the connection type, either dedicated or shared
- SERVICE_NAME : This field is populated when the connection is performed
by a Service instead of SID. - COMMAND : The command issued by the user.
- SERVICE : Present only when listener commands are issued.
- FAILOVER_MODE : In Real Application Clusters (RAC) environments this
field is used if the client performed a connection due to a failover. It shows
the failover mode used.
Protocol
- PROTOCOL: Indicates the used to perform the connection; this will be TCP
most of the times. - HOST: This is the client’s IP Address.
- PORT: The port number of the oracle server used to establish the connection.