JavaBeat

  • Home
  • Java
    • Java 7
    • Java 8
    • Java EE
    • Servlets
  • Spring Framework
    • Spring Tutorials
    • Spring 4 Tutorials
    • Spring Boot
  • JSF Tutorials
  • Most Popular
    • Binary Search Tree Traversal
    • Spring Batch Tutorial
    • AngularJS + Spring MVC
    • Spring Data JPA Tutorial
    • Packaging and Deploying Node.js
  • About Us
    • Join Us (JBC)
  • Privacy
  • Contact Us

SimpleDB versus RDBMS

October 28, 2010 by Krishna Srinivasan Leave a Comment

Amazon SimpleDB Developer Guide


SimpleDB is a highly scalable, simple-to-use, and inexpensive database in the cloud from
Amazon Web Services. But in order to use SimpleDB, you really have to change your
mindset. This isn’t a traditional relational database; in fact it’s not relational at all. For
developers who have experience working with relational databases, this may lead to
misconceptions as to how SimpleDB works.


This practical book aims to address your preconceptions on how SimpleDB will work for
you. You will be led quickly through the differences between relational databases and
SimpleDB, and the implications of using SimpleDB. Throughout this book, there is an
emphasis on demonstrating key concepts with practical examples for Java, PHP, and
Python developers.


You will be introduced to this massively scalable schema less key-value data store: what
it is, how it works, and why it is such a game changer. You will then explore the basic
functionality offered by SimpleDB including querying, code samples, and a lot more.
This book will help you deploy services outside the Amazon cloud and access them from
any web host.


You will see how SimpleDB gives you the freedom to focus on application development.
As you work through this book you will be able to optimize the performance of your
applications using parallel operations, caching with memcache, asynchronous operations,
and more.


Gain in-depth understanding of Amazon SimpleDB with PHP, Java, and Python
examples, and run optimized database-backed applications on Amazon’s Web
Services cloud.


What This Book Covers


Chapter 1, Getting to Know SimpleDB, explores SimpleDB and the advantages of
utilizing it to build web-scale applications.


Chapter 2, Getting Started with SimpleDB, moves on to set up an AWS account, enable
SimpleDB service for the account, and install and set up libraries for Java, PHP, and
Python. It also illustrates several SimpleDB operations using these libraries.


Chapter 3, SimpleDB versus RDBMS, sheds light on the differences between SimpleDB
and a traditional RDBMS, as well as the pros and cons of using SimpleDB as the storage
engine in your application.


Chapter 4, The SimpleDB Data Model, takes a detailed look at the SimpleDB data model
and different methods for interacting with a domain, its items, and their attributes. It
further talks about the domain metadata and reviews the various constraints imposed by
SimpleDB on domains, items, and attributes.


Chapter 5, Data Types, discusses the techniques needed for storing different data types in
SimpleDB, and explores a technique for storing numbers, Boolean values, and dates. It
also teaches you about XML-restricted characters and encoding them using base64
encoding.


Chapter 6, Querying, describes the Select syntax for retrieving results from SimpleDB,
and looks at the various operators and how to create predicates that allow you to get back
the information you need.


Chapter 7, Storing Data on S3, introduces you to Amazon S3 and its use for storing large
files. It practically modifies a sample domain to add additional metadata including a file
key that is again used for naming the MP3 file uploaded to S3. The example used in this
chapter shows you a simple way to store metadata on SimpleDB while storing associated
content that is in the form of binary files on Amazon S3.


Chapter 8, Tuning and Usage Costs, mainly covers the BoxUsage of different SimpleDB
queries and the usage costs, along with viewing the usage activity reports.


Chapter 9, Caching, explains memcached and Cache_Lite in detail and their use for
caching. It further explores a way you can use memcached with SimpleDB to avoid
making unnecessary requests to SimpleDB, that is, by using libraries in Java, PHP, and
Python.


Chapter 10, Parallel Processing, analyzes how to utilize multiple threads for running
parallel operations against SimpleDB in Java, PHP, and Python in order to speed up
processing times and taking advantage of the excellent support for concurrency in
SimpleDB.


SimpleDB versus RDBMS


We have all used a Relational Database Management System (RDBMS) at some
point in our careers. These relational databases are ubiquitous and are available
from a wide range of companies such as Oracle, Microsoft, IBM, and so on. These
databases have served us well for our application needs. However, there is a new
breed of applications coming to the forefront in the current Internet-driven and
socially networked economy. The new applications require large scaling to meet
demand peaks that can quickly reach massive levels. This is a scenario that is hard
to satisfy using a traditional relational database, as it is impossible to requisition
and provision the hardware and software resources that will be needed to service
the demand peaks. It is also non-trivial and difficult to scale a normal RDBMS to
hundreds or thousands of nodes. The overwhelming complexity of doing this makes
the RDBMS not viable for these kinds of applications. SimpleDB provides a great
alternative to an RDBMS and can provide a solution to all these problems. However,
in order to provide this solution, SimpleDB makes some choices and design decisions
that you need to understand in order to make an informed choice about the data
storage for your application domain.


In this chapter, we are going to discuss the differences between SimpleDB and a
traditional RDBMS, as well as the pros and cons of using SimpleDB as the storage
engine in your application.


No normalization


Normalization is a process of organizing data efficiently in a relational database
by eliminating redundant data, while at the same time ensuring that the data
dependencies make sense. SimpleDB data models do not conform to any of the
normalization forms, and tend to be completely de-normalized. The lack of need
for normalization in SimpleDB allows you a great deal of fl exibility with your
model, and enables you to use the power of multi-valued attributes in your data.


Let’s look at a simple example of a database starting with a basic spreadsheet
structure and then design it for an RDBMS and a SimpleDB. In this example,
we will create a simple contact database, with contact information as raw data.



The obvious issue is the repetition of the name data. The table is inefficient and
would require care to update to keep the name data in sync. To find a person by
his or her phone number is easy.



SELECT * FROM Contact_Info WHERE Phone_Num = ‘555-854-9885’


So let’s analyze the strengths and weaknesses of this database design.



The design is simple, but as the name data is repeated, it would require care to keep
the data in sync. Searching for phone numbers by name would be ugly if the names
got out of sync.


To improve the design, we can rationalize the data. One approach would be to create
multiple phone number fields such as the following. While this is a simple solution,
it does limit the phone numbers to three. Add e-mail and Twitter, and the table
becomes wider and wider.



Finding a person by a phone number is ugly.



SELECT * FROM Contact_Info WHERE Phone_Num_1 = ‘555-854-9885’
OR Phone_Num_2 = ‘555-854-9885’
OR Phone_Num_3 = ‘555-854-9885’


Now let’s analyze the strengths and weaknesses of this database design.



The design is simple, but the phone numbers are limited to three, and searching by
phone number involves three index searches.


Another approach would be to use a delimited list for the phone number as follows:



This approach has the advantage of no data repetition and is easy to maintain,
compact, and extendable, but the only way to find a record by the phone number
is with a substring search.



SELECT * FROM Contact_Info WHERE Phone_Nums LIKE %555-854-9885%


This type of SQL forces a complete table scan. Do this with a small table and no
one will notice, but try this on a large database with millions of records, and the
performance of the database will suffer.



A delimited field is good for data that is of one type and will only be retrieved.


The normalization for relational databases results in splitting up your data into
separate tables that are related to one another by keys. A join is an operation that
allows you to retrieve the data back easily across the multiple tables.


Let’s first normalize this data.


This is the Person_Info table:



And this is the Phone_Info table:



Now a join of the Person_Info table with the Phone_Info can retrieve the list of
phone numbers as well as the e-mail addresses. The table structure is clean and other
than the ID primary key, no data is duplicated. Provided Phone_Num is indexed,
retrieving a contact by the phone number is efficient.



SELECT First_Name, Last_Name, Phone_num, Person_Info.ID
FROM Person_Info JOIN Phone_Info
ON Person_Info.ID = Phone_Info.ID
WHERE Phone_Num = ‘555-854-9885’


So if we analyze the strengths and weaknesses of this database design, we get:



While this is an efficient relational model, there is no join command in SimpleDB.
Using two tables would force two selects to retrieve the complete contact
information. Let’s look at how this would be done using the SimpleDB principles.


No joins


SimpleDB does not support the concept of joins. Instead, SimpleDB provides you
with the ability to store multiple values for an attribute, thus avoiding the necessity
to perform a join to retrieve all the values.



In the SimpleDB table, each record is stored as an item with attribute/value pairs. The
difference here is that the Phone_Num field has multiple values. Unlike a delimited list
field, SimpleDB indexes all values enabling an efficient search each value.



SELECT * FROM Contact_Info WHERE Phone_Num = ‘555-854-9885’


This SELECT is very quick and efficient. It is even possible to use Phone_Num multiple
times such as follows:



SELECT * FROM Contact_Info WHERE Phone_Num = ‘555-854-9885’
OR Phone_Num = ‘555-748-7854’


Let’s analyze the strengths and weaknesses of this approach:



No schemas


There are no schemas anywhere in sight of SimpleDB. You don’t have to create
schemas, change schemas, migrate schemas to a new version, or maintain schemas.
This is yet another thing that is difficult for some people from a traditional relational
database world to grasp, but this fl exibility is one of the keys to the power of scaling
offered by SimpleDB. You can store any attribute-value data you like in any way you
want. If the requirements for your application should suddenly change and you need
to start storing data on a customer’s Twitter handle for instance, all you need to do is
store the data without worrying about any schema changes!


Let’s add an e-mail address to the database in the previous example. In the relational
database, it is necessary to either add e-mail to the phone table with a type of contact
field or add another field. Let’s add another table named Email_Info.


Person_Info table:



Phone_Info table:



Email_Info table:



Using a traditional relational database approach, we join the three tables to extract
the requested data in one call.



SELECT First_Name, Last_Name, Phone_num, Person_Info.ID, Email_Addr
FROM Person_Info JOIN Phone_Info JOIN Email_Info
ON Person_Info.ID = Phone_Info.ID
AND Person_Info.ID = Email_Info.ID
WHERE Phone_Num = ‘555-854-9885’


Now let’s analyze the strengths and weaknesses of this approach:



We ignored the issue of join versus left outer join, which is really what should be
used here unless all contacts have a phone number and e-mail address. The example
is just to illustrate that the Contact_Info schema must be modified.


Contact_Info domain:



The obvious question is why is Email_Addr not in its own column? In SimpleDB,
there is no concept of a column in a table. The spreadsheet view of the SimpleDB
data was done for ease of readability, not because it refl ects the data structure. The
only structure in SimpleDB consists of the item name and attribute/value pairs. The
proper representation of the SimpleDB data is:



Use the following query to fetch a contact item by the e-mail address:



SELECT * FROM Contact_Info WHERE Email_Addr = ‘john@def.ccc’


Let’s analyze the strengths and weaknesses of this approach:



Simpler SQL


Structured Query Language (SQL) is a standard language that is widely used for
accessing and manipulating the data stored in a relational database. SQL has evolved
over the years into a highly complex language that can do a vast variety of things to
your database. SimpleDB does not support the complete SQL language, but instead
it lets you perform your data retrieval using a much smaller and simpler subset of an
SQL-like query language. This simplifies the whole process of querying your data.
A big difference between the simpler SQL supported by SimpleDB and SQL is the
support for multi-valued SimpleDB attributes, which makes it super simple to query
your data and get back multiple values for an attribute.


The syntax of the SimpleDB SQL is summarized in this syntax:



select output_list
from domain_name
[where expression]
[sort_instructions]
[limit limit]


We will go into detail on SimpleDB SQL in Chapter 6, Querying.


Only strings


SimpleDB uses a very simple data model, and all data is stored as an UTF-8 string.
This simplified textual data makes it easy for SimpleDB to automatically index your
data and give you the ability to retrieve the data very quickly. If you need to store
and retrieve other kinds of data types such as numbers and dates, you must encode
these data types into strings whose lexicographical ordering will be the same as your
intended ordering of the data. As SimpleDB does not have the concept of schemas
that enforce type correctness for your domains, it is the developer’s responsibility to
ensure the correct encoding of data before storage into SimpleDB.


Working only in strings impacts two aspects of using the database: queries and sorts.


Consider the following Sample_Qty table:



Now try and execute the following SQL statement:



SELECT * FROM Sample_Qty WHERE Quantity= ‘1’


This SQL statement will retrieve nothing—not even items 101 and 102.


Selecting all records sorted by Quantity will return the order 101, 102, 103, 105, 104.


Dates present an easier problem, as they can be stored in ISO 8601 format to enable
sorting as well as predictable searching. We will cover this in detail in Chapter 5,
Data Types
.


Eventual consistency


Simple DB can be thought of as a Write-Seldom-Read-Many model. Updates are
done to a central database, but reads can be done from many read-only database
slave servers.


SimpleDB keeps multiple copies of each domain. Whenever data is written or
updated within a domain, first a success status code is returned to your application,
and then all the different copies of the data are updated. The propagation of these
changes to all of the nodes at all the storage locations might take some time, but
eventually the data will become consistent across all the nodes.


SimpleDB provides this assurance only of eventual consistency for your data. This
means that the data you retrieve from SimpleDB at any particular time may be
slightly out of date. The main reason for this is that SimpleDB service is implemented
as a distributed system, and all of the information is stored across multiple physical
servers and potentially across multiple data centers in a completely redundant
manner. This ensures the large-scale ready accessibility and safety of your data,
but comes at the cost of a slight delay before any addition, alteration, or deletion
operations you perform on the data being propagated throughout the entire
distributed SimpleDB system. Your data will eventually be globally consistent, but
until it is consistent, the possibility of retrieving slightly outdated information from
SimpleDB exists.


Amazon has stated in the past that states of global consistency across all the nodes
will usually be achieved “within seconds”; however, please be aware that this
timeframe will depend to a great degree on the processing and the network load on
SimpleDB at the time that you make a change to your data. An intermediate caching
layer can quickly solve this consistency issue if data consistency is highly important
and essential to your application. The principle of eventual consistency is the hardest
to grasp, and it is the biggest difference between a RDBMS and SimpleDB. In order
to scale massively, this is a trade-off that needs to be made at design time for your
application. If you consider how often you will require immediate consistency within
your web applications, you might find that this trade-off is well worth the improved
scalability of your application.


Flash: February 24, 2010 — consistent read added


While eventual consistency is still the normal mode for SimpleDB, Amazon
announced several extensions for consistent read. When using a GetAttributes
or SELECT, the ConsistentRead = true can be selected, forcing a read of the most
current value. This tells SimpleDB to read the items from the master database rather
than from one of the slaves, guaranteeing the latest updates or deletes. This does
not mean you can use this on all reads and still get the extreme scaling. In Chapter 8,
Tuning and Usage Costs,
we will look at the cost of using consistent reads.


A conditional PUT or DELETE was also announced, which will execute a database
PUT or DELETE only if the consistent read of a specific attribute has a specific value
or does not exist. This is useful if concurrent controls or counters primitives. In later
chapters, we will look at the implications of these new features.


Scalability


Relational databases are designed around the entities and the relationships between
the entities, and need a large investment in hardware and servers in order to
provide high scaling. SimpleDB provides a great alternative that is designed around
partitioning your data into independent chunks that are stored in a distributed
manner and can scale up massively. SimpleDB provides the automatic partitioning
and replication of your data, while at the same time guaranteeing fast access and
reliability for your data. You can let Amazon scale their platform as needed using
their extensive resources, while you enjoy the ability to easily scale up in response to
increased demand!


The best feature of SimpleDB scalability is that you only pay for usage, not for the
large cluster needed in anticipation of large usage.


Low maintenance


Maintaining a relational database and keeping it humming with indexing takes
effort, know-how, and technical and administrative resources. Applications are
not static but dynamic things, and change constantly along with additions of new
features. All of these updates can result in changes and modifications to the database
schema along with increased maintenance and tuning costs. SimpleDB is hosted
and maintained for you by Amazon. Your task is as simple as storing your data and
retrieving it when needed. The simplicity of structured data and lack of schemas
helps your application be more fl exible and adaptable to change, which is always
around the corner. SimpleDB ensures that your queries are optimized and retrieval
times are fast by indexing all your data automatically.


Advantages of the SimpleDB model


SimpleDB’s alternative approach for storing data can be advantageous for meeting
your application needs when compared to a traditional relational database. Here’s
the list of advantages:



  • Reduced maintenance as compared to a relational database

  • Automated indexing of your data for fast performance

  • Flexibility to modify or change your stored data without the need to
    worry about schemas

  • Failover for your data automatically being provided by Amazon

  • Replication for your data across multiple nodes also handled for you
    by Amazon

  • Ability to easily scale up in response to increased demand without
    worrying about running out of hardware or processing capacity

  • Simplified data storage and querying using a simple API

  • The lack of object-to-relational mapping that is common for an RDBMS
    allows your structured data to map more directly to your underlying
    application code and reduce the application development time


Disadvantages of the SimpleDB model


SimpleDB’s alternative approach also has some disadvantages compared to a
relational database for certain applications.



  • Those using applications that always need to ensure immediate consistency
    of data will find that SimpleDB’s eventual data consistency model may not
    suit their needs. The consistent read announcement does change this, but the
    eventual consistency model is still the basis of the extreme scalability.

  • Using SimpleDB as the data storage engine in your applications needs
    the development team to get used to different concepts over a simple,
    traditional RDBMS.

  • Because relationships are not explicitly defined at the schema level as in a
    relational database, you might need to enforce some data constraints within
    your application code.

  • If your application needs to store data other than strings, such as numbers
    and dates, additional effort will be required on your part to encode the
    strings before storing them in the SimpleDB format.

  • The ability to have multiple attributes for an item is a completely different
    way of storing data and has a learning curve attached to it for new users
    who are exposed to SimpleDB.


Summary


In this chapter, we discussed the differences between SimpleDB and the traditional
relational database systems in detail. In the next chapter, we are going to review the
data model used by SimpleDB.

Filed Under: Oracle

Oracle SQL Developer 2.1

July 2, 2010 by Krishna Srinivasan Leave a Comment

Oracle SQL Developer 2.1


Oracle SQL Developer is a relatively new product included in the range of Oracle
products. It was first introduced to the world in September 2005, by its code name Project
Raptor. Raptor is a name many still cling to, despite being replaced very quickly with the
full product name, Oracle SQL Developer (and referred to in the rest of the book as SQL
Developer). The first production version was released in early 2006 and had many Oracle
customers both skeptical and excited.


SQL Developer is a Graphical User Interface (GUI) for browsing and working with
databases. Primarily developed for the Oracle database, it also provides a GUifor a
selection of popular non-Oracle databases. As technology has evolved, many developers
are looking for modern tools to assist them in daily tasks. A clean, easy to navigate
interface is essential and many tools are judged on looks, performance, and functionality.


Initially billed as a light-weight, clean, easy to install GUifor the Oracle database, SQL
Developer is gaining momentum in the market and broadening its scope in its offering.


The skeptics mentioned are ever concerned that Oracle is not really interested in the
product, backing up this concern with the fact that SQL Developer is free. Disappointing
as it is that “free” is equated with “poor quality”, the last couple of years have
considerably stilled many of these dissenting voices and the market using SQL Developer
continues to increase. Time will tell.


What This Book Covers


The book is designed to allow you to dip into any chapter that is your current area
of focus. Having said that, if you have never used SQL Developer before, are new to
Oracle and SQL, or have never used a GUitool before, we recommend that you read
the introduction, which gets you started slowly. You can then work through the first
few chapters in order. Once you have covered the first three chapters, you should be on
your way.


As the chapters progress, while they do not necessarily get more complex, they do rely to
some extent on existing knowledge. The latter portion of the book is not considered to be
more complex, but instead focuses on areas used by specific target audiences, such as
Oracle Application Express or migrating from non-Oracle databases.


We have loosely divided the book into the following three sections.


I: Getting started


The first few chapters introduce SQL Developer. Use them to get to know the layout of
the product and the essential features it provides. For those new to Oracle, or application
developers who spend the day using other development tools, these first few chapters
may be the only chapters needed.


In Chapter 1: Getting Started with Oracle SQL Developer, we provide general
information about SQL Developer, where you can find it, how to install it, and how to get
started. Using a simple example, you quickly walk through the product, touching a
variety of areas to get a feel for the navigation, the layout, and some of the features
available. We introduce the environment, how to navigate and manage the interface, the
database, and schemas used in this book. The chapter includes tips for customizing the
environment to suit your preferences.


In Chapter 2: Browsing and Editing Database Objects and Data, you will create, edit,
and update database objects, such as tables, views, sequences, and instance data (the data
in those tables), using data grids. You’ll also learn how to access and use the utilities,
wizards, and dialogs, which SQL Developer provides, to work with data and database
objects. Lastly, you will use SQL commands and scripts, accessing the code formatter
and other assistants that separate a GUitool from a command–line environment.


In Chapter 3: Working with the SQL Worksheet, you see that the SQL Worksheet
provides a scratch pad for all SQL, PL/SQL, and SQL*Plus statements and commands. In
this chapter, you learn to work with the SQL Worksheet and the commands it supports,
how to use code insight, templates, snippets, and formatting code. You learn how to
manage the environment and to work with multiple worksheets. We introduce the Query
Builder and how to write, test, and run SQL queries using the Query Builder.


In Chapter 4: The Power of SQL Reports, you look at the variety of SQL reports
provided by SQL Developer, why they are useful in daily tasks, and how to run and use
them. You’ll learn how to create your own reports, such as master-detail or drill-down
reports, and share your user-defined reports.


II: Delving a little deeper


These chapters are intended for all developers, perhaps more focused on those of you
who spend your days working with database objects and data. While not strictly advanced
features, there are areas here that can be more involved, and knowledge of how the tool
works will complement your existing knowledge of developing and working with code.


In Chapter 5: Working with PL/SQL, we start by reviewing various PL/SQL structures.
This chapter covers creating, editing, compiling, and debugging PL/SQL code. You’ll see
how the various search tools work, and use code templates and snippets. You’ll learn how
to refactor PL/SQL code, and search using the Oracle Database 11g PL/Scope feature.


Chapter 6: SQL and PL/SQL Tuning Tools. Although not designed to teach you how to
optimize your code, this chapter shows you the different utilities that SQL Developer
provides to help you see problem areas and work at fixing them. Utilities include using
Explain Plan, reviewing trace files, and auto trace.


In Chapter 7: Managing Files, you’ll see that SQL Developer provides an integrated file
navigator that allows you to browse the file system and open files from within the tool.
This chapter introduces the file navigator and the features it supports. Working in
conjunction with the file navigator, SQL Developer integrates with open source version
control systems. This chapter discusses the alternatives and shows you how to place your
code under version control and work within a team sharing code, by comparing and
merging code.


In Chapter 8: Importing, Exporting, and Working with Data, you’ll learn how SQL
Developer provides a number of utilities for moving or copying both the data structures
and the instance data between schemas and databases. This chapter introduces you to the
schema copy, diff, and export wizards and related export and import utilities.


III: Focus areas


The last few chapters are focused on specific audiences. We don’t specifically cover
complex features, but there are aspects of some of these features that require knowledge
beyond the scope of this book.


In Chapter 9: Database Connections and JDBC Drivers, we’ll look at the various types
of database connections available, how to set them up, and when to use them. We’ll look
at various authentication methods, such as LDAP, OS authentication or strong
authentication methods like Kerberos, and compare Basic, TNS, and JDBC connections.


This chapter also looks at setting up the required JDBC drivers and creating connections
to non-Oracle databases. The chapter closes with a section on managing your
connections.


Chapter 10: Introducing SQL Developer Data Modeler. Whether you just want a diagram
of how your tables connect, or you want to build an entity relationship diagram,
transform that to a relational model, and generate the DDL script to create the objects in
your database, the SQL Developer Data Modeler provides the tools you need. This
chapter introduces some of the modeling utilities available in SQL Developer Data
Modeler. You’ll also learn about the integration points with SQL Developer and the Data
Modeler Viewer in SQL Developer.


In Chapter 11: Extending SQL Developer, you’ll see that SQL Developer is an extensible
tool. This means you, as an end user of the tool, can add in your own features. These may
be small, providing just a single input or output field, and only require XML and SQL
knowledge, while others are more extensive and require Java skills. This chapter reviews
the various ways you can extend SQL Developer using SQL and XML.


In Chapter 12: Working with Application Express, you’ll see how SQL Developer
connects to and provides utilities for interacting with Application Express (Oracle
APEX). In this chapter, features you’ll learn to use include setting up the environment to
work with Oracle APEX, importing, browsing, and deploying applications. You’ll also
see how to tune and refactor SQL and PL/SQL code.


In Chapter 13: Working with SQL Developer Migrations, you’ll understand how SQL
Developer helps you connect to and browse non-Oracle or third-party databases. This
chapter looks at preparing the SQL Developer environment, and browsing the supported
third-party databases. The focus of the chapter is on migrating from a third-party database
to Oracle, from setting up the repository, through the capture and conversion phases, and
ending with data migration. You’ll also learn to translate pieces of code, such as
translating T-SQL to PL/SQL.


Standards and assumptions


Unless otherwise stated, the environment used in the examples and referred to throughout
the book is Oracle Database 11g Release 1, installed on Microsoft Windows XP. We use
the shipped Oracle sample schemas, in addition to using the SYSTEM and SYS users.


Oracle offers a free database, Oracle Express Edition 10g (XE), which you can download
and use for many of the examples. Some of the discussion is related to new Oracle
Database 11g functionality that is not available on any Oracle Database 10g release,
including XE. In the case where the feature is Oracle Database 11g specific, the text will
indicate this. However, the text does not list anomalies for each database version.


This book was written using and is based on Oracle SQL Developer 2.1. Screenshots and
menu items all refer to SQL Developer 2.1 and were accurate at the time of writing,
completed just prior to the production release of the software. Minor differences are
inevitable due to changes in the later stages of product development prior to production.
You can also use the book, and much of the content, as a guide for working with earlier
releases of SQL Developer. Where they are significant, we make note of the differences
in features that have changed from earlier releases.


Getting Started with SQL Developer


This book is divided into chapters that focus on the different areas or functionality
in SQL Developer. The progression through the chapters is from the more frequently
used features to those less frequently used. This initial chapter is all about preparing
your environment, installation, and getting started.


SQL Developer is easy to set up and use, so there is very little setup required to
follow the examples in this book. The best way to learn is by practice, and for that
you’ll need a computer with access to an Oracle database and SQL Developer.
This chapter, and indeed the rest of the book, assumes you have a computer with
Microsoft Windows, Linux, or Mac OS X installed, and that you have access to
an Oracle database. It focuses on the alternative installations available for SQL
Developer, where to find the product, and how to install it. Once your environment
is set up, you can follow a quick product walk-through to familiarize yourself with
the landscape. You’ll create a few connections, touch on the various areas available
(such as the SQL Worksheet and Reports navigator), and learn about the control
of the windows and general product layout.


Preparing your environment


Preparing your environment depends on a few factors, including the platform you
are working on and whether you have an early edition of SQL Developer previously
installed. First, you need to locate the software, download, and install it.


Finding and downloading the software


SQL Developer is available through a variety of sources as both a standalone
download and as part of the Oracle Database and Oracle JDeveloper installations.


SQL Developer is a free product, and you can download it from the Oracle
Technology Network, http://www.oracle.com/technology/products/database/
sql_developer. Use this link to reach the download for the latest standalone
production release. It also includes details of the release and is regularly updated
with news of preview releases and new articles. While SQL Developer is free to
download and use, you are required to read and agree to the license before you can
proceed with the download. The product also falls under Oracle Support contracts,
if you have a Support contract for the database, which means that you can log Oracle
Support tickets.


Downloading and installing the Java Development Kit


SQL Developer requires the Java SE Development Kit (JDK) ; this includes the Java
Runtime Environment (JRE)
and other tools, which are used by SQL Developer
utilities such as the PL/SL Debugger.


For Microsoft Windows, you can download and install SQL Developer with the JDK
already installed. This means you’ll download and unzip the product and will be
ready to start, as there are no extra steps required. For the other operating systems,
you’ll need to download the JDK and direct SQL Developer to the path yourself.
Indeed, as many other products require a JDK to be installed, you may already
have one on your system. In this case, just direct the product to use an existing JDK
installation. For Microsoft Windows, ensure you download SQL Developer without
the JDK to make use of an existing JDK installation.


The SQL Developer download site offers a selection of download choices:



  • Microsoft Windows (with or without the JDK)

  • Linux (without the JDK)

  • Mac OS X (without the JDK)


In each case, make your selection and download the required file.


The download sites for the JDK are as follows:



  • For Microsoft Windows and Linux:

    http://java.sun.com/javase/downloads/index.jsp

  • For Mac OS X:

    http://developer.apple.com/java/download/



SQL Developer is shipped with the minimum JDK required. You can
download and use the latest updates to the JDK. You should be aware
that some updates to the JDK are not supported. This detail is posted on
the SQL Developer Downloads page for each release. Starting from SQL
Developer 2.1, JDK 1.6 is the minimum JDK supported.


Once you have installed the JDK, you can start SQL Developer.

Pages: 1 2 3 4

Filed Under: Oracle Tagged With: SQL Developer

Getting Started with Oracle SOA Suite

June 8, 2010 by Krishna Srinivasan Leave a Comment

Getting Started with Oracle SOA Suite 11g R1


A Hands-On Tutorial


As the concept of Service-Oriented Architecture has matured, it has triggered the
emergence of new, sophisticated, and specialized tools: Enterprise Service Buses (ESB)
for service virtualization, BPEL for orchestration, Human Workflow, Business Rules for
externalizing key pieces of logic, and so on. As a result, developers now have a rich set of
tools to work with. However, this can itself present a challenge: how can one keep up
with all these various tools and their capabilities? One of the key goals of Oracle SOA
Suite 11g is to assemble these tools in a cohesive, simple-to-use, and highly-integrated
development environment. This book, organized around a tutorial that is built in an
iterative fashion, will guide you through all the components of the suite and how they
relate to each other.


The authors are part of the Oracle SOA Suite product management team, and the idea
of the book came as we were delivering an earlier version of this material, as an
accelerated internal training at Oracle—before the product was even released. These
training sessions were very well received and we decided it was worth sharing this
material with a larger audience.


This book is not meant to be used as reference material—it is an accelerated learning
path
to the Oracle SOA Suite. The focus is on breadth rather than on depth. More
specifically, we wanted to highlight the key capabilities and role of each product in the
Oracle SOA Suite and explain how they can be put to work together to deliver highly
capable and flexible applications. Too often we, as developers, tend to stretch the limits
of (not to say abuse!) a few technologies, simply to stay within our comfort zone—and
because there is always so little time to learn new things. With its streamlined format, we
hope this book will give you the confidence to further explore some of these technologies
you had never looked at before.


What This Book Covers


The principal aim of this book is to get you operational with Oracle SOA Suite 11gR1
quickly and easily. In this spirit, the largest part of this book is dedicated towards a set of
hands-on step-by-step tutorials that build a non-trivial SOA composite that you can
deploy, test, run, monitor, and manage.


Chapter 1 starts the book off with a quick refresher on some of the useful concepts
regarding SOA and services and concludes with an introduction to Service Component
Architecture (SCA).


Chapter 2 discusses the key challenges in the technical implementation of SOA-based
applications and how Oracle SOA Suite 11g leverages SCA principles to address
these challenges.


Chapter 3 describes the business and technical requirements for a purchase order
(PO) processing composite and gives you an overview of how the complete solution
will be built up in a set of discrete steps using a series of tutorials using Oracle SOA
Suite 11gR1.


Chapter 4 gives you the necessary instructions for download, installation, and
configuration of Oracle SOA Suite 11gR1.


The core functionalities of the PO processing composite that is described in Chapter 3 are
built in Chapters 5 through 10. This series of chapters will teach you the basics of
working with Oracle SOA Suite 11g and the IDE (JDeveloper).


You start building the composite using a mediator, as well as web services and database
adapters. You then add a file adapter and a BPEL (Business Process Execution
Language) component to create a process that orchestrates the overall flow, adding
human interaction, creating conditional process execution using business rules, and
accessing external services via a JMS (Java Message Service) adapter. At the end of each
and every chapter, you will have a composite that can be deployed, run, and tested. You
are advised to go through these tutorial chapters, 5-10 in a sequential manner.


The tutorials in Chapters 11 through 19 let you add more functionality to the composite
and explore some of the operational features of Oracle SOA Suite 11gR1. You will learn
service re-use and virtualization using Oracle Service Bus (OSB), explore some of the
composite life cycle management features, test the composite using the unit testing
framework, incorporate exception handling, add security policies to a service, set up a
business activity-level tracking of the composite transactions using Oracle Business
Activity Monitoring (BAM), work with events using the unified services and events
platform of Oracle SOA Suite 11g, handle data using Service Data Object (SDO)
specification, and connect the composite to a Business-to-Business (B2B) gateway using
Oracle B2B.


By the end of Chapter 19, you should have a good grasp of all components in Oracle
SOA Suite 11gR1, and be able to create modular, full-featured service composites. The
concluding remarks in Chapter 20 will briefly discuss some of the ways you could use
such composites to provide business benefits.


Event Delivery Network

Filed Under: Oracle

Oracle Warehouse Builder 11g Getting Started

August 29, 2009 by Krishna Srinivasan Leave a Comment

Oracle Warehouse Builder 11g Getting Started

Competing in today’s world requires a greater emphasis on strategy, long-range
planning, and decision making, and this is why businesses are building data warehouses.
Data warehouses are becoming more and more common as businesses have realized
the need to mine the information that is stored in electronic form. Data warehouses
provide valuable insight into the operation of a business and how best to improve it.
Organizations need to monitor these processes, define policy, and at a more strategic
level, define the visions and goals that will move the company forward in the future. If
you are new to data warehousing in general, and to Extract, Transform, and Load
(ETL)
in particular, and need a way to get started, the Oracle Warehouse Builder is a
great application to use to build your warehouse. The Oracle Warehouse Builder
(OWB)
is a tool provided by Oracle that can be used at every stage of the implementation
of a data warehouse right from the initial design and creation of the table structure to ETL
and data-quality auditing.

We will build a basic data warehouse using Oracle Warehouse Builder. It has the
ability to support all phases of the implementation of a data warehouse from designing
the source and target information, the mappings to map data from source to target,
the transformations needed on the data, and building the code to implementing the
mappings to load the data. You are free to use any or all of the features in your
own implementation.

What This Book Covers

This book is an introduction to the Oracle Warehouse Builder (OWB). This is an
introductory, hands-on book so we will be including in this book the features available in
Oracle Warehouse Builder that we will need to build our first data warehouse.

The chapters are in chronological order to flow through the steps required to build a data
warehouse. So if you are building your first data warehouse, it is a good idea to read
through each chapter sequentially to gain maximum benefit from the book. Those who
have already built a data warehouse and just need a refresher on some basics can skip
around to whatever topic they need at that moment.

We’ll use a fictional toy company, ACME Toys and Gizmos, to illustrate the concepts
that will be presented throughout the book. This will provide some context to the
information presented to help you apply the concepts to your own organization. We’ll
actually be constructing a simple data warehouse for the ACME Toys and Gizmos
company. At the end of the book, we’ll have all the code, scripts, and saved metadata that
was used. So we can build a data warehouse for practice, or use it as a model for building
another data warehouse.

Chapter 1: An Introduction to Oracle Warehouse Builder starts off with a high-level look
at the architecture of OWB and the steps for installing it. It covers the schemas created in
the database that are required by OWB, and touches upon some installation topics to
provide some further clarification that is not necessarily found in the Oracle
documentation. Most installation tasks can be found in the Oracle README files and
installation documents, and so they won’t be covered in depth in this book.

Chapter 2: Defining and Importing Source Data Structures covers the initial task of
building a data warehouse from scratch, that is, determining what the source of the data
will be. OWB needs to know the details about what the source data structures look like
and where they are located in order to properly pull data from them using OWB. This
chapter also covers how to define the source data structures using the Data Object Editor
and how to import source structure information. It talks about three common sources of
data—fl at files, Oracle Databases, and Microsoft SQL Server databases—while
discussing how to configure Oracle and OWB to connect to these sources.

Chapter 3: Designing the Target Structure explains designing of the data warehouse
target. It covers some options for defining a data warehouse target structure using
relational objects (star schemas and snowflake schemas) and dimensional objects
(cubes and dimensions). Some of the pros and cons of the usage of these objects are
also covered. It introduces the Warehouse Builder for designing and starts with the
creation of a target user and module.

Chapter 4: Creating the Target Structure in OWB implements the design of the target
using the Warehouse Builder. It has step-by-step explanations for creating cubes and
dimensions using the wizards provided by OWB.

Chapter 5: Extract, Transform, and Load Basics introduces the ETL process by
explaining what it is and how to implement it in OWB. It discusses whether to use a
staging table or not, and describes mappings and some of the main operators in OWB that
can be used in mappings. It introduces the Warehouse Builder Mapping Editor, which is
the interface for designing mappings.

Chapter 6: ETL: Putting it Together is about creating a new mapping using the Mapping
Editor. A staging table is created with the Data Object Editor, and a mapping is created to
map data directly from the source tables into the staging table. This chapter explains how
to add and edit operators, and how to connect them together. It also discusses operator
properties and how to modify them.

Chapter 7: ETL: Transformations and Other Operators expands on the concept of
building a mapping by creating additional mappings to map data from the staging table
into cube and dimensions. Additional operators are introduced for doing transformations
of the data as it is loaded from source to target.

Chapter 8: Validating, Generating, Deploying, and Executing Objects covers in great
detail the validating of mappings, the generation of the code for mappings and objects,
and deploying the code to the target database. This chapter introduces the Control
Center Service, which is the interface with the target database for controlling this
process, and explains how to start and stop it. The mappings are then executed to actually
load data from source to target. It also introduces the Control Center Manager, which is
the user interface for interacting with the Control Center Service for deploying and
executing objects.

Chapter 9: Extra Features covers some extra features provided in the Warehouse
Builder that can be very useful for more advanced implementations as mappings get
more numerous and complex. The metadata change management features of OWB are
discussed for controlling changes to mappings and objects. This includes the recycle bin,
cutting/copying and pasting objects to make copies or backups, the snapshot feature, and
the metadata loader facility for exporting metadata to a file. Keeping objects
synchronized as changes are made is discussed, and so is the auto-binding of tables to
dimensional objects. Lastly, some additional online references are provided for further
study and reference.

Designing the Target Structure

We have our entire source structures defined in the Warehouse Builder. But before
we can do anything with them, we need to design what our target data warehouse
structure is going to look like. When we have that figured out, we can start mapping
data from the source to the target. So, let’s design our target structure. First, we’re
going to take a look at some design topics related to a data warehouse that are
different from what we would use if we were designing a regular relational database.
We’ll then discuss what our design will look like, and after that we’ll be ready to
move right into creating that design using the Warehouse Builder in the next chapter.

Data warehouse design

When it comes to the design of a data warehouse, there is basically one option
that makes the most sense for how we will structure our database and that is the
dimensional model. This is a way of looking at the data from a business perspective
that makes the data simple, understandable, and easy to query for the business end
user. It doesn’t require a database administrator to be able to retrieve data from it.

When looking at the source databases in the last chapter, we saw a normalized
method of modelling a database. A normalized model removes redundancies in data
by storing information in discrete tables, and then referencing those tables when
needed. This has an advantage for a transactional system because information needs
to be entered at only one place in the database, without duplicating any information
already entered. For example, in the ACME Toys and Gizmos transactional database,
each time a transaction is recorded for the sale of an item at a register, a record
needs to be added only to the transactions table. In the table, all details regarding
the information to identify the register, the item information, and the employee
who processed the transaction do not need to be entered because that information
is already stored in separate tables. The main transaction record just needs to be
entered with references to all that other information.

This works extremely well for a transactional type of system concerned with daily
operational processing where the focus is on getting data into the system. However,
it does not work well for a data warehouse whose focus is on getting data out of the
system. Users do not want to navigate through the spider web of tables that compose
a normalized database model to extract the information they need. Therefore,
dimensional models were introduced to provide the end user with a fl attened structure
of easily queried tables that he or she can understand from a business perspective.

Dimensional design

A dimensional model takes the business rules of our organization and represents
them in the database in a more understandable way. A business manager looking at
sales data is naturally going to think more along the lines of “how many gizmos did
I sell last month in all stores in the south and how does that compare to how many
I sold in the same month last year?” Managers just want to know what the result
is, and don’t want to worry about how many tables need to be joined in a complex
query to get that result. In the last chapter, we saw how many tables would have to
be joined together in such a query just to be able to answer a question like the one
above. A dimensional model removes the complexity and represents the data in a
way that end users can relate to it more easily from a business perspective.

Users can intuitively think of the data for the above question as a cube, and the edges
(or dimensions) of the cube labeled as stores, products, and time frame. So let’s take a
look at this concept of a cube with dimensions, and how we can use that to represent
our data.

Cube and dimensions

The dimensions become the business characteristics about the sales, for example:

  • A time dimension—users can look back in time and check various time
    periods
  • A store dimension—information can be retrieved by store and location
  • A product dimension—various products for sale can be broken out

Think of the dimensions as the edges of a cube, and the intersection of the
dimensions as the measure we are interested in for that particular combination of
time, store, and product. A picture is worth a thousand words, so let’s look at what
we’re talking about in the following image:

Notice what this cube looks like. How about a Rubik’s Cube? We’re doing a data
warehouse for a toy store company, so we ought to know what a Rubik’s cube is! If
you have one, maybe you should go get it now because that will exactly model what
we’re talking about. Think of the width of the cube, or a row going across, as the
product dimension. Every piece of information or measure in the same row refers to
the same product, so there are as many rows in the cube as there are products. Think
of the height of the cube, or a column going up and down, as the store dimension.
Every piece of information in a column represents one single store, so there are as
many columns as there are stores. Finally, think of the depth of the cube as the time
dimension, so any piece of information in the rows and columns at the same depth
represent the same point in time. The intersection of each of these three dimensions
locates a single individual cube in the big cube, and that represents the measure
amount we’re interested in. In this case, it’s dollar sales for a single product in a
single store at a single point in time.

But one might wonder if we are restricted to just three dimensions with this model.
After all, a cube has only three dimensions— length, width, and depth. Well, the
answer is no. We can have many more dimensions than just three. In our ACME
example, we might want to know the sales each employee has accomplished for the
day. This would mean we would need a fourth dimension for employees. But what
about our visualization above using a cube? How is this fourth dimension going to
be modelled? And no, the answer is not that we’re entering the Twilight Zone here
with that “dimension not only of sight and sound but of mind…” We can think of
additional dimensions as being cubes within a cube. If we think of an individual
intersection of the three dimensions of the cube as being another cube, we can see
that we’ve just opened up another three dimensions to use—the three for that inner
cube. The Rubik’s Cube example used above is good because it is literally a cube of
cubes and illustrates exactly what we’re talking about.

We do not need to model additional cubes. The concept of cubes within cubes was
just to provide a way to visualize further dimensions. We just model our main cube,
add as many dimensions as we need to describe the measures, and leave it for the
implementation to handle.

This is a very intuitive way for users to look at the design of the data warehouse.
When it’s implemented in a database, it becomes easy for users to query the
information from it.

Implementation of a dimensional model in a database

We have seen how a dimensional model is preferred over a normalized model
for designing a data warehouse. Now before we finalize our model for the ACME
Toys and Gizmos data warehouse, let’s look at the implementation of the model
to see how it gets physically represented in the database. There are two options: a
relational implementation and a multidimensional implementation. The relational
implementation, which is the most common for a data warehouse structure, is
implemented in the database with tables and foreign keys. The multidimensional
implementation requires a special feature in a database that allows defining cubes
directly as objects in the database. Let’s discuss a few more details of these two
implementations. But we will look at the relational implementation in greater detail
as that is the one we’re going to use throughout the remainder of the book for our
data warehouse project.

Relational implementation (star schema)

Back in Chapter 2, we saw how ACME’s POS Transactional database and Order
Entry databases were structured when we did our initial analysis. The diagrams
presented showed all the tables interconnected, and we discussed the use of foreign
keys in a table to refer to a row in another table. That is fundamentally a relational
database. The term relational is used because the tables in it relate to each other in
some way. We can’t have a POS transaction without the corresponding register it
was processed on, so those two relate to each other when represented in the database
as tables.

For a relational data warehouse design, the relational characteristics are retained
between tables. But a design principle is followed to keep the number of levels of
foreign key relationships to a minimum. It’s much faster and easier to understand if
we don’t have to include multiple levels of referenced tables. For this reason, a data
warehouse dimensional design that is represented relationally in the database will
have one main table to hold the primary facts, or measures we want to store, such as
count of items sold or dollar amount of sales. It will also hold descriptive information about
those measures that places them in context, contained in tables that are accessed by the
main table using foreign keys. The important principle here is that these tables that are
referenced by the main table contain all the information they need and do not need to
go down any more levels to further reference any other tables.

The ER diagram of such an implementation would be shaped somewhat like a star,
and thus the term star schema is used to refer to this kind of an implementation.
The main table in the middle is referred to as the fact table because it holds the
facts, or measures that we are interested in about our organization. This represents
the cube that we discussed earlier. The tables surrounding the fact table are known
as dimension tables. These are the dimensions of our cube. These tables contain
descriptive information, which places the facts in a context that makes them
understandable. We can’t have a dollar amount of sales that means much to us unless
we know what item it was for, or what store made the sale, or any of a number of
other pieces of descriptive information that we might want to know about it.

It is the job of data warehouse design to determine what pieces of information need
to be included. We’ll then design dimension tables to hold the information. Using the
dimensions we referred to above in our cube discussion as our dimension tables, we
have the following diagram that illustrates a star schema:

Of course our star only has three points, but with a much larger data warehouse of
many more dimensions, it would be even more star-like. Keep in mind the principle
that we want to follow here of not using any more than one level of foreign key
referencing. As a result, we are going to end up with a de-normalized database
structure. We discussed normalization back in Chapter 2, which involved the use
of foreign key references to information in other tables to lessen the duplication
and improve data accuracy. For a data warehouse, however, the query time and
simplicity is of paramount importance over the duplication of data. As for the data
accuracy, it’s a read-only database so we can take care of that up front when we load
the data. For these reasons, we will want to include all the information we need right
in the dimension tables, rather than create further levels of foreign key references.
This is the opposite of normalization, and thus the term de-normalized is used.

Let’s look at an example of this for ACME Toys and Gizmos to get a better
idea of what we’re talking about with this concept of de-normalization. Every
product in our stores is associated with a department. If we have a dimension for
product information, one of the pieces of information about the product would
be the department it is in. In a normalized database, we would consider creating
a department table to store department descriptions with one row for each
department, and would use a short key code to refer to the department record in
the product table.

However, in our data warehouse, we would include that department information,
description and all, right in the product dimension. This will result in the same
information being duplicated for each product in the department. What that buys
us is a simpler structure that is easier to query and more efficient for retrieving
information from, which is key to data warehouse usability. The extra space we
consume in repeating the information is more than paid for in the improvement in
speed and ease of querying the information. That will result in a greater acceptance
of the data warehouse by the user community who now find it more intuitive and
easier to retrieve their data.

In general, we will want to de-normalize our data warehouse implementation
in all cases, but there is the possibility that we might want to include another
level—basically a dimension table referenced by another dimension table. In most
cases, we will not need nor want to do this and instances should be kept to an
absolute minimum; but there are some cases where it might make sense.

This is a variation of the star schema referred to as a snowflake schema because
with this type of implementation, dimension tables are partially normalized to pull
common data out into secondary dimension tables. The resulting schema diagram
looks somewhat like a snowfl ake. The secondary dimension tables are the tips of
the snowfl ake hanging off the main dimension tables in a star schema.

In reality, we’d want at the most only one or two of the secondary dimension
tables; but it serves to illustrate the point. A snowfl ake dimension table is really not
recommended in most cases because of ease-of-use and performance considerations,
but can be used in very limited circumstances. The Kimball book on Dimensional
Modelling was referred to at the beginning of Chapter 2. This book discusses some
limited circumstances where it might be acceptable to implement a snowfl ake design,
but it is highly discouraged for most cases.

Let’s now talk a little bit about the multidimensional implementation of a
dimensional model in the database, and then we’ll design our cube and dimensions
specifically for the ACME Toys and Gizmos Company data warehouse.

Multidimensional implementation (OLAP)

A multidimensional implementation or OLAP (online analytic or analytical
processing)
requires a database with special features that allow it to store cubes as
actual objects in the database, and not just tables that are used to represent a cube
and dimensions. It also provides advanced calculation and analytic content built
into the database to facilitate advanced analytic querying. Oracle’s Essbase product
is one such database and was originally developed by Hyperion. Oracle recently
acquired Hyperion, and is now promoting Essbase as a tool for custom analytics
and enterprise performance management applications. The Oracle Database
Enterprise Edition has an additional feature that can be licensed called OLAP that
embeds a full-featured OLAP server directly in an Oracle database. This is an option
organizations can leverage to make use of their existing database.

These kinds of analytic databases are well suited to providing the end user with
increased capability to perform highly optimized analytical queries of information.
Therefore, they are quite frequently utilized to build a highly specialized data
mart
, or a subset of the data warehouse, for a particular user community. The data
mart then draws its data to load from the main data warehouse, which would be a
relational dimensional star schema. A data warehouse implementation may contain
any number of these smaller subset data marts.

We’ll be designing dimensionally and implementing relationally, so let’s now
design our actual dimensions that we’ll need for our ACME Toys and Gizmos data
warehouse, and talk about some issues with the fact data (or cube) that we’ll need.
This will make the concepts we just discussed more concrete, and will form the basis
for the work we do in the rest of the book as we implement this design. We’ll then
close out this chapter with a discussion on designing in the Warehouse Builder,
where we’ll see how it can support either of these implementations.


	We have seen the word dimension used in describing both a relational
	implementation and a multidimensional implementation. It is even in the
	name of the second implementation method we discussed, so why does
	the relational method use it also? In the relational case, the word is used
	more as an adjective to describe the type of table taken from the name of
	the model being implemented; whereas in the multidimensional model
	it's more a noun, referring to the dimension itself that actually gets
	created in the database. In both cases, the type of information conveyed
	is the same—descriptive information about the facts or measures—so its
	use in both cases is really not contradictory. There is a strong correlation
	between the fact table of the relational model and the cube of the
	dimensional model, and between the dimension tables of the relational
	model and the dimensions of the dimensional model.

Pages: 1 2

Filed Under: Oracle

Mastering Oracle Scheduler in Oracle 11g Databases

August 27, 2009 by Krishna Srinivasan Leave a Comment

Mastering Oracle Scheduler in Oracle 11g Databases

This article is a sample chapter published for the book. Welcome to the world of Oracle Scheduler! Oracle Scheduler is a free utility included in the Oracle database that makes the Oracle RDBMS the most powerful scheduling tool on our planet (and in the known parts of the galaxy). The Oracle Scheduler can be used to automate not only the simple maintenance tasks, but also the complex business logic. Traditionally, only PL/SQL could be executed in the Scheduler. Later, operating system scripts were added to it, and now we can run jobs on remote systems and cross platform as well. This can turn the Oracle Scheduler into the spider in your Web, controlling all the jobs running in the organization and giving you a single point for control.

  • Call Java Oracle Functions From Java program
  • MySql database with Hibernate mappings

Database background

Relational database management systems (RDBMS) can be very powerful. With a little code, we can use the RDBMS as a filesystem, mail server, HTTP server, and now also as a full-blown job Scheduler that can compete very well with other commercial job Schedulers. The advantage that Oracle gives us is in terms of price, flexibility, and phenomenal power. The scheduling capabilities are all a part of the normal Oracle license for the RDBMS, whereas others have a serious price tag and often require a database for the repository to store the metadata of the jobs.

Scheduling events in the database

Since Oracle added the procedural option to the database, they also included some scheduling power provided by dbms_jobs. Although a bit restricted, it was used extensively. However, no one would even think about using this as an Enterprise-level
Scheduler tool. This changed when Oracle introduced 10gR2. In this release, Oracle could not only start jobs that ran outside the database, but they also added the job chain.

In 11g, Oracle also added the option to run jobs on remote systems where no database is running. Now it’s time to rethink what the database actually is. In the early days, a database was just a bunch of code that could hold data in tables. Today, the Oracle
RDBMS can do that—and that too well—along with many more things. In fact, the Oracle RDBMS can perform so many tasks so amazingly, that it’s surprising that we still call it just a database. We could easily turn it into a personal assistant.

Oracle Scheduler 11g can:

  • Run jobs where no database ever was before
  • Use different operating system credentials per job
  • React on events
  • Schedule jobs on multiple platforms simultaneously
  • Give a tight security

What This Book Covers
Chapter 1 will get you going with the Scheduler as quickly as possible. In the end, you will automate simple tasks that are now maintained in cron, task manager, or the good old DBMS_JOB package, for example.

Chapter 2 will show you a lot of possibilities of chains with many examples and explanations. In short, it will tell you all you ever wanted to know about chains, but were afraid to ask.

Chapter 3 is for all you people living in an organization that requires strict job separation. This chapter will show how to make good use of the Scheduler and apply job separation.

Chapter 4 is a very important chapter that explains how to crank up the power of a system to the limits by combining the Scheduler and the Resource Manager. Here you will find how to get the best out of your system.

Chapter 5 will be of a great help in setting up remote external jobs introduced in Oracle 11g. How is this related to the old-fashioned local external jobs that we know since Oracle 10g and why we should get rid of the old external jobs? Get your answers here.

Chapter 6 helps the reader to get a firm grip on events and explains how to make good use of events. Events sound like voodoo, but in the end are an extra tool found in the Scheduler.

Chapter 7 considers the fact that when the jobs get more complicated, it gets harder to understand why something works differently than planned. This chapter gives the reader a fresh look at how to follow and debug Scheduler jobs.

Chapter 8 will give you some creative implementations of more or less common tasks—this time implemented using the Scheduler. This chapter gives a working code with clear explanations. This broadens the horizon and will take down the barriers that might exist between other environments and Oracle.

Chapter 9 shows how the Scheduler can be used in other configurations such as standby databases and RAC.

Chapter 10 shows how the Scheduler can be managed and monitored remotely through a web interface.

Events

So far, we have mostly used jobs that ran immediately upon being enabled, or when we called the run_job procedure of the dbms_scheduler package. Many jobs are time-based; they are controlled by a schedule based on some kind of calendar.

However, not everything in real life can be controlled by a calendar. Many things need an action on an ad hoc basis, depending on the occurrence of some other thing. This is called event-based scheduling. Events also exist as the outcome of a job. We can define a job to raise an event in several ways—when it ends, or when it ends in an error, or when it does not end within the expected runtime. Let’s start with creating job events in order to make job monitoring a lot easier for you.

In this chapter, we will see how events that are generated by a job or a chain step can be intercepted to enable the monitoring of jobs. After that, we will see how we can use events to start a job that is waiting for an event.

Monitoring job events

Most of the time when jobs just do their work as expected, there is not much to monitor. In most cases, the job controller has to fix application-specific problems for example, sometimes file systems or table spaces get filled up). To make this easier, we can incorporate events. We can make jobs raise events when something unexpected happens, and we can have the Scheduler generate events when a job runs for too long. This gives us tremendous power. We can also use this to make chains a little easier to maintain.

Events in chains

A chain consists of steps that depend on each other. In many cases, it does not make sense to continue to step 2 when step 1 fails. For example, when a create table fails, why try to load data into the nonexistent table? So it is logical to terminate the job if no other independent steps can be performed.

One of the ways to handle this is implementing error steps in the chain. This might be a good idea, but the disadvantage is that this quickly doubles the steps involved in the chain, where most of the steps—hopefully—will not be executed. Another disadvantage is that the chain becomes less maintainable. It’s a lot of extra code, and more code (mostly) gives us less oversight.

If a job chain has to be terminated because of a failure, using the option of creating an event handler to raise a Scheduler event is recommended instead of adding extra steps that try to tell which error possibly happened. This makes event notification a lot simpler because it’s all in separate code and not mixed up with the application code.

Another situation is when the application logic has to take care of steps that fail, and has well-defined countermeasures to be executed that make the total outcome of the job a success.

An example is a situation that starts with a test for the existence of a file. If the test fails, get it by FTP; and if this succeeds, load it into the database. In this case, the first step can fail and go to the step that gets the file. As there is no other action possible when the FTP action fails, this should raise a Scheduler event that triggers—for example—a notification action. The same should happen when the load fails.

In other third-party scheduling packages, I have seen these notification actions implemented as part of the chain definitions because they lack a Scheduler event queue. In such packages, messages are sent by mail in extra chain steps. In the Oracle Scheduler, this queue is present and is very useful for us. Compared to 10g, nothing has changed in 11g. An event monitoring package can de-queue from the SCHEDULER$_EVENT_QUEUE variable into a sys.scheduler$_event_info type variable. The definition is shown in the following screenshot:

1

What you can do with an event handler is up to your imagination. The following DB Console screenshot shows the interface that can be used to specify which events to raise:

2

It is easy to generate an event for every possible event listed above and have the handler decide (by the rules defined in tables) what to do. This does sound a little creepy, but it is not very different from having a table that can be used as a lookup for the job found in the event message where—most of the time—a notification mail is sent, or not sent. Sometimes, a user wants to get a message when a job
starts running; and most of the time, they want a message when a job ends.

In a chain, it is especially important to be able to tell in which step the event happened and what that step was supposed to do. In the event message, only the job name is present and so you have to search a bit to find the name of the step that failed.

For this, we can use the LOG_ID to find the step name in the SCHEDULER_JOB_LOGS (user/dba/all_SCHEDULER_JOB_LOG) view, where the step name is listed as JOB_SUBNAME. The following query can be used to find the step_name from the dba all_scheduler_log view, assuming that the event message is received in msg:

[java]
select job_subname from all_scheduler_job_log where log_id = msg.log_id;
[/java]

To enable the delivery of all the events a job can generate, we can set the raise_events attribute to a value of:

[java]
[java]dbms_scheduler.job_started + dbms_scheduler.job_succeeded +
dbms_scheduler.job_failed + dbms_scheduler.job_broken +
dbms_scheduler.job_completed + dbms_scheduler.job_stopped +
dbms_scheduler.job_sch_lim_reached + dbms_scheduler.job_disabled +
dbms_scheduler.job_chain_stalled

[/java]

Or in short, we can set it to: dbms_scheduler.job_all_events.

There are many things that can be called events. In the job system, there are basically two types of events: events caused by jobs (which we already discussed) and events that makes a job execute.

Event-based scheduling

On many occasions, a calendar will do fine for scheduling jobs. However, there are situations that require an immediate action and which cannot wait for the next activation based on a calendar. An example might be of a user who logs on to the database and then, using a logon trigger, more actions are executed.

Another example could be a situation in which we want a backup server to be utilized to the maximum, but not beyond that. We schedule all the backups independent of each other and have each backup raise an event when ready, which tells the system that another backup can go ahead. By letting the backup jobs wait for an event that essentially fl ags “there is backup capacity available now”, we make sure that a backup does not take longer than needed. We also make sure that the backup system is pushed to the highest throughput.

When we just use a preset date and time to start the backups, chances are that more backups are running at the same time (possibly caused by the growth of one or more databases, which is potentially causing their backups to be longer than anticipated). On the other hand, when we make sure that more backups are never ever run at the same time, we will likely have lots of idle time in the backup system.

This is a reason enough to learn how we can make good use of events. However, there are a few things we need to do. It essentially comes down to:

  • Creating a queue and defining a payload for that queue
  • Having a process that puts the message on the queue
  • Coupling one or more job definition(s) to the queue

Again, this gives a kind of control that is hard to find in third-party scheduling packages.

Event messages are placed on an event queue and this is handled by AQ. So we need to call the AQ packages and for that we require DBMS_AQ and DBMS_AQADM. In the days before Oracle 10g, we needed to set the AQ_TM_PROCESSES parameter to a non-zero value to work. Since Oracle 10g, this is no longer the case and we can leave the AQ_TM_PROCESSES value to zero.

First, make sure we can use AQ.

[java]
select grantee, privilege, table_name
from dba_tab_privs
where table_name in ( ‘DBMS_AQ’, ‘DBMS_AQADM’)
and grantee = ‘MARVIN’;
/
[/java]


The expected output is as shown in the following screenshot:

3

If this query does not show MARVIN having the EXECUTE privileges on both DBMS_AQ and DBMS_AQADM, we need to give them to our user.

As a DBA, execute the following:

[java]
grant execute on dbms_aq to marvin;
grant execute on dbms_aqadm to marvin;
grant select on dba_aq_agents to marvin;
grant create type to marvin;
alter user marvin quota unlimited on users;
–/
begin
dbms_aqadm.grant_system_privilege (‘ENQUEUE_ANY’, ‘marvin’, FALSE);
dbms_aqadm.grant_system_privilege (‘DEQUEUE_ANY’, ‘marvin’, FALSE);
dbms_aqadm.grant_system_privilege (‘MANAGE_ANY’, ‘marvin’, TRUE);
end;
/
[/java]

This makes sure that marvin has enough privileges to be able to create and use queues. Now connect as marvin, create an object type that we can use to put a message on the queue, and read from the queue later on.

[java]
connect marvin/panic
create or replace type bckup_msgt as object ( msg varchar2(20) )
/
[/java]

This defines a type consisting of one msg field of 20-character length. This is the type we will be using in the queue for which we create a queue table next:

[java]
–/
begin
dbms_aqadm.create_queue_table
(
queue_table => ‘bckup_qt’,
queue_payload_type => ‘bckup_msgt’,
multiple_consumers => TRUE
);
dbms_aqadm.create_queue
(
queue_name => ‘bckup_q’,
queue_table => ‘bckup_qt’
);
dbms_aqadm.start_queue ( queue_name => ‘bckup_q’ ) ;
end ;
/
[/java]

This creates a queue table called bckup_qt, which contains messages defined by bckup_msgt. After that, bckup_q starts immediately.

The following objects show up in the schema, which are created to support the queue table:

4

This also explains why MARVIN needs quota on his default tablespace.

The queue definitions part is ready. Now, we can tie a job to the queue. First, create a job as follows:

[java]
<code>–/
BEGIN
sys.dbms_scheduler.create_job
(
job_name => ‘"MARVIN"."BCKUP_01"’,
job_type => ‘EXECUTABLE’,
job_action => ‘/home/oracle/bin/rman.sh’,
event_condition => ‘tab.user_data.msg=”GO”’,
queue_spec => ‘"MARVIN"."BCKUP_Q"’,
start_date => systimestamp at time zone ‘Europe/Amsterdam’,
job_class => ‘"LONGER"’,
comments => ‘backup a database’,
auto_drop => FALSE,
number_of_arguments => 1,
enable => FALSE
);
sys.dbms_scheduler.set_attribute
(
name => ‘"MARVIN"."BCKUP_01"’,
attribute => ‘raise_events’,
value => dbms_scheduler.job_started +
dbms_scheduler.job_succeeded +
dbms_scheduler.job_failed +
dbms_scheduler.job_broken +
dbms_scheduler.job_completed +
dbms_scheduler.job_stopped +
dbms_scheduler.job_sch_lim_reached +
dbms_scheduler.job_disabled +
dbms_scheduler.job_chain_stalled
);
sys.dbms_scheduler.set_job_argument_value
(
job_name => ‘"MARVIN"."BCKUP_01"’,
argument_position => 1,
argument_value => ‘db_01’
);
DBMS_SCHEDULER.SET_ATTRIBUTE
(
name => ‘"MARVIN"."BCKUP_01"’,
attribute => ‘destination’,
value => ‘pantzer:15021’
);
DBMS_SCHEDULER.SET_ATTRIBUTE
(
name => ‘"MARVIN"."BCKUP_01"’,
attribute => ‘credential_name’,
value => ‘"MARVIN"."JOBS_CRED2"’
);
END;
/</code>
[/java]

This is just a simple remote external job that calls an RMAN script with an argument for the database to back up. As the backup will take longer than a few seconds, it looks obvious to put it in the job_class called LONGER that we defined a while ago. The queue that is coupled to this job is the queue we defined before. It is bckup_q as defined by the queue_spec parameter. As soon as the GO message appears in the payload of the queue, all of the jobs that listen to this queue and those waiting for this GO message will get started. The code listed for the MARVIN job can also be put together using DB Console. In the following Schedule screen, select Event as
Schedule Type:

5

As the job was not Enabled, it now looks like the following:

6

So, let’s enable the job:

[java]
–/
BEGIN
sys.dbms_scheduler.enable( ‘"MARVIN"."BCKUP_01"’ );
END;
/
[/java]

This produces the following:

7

The job is currently scheduled, but not on a date. All we need to do now is have someone put a GO message in the bckup_q.

[java]
–/
declare
my_msgid RAW(16);
props dbms_aq.message_properties_t;
enqopts dbms_aq.enqueue_options_t;
begin
sys.dbms_aq.enqueue(‘marvin.bckup_q’, enqopts, props,
marvin.bckup_msgt(‘GO’), my_msgid);
end;
/
commit;
[/java]

The result is that all of the jobs waiting for the GO message are started at the same time. With the health of the backup system in mind, it would be wiser to query the jobs view, find the backup job that was scheduled first, and give that job its specific event. In that case, the BCKUP_01 job will wait for the message BCKUP_01; and BCKUP_02 will wait for the message “BCKUP_02”.

Another option is that Oracle can allow us to define an event that is delivered to exactly one waiting job at a time. An enhancement request has been filed for this. It will make this kind of waiting a bit easier because normal queuing behavior is then saved. This means that things such as job priorities will be honored. When we define a separate event for every job, we have manual control but we cannot infl uence the selection order of the jobs in the queue, for example, by raising the priority of a job.

When a backup is ready, the backup system can handle the next backup. We can utilize the enqueue operation by putting our next GO message into the queue in the epilogue of the backup script. However, what will happen if the script crashes? The next backup will never be released. Again, a smarter location for this code would be in an event handler routine that just waits for termination messages from the Scheduler event queue. As soon as it sees the termination of a backup, it can decide to call in the next one by giving that waiting job a signal at its location.

Filed Under: Oracle Tagged With: Oracle

Oracle 10g/11g Data and Database Management Utilities

August 27, 2009 by Krishna Srinivasan Leave a Comment

Oracle 10g/11g Data and Database Management Utilities

Does your database seem complicated? Are you finding it difficult to work with it
efficiently? Database administration is part of a daily routine for all database
professionals. Using Oracle Utilities, administrators can benefit from improved
maintenance windows, optimized backups, faster data transfers, and more reliable
security, and can in general do more with the same time and resources.

You don’t have to reinvent the wheel, just learn how to use Oracle Utilities properly to
achieve your goals. That is what this book is about; it covers topics which are oriented
towards data management, session management, batch processing, massive deployment,
troubleshooting, and how to make the most out of frequently used DBA tools to improve
your daily work.

Data management is one of the most frequently required tasks; doing a backup is a mustdo
task for any company. Data management includes several tasks such as data transfers,
data uploading and downloading, reorganizing data, and data cloning, among many
others. If people learn to use a tool and things appear to go well, few will question if their
approach is optimal. Often it is only when maintenance windows start shrinking; due to
the ever increasing amount of data and need for business availability, that problems with
any particular approach get identified. People tend to get used to using the old
export/import utilities to perform data management and if it works, they probably will
consider the problem solved and continue to use an obsolete tool. This book explores
further possibilities and new tools. It makes the user question if his/her current
environment is optimized and teaches the reader how to adopt more optimized data
management techniques focusing on the tools and requirements most frequently seen in
modern production environments.

What This Book Covers

Chapter 1 deals with Data Pump. Data Pump is a versatile data management tool. It is
much more than just an exp/imp upgrade; it allows remapping, dump file size estimation,
restartable tasks, network transfers, advanced filtering operations, recovering data after a
commit has been issued, and transferring data files among different oracle versions. It
includes a PL/SQL API so it can be used as a base to develop data pump-based systems.

Chapter 2 involves a description of the SQL*Loader. It describes how SQL* Loader
is the tool to upload plain text format files to the database. If SQL* Loader properly
configured, you can greatly increase the speed with which uploads are completed.
Loading data to take care of the character set will avoid unnecessary headaches, and
you can optimize your loading window. There are several tips and tricks to load
different character sets to the database and load binary data to BLOB fields. This tool
can be used to load data on the fl y and you will learn how to proactively configure it to
get a smooth load.

Chapter 3 is all about External Tables. The external table is a concept Oracle introduced
in 9i to ease the ETL (Extraction Transformation and Loading) DWH process. An
external table can be created to map an external file to the database so you can seamlessly
read it as if it was a regular table. You can extend the use of the external tables concept
to analyze log files such as the alert.log or the network log files inside the database.
The external table concept can be implemented with the Data Pump drivers; this way you
can easily and selectively perform data transfers among databases spanning different
Oracle versions.

Chapter 4 specializes in advanced techniques involved in optimizing the Recovery
Manager. Recovery Manager can be optimized to minimize the impact in production
environments; or it can run faster using parallel techniques. It can be used to clone a
database on the same OS or transport it over different platforms, or even change the
storage method between ASM and conventional file system storage and vice versa.

Chapter 5 talks about the Recovery Manager. Recovery manager first appeared back
in 8.0, but it was not until 9i that it began to gain popularity among DBAs as the default
backup/recover tool. It is simple and elegant and the most frequently used commands
are pretty simple and intuitive. This chapter presents several practical database
recovery scenarios.

Chapter 6 is about Session Management. The users are the main reason why the DBA
exists. If it were not for the users, there would be no database activity and there would be
no problems to be solved. How can you easily spot a row lock contention problem? What
should be done to diagnose and solve this problem? What does it mean to kill a user
session? Managing sessions means you can regulate them by means of Oracle profiles;
this may sooner or later lead to snipped sessions; what are those snipped sessions? How
do you get rid of them? This chapter discusses several user session management issues.

Chapter 7 talks about the Oracle Scheduler. The Oracle Scheduler is a powerful tool used
to schedule tasks in Oracle. This tool can perform simple schedules as well as complex
schedules; you need to understand time expressions and the Oracle scheduler architecture
to take advantage of this utility.

Chapter 8 will teach you about Oracle Wallet Manager. Oracle Wallet Manager is the
cornerstone and entry point for advanced security management. You can use it to manage
certificates and certificate requests. You can store identity certificates and retrieve them
from a central location, or you can use the registry in a Windows environment. You can
hide passwords without using OS Authentication mechanisms by storing the user
password inside the wallet.

Chapter 9 deals with security of the system. Most people worry about having a valid
backup that can be used to effectively recover data, but not all of them are concerned
about the backup security; if a backup can be used to recover data, this doesn’t actually
mean the data will be recovered at the same site where it was taken from. OWM is a tool
which can be used to have the backup encrypted, so sensitive data can be secured not
only from the availability point of view, but also from the confidentiality point of view.
Security has to do also with identifying who the real user is; this can be achieved with the
enterprise user. This chapter explains step by step how to set up an environment with
enterprise identity management using the Enterprise Security Manager.

Chapter 10 talks about Database Configuration Assistant. Creating a database is one
of the first tasks the user performs when installing Oracle, but this tool goes far beyond
the simple task of creating the database; it can be used to manage templates, create a
database in silent mode, and configure services in an RAC environment. Configuring
database options and enabling the Enterprise Manager DB Control can be done here.
DBCA is also the easy way to start up and configure an Automatic Storage Management
(ASM) environment.

Chapter 11 provides details about the Oracle Universal Installer. Installing Oracle is
more than just a next → next button pressing activity; OUI is a tool to manage software.
Most people care about database backups, as well as configuration file backups, but what
about the Oracle installer repository? This set of files is most often underestimated unless
a hardware failure makes the DBA understand what Oracle software maintenance is. OUI
can perform silent and batch installations; it can also perform installations from a central
software depot accessible through the Web.

Chapter 12 is about the Enterprise Manager Configuration Assistant. Most DBAs use EM
as the basic DBA administration tool; it is a very intuitive database management console.
Most people depend on it to easily perform most of the administration and operation tasks
that otherwise would be time consuming to complete through character console mode.
But what happens when it is not available, either because of a change in the network
topology or a firewall that restricts access to the managing port? Then the user needs to
have the console reconfigured to bring it back into operation. EMCA is the character
mode tool used to perform this task.

Chapter 13 talks about OPatch. Patching the RDBMS is required to keep the software up
to date. When a patchset is to be applied OUI is used, but when a single patch or a CPU is
to be applied OPatch must be used. You will learn how to perform a basic patch
application task, list the patch inventory, find out if a patch has already been applied,
maintain the software and the software inventory, and learn how and when to perform a
patch application while the database is up and running.

External Tables

When working in data warehouse environments, the Extraction—Transformation—
Loading (ETL)
cycle frequently requires the user to load information from external
sources in plain file format, or perform data transfers among Oracle database in
a proprietary format. This requires the user to create control files to perform the
load. As the format of the source data regularly doesn’t fit the one required by the
Data Warehouse, a common practice is to create stage tables that load data into the
database and create several queries that perform the transformation from this point
on, to take the data to its final destination.

A better approach, would be to perform this transformation ‘on the fl y’ at load time.
That is what External Tables are for. They are basically external files, managed either
by means of the SQL*Loader or the data pump driver, which from the database
perspective can be seen as if they were regular read only tables.

This format allows the user to think about the data source as if the data was already
loaded into its stage table. This lets the user concentrate on the queries to perform
the transformation, thus saving precious time during the load phase.

External Tables can serve not only as improvements to the ETL process, but also as
a means to manage database environments, and a means of reducing the complexity
level of data management from the user’s point of view.

The External Table basics

An External Table is basically a file that resides on the server side, as a regular fl at
file or as a data pump formatted file. The External Table is not a table itself; it is
an external file with an Oracle format and its physical location. This feature first
appeared back in Oracle 9i Release 1 and it was intended as a way of enhancing the
ETL process by reading an external fl at file as if it was a regular Oracle table. On its
initial release it was only possible to create read-only External Tables, but, starting
with 10g—it is possible to unload data to External Tables too.

In previous 10g Releases there was only the SQL*Loader driver could be used to
read the External Table, but from 10g onwards it is now possible to load the table
by means of the data pump driver. The kind of driver that will be used to read the
External Table is defined at creation time. In the case of ORACLE_LOADER it is the same
driver used by SQL*Loader. The fl at files are loaded in the same way that a fl at file
is loaded to the database by means of the SQL*Loader utility, and the creation script
can be created based on a SQL*Loader control file. In fact, most of the keywords that
are valid for data loading are also valid to read an external fl at file table.

The main differences between SQL*Loader and External Tables are:

  • When there are several input datafiles SQL*Loader will generate a bad file
    and a discard file for each datafile.
  • The CONTINUEIF and CONCATENATE keywords are not supported by
    External Tables.
  • The GRAPHIC, GRAPHIC EXTERNAL, and VARGRAPHIC are not supported for
    External Tables.
  • LONG, nested tables, VARRAY, REF, primary key REF, and SID are
    not supported.
  • For fields in External Tables the character set, decimal separator, date mask
    and other locale settings are determined by the database NLS settings.
  • The use of the backslash character is allowed for SQL*Loader, but for
    External Tables this would raise an error. External Tables must use quotation
    marks instead.

    For example:
    SQL*Loader
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY "\"
    External Tables
    TERMINATED BY ',' ENCLOSED BY "'"
    

A second driver is available, the ORACLE_DATAPUMP access driver , which uses
the Data Pump technology to read the table and unload data to an External Table.
This driver allows the user to perform a logical backup that can later be read back
to the database without actually loading the data. The ORACLE_DATAPUMP access
driver utilizes a proprietary binary format for the external file, so it is not possible
to view it as a fl at file.

Let’s setup the environment

Let’s create the demonstration user, and prepare its environment to create an
External Table. The example that will be developed first refers to the External Table
using the ORACLE_LOADER driver.


	create user EXTTABDEMO
		identified by ORACLE
		default tablespace USERS;
	alter user exttabdemo
		quota unlimited on users;
	grant   CREATE SESSION,
			CREATE TABLE,
			CREATE PROCEDURE,
			CREATE MATERIALIZED VIEW,
			ALTER SESSION,
			CREATE VIEW,
			CREATE ANY DIRECTORY
	to EXTTABDEMO;

A simple formatted spool from this query will generate the required external
table demonstration data. The original source table is the demonstration
HR.EMPLOYEES table.


	select
		EMPLOYEE_ID ||','||
		DEPARTMENT_ID ||','||
		FIRST_NAME ||','||
		LAST_NAME ||','||
		PHONE_NUMBER ||','||
		HIRE_DATE ||','||
		JOB_ID ||','||
		SALARY ||','||
		COMMISSION_PCT ||','||
		MANAGER_ID ||','||
		EMAIL
	from    HR.EMPLOYEES
	order by   EMPLOYEE_ID

The above query will produce the following sample data:


	The External Table directory is defined inside the database by
	means of a DIRECTORY object. This object is not validated at
	creation time, so the user must make sure the physical directory
	exists and the oracle OS user has read/write privileges on it.

$ mkdir $HOME/external_table_dest
SQL> CREATE DIRECTORY EXTTABDIR AS '/home/oracle/external_table_dest';

The above example was developed in a Linux environment, on Windows platforms
the paths will need to be changed to correctly refl ect how Oracle has been set up.

Now, the first External Table can be created.

A basic External Table

Here is the source code of the External Table creation.

The create table command syntax is just like any other regular table creation (A), (B),
up to the point where the ORGANIZATION EXTERNAL (C) keyword appears,
this is the point where the actual External Table definition starts. In this case the
External Table is accessed by the ORACLE_LOADER driver (D). Next, the external
fl at file is defined, and here it is declared the Oracle DIRECTORY (E) where the fl at
file resides. The ACCESS PARAMETERS (F) section specifies how to access the fl at
file and it declares whether the file is a fixed or variable size record, and which other
SQL*Loader loading options are declared. The LOCATION (H) keyword defines the
name of the external data file. It must be pointed out that as this is an External Table
managed by the SQL_LOADER driver the ACCESS_PARAMETERS section must
be defined, in the case of External Tables based on the DATAPUMP_DRIVER this
section is not required.

The columns are defined only by name (G), not by type. This is permitted from the
SQL*Loader perspective, and allows for dynamic column definition. This column
schema definition is more fl exible, but it has a drawback—data formats such as those
in DATE columns must match the database date format, otherwise the row will be
rejected. There are ways to define the date format working around this requirement.
Assuming the date column changes from its original default format mask
“DD-MON-RR” to “DD-MM-RR”, then the column definition must change from a
simple CHAR column to a DATE with format mask column definition.


	Original format:
		"HIRE_DATE" CHAR(255)
	Changed format:
		"HIRE_DATE" DATE "DD-MM-RR"

		When working with an External Table, the access parameter is
		not validated at creation time, so if there are malformed rows,
		or if there are improperly defined access parameters, an error is
		shown, similar to the one below.

	ERROR at line 1:
	ORA-29913: error in executing ODCIEXTTABLEFETCH callout
	ORA-30653: reject limit reached
	ORA-06512: at "SYS.ORACLE_LOADER", line 52

Once the data is created and all required OS privileges have been properly validated,
the data can be seen from inside the database, just as if it were a regular Oracle table.

This table is read only, so if the user attempts to perform any DML operation against
it, it will result in this error:


SQL> delete ext_employees;
delete ext_employees
		*
ERROR at line 1:
ORA-30657: operation not supported on external organized table

As the error message clearly states, this kind of table is only useful for read
only operations.

This kind of table doesn’t support most of the operations available for regular tables,
such as index creation, and statistics gathering, and these types of operations will
cause an ORA-30657 error too. The only access method available for External Tables
is Full Table Scan, so there is no way to perform a selective data retrieval operation.

The External Tables cannot be recovered, they are just metadata definitions stored
in the dictionary tables. The actual data resides in external files, and there is no
way to protect them with the regular backup database routines, so it is the user’s
sole responsibility to provide proper backup and data management procedures. At
the database level the only kind of protection the External Table receives is at the
metadata level, as it is an object stored as a definition at the database dictionary level.
As the data resides in the external data file, if by any means it were to be corrupted,
altered, or somehow modified, there would be no way to get back the original data.
If the external data file is lost, then this may go unnoticed, until the next SELECT
operation takes place.

This metadata for an External Table is recorded at the {USER | ALL | DBA}_TABLES
view, and as this table doesn’t actually require physical database storage, all storage
related columns appear as null, as well as the columns that relate to the statistical
information. This table is described with the {USER | ALL | DBA}_EXTERNAL_TABLES
view, where information such as the kind of driver access,
the reject_limit, and the access_parameters, amongst others, are described.


SQL> DESC USER_EXTERNAL_TABLES
	
Name Null? Type
-------------------------- ------------ --------------
TABLE_NAME NOT NULL VARCHAR2(30)
TYPE_OWNE CHAR(3)
TYPE_NAME NOT NULL VARCHAR2(30)
DEFAULT_DIRECTORY_OWNER CHAR(3)
DEFAULT_DIRECTORY_NAME NOT NULL VARCHAR2(30)
REJECT_LIMIT VARCHAR2(40)
ACCESS_TYPE VARCHAR2(7)
ACCESS_PARAMETERS VARCHAR2(4000)
PROPERTY VARCHAR2(10)

This is the first basic External Table, and as previously shown, its creation is pretty
simple. It allows external data to be easily accessed from inside the database,
allowing the user to see the external data just as if it was already loaded inside
a regular stage table.

Pages: 1 2 3

Filed Under: Oracle

Follow Us

  • Facebook
  • Pinterest

As a participant in the Amazon Services LLC Associates Program, this site may earn from qualifying purchases. We may also earn commissions on purchases from other retail websites.

JavaBeat

FEATURED TUTORIALS

Answered: Using Java to Convert Int to String

What is new in Java 6.0 Collections API?

The Java 6.0 Compiler API

Copyright © by JavaBeat · All rights reserved