JavaBeat

  • Home
  • Java
    • Java 7
    • Java 8
    • Java EE
    • Servlets
  • Spring Framework
    • Spring Tutorials
    • Spring 4 Tutorials
    • Spring Boot
  • JSF Tutorials
  • Most Popular
    • Binary Search Tree Traversal
    • Spring Batch Tutorial
    • AngularJS + Spring MVC
    • Spring Data JPA Tutorial
    • Packaging and Deploying Node.js
  • About Us
    • Join Us (JBC)
  • Privacy

Connect To MySQL With JDBC Driver

November 11, 2013 by Krishna Srinivasan Leave a Comment

Here is an example to connect your JDBC code to MySQl database. You have to download mysql.jar for the JDBC driver class from here.

[code]
Class.forName("com.mysql.jdbc.Driver");
Connection conn = null;
conn = DriverManager.getConnection("jdbc:mysql://hostname:port/dbname","username", "password");
conn.close();
[/code]

 

[code lang=”java”]
package javabeat.net.db;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;

public class MySQLConnectionExample {
public static void main(String[] argv) {

try {
Class.forName("com.mysql.jdbc.Driver");
} catch (ClassNotFoundException e) {
System.out.println("Problem in loading the MySQL Driver class!!!");
e.printStackTrace();
return;
}

Connection connection = null;

try {
//Get the connection object
connection = DriverManager
.getConnection("jdbc:mysql://<host>:3306/<dbname>","<username>", "<password>");

} catch (SQLException e) {
System.out.println("Problem in establishing connection!!");
e.printStackTrace();
return;
}

if (connection != null) {
System.out.println("Connection created successfully!!");
} else {
System.out.println("Problem in establishing connection!!");
}
}
}

[/code]

If you are not copying the mysql.jar file in the classpath, you will get the following error.

[code]
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:188)
at javabeat.net.db.MySQLConnectionExample.main(MySQLConnectionExample.java:11)
[/code]

Filed Under: MYSQL Tagged With: JDBC, MySQL

How to perform indexing in MySQL?

October 28, 2010 by Krishna Srinivasan Leave a Comment

MySQL Admin Cookbook


MySQL is the most popular open-source database and is also known for its easy set up
feature. However, proper configuration beyond the default settings is still a challenge,
along with some other day-to-day maintenance tasks such as backup and restoring,
performance tuning, and server monitoring.


This book provides both step-by-step recipes and relevant background information on
these topics and more. It covers everything from basic to advanced aspects of MySQL
administration and configuration. All recipes are based on real-world experience and
were derived from proven solutions used in an enterprise environment.


What This Book Covers


Chapter 1, Replication: In this chapter, you will see how to set up MySQL replication,
useful for load balancing, online backups, and fail-over scenarios. Advanced replication
scenarios using the blackhole engine and streaming slave deployment are discussed
beyond the basic topics.


Chapter 2, Indexing: You will be shown how to create, drop, and modify indexes,
perhaps the most important means of optimizing your MySQL servers’ performance.
Fulltext indexing, clustered and non-clustered indexes are compared and presented with
their respective strengths and typical use cases. Moreover, you will learn how to identify
duplicate indexes, which can hinder your servers’ performance.


Chapter 3, Tools: This chapter will get you acquainted with the MySQL Administrator
and Query Browser GUiTools as well as the MySQL command-line client and how to
use it in concert with external scripts and tools. You will also see how to create custom
diagrams for MySQL Administrator and share connection profiles between
multiple computers.


Chapter 4, Backing Up and Restoring MySQL Data: In this chapter, we introduce the
basic approaches to backing up your database and restoring data again. Advanced
techniques like on-the-fly compression, point in time recovery, avoiding extended lock
situations, backup in replication scenarios, and partial backup and restore
are also covered.


Chapter 5, Managing Data: You will learn some tricks beyond the basic SQL
commands, which enable you to delete data in a highly efficient manner and insert data
based on existing database content, and how to import and export data to and
from your database.


Chapter 6, Monitoring and Analyzing a MySQL Installation: We present approaches to
monitoring table space usage, and how to use database metadata to your advantage.
Typical performance bottlenecks and lock contention problems are discussed as well.


Chapter 7, Configuring MySQL: This chapter deals with MySQL configuration and
how to best leverage available settings to their full potential. Table space management,
pool sizing, and logging options are discussed along with platform-specific caveats and
advanced installation scenarios, such as multiple instances on one server.


Chapter 8, MySQL User Management: Management of MySQL user accounts is
discussed in detail throughout this chapter. Typical user roles with appropriate privileges
and approaches to restricting access sensibly are proposed. You will also learn how to
regain access to your database in case the administrative user credentials are lost.


Chapter 9, Managing Schemas: This chapter includes topics such as adding and
removing columns to and from tables and choosing a suitable storage engine and
character set for individual needs. Another recipe covers a technique to add a new
primary key column to a table already filled with data. Ways to manage and automate
database schema evolution, as part of a software life cycle are presented as well. And if
you have always missed “ADD INDEX IF NOT EXISTS”, you will find a
solution to this, too.


Appendix, Good to Know: In this final part of the book you can find several things that
can turn out useful in everyday situations, but did not fit the step-by-step recipe format
naturally. Topics range from choosing character sets to getting the most out of 32 bit
address space limitations.


Indexing


In this chapter, we will cover:



  • Adding indexes to tables

  • Adding a fulltext index

  • Creating a normalized text search column

  • Removing indexes from tables

  • Estimating InnoDB index space requirements

  • Using prefix primary keys

  • Choosing InnoDB primary key columns

  • Speeding up searches for (sub)domains

  • Finding duplicate indexes


Introduction


One of the most important features of relational database management systems—MySQL
being no exception—is the use of indexes to allow rapid and efficient access to the enormous
amounts of data they keep safe for us. In this chapter, we will provide some useful recipes for
you to get the most out of your databases.


Infinite storage, infinite expectations


We have got accustomed to nearly infinite storage space at our disposal—storing everything
from music to movies to high resolution medical imagery, detailed geographical information,
or just plain old business data. While we take it for granted that we hardly ever run out of
space, we also expect to be able to locate and retrieve every bit of information we save in an
instant. There are examples everywhere in our lives—business and personal:



  • Your pocket music player’s library can easily contain tens of thousands of songs and
    yet can be browsed effortlessly by artist name or album title, or show you last week’s
    top 10 rock songs.

  • Search engines provide thousands of results in milliseconds for any arbitrary search
    term or combination.

  • A line of business application can render your sales numbers charted and displayed
    on a map, grouped by sales district in real-time.


These are a few simple examples, yet for each of them huge amounts of data must be
combed to quickly provide just the right subset to satisfy each request. Even with the
immense speed of modern hardware, this is not a trivial task to do and requires some
clever techniques.


Speed by redundancy


Indexes are based on the principle that searching in sorted data sets is way faster than
searching in unsorted collections of records. So when MySQL is told to create an index on one
or more columns, it copies these columns’ contents and stores them in a sorted manner. The
remaining columns are replaced by a reference to the original table with the unsorted data.


This combines two benefits—providing fast retrieval while maintaining reasonably efficient
storage requirements. So, without wasting too much space this approach enables you to
create several of those indexes (or indices, both are correct) at a relatively low cost.


However, there is a drawback to this as well: while reading data, indexes allow for immense
speeds, especially in large databases; however, they do slow down writing operations. In the
course of INSERTs, UPDATEs, and DELETEs, all indexes need to be updated in addition to
the data table itself. This can place significant additional load on the server, slowing down
all operations.


For this reason, keeping the number of indexes as low as possible is paramount, especially for
the largest tables where they are most important. In this chapter, you’ll find some recipes that
will help you to decide how to define indexes and show you some pitfalls to avoid.


Storage engine differences


We will not go into much detail here about the differences between the MyISAM and the
InnoDB storage engines offered by MySQL. However, regarding indexes there are some
important differences to know between MySQL’s two most important storage engines. They
infl uence some decisions you will have to make.


MyISAM


In the figure below you can see a simplified schema of how indexes work with the MyISAM
storage engine. Their most important property can be summed up as “all indexes are created
equal”. This means that there is no technical difference between the primary and
secondary keys.



The diagram shows a single (theoretical) data table called books. It has three columns
named isbn, title, and author. This is a very simple schema, but it is sufficient for explanation
purposes. The exact definition can be found in the Adding indexes to tables recipe in this
chapter. For now, it is not important.


MyISAM tables store information in the order it is inserted. In the example, there are three
records representing a single book each. The ISBN number is declared as the primary key for
this table. As you can see, the records are not ordered in the table itself—the ISBN numbers
are out of what would be their lexical order. Let’s assume they have been inserted by someone
in this order.


Now, have a look at the first index—the PRIMARY KEY. The index is sorted by the isbn column.
Associated with each index entry is a row pointer that leads to the actual data record in the
books table. When looking up a specific ISBN number in the primary key index, the database
server follows the row pointer to retrieve the remaining data fields. The same holds true for
the other two indexes IDX_TITLE and IDX_AUTHOR, which are sorted by the respective fields
and also contain a row pointer each.


Looking up a book’s details by any one of the three possible search criteria is a two-part
operation: first, find the index record, and then follow the row pointer to get the rest of the data.


With this technique you can insert data very quickly because the actual data records are
simply appended to the table. Only the relatively small index records need to be kept in order,
meaning much less data has to be shuffled around on the disk.


There are drawbacks to this approach as well. Even in cases where you only ever want to look
up data by a single search column, there will be two accesses to the storage subsystem—one
for the index, another for the data.


InnoDB


However, InnoDB is different. Its index system is a little more complicated, but it has
some advantages:



Primary (clustered) indexes


Whereas in MyISAM all indexes are structured identically, InnoDB makes a distinction between
the primary key and additional secondary ones.


The primary index in InnoDB is a clustered index. This means that one or more columns
of each record make up a unique key that identifies this exact record. In contrast to other
indexes, a clustered index’s main property is that it itself is part of the data instead of being
stored in a different location. Both data and index are clustered together.


An index is only serving its purpose if it is stored in a sorted fashion. As a result, whenever you
insert data or modify the key column(s), it needs to be put in the correct location according to
the sort order. For a clustered index, the whole record with all its data has to be relocated.


That is why bulk data insertion into InnoDB tables is best performed in correct primary key
order to minimize the amount of disk I/O needed to keep the records in index order. Moreover,
the clustered index should be defined so that it is hardly ever changed for existing rows, as
that too would mean relocating full records to different sectors on the disk.


Of course, there are significant advantages to this approach. One of the most important
aspects of a clustered key is that it actually is a part of the data. This means that when
accessing data through a primary key lookup, there is no need for a two-part operation as
with MyISAM indexes. The location of the index is at the same time the location of the data
itself—there is no need for following a row pointer to get the rest of the column data, saving
an expensive disk access.


Looking up a book by ISBN in our example table simply means locating it quickly, as it is
stored in order, and then reading the complete record in one go.


Secondary indexes


Consider if you were to search for a book by title to find out the ISBN number. An index on
the name column is required to prevent the database from scanning through the whole
(ISBN-sorted) table. In contrast to MyISAM, the InnoDB storage engine creates secondary
indexes differently.


Instead of record pointers, it uses a copy of the whole primary key for each record to establish
the connection to the actual data contents.


In the previous figure, have a look at the IDX_TITLE index. Instead of a simple pointer to the
corresponding record in the data table, you can see the ISBN number duplicated as well. This
is because the isbn column is the primary key of the books table. The same goes for the other
indexes in the figure—they all contain the book ISBN number as well. You do not need to (and
should not) specify this yourself when creating and indexing on InnoDB tables, it all happens
automatically under the covers.


Lookups by secondary index are similar to MyISAM index lookups. In the first step, the index
record that matches your search term is located. Then secondly, the remaining data is
retrieved from the data table by means of another access—this time by primary key.


As you might have figured, the second access is optional, depending on what information you
request in your query. Consider a query looking for the ISBN numbers of all known issues of
Moby Dick:



SELECT isbn FROM books WHERE title LIKE ‘Moby Dick%’;


Issued against a presumably large library database, it will most certainly result in an index
lookup on the IDX_TITLE key. Once the index records are found, there is no need for another
lookup to the actual data pages on disk because the ISBN number is already present in the
index. Even though you cannot see the column in the index definition, MySQL will skip the
second seek saving valuable I/O operations.


But there is a drawback to this as well. MyISAM’s row pointers are comparatively small. The
primary key of an InnoDB table can be much bigger—the longer the key, the more the data
that is stored redundantly.


In the end, it can often be quite difficult to decide on the optimal balance between increased
space requirements and maintenance costs on index updates. But do not worry; we are going
to provide help on that in this chapter as well.


General requirements for the recipes in this chapter


All the recipes in this chapter revolve around changing the database schema. In order to add
indexes or remove them, you will need access to a user account that has an effective INDEX
privilege or the ALTER privilege on the tables you are going to modify.
While the INDEX privilege allows for use of the CREATE INDEX command, ALTER is required
for the ALTER TABLE ADD INDEX syntax. The MySQL manual states that the former is mapped
to the latter automatically. However, an important difference exists: CREATE INDEX can only
be used to add a single index at a time, while ALTER TABLE ADD INDEX can be used to add
more than one index to a table in a single go.


This is especially relevant for InnoDB tables because up to MySQL version 5.1 every change
to the definition of a table internally performs a copy of the whole table. While for small
databases this might not be of any concern, it quickly becomes infeasible for large tables due
to the high load copying may put on the server. With more recent versions this might have
changed, but make sure to consult your version’s manual.


In the recipes throughout this chapter, we will consistently use the ALTER TABLE ADD INDEX
syntax to modify tables, assuming you have the appropriate privileges. If you do not, you will
have to rewrite the statements to use the CREA TE INDEX syntax.

Pages: 1 2 3 4

Filed Under: MYSQL

Creating your MySQL Database

September 30, 2009 by Krishna Srinivasan Leave a Comment

Creating your MySQL Database

MySQL, launched in 1995, has become the most popular open source database system. The
popularity of MySQL and phpMyAdmin has allowed many non-IT specialists to build dynamic
websites with a MySQL backend. This book is a short but complete guide showing beginners how
to design good data structures for MySQL. It teaches how to plan the data structure and how to
implement it physically using MySQL’s model.

What This Book Covers

Chapter 1 introduces the concept of MySQL, and discusses MySQL’s growing popularity and its
impact as a powerful tool. This chapter gives us a brief overview of the relational models and
Codd’s rules, which are required for designing purposes. A brief introduction to our case study —
“car dealer” is provided at the end.

Chapter 2 shows how to deal with the raw data information that comes from the users or other
sources, and the techniques that can help us build a comprehensive data collection. Also, this
chapter covers the exact limits of the analyzed system, how one should gather documents, and
interview activities for our case study.

Chapter 3 emphasises on transforming the data elements gathered in the collection process into a
cohesive set of column names. The concept of data naming is also discussed in this chapter.

Chapter 4 provides the technique of grouping column names into tables. Rules for table layout, the
concepts such as primary key, unique key, data redundancy, and data dependency are covered in
this chapter.

Chapter 5 presents various techniques for improving our data structure in terms of security,
performance, and documentation. The final data structure for the car dealer’s case study is
provided at the end.

Chapter 6 covers a supplemental case study about an airline system. This case study involves
various steps such as gathering documents, preparing preliminary list of data elements, preparing a
list of tables, sample values, and queries for the airline system..

Data Naming

In this chapter, we focus on transforming the data elements gathered in the collection
process into a cohesive set of column names. Although this chapter has sections for
the various steps we should accomplish for efficient data naming, there is no specific
order in which to apply those steps. In fact, the whole process is broken down into
steps to shed some light on each one in turn, but the actual naming process applies
all those steps at the same time. Moreover, the division between the naming and
grouping processes is somewhat artificial – you’ll see that some decisions about
naming infl uence the grouping phase, which is the subject of the next chapter.

Data Cleaning

Having gathered information elements from various sources, some cleaning work is
appropriate to improve the significance of these elements. The way each interviewee
named elements might be inconsistent; moreover, the significance of a term can vary
from person to person. Thus, a synonym detection process is in order.

Since we took note of sample values, now it is time to cross-reference our list of
elements with those sample values. Here is a practical example, using the car’s
id number.

When the decision is made to order a car – a Mitsou 2007 – the office clerk opens
a new file and assigns a sequential number dubbed car_id number to the file, for
instance, 725. At this point, no confirmation has been received from any car supplier,
so the clerk does not know the future car’s serial number – a unique number stamped
on the engine and other critical parts of the vehicle.

This car’s id number is referred to as the car_number by the office clerk. The store
assistants who register car movements use the name stock_number. But using this
car number or the stock number is not meaningful for financing and insurance
purposes; the car’s serial number is used instead for that purpose.

At this point, a consensus must be reached by convincing users about the importance
of standard terms. It must become clear to everyone that the term car_number is not
precise enough to be used, so it will be replaced by car_internal_number in the
data elements list, probably also in any user interface (UI) or report.

It can be argued that car_internal_number should be replaced by something more
appropriate; the important point here is we merged two synonyms: car_number and
stock_number, and established the difference between two elements that looked
similar but were not, eliminating a source of confusion.

Therefore we end up with the following elements:

  • Car_serial_number
  • Car_internal_number (former car id number and stock number)

Eventually, when dealing with data grouping, another decision will have to be taken:
to which number, serial or internal, do we associate the car’s physical key number.

Subdividing Data Elements

In this section, we try to find out if some elements should be broken into more simple
ones. The reason for doing so is that, if an element is composed of many parts,
applications will have to break it for sorting and selection purposes. Thus it’s better
to break the elements right now at the source. Recomposing it will be easier at the
application level.

Breaking the elements provides more clarity at the UI level. Therefore, at this
level we will avoid (as much as possible) the well-known last-name/first-name
inversion problem.

As an example for this problem, let’s take the buyer’s name. During the interview, we
noticed that the name is expressed in various ways on the forms:

Form How the name is expressed
Delivery certificate Mr Joe Smith
Sales contract Smith, Joe

We notice that

  • There is a salutation element, Mr
  • The element name is too imprecise; we really have a first name and a last name
  • On the sales contract, the comma after our last name should really be
    excluded from the element, as it’s only a formatting character

As a result, we determine that we should sub-divide the name into the following
elements:

  • Salutation
  • First name
  • Last name

Sometimes it’s useful to sub-divide an element, sometimes it’s not. Let’s consider
the date elements. We could sub-divide each one into year, month, and day (three
integers) but by doing so, we would lose the date calculation possibilities that
MySQL offers. Among those are, finding the week day from a date, or determining
the date that falls thirty days after a certain date. So for the date (and time), a single
column can handle it all, although at the UI level, separate entry fields should be
displayed for year, month, and day. This is to avoid any possibility of mix-up and
also because we cannot expect users to know about what MySQL accepts as a valid
date. There is a certain latitude in the range of valid values but we can take it for
granted that users have unlimited creativity, regarding how to enter invalid values.
If a single field is present on the UI, clear directions should be provided to help
with filling this field correctly.

Data Elements Containing Formatting Characters

The last case we’ll examine is the phone number. In many parts of the world, the
phone number follows a specific pattern and also uses formatting characters for
legibility. In North America, we have a regional code, an exchange number, and
phone number, for example, 418-111-2222; an extension could possibly be appended
to the phone number. However, in practice only the regional code and extension
are separated from the rest into data elements of their own. Moreover, people often
enter formatting characters like (418) 111-2222 and expect those to be output back.
So, a standard output format must be chosen, and then the correct number of
sub-elements will have to be set into the model to be able to recreate the
expected output.

Data that are Results

Even though it might seem natural to have a distinct element for the total_price of
the car, in practice this is not justified. The reason is that the total price is a computed
result. Having the total price printed on a sales contract constitutes an output. Thus,
we eliminate this information in the list of column names. For the same reason, we
could omit the tax column because it can be computed.

By removing the total price column, we could encounter a pitfall. We have to be sure
that we can reconstruct this total price from other sub-total elements, now and in the
future. This might not be possible for a number of reasons:

  • The total price includes an amount located in another table, and this table
    will change over time (for example, the tax rate). To avoid this problem, see
    the recommendations in the Scalability over Time section in Chapter 4.
  • This total price contains an arbitrary value, due to some exceptional cases,
    for example, where there is a special sale, and the rebate was not planned
    in the system, or when the lucky buyer is the brother-in-law of the general
    manager! In this case, a decision can be made: adding a new column
    other_rebate.

Data as a Column’s or Table’s Name

Now is the time to uncover what is perhaps the least known of the data naming
problems: data hidden in a column’s or even a table’s name.

We had one example of this in Chapter 1. Remember the qty_2006_1 column name.
Although this is a commonly seen mistake, it’s a mistake nonetheless. We clearly
have two ideas here, the quantity and the date. Of course, to be able to use just two
columns, some work will have to be done regarding the keys – this is covered in
Chapter 4. For now, we should just use elements like quantity and date in our
elements list, avoiding representing data in a column’s name.

To find those problematic cases in our model, a possible method is to look for
numbers. Column names like address1, address2 or phone1, phone2 should
look suspicious.

Now, have a look in Chapter 2 at the data elements we got from our store assistant.
Can you find a case of data being hidden in a column name?

If you have done this exercise, you might have found many past participles hidden
into the column names, like ordered, arrived, and washed. These describe the events
that happen to a car. We could try to anticipate all possible events but it might prove
impossible. Who knows when a new column car_provided_with_big_ribbon will
be needed? Such events, if treated as distinct column names, must be addressed by

  • A change in the data structure
  • A change in the code (UI and reports)

To stay fl exible and avoid the wide-table syndrome, we need two tables: car_event
and event.

Here are the structure and sample values for those tables:


	CREATE TABLE `event` (
		`code` int(11) NOT NULL,
		`description` char(40) NOT NULL,
		PRIMARY KEY ('code')
	) ENGINE=MyISAM DEFAULT CHARSET=latin1;

	INSERT INTO `event` VALUES (1, 'washed');

		The usage of backticks here ('event'), although not standard
		SQL, is a MySQL extension used to enclose and protect
		identifiers. In this specific case, it could help us with
		MySQL 5.1 in which the event keyword is scheduled to
		become part of the language for some another purpose
		(CREATE EVENT). At the time of writing, beta version
		MySQL 5.1.11 accepts CREATE TABLE event, but it might
		not always be true.

The following image shows sample values entered into the event table from within
the Insert sub-page of phpMyAdmin:


	CREATE TABLE `car_event` (
		`internal_number` int(11) NOT NULL,
		`moment` datetime NOT NULL,
		`event_code` int(11) NOT NULL,
		PRIMARY KEY ('internal_number')
	) ENGINE=MyISAM DEFAULT CHARSET=latin1;

	INSERT INTO `car_event` VALUES (412, '2006-05-20 09:58:38', 1);

Again, sample values are entered via phpMyAdmin:

Data can also hide in a table name. Let’s consider the car and truck tables. They
should probably be merged into a vehicle table, since the vehicle’s category – truck,
car, and other values like minivan is really an attribute of a particular vehicle.
We could also find another case for this table name problem: a table named
vehicle_1996.

Planning for Changes

When designing a data structure, we have to think about how to manage its growth
and the possible implications of the chosen technique.

Let’s say an unplanned car characteristic – the weight – has to be supported. The
normal way of solving this is to find the proper table and add a column. Indeed, this
is the best solution; however, someone has to alter the table’s structure, and probably
the UI too.

The free fields technique, also called second-level data or EAV (Entity-Attribute-
Value) technique is sometimes used in this case. To summarize this technique, we
use a column whose value is a column name by itself.


	Even if this technique is shown here, I do not recommend
	using it, for the reasons explained in the Pitfalls of the Free
	Fields Technique section below.

The difference between this technique and our car_event table is that, for
car_event, the various attributes can all be related to a common subject, which is
the event. On the contrary, free fields can store any kind of dissimilar data. This
might also be a way to store data specific to a single instance or row of a table.

In the following example, we use the car_free_field table to store unplanned
information about the car whose internal_number is 412. The weight and special
paint had not been planned, so the UI gave the user the chance to specify which
information they want to keep, and the corresponding value. We see here a
screenshot from phpMyAdmin but most probably, another UI would be presented
to the user – for example the salesperson who might not be trained to play at the
database level.


	CREATE TABLE `car_free_field` (
		`internal_number` int(11) NOT NULL,
		`free_name` varchar(30) NOT NULL,
		`free_value` varchar(30) NOT NULL,
		PRIMARY KEY ('internal_number','free_name')
	) ENGINE=MyISAM DEFAULT CHARSET=latin1;

	INSERT INTO `car_free_field` VALUES (412, 'weight', '2000');
	INSERT INTO `car_free_field` VALUES (412, 'special paint needed',
	'gold');

Pitfalls of the Free Fields Technique

Even if it’s tempting to use this kind of table for added fl exibility and to avoid user
interface maintenance, there are a number of reasons why we should avoid using it.

  • It becomes impossible to link this “column” (for example the special paint
    needed
    ) to a lookup table containing the possible colors, with a foreign
    key constraint.
  • The free_value field itself must be defined with a generic field type like
    VARCHAR whose size must be wide enough to accommodate all values for all
    possible corresponding free_name values.
  • It prevents easy validation (for a weight, we need a numeric value).
  • Coding the SQL queries on these free fields becomes more complex – i.e.
    SELECT internal_number from car_free_field where
    free_name = ‘weight’ and free_value > 2000.

Naming Recommendations

Here we touch a subject that can become sensitive. Establishing a naming convention
is not easily done, because it can interfere with the psychology of the designers.

Designer’s Creativity

Programmers and designers usually think of themselves as imaginative, creative
people; UI design and data model are the areas in which they want to express
those qualities. Since naming is writing, they want to put a personal stamp to the
column and table names. This is why working as a team for data structure design
necessitates a good dose of humility and achieves good results only if everyone is a
good team player.

Also, when looking at the work of others in this area, there is a great temptation to
improve the data elements names. Some discipline in the standardization has to be
applied and all the team members have to collaborate.

Abbreviations

Probably because older database systems had severe restrictions about the
representation of variables and data elements in general, the practice of abbreviating
has been taught for many years and is followed by many data structure designers
and programmers. I used programming languages that accepted only two characters
for variable names – we had to extensively comment the correspondence between
those cropped variables and their meaning.

Nowadays, I see no valid reasons for systematically abbreviating all column and
table names; after all, who will understand the meaning of your T1 table or your
B7 field?

Clarity versus Length: an Art

A consistent style of abbreviations should be used. In general, only the most
meaningful words of a sentence should be put into a name, dropping prepositions,
and other small words. As an example, let’s take the postal code. We could express
this element with different column names:

  • the_postal_code
  • pstl_code
  • pstlcd
  • postal_code

I recommend the last one for its simplicity.

Suffixing

Carefully chosen suffixes can add clarity to column names. As an example,
for the date of first payment element, I would suggest first_payment_date. In fact,
the last word of a column name is often used to describe the type of content – like
customer_no, color_code, interest_amount.

The Plural Form

Another point of controversy for table names: should we use the plural form cars
table? It can be argued that the answer is yes because this table contains many cars
– in other words, it is a set. Nonetheless, I tend not to use the plural form for the
simple reason that it adds nothing in terms of information. I know that a table is a
set, so using the plural form would be redundant. It can be said also that each row
describes one car.

If we consider the subject on the angle of queries, we can draw different
conclusions depending on the query. A query referring to the car table –
select car.color_code from car where car.id = 34 is more elegant if the
plural form is not used, because the main idea here is that we retrieve one car
whose id equals 34. Some other queries might make more sense with a plural, like
select count(*) from cars.

As a conclusion for this section, the debate is not over, but the most important point
is to choose a form and be consistent throughout the whole system.

Naming Consistency

We should ensure that a data element that is present in more than one table is
represented everywhere by the same column name. In MySQL, a column name does
not exist by itself; it is always inside a table. This is why, unfortunately, we cannot
pick up consistent column names from, say, a pool of standardized column names
and associate it with the tables. Instead, during each table’s creation we indicate
the exact column names we want and their attributes. So, let’s avoid using different
names – internal_number and internal_num when they refer to the same reality.

An exception for this: if the column’s name refers to a key in another table – the
state column – and we have more than one column referring to it like
state_of_birth, `state_of_residence`.

MySQL’s Possibilities versus Portability

MySQL permits the use of many more characters for identifiers – database, table,
and column names than its competitors. The blank space is accepted as are accented
characters. The simple trade-off is that we need to enclose such special names
with back quotes like ‘state of residence’. This procures a great liberty in the
expression of data elements, especially for non-English designers, but introduces a
state of non-portability because those identifiers are not accepted in standard SQL.
Even some SQL implementations only accept uppercase characters for identifiers.

I recommend being very prudent before deciding to include such characters.
Even when staying faithful to MySQL, there has been a portability issue between
versions earlier than 4.1 when upgrading to 4.1. In 4.1.x, MySQL started to represent
identifiers internally in UTF-8 code, so a renaming operation had to be done to
ensure that no accented characters in the database, table, column and constraint
names were present before the upgrade. This tedious operation is not very practical
in a 24/7 system availability context.

Table Name into a Column Name

Another style I often see: one would systematically add the table name as a prefix
to every column name. Thus the car table would be comprised of the columns:
car_id_number, car_serial_number. I think this is redundant and it shows its
inelegance when examining the queries we build:


	select car_id_number from car

is not too bad, but when joining tables we get a query such as


	select car.car_id_number,
	buyer.buyer_name
	from car, buyer

Since at the application level, the majority of queries we code are multi-tables like
the one used above, the clumsiness of using a table name even abbreviated as part of
column names becomes readily apparent. Of course, the same exception we saw in
the Naming Consistency section applies: a column – foreign key – referring to a lookup
table normally includes this table’s name as part of the column’s name. For example,
in the car_event table, we have event_code which refers to the code column in
table event.

Summary

To get a clear and understandable data structure, proper data elements naming is
important. We examined many techniques to apply in order to build consistent table
and column names.

Filed Under: MYSQL

Follow Us

  • Facebook
  • Pinterest

As a participant in the Amazon Services LLC Associates Program, this site may earn from qualifying purchases. We may also earn commissions on purchases from other retail websites.

JavaBeat

FEATURED TUTORIALS

Answered: Using Java to Convert Int to String

What is new in Java 6.0 Collections API?

The Java 6.0 Compiler API

Copyright © by JavaBeat · All rights reserved