Introduction
XML Documents are nothing until some kind of Components called Parsers parses the Documents to extract the meaningful data. Some of the most popular DOM parsers are the Simple API for XML (SAX) and Document Object Model (DOM). Both these parsers have their own advantages and disadvantages in parsing the XML Documents. XPath is a simple Query Language for querying data from a XML Document and it is a standard specification from the W3C Group.
also read:
XPath – A Query Language for XML
Let us see how XPath can be used to query the various pieces of data in a XML Document. Consider a following simple XML file (employees.xml)
<employees> <employee id = "001"> <name>Johny</name> </employee> <employee id = "002"> <name>Williams</name> </employee> </employees>
The above XML file represents a collection of Employee instances as represented by the employee tag. A set of employees shares a common root tag employees. It is wise to mention that in XML terms a tag, element or a node all means the same. A XML Document is nothing but a collection of properly organised tags. A XML Document can contain a mixture of several of the commonly-used tags which either represent an element or the value/text stored by the element. Each element can also have certain attributes. For example, in the above employees.xml, the employees, employee, name represent the elements, the values “Williams“, “Johny” represent the text. Also note that the element/tag- employee has an “id” attribute associated with it.
XPath uses simple expressions to query or select a portion of information from a XML Document. For instance, if we want to get the name of the first employee, then we can frame an expression like this:
/employees/employee[1]/name
The above expression can be intepreted like this, Starting from the root of the XML Document, (which is represented by “/“) traverse until the employees element is found, then deep traverse to find the first employee element represented by employee[1], then retrive the value of the name element. As seen, the XML Document is hierarchically traversed to retrieve the information. The forward slash represents the root of the document, and multiple elements having the same name can be accessed using array based index notation. The index starts with 0, 1, … and so on. If we want to select an attribute then “@” symbol has to be prefixed along with the attribute name. For example, if we wish to query for the id value for the second employee, then the following XPath expression will just do that,
/employees/employee[2]/@id
Java and XPath
Easy to use, Java XPath API is available for accessing the XML data. The XPath API is available in the standard JDK distribution in the javax.xml.xpath package. All we have to do is to utilize the XPathFactory, XPath and XPathExpression classes and interfaces to do the task.
XPathFactory class follows the standard Factory Pattern to create XPath objects. XPath objects provides an environment to compile expressions which is encapsulated by XPathExpression. Then the compiled XPathExpression can be executed to get the desired results. Following is the code snippet,
XPathFactory xPathFactory = XPathFactory.newInstance(); // To get an instance of the XPathFactory object itself. XPath xPath = xPathFactory.newXPath(); // Create an instance of XPath from the factory class. String expression = "SomeXPathExpression"; XPathExpression xPathExpression = xPath.compile(expression); // Compile the expression to get a XPathExpression object. Object result = xPathExpression.evaluate(xmlDocument); // Evaluate the expression against the XML Document to get the result.
Sample Application:
Following section provides a sample application to demonstrate the usage of XPath in Java Applications. The sample application tries to select the value of an element, the value of an attribute, the value of a element-set (which is an element containing multiple elements) by compiling and executing different expressions.
projects.xml
Here is a XML file called ‘projects.xml’ which contains the structured information for various projects. The project element has an attribute called id and various nested elements like name, start-date and end-date. The structure of the XML File is given below:
<xml version="1.0" encoding="UTF-8> <projects> <project id = "BP001"> <name>Banking Project</name> <start-date>Jan 10 1999</start-date> <end-date>Jan 10 2003</end-date> </project> <project id = "TP001"> <name>Telecommunication Project</name> <start-date>March 20, 1999</start-date> <end-date>July 30, 2004</end-date> </project> <project id = "PP001"> <name>Portal Project</name> <start-date>Dec 10 1999</start-date> <end-date>March 10 2006</end-date> </project> </projects>
XPathReader.java
Now, let write a simple Java Application which acts as a reader in reading the various pieces of information from the XML Document. Following is the Java source that does the job of parsing the XML Document:
import java.io.IOException; import javax.xml.XMLConstants; import javax.xml.namespace.QName; import javax.xml.parsers.*; import javax.xml.xpath.*; import org.w3c.dom.Document; import org.xml.sax.SAXException; public class XPathReader { private String xmlFile; private Document xmlDocument; private XPath xPath; public XPathReader(String xmlFile) { this.xmlFile = xmlFile; initObjects(); } private void initObjects(){ try { xmlDocument = DocumentBuilderFactory. newInstance().newDocumentBuilder(). parse(xmlFile); xPath = XPathFactory.newInstance(). newXPath(); } catch (IOException ex) { ex.printStackTrace(); } catch (SAXException ex) { ex.printStackTrace(); } catch (ParserConfigurationException ex) { ex.printStackTrace(); } } public Object read(String expression, QName returnType){ try { XPathExpression xPathExpression = xPath.compile(expression); return xPathExpression.evaluate (xmlDocument, returnType); } catch (XPathExpressionException ex) { ex.printStackTrace(); return null; } } }
The constructor of this class is passed a XML File from which the information has to be read. The method initObjects() is called immediately, and it is used to initialize the XML Document and the XPath objects. A Document representation of the XML File is created by calling the DocumentBuilder.parse() method Then, a new XPath object is created by calling the XPathFactory.newXPath() method.
Client Applications can then call XPathReader.read() method by passing the expression to be evaluated and the return type of the expression. The return type of the expression is a QName which in XML terms, stands for Qualified Name. The standard XPath data-types are String, Number, Boolean, Node, NodeSet etc., which are represented as constants in XPathConstants namely XPathConstants.STRING, XPathConstants.NUMBER, XPathConstants.BOOLEAN, XPathConstants.NODE and XPathConstants.NODESET. Hence, the return type after evaluating an expression should be any of the above mentioned data-types. Within the read() method, an expression is compiled using the XPath.compile() method which returns a XPathExpression and the compiled expression can be evaluated using XPathExpression.evaluate() method.
XPathReaderTest.java
package com.javabeat.tips.xpath; import javax.xml.xpath.XPathConstants; import org.w3c.dom.*; public class XPathReaderTest { public XPathReaderTest() { } public static void main(String[] args){ XPathReader reader = new XPathReader(" src\com\javabeat\tips\xpath\projects.xml"); // To get a xml attribute. String expression = "/projects/project[1]/@id"; System.out.println(reader.read(expression, XPathConstants.STRING) + "n"); // To get a child element's value.' expression = "/projects/project[2]/name"; System.out.println(reader.read(expression, XPathConstants.STRING) + "n"); // To get an entire node expression = "/projects/project[3]"; NodeList thirdProject = (NodeList)reader.read(expression, XPathConstants.NODESET); traverse(thirdProject); } private static void traverse(NodeList rootNode){ for(int index = 0; index < rootNode.getLength(); index ++){ Node aNode = rootNode.item(index); if (aNode.getNodeType() == Node.ELEMENT_NODE){ NodeList childNodes = aNode.getChildNodes(); if (childNodes.getLength() > 0){ System.out.println("Node Name-->" + aNode.getNodeName() + " , Node Value-->" + aNode.getTextContent()); } traverse(aNode.getChildNodes()); } } } }
This test application uses the XPathReader class by creating its instance and then calls the XPathReader.read() method by passing different expressions and return types. As we see, the third expression tries to retrieve an entire node-set by passing the return type as XPathConstants.NODESET. Since a node-set contains a collection of nodes which in turn can contain some other nodes, a Recursive Traversal is made on the node-set to get the name and the value of the node by calling the Node.getNodeName() and Node.getTextContent() methods. The following would be the expected output for the above sample client application.