CoffeeCup: 2013

What is an XML Database?

An eXtensible Markup Language (XML) database is a software system that permits data storage in XML format. XML is a meta-markup language used to manage data which employs user customizable tags to organize information. The flexibility of the language, which allows the creation of custom data structures and organizational systems, has led to its widespread use to exchange data in multiple forms. XML databases are often used in applications such as informational portals, document exchanges, and product catalogs.

It is generally considered more efficient in terms of data conversion costs to use an XML database due to the widespread use of this language in data transportation. There are two major categories of these databases: XML-enabled databases and Native XML databases (NXD). Each type of XML database is used to store different types of data.

An XML-enabled database directs data into a traditional relational database in an XML format. The data is translated for storage, and returned to its initial format upon output. This type of database is used to store data-centric documents which include highly structured information, such as patient records, and only use XML for data transfer.

Native XML databases store XML documents as a whole, instead of separating out the data within them, and are designed to store semi-structured information, such as marketing brochures or health data. XML documents that contain semi-structured data are referred to as document-centric. A native XML database does not conform to a certain physical storage model, being able to use relational, hierarchical, or object-oriented structures as well as custom storage formats. It manages documents by grouping them into logical collections, and can set up and manage multiple collections simultaneously. This type of database permits the user to store any type of XML document, regardless of structure, within the same collection. Queries can be constructed across the whole collection, generally making data organization and manipulation more flexible.

An XML database uses a special programming language designed specifically to extract and manipulate XML documents, known as XQuery. The purpose of XQuery is to allow the construction of flexible queries that can extract and manipulate information from XML documents, as well as other sources that can be translated into XML. Some applications in which XQuery can be used include searching text documents on the Web for relevant data and compiling the results, extracting data from databases to be used in application integration, and generating reports on the data contained in an XML database.

XML and Databases

In the document-centric model of XML where XML is typically used as a means to creating semi-structured documents with irregular content that are meant for human consumption. An example of document-centric usage of XML is XHTML which is the XML based successor to HTML.

Sample XHTML document

<head>

<title>Sample Web Page</title>

</head>

<body>

<h1>My Sample Web Page</h1>

<p> All XHTML documents must be well-formed and valid. </p>

</body>

</html>

The other primary usage of XML is in a data-centric model. In a data-centric model, XML is used as storage or interchange format for data that is structured, appears in a regular order and is most likely to be machine processed instead of read by a human. In a data-centric model, the fact that the data is stored or transferred as XML is typically incidental since it could be stored or transferred in a number of other formats which may or may not be better suited for the task depending on the data and how it is used. An example of a data-centric usage of XML is SOAP. SOAP is an XML based protocol used for exchanging information in a decentralized, distributed environment. A SOAP message consists of three parts: an envelope that defines a framework for describing what is in a message and how to process it, a set of encoding rules for expressing instances of application-defined data types, and a convention for representing remote procedure calls and responses.

Sample SOAP message taken from w3c soap recommendation

<SOAP-ENV:Envelope xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/

SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

<SOAP-ENV:Body>

<m:GetLastTradePrice xmlns:m="Some-URI">

</m:GetLastTradePrice>

</SOAP-ENV:Body>

</SOAP-ENV:Envelope>

In both models where XML is used, it is sometimes necessary to store the XML in some sort of repository or database that allows for more sophisticated storage and retrieval of the data especially if the XML is to be accessed by multiple users. Below is a description of storage options based on what model of XML usage is required.

Data-centric model:

In a data-centric model where data is stored in a relational database or similar repository; one may want to extract data from a database as XML, store XML into a database or both. For situations where one only needs to extract XML from the database one may use a middleware application or component that retrieves data from the database and returns it as XML. Middleware components that transform relational data to XML and back vary widely in the functionality they provide and how they provide it. For instance, Microsoft's ADO.NET provides XML integration to such a degree that results from queries on XML documents or SQL databases can be accessed identically via the same API.

The alternative to using middleware components to retrieve or store XML in a database is to use an XML-enabled database that understands how to convert relational data to XML and back. Currently, the Big 3 relational database products all support retrieving and storing XML in one form or another. IBM's DB2 uses the DB2 XML Extender. The DB2 extender gives one the option to store an entire XML document and its DTD as a user-defined column or to slice the document into multiple tables and columns. XML documents can then be queried with syntax that is compliant with W3C XPath recommendation. Updating of XML data is also possible using stored procedures.

Document-centric model

Content management systems are typically the tool of choice when considering storing, updating and retrieving various XML documents in a shared repository. A content management system typically consists of a repository that stores a variety of XML documents, an editor and an engine that provides one or more of the following features:

o version, revision and access control

o ability to reuse documents in different formats

o collaboration

o web publishing facilities

o support for a variety of text editors (e.g. Microsoft Word, Adobe Framemaker, etc)

o indexing and search capabilities

Content management systems have been primarily of benefit for workflow management in corporate environments where information sharing is vital and as a way to manage the creation of web content in a modular fashion allowing web developers and content creators to perform their tasks with less interdependence than exists in a traditional web authoring environment.

Hybrid model

In situations where both document-centric and data-centric models of XML usage will occur, the best data storage choice is usually a native XML database. The most coherent definition so far is one that was reached by consensus amongst members of the XML: DB mailing list which defines a native XML database as a database that has an XML document as its fundamental unit of (logical) storage and defines a (logical) model for an XML document, as opposed to the data in that document, and stores and retrieves documents according to that model. At a minimum, the model must include elements, attributes, PCDATA, and document order. Tamino is a native XML database management system developed by Software AG. Tamino is a relatively mature application, currently at version 2.3.1, that provides the means to store & retrieve XML documents, store & retrieve relational data, as well as interface with external applications and data sources. Schemas in Tamino are DTD-based and are used primarily as a way to describe how the XML data should be indexed. When storing XML documents in Tamino; one can specify a pre-existing DTD which is then converted to a Tamino schema, store a well-formed XML document without a schema which means that default indexing ensues or a schema can be created from scratch for the XML document being stored. A secondary usage of schemas is for specifying the data types in XML documents.

XSD (XML Schema Definition)

A Recommendation of the World Wide Web Consortium, specifies how to formally describe the elements in an Extensible Markup Language document. This description can be used to verify that each item of content in a document adheres to the description of the element in which the content is to be placed.

The XML Schema Definition is a reference library that provides an API for use with any code that examines, creates or modifies W3C XML Schema (standalone or as part of other artifacts, such as XForms or WSDL documents).

XSD is a library that provides an API for manipulating the components of an XML Schema as described by the W3C XML Schema specifications, as well as an API for manipulating the DOM-accessible representation of XML Schema as a series of XML documents, and for keeping these representations in agreement as schemas are modified.

XSD has several advantages over earlier XML schema languages, such as document type definition (DTD) or Simple Object XML (SOX). For example, it's more direct: XSD, in contrast to the earlier languages, is written in XML, which means that it doesn't require intermediary processing by a parser. Other benefits include self-documentation, automatic schema creation, and the ability to be queried through XML Transformations (XSLT). Despite the advantages of XSD, it has some detractors who claim, for example, that the language is unnecessarily complex.

e.g.

Hello.xml

<?xml version="1.0" encoding="ISO-8859-1"?>

<city>Kadegaon</city>

<country>India</country>

</shipto>

<item>

</item>

<item>

</item>

</shiporder>

Hello.xsd

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="shiporder">

<xs:complexType>

<xs:sequence>

<xs:element name="orderperson" type="xs:string"/>

<xs:element name="shipto">

<xs:complexType>

<xs:sequence>

<xs:element name="name" type="xs:string"/>

<xs:element name="addr" type="xs:string"/>

<xs:element name="city" type="xs:string"/>

<xs:element name="country" type="xs:string"/>

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name="item" maxOccurs="unbounded">

<xs:complexType>

<xs:sequence>

<xs:element name="title" type="xs:string"/>

<xs:element name="note" type="xs:string"/>

<xs:element name="quantity" type="xs:string"/>

<xs:element name="price" type="xs:string"/>

</xs:sequence>

</xs:complexType>

</xs:element>

Pages

Friday, 6 December 2013

XML DATABASE

Wednesday, 4 December 2013

XML Schema Definition with example

ShareThis

Menu