The eXtensible Markup Language (XML) is a de facto standard on the Internet and is now being used to exchange a variety of data structures. This leads to the problem of efficiently storing, querying and retrieving a great amount of data contained in XML documents. Unfortunately, XML data often need to coexist with historical data. At present, the best solution for storing XML into pre-existing data structures is to extract the information from the XML documents and adapt it to the data structures' logical model (e.g., the relational model of a DBMS). In this paper, we introduce a technique called Xere (XML entity-relationship exchange) to assist the integration of XML data with other data sources. To this aim, we present an algorithm that maps XML schemas into entity-relationship diagrams, discuss its soundness and completeness and show its implementation in XSLT.
Table of Contents
Abstract1
1. Introduction3
2. Background7
2.1. The eXtensible Markup Language7
2.2. Entity-relationship diagrams10
3. The Xere XML schema-to-ER mapping algorithm12
3.1. The Xere mapping methodology12
3.2. The mapping algorithm13
3.2.1. Mapping XML schema elements16
3.2.2. Mapping of element attributes18
3.2.3. Mapping of content models19
3.2.4. Mapping of substitution groups22
3.2.5. Mapping of identity constraints23
3.2.6. Model refinement24
4. An example25
4.1. The problem26
4.2. The XSD model26
4.3. The ER model obtained with Xere27
4.4. Discussion28
5. Soundness and completeness29
6. Design and implementation37
7. Related works39
7.1. XML support in relational databases39
7.2. XML document mapping techniques42
8. Conclusions and future works44
References46
1. Introduction
A great deal of information is exchanged every day through the Internet. The exponential growth of the Web has increased the amount of documents that are being shared among the global community which is on the verge of including almost everyone in the coming years.
The eXtensible Markup Language (XML) , a flexible tagged text format derived from the Standard Generalized Markup Language (SGML) , has been proposed in 1996 by the W3C Consortium as a tool to standardize the format of all the documents used on the Internet and meets the challenges of large-scale electronic publishing. (Fernandez and Suciu 2001 103-114)
During the last few years, XML has become a de facto standard and, as the next step of its evolution, it is being adopted also for data description and manipulation. The hierarchical structure of markup documents is very suitable to represent a wide variety of data, especially objects. XML fragments are hence used to contain data structures and make them more portable and open-format. This trend has been further pushed by the fact that applications can easily interface with XML-structured data streams or packets using XML parsers and querying tools (like XPath , XSLT ) to manipulate their content. Actually, at present XML is being used much more as a data-definition formalism than as a document-definition language. XML protocols such as SOAP are widely used to transport data on the Internet, and a number of organizations are using XML to exchange platform-independent and open-format data.
This scenario points out the problem of efficiently storing, querying and retrieving a great amount of data exchanged by means of XML documents. To this aim, native XML DBMS such as Tamino have been ...