XML2OWL

 

1. Introduction

XML2OWL is an ANTLR-based program written in C++ for automatic conversion of an Extensible Markup Language (XML) file to a Web Ontology Language (OWL) file. The mapping rules are defined in a text file with a .rules extension. Note that the OWL file created during the conversion process contains only instance data (i.e., OWL individuals and their properties); the converter presumes the existence of an OWL ontology that specifies the classes and properties instantiated in the OWL file.

2. Motivation

The main motivation for writing this program was the conversion of data available in legacy XML documents into a semantically richer representation, which can then be processed and interpreted by ontology-based software applications. Some exemplary use cases for such semantic applications are described here .

We are presently using this converter for two applications. In the first application, XML2OWL converts work process models created with the Workflow Modeling System WOMS into an OWL-based representation. In the second application, the converter automatically creates semantic annotations for CAPE model files (i.e., data files containing the specification of a mathematical model used for simulation or optimization in the domain of Computer Aided Process Engineering, CAPE). It is assumed that the mathematical model is represented in CapeML , an XML-based model exchange language specifically developed for the CAPE domain. The converter extracts relevant information from the CapeML model file to create an associated OWL file that summarizes the main features of the model, thus facilitating its later retrieval through a semantic search engine. The data contained in the OWL file are instantiations of concepts defined in the ontology OntoCAPE. All examples presented in the following arise from this application.

3. Conversion Process

The converter program takes at least two arguments, a rules definition file (.rules) that defines XML to OWL mappings and an XML file containing the legacy XML data.

The .rules file is parsed by a parser that is automatically generated by ANTLR. The grammar (syntactical structure) of the .rules file is defined in an ANTLR grammar (.g) file (not shown in Fig. 1). The grammar file also contains actions for reading/searching the input XML file and writing the output OWL file. These XML-related functions use libxml++, which is a C++ wrapper library around libxml2, the well-known C-based XML parser and toolkit for the GNOME project.

In order to use this converter, it is important to understand the grammatical structure of the rules definition file (.rules). The syntax of the mapping rules consists of a capitalized keyword followed by a comma separated list of strings enclosed in parentheses. These strings completely define a rule such that a certain action is performed for that rule. A detailed description of the available rules is contained in this document.

4. Mapping XML to OWL

The mapping of XML content to OWL is elaborated in this section. Most of the examples are taken from an example XML file that is available here along with the used rules definition file and the OWL file produced by the converter.

4.1 XML element → OWL individual

Mapping XML elements of a given type to OWL individuals of a predefined class can be done either conditionally or unconditionally, depending on whether the mapping rule defines a test condition or not. As a test condition, one may specify a target value for a particular XML attribute; for XML elements without any attribute, a conditional mapping to OWL individuals is not possible.

Additionally, the converter can retrieve rdf:IDs of selected individuals from already existing OWL files. Such individuals are termed as “external individuals” in the following. That way, it is possible to define relations between external individuals and newly created individuals through object properties.

4.1.1 Unconditional mapping

Each XML element of a given type is mapped to an OWL individual of a given class. The rdf:ID of each created individual may be named according to the value of a particular XML attribute.

As an example, consider the following excerpt of an XML source file, containing two XML elements of type ReactionNetwork that have attributes named name and type.

<ReactionNetwork name="R-1" type="POWERLAW"> ... </ReactionNetwork>
<ReactionNetwork name="R-2" type="POWERLAW"> ... </ReactionNetwork>

Now, the mapping rule stated below is applied to the above XML source.

IND(phase_system:ChemicalReactionNetwork, ReactionNetwork, ReactionNetwork@name)

The mapping rule specifies that, for each XML element of type ReactionNetwork , an OWL individual is created which is an instance of the OWL class ChemicalReactionNetwork belonging to the namespace phase_system . Moreover, the rdf:IDs of the created individuals are named according to the value of the attribute name of the XML element ReactionNetwork .

The resulting OWL code is shown below. Two individuals are created by the converter, whose rdf:IDs have the values R-1 and R-2 , respectively.

<phase_system:ChemicalReactionNetwork rdf:ID="R-1"/>
<phase_system:ChemicalReactionNetwork rdf:ID="R-2"/>

A slightly different situation occurs when the rdf:ID of the created individual is not specified by an attribute’s value. In this case, the individual is named by a string that is specified by the mapping rule. If multiple individuals are created by such a rule, the name string is followed by a number where the counting starts from 1. The numbering ensures that each individual gets a unique identifier.

As an example, consider the below XML fragment that contains three XML elements of type Stoichiometry that have attributes named component and coefficient .

<Stoichiometry component="ETHYLENE" coefficient="-2."/>
<Stoichiometry component="O2" coefficient="-1."/>
<Stoichiometry component="EO" coefficient="2." />

Each Stoichiometry element is to be mapped onto an OWL individual of class StoichiometricCoefficient ; the rdf:ID s of the newly created individuals consist of the string StCoeff_ followed by a number to differentiate between the individuals.

IND(phase_system:StoichiometricCoefficient, Stoichiometry, "StCoeff_")

The resulting OWL code is shown below. Three individuals are created by the converter, whose rdf:IDs have the values StCoeff_1 , StCoeff_2 and StCoeff_3 , respectively.

<phase_system:StoichiometricCoefficient rdf:ID="StCoeff_1"/>
<phase_system:StoichiometricCoefficient rdf:ID="StCoeff_2"/>
<phase_system:StoichiometricCoefficient rdf:ID="StCoeff_3"/>

It is possible to further specify the XML elements that are to be converted by means of XPath expressions (more precisely, by indicating their respective location paths). This might be necessary if there is ambiguity in the selection of XML elements due to a rather simple XML structure, such that the mere indication of the type of the XML element is not sufficient to resolve this ambiguity. The XPath expression must provide provide only the information that is required to resolve the ambiguity; for example, one does not have to specify the complete location path but only the differentiating ancestor element. This can be better understood with the help of the following example.

The shown XML excerpt contains elements of type PARAMETER with two different ancestor elements, DATABANKS and COMPONENTS .

<DATABANKS>
<PARAMETERLIST recordsize="1" index="0">
<PARAMETER index="0">PURE11</PARAMETER>
<PARAMETER index="0">AQUEOUS</PARAMETER>
<PARAMETER index="0">SOLIDS</PARAMETER>
</PARAMETERLIST>
</DATABANKS>
<COMPONENTS>
<PARAMETERLIST recordsize="1" index="0">
<PARAMETER index="0">O2</PARAMETER>
<PARAMETER index="0">ETHEN</PARAMETER>
<PARAMETER index="0">ETHAN</PARAMETER>
<PARAMETER index="0">CO2</PARAMETER>
</PARAMETERLIST> </COMPONENTS>

The mapping rule should select the PARAMETER elements under DATABANKS element only in order to create individuals of databank class with a namespace prefix of model . The rule given below inserts the text content of a selected PARAMETER element as part of the rdf:IDs of the created individuals. This is specified by using text() in place of an attribute's name. (i.e., in place of the index attribute in this example).

IND(model:databank, DATABANKS/descendant::PARAMETER, "Databank_" + DATABANKS/descendant::PARAMETER@text())

Three OWL individuals of databank class are created with rdf:IDs of Databank_PURE11 , Databank_AQUEOUS and Databank_SOLIDS as shown below.

<model:databank rdf:ID="Databank_PURE11"/>
<model:databank rdf:ID="Databank_AQUEOUS"/>
<model:databank rdf:ID="Databank_SOLIDS"/>

4.1.2 Conditional mapping

For conditional mapping, the value of an attribute is compared against a given string or numeric value, depending on the type of the attribute. For string attributes, the test condition checks whether the value of the attribute contains the test string. For numerical comparisons, the test condition checks whether the attribute’s value is greater, greater or equal, equal, less or equal, or less than the test number.

Similar to the unconditional case, the rdf:IDs of the created individuals may contain the value of an attribute or may be named by a string specified in the mapping rule followed by a number. No number is appended if only one individual is created for a mapping rule.

Three examples of conditional creation of individuals are given below.

In the first example, an XML excerpt contains three XML elements of type SubmodelDefinition with attributes named name and href .

<SubmodelDefinition name="WATERMIX" href="MIXER"> ... </SubmodelDefinition>
<SubmodelDefinition name="DEGCOL" href="RADFRAC"> ... </SubmodelDefinition>
<SubmodelDefinition name="EGCOL" href="RADFRAC"> ... </SubmodelDefinition>

The conditional mapping rule shown below searches for XML elements of type SubmodelDefinition where the href attribute takes the value "RADFRAC" .If such an element is found, an OWL individual of class RadFrac is created, and its rdf:ID takes the value of the name attribute of the corresponding XML element.

IND(aspen_plus_model:RadFrac, SubmodelDefinition, SubmodelDefinition@name, SubmodelDefinition@href="RADFRAC")

The two OWL individuals created by the converter are shown below, with rdf:IDs of DEGCOL and EGCOL , respectively.

<aspen_plus_model:RadFrac rdf:ID="DEGCOL"/> <aspen_plus_model:RadFrac rdf:ID="EGCOL" />

As a second example, consider the following XML fragment containing five XML elements of type Parameter that have attributes named name and value .

<SubmodelDefinition name="R1" href="RPLUG">
<Parameter name="temperature" value="250."/>
<Parameter name="NTUBE" value="2000"/>
<Parameter name="LENGTH" value="8."/>
</SubmodelDefinition>
<SubmodelDefinition name="R2" href="RCSTR">
<Parameter name="pressure" value="10."/>
<Parameter name="LENGTH" value="2."/>
</SubmodelDefinition>

The below conditional mapping rule searches for XML elements of type Parameter where the name attribute takes the value "LENGTH". In this case, an OWL individual of class ModelVariableDef is created. The rdf:IDsVariableDef_ of the created individuals consist of the string , followed by the value of the name attribute of the corresponding XML element Parameter , followed by a number to differentiate between the individuals.

IND(equation_system:ModelVariableDef, Parameter, "VariableDef_"+Parameter@name, Parameter@name="LENGTH")

The two OWL individuals created conditionally by the converter are shown below, with rdf:IDs of VariableDef_LENGTH_1 and VariableDef_LENGTH_2 , respectively.

<equation_system:ModelVariableDef rdf:ID="VariableDef_LENGTH_1"/>
<equation_system:ModelVariableDef rdf:ID="VariableDef_LENGTH_2"/>

The third example demonstrates the conditional creation of individuals depending on an additional test condition.

<SubmodelDefinition name="R1" href="RPLUG">
<Parameter name="pressure" value="20."/>
<Parameter name="NTUBE" value="2000"/>
</SubmodelDefinition>
<SubmodelDefinition name="B4" href="MIXER">
<Parameter name="pressure" value="1." unit="bar" />
<Parameter name="MAXIT" value="50" unit="-" />
<Parameter name="T-EST" value="372.6" unit="K" />
</SubmodelDefinition>

The below rule creates individuals of class VariableLarge by selecting only those Parameter elements which have an attribute value with value greater than or equal to 50.

IND(equation_system:VariableLarge, Parameter, "VariableDef_"+Parameter@name, Parameter@value>=50)

The three conditionally created OWL individuals of class VariableLarge are shown below, with rdf:IDs of VariableDef_NTUBE , VariableDef_MAXIT and VariableDef_T-EST , respectively.

<equation_system:VariableLarge rdf:ID="VariableDef_NTUBE"/>
<equation_system:VariableLarge rdf:ID="VariableDef_MAXIT"/>
<equation_system:VariableLarge rdf:ID="VariableDef_T-EST"/>

4.1.3 External individuals

Object properties may be defined between newly created individuals and individuals retrieved from an already existing OWL file (the so-called “external individuals”). The class of the external individual serves as the range of the object property in this case.

To retrieve the rdf:IDs of external individuals, the converter compares the value of a datatype property in the OWL file with the value of an attribute in the XML file. For example, consider the rule given below.

EXTIND("molecular_species.owl"@substance:aspen_name, Component@name, Component@userid)

This rule specifies that an OWL file named molecular_species.owl should be searched for external individuals. The desired external individual is identified through its datatype property with qualified name substance:aspen_name that takes the value given by the XML attribute name of the Component element.

The rdf:IDs of found elements are saved in pairs with the values of the userid attribute of the Component element in a correspondence table. The userid attribute is specified in this rule because it links the found individuals with newly created individuals through object properties. This issue is explained later in Section 4.2. As an example, Table 1 lists corresponding pairs of attribute values and rdf:IDs of external individuals, which result from the application of the above mapping rule to the XML fragment given below.

<Component userid="EO" formula="C2H4O-2" name="ETHYLENE-OXIDE"/>
<Component userid="EG" formula="C2H6O2" name="ETHYLENE-GLYCOL"/>
<Component userid="DIEG" formula="C4H10O3" name="DIETHYLENE-GLYCOL"/>
<Component userid="TREG" formula="C6H14O4" name="TRIETHYLENE-GLYCOL"/>

Value of attribute ( userid )

OWL file name + rdf:ID

EO

molecular_species.owl +
EthyleneOxide

EG

molecular_species.owl +
_1.2-Ethanediol

DIEG

molecular_species.owl +
_2.2-oxybis-Ethanol

TREG

molecular_species.owl +
TriethyleneGlycol

Table 1: Pairs of attribute values and rdf:IDs of individuals found in the already existing OWL file

4.2 OWL object properties

4.2.1 XML ancestor-descendant relationship → OWL object property

Object properties represent relations among OWL classes. Object properties can be defined between newly created individuals when the corresponding XML elements have an ancestor-descendant relationship, such as a parent-child relationship or a grandparent-grandchild relationship. The converter can handle object properties between any pair of conditionally or unconditionally created individuals. The syntax for the mapping rule simply defines the name of the property, followed by domain and range classes of the individuals.

As an example, consider an XML element of type ReactionNetwork , which has three child elements of type Reaction .

<ReactionNetwork name="R-1" type="POWERLAW">
<Reaction id="1" stream="MIXED"> ... </Reaction>
<Reaction id="2" stream="MIXED"> ... </Reaction>
<Reaction id="3" stream="MIXED"> ... </Reaction>
</ReactionNetwork>

Assume that OWL individuals of classes ChemicalReactionNetwork and ChemicalReaction , respectively, have been created from the XML elements ReactionNetwork and Reaction , respectively, through an unconditional mapping rule (cf. Section 4.1.1).

<phase_system:ChemicalReactionNetwork rdf:ID="R-1"/> <phase_system:ChemicalReaction rdf:ID="ChemicalReaction_1"/>
<phase_system:ChemicalReaction rdf:ID="ChemicalReaction_2"/>
<phase_system:ChemicalReaction rdf:ID="ChemicalReaction_3"/>

The following mapping rule specifies that each OWL individual of the domain class ChemicalReactionNetwork is linked via an object property is_aggregated_of with individuals of the range class ChemicalReaction . Object properties are established only between those OWL individuals whose corresponding XML elements (i.e., ReactionNetwork and Reaction ) have a parent-child relationship.

OTP(phase_system:is_aggregated_of, ChemicalReactionNetwork, ChemicalReaction)

The resulting OWL code is given below.

<phase_system:ChemicalReactionNetwork rdf:about="#R-1">
<phase_system:is_aggregated_of rdf:resource="#ChemicalReaction_1"/>
<phase_system:is_aggregated_of rdf:resource="#ChemicalReaction_2"/>
<phase_system:is_aggregated_of rdf:resource="#ChemicalReaction_3"/> </phase_system:ChemicalReactionNetwork>

4.2.2 Link established via an XML attribute value → OWL object property

A link between XML elements may also be established via an unambiguous attribute value. OWL individuals derived from such XML elements can then be related by an object property.

For example, the XML elements:

<source_element name="source_1" linking_attribute="link_1"/>
<target_element name="target_1" linking_attribute="link_1"/>

are explicitly related by the attribute value "link_1".

Now assume that two individuals named source_1 and target_1 were created from these two XML elements; the individuals are instances of the OWL classes Source and Target, respectively. To establish an object property named link between these two individuals, the following rule needs to be specified:

OTP(link, Source, Target, source_element@linking_attribute, target_element@linking_attribute)

The result would be

<Source rdf:about="#source_1">
<link rdf:resource="#target_1"/>
</Source rdf:about="#source_1">

4.2.3 Object property connecting individuals derived from the same XML element

Object properties can also be established between domain and range class individuals that were derived from the same XML element. As an example, consider the following XML code.

<SubmodelDefinition name="DEGCOL" href="RADFRAC"> ... </SubmodelDefinition>
<SubmodelDefinition name="EGCOL" href="RADFRAC"> ... </SubmodelDefinition>

For each XML element of type SubmodelDefinition , two individuals have been created: one individual of class RadFrac , which has been created conditionally based on the value of the href attribute (as explained in Section 4.1.2), and one individual of class BehavioralUnit, which has been created unconditionally.

<aspen_plus_model:RadFrac rdf:ID="DEGCOL"/>
<aspen_plus_model:RadFrac rdf:ID="EGCOL"/>
<behavior:BehavioralUnit rdf:ID="Unit_DEGCOL"/>
<behavior:BehavioralUnit rdf:ID="Unit_EGCOL"/>

The following rule creates an object property models with a namespace prefix of mathematical_model between the individuals of the domain class RadFrac and the range class BehavioralUnit . This linking applies only to those OWL individuals that have been derived from the same XML SubmodelDefinition element.

OTP(mathematical_model:models, RadFrac, BehavioralUnit)

If the rule is applied to the above XML code, the converter creates the following OWL code.

<aspen_plus_model:RadFrac rdf:about="#DEGCOL">
<mathematical_model:models rdf:resource="#Unit_DEGCOL"/>
</aspen_plus_model:RadFrac>
<aspen_plus_model:RadFrac rdf:about="#EGCOL">
<mathematical_model:models rdf:resource="#Unit_EGCOL"/>
</aspen_plus_model:RadFrac>

4.2.4 Object property connecting a newly created individual and an external individual

An object property may also be established between a newly created individual and an external individuals (cf. Section 4.1.3). As a precondition, the rdf:IDs of both the newly created individuals and the corresponding external individuals have already been identified by means of an EXTIND rule, as explained in Section 4.1.3. The XML elements that correspond to the domain and range class individuals must have an ancestor-descendent relationship

For instance, consider the following mapping rule.

OTPEXT(phase_system:has_reactant, ChemicalReaction, Stoichiometry@component, Stoichiometry@coefficient<0)

For each occurrence of an XML element of type Stoichiometry , the rule creates an object property named has_reactant between individuals of the domain class ChemicalReaction and external individuals which have been identified through a corresponding EXTIND rule; the value of the component Stoichiometry attribute of the element must match a value in a correspondence table that has been created as the result of the EXTIND rule (cf. Table 1). The rule also incorporates a test condition that checks whether the value of the coefficient attribute is less than zero.

Assume that the rule is applied to the following XML code.

<Reaction id="3" stream="MIXED">
<Stoichiometry component="EO" coefficient="-1."/>
<Stoichiometry component="H2O" coefficient="-1."/>
<Stoichiometry component="EG" coefficient="1."/>
</Reaction>

<Reaction id="4" stream="MIXED">
<Stoichiometry component="EO" coefficient="-1."/>
<Stoichiometry component="EG" coefficient="-1."/>
<Stoichiometry component="DIEG" coefficient="1."/>
</Reaction>

As a result, the converter will create the following OWL code.

<phase_system:ChemicalReaction rdf:about="#ChemicalReaction_3">
<phase_system:has_reactant rdf:resource="molecular_species.owl#EthyleneOxide"/> <phase_system:has_reactant rdf:resource=" molecular_species.owl#Water"/> </phase_system:ChemicalReaction> <phase_system:ChemicalReaction rdf:about="#ChemicalReaction_4"> <phase_system:has_reactant rdf:resource="molecular_species.owl#EthyleneOxide"/> <phase_system:has_reactant rdf:resource="molecular_species.owl#_1.2-Ethanediol"/> </phase_system:ChemicalReaction>

If the rdf:ID of an external individual is known beforehand, there is no need to perform a search as the one triggered by the EXTIND rule (cf. Section 4.1.3). Instead, the ID can be specified directly in the OTPEXT rule. Consider the following OWL fragment that contains two individuals of class ModelVariableDef .

<equation_system:ModelVariableDef rdf:ID="VariableDef_pressure"/>
<equation_system:ModelVariableDef rdf:ID="VariableDef_NTUBE"/>

The below rule creates an object property is_of_type that runs from individuals of class ModelVariableDef to an external individual PARAMETER that is located in the file equation_system.owl .

OTPEXT(equation_system:is_of_type, ModelVariableDef, "equation_system.owl#PARAMETER")

If this rule is applied to the above OWL fragment, it results in the following OWL code.

<equation_system:ModelVariableDef rdf:about="#VariableDef_pressure">
<equation_system:is_of_type rdf:resource="equation_system.owl#PARAMETER"/> </equation_system:ModelVariableDef>
<equation_system:ModelVariableDef rdf:about="#Variable_NTUBE">
<equation_system:is_of_type rdf:resource="equation_system.owl#PARAMETER"/> </equation_system:ModelVariableDef>

4.3 OWL datatype properties

4.3.1 XML attribute → OWL datatype property

The attribute value of an XML element can be mapped to the datatype property of an OWL individual that has been derived from the same XML element. The range of this datatype property can be one of the following XML Schema datatypes: xsd:string or xsd:float .

In the following example, individuals of class VariableSpecification have been previously created from XML elements of type Parameter . By the rule given below, a string datatype property named has_unit is established for each individual of class VariableSpecification . If the corresponding Parameter element has an attribute unit, the datatype property takes the value of the unit attribute. For individuals corresponding to those elements that do not have a Parameter unit attribute, the datatype property takes the value “-”.

DTP(equation_system:has_unit, VariableSpecification, Parameter@unit,string)

The rule is applied to the following XML code.

<SubmodelDefinition name="WATCOL" href="RADFRAC">
<Parameter name="NSTAGE" value="25"/>
</SubmodelDefinition>
<SubmodelDefinition name="REAKTOR" href="RSTOIC">
<Parameter name="TEMP" value="250." unit="C"/>
<Parameter name="PRES" value="20." unit="bar"/>
</SubmodelDefinition>

The resulting OWL code is shown below.

<equation_system:VariableSpecification rdf:about="#Spec_NSTAGE">
<equation_system:has_unit rdf:datatype="&xsd;#string">-</equation_system:has_unit> </equation_system:VariableSpecification>
<equation_system:VariableSpecification rdf:about="#Spec_TEMP">
<equation_system:has_unit rdf:datatype="&xsd;#string">C</equation_system:has_unit> </equation_system:VariableSpecification>
<equation_system:VariableSpecification rdf:about="#Spec_PRES">
<equation_system:has_unit rdf:datatype="&xsd;#string">bar</equation_system:has_unit> </equation_system:VariableSpecification>

4.3.2 XML text element → OWL datatype property

The content of an XML text element can be mapped to the datatype property of an OWL individual. In this case, the XML text element must be a descended of the XML element from which the OWL individual owning the datatype property was derived.

In the mapping rule, the XML text element is identified by an XPath expression of type xml_element/child_element. If an XML element has several child text elements that match the XPath expression, their contents are combined and then represented by a single datatype property, using newline as a separator character.

For example, if the rule

DTP(hasComment, Element, element/comment/line, string)

is applied to the XML fragment

<element id="e_1">
<comment>
<line>This is a comment. <line>
The comment is continued in this line.
</line>
</comment>
</element>

the following OWL code will be produced by the converter:

<Element rdf:about="#e_1">
<hasComment rdf:datatype="&xsd;#string">This is a comment. The comment is continued in this line.<hasDescription>
</Element>

5. Usage/Download

XML2OWL is a C++ based program, which can be executed from the DOS shell built-in of Microsoft Windows. The program has been developed using Microsoft Visual C++ .NET 2005 and may not be portable to operating systems other than Windows without making changes to the source code.

Assuming that the executable (xml2owl.exe) is located in the current working directory, the command for executing the program is given below:

>> xml2owl -r <rules_def_file> -i <input_xml_file> [-o <output_owl_file>]

For example, for a rule definition named mapping.rules and an input XML file named example.xml, both located in the current working directory, the above command would look like

>> xml2owl -r mapping.rules -i example.xml

If the optional -o flag is not used, the output OWL file is named after the input XML file. Executing the program with no input arguments outputs the usage instructions for the program:

>> XML to OWL Converter (xml2owl)

>> usage: xml2owl -r <rules_def_file> -i <input_xml_file> [-o <output_owl_file>]

The converter executable (.exe) along with the required DLLs can be downloaded from here (1.42 MB). The development package (10.0 MB) contains the Visual C++ .Net 2005 solution files along with the source code and the required libraries.

6. Related Work

6.1 XML2OWL XSLT

XML2OWL XSLT is a framework for converting an XML/XML Schema file to OWL file(s). For a given XML instance file, the converter creates two OWL files: the first one contains an ontology that introduces classes and properties with cardinality constraints on minimum and maximum occurrences; the second file contains the actual data, represented as instantiations of the previously created ontology.

The conversion steps include the extraction of an XML schema file, the generation of an OWL ontology and a configured stylesheet (for creating the OWL instances file), and finally the creation of the OWL instance file.

There are some significant differences between “XML2OWL XSLT” and our converter that are stated below.

1. The ontology generated by the XML2OWL XSLT framework is automatically derived from a given XSLT schema. As a result, it can only contain those semantic information that has already been present in the XSLT schema – basically class hierarchies as well as some elementary class definitions and object properties. Such a rather simple ontology, commonly known as a “lightweight ontology”, is not sufficient for semantic applications that require sophisticated reasoning.

Our converter, by contrast, maps XML data onto a an existing ontology. Consequently, no limitations are imposed on the semantic richness of the target ontology. If required, it the data can be mapped onto a carefully crafted “heavyweight ontology” containing complex properties, property restrictions, axioms, etc., which can be exploited by sophisticated software applications.

2. The XML2OWL XSLT framework uses fixed conversion rules for the creation of classes and properties. In contrast, our converter allows the definition of individual rules within a rules definition file in our approach. This approach requires a certain additional effort but is much more flexible, as will be elaborated in the following.

a. XML2OWL XSLT creates an OWL class for each element type in the input XML file. Thus, a selective mapping of only some of the element types is not possible. Moreover, the class name is the same as the type name of the corresponding XML element. The identifiers of the created OWL individuals consist of a seven-digit random number.

In our approach, the OWL individuals may also be created conditionally, where only those XML elements of a certain type are mapped that satisfy a test condition. The mapping rule for creating OWL individuals gives complete flexibility in selecting the XML element type and the corresponding OWL class. The OWL individuals have meaningful names and may even contain the value of an XML attribute.

b. XML2OWL XSLT creates a datatype property for each attribute of each element present in the XML source file, where the domain of the data property is the class corresponding to the XML element and the range is xsd:string . The datatype property takes the name of the XML attribute; in addition, a (global) prefix may be prepended to the name of the properties.

Our converter defines datatype properties using the DTP rule. For each instance of the DTP rule, a datatype property is defined for individuals of a given class. The value of the datatype property is equal to an attribute of the corresponding XML element. The datatype can be either xsd:string or xsd:float . The user may define an arbitrary name for the datatype property.

c. XML2OWL XSLT converter creates an object property where the elements corresponding to domain and range classes have a parent-child relationship. Each parent-child relationship in the XML instance file is mapped to an object property. The object property takes its name from a globally specified prefix, followed by the name of the respective child element. A further limitation is that an object property cannot be created between two individuals if the corresponding XML elements have an ancestor-descendent relationship other than the direct parent-child relationship.

In our converter, object properties are created only between selected individuals. Two individuals linked via an object property must have been derived either from the same XML element or from separate XML elements that have an ancestor-descendent relationship. The converter may also link derived individuals with individuals from an already existing OWL file. As in the case of datatype properties, the user may freely choose names for the object properties.

We tried the XML2OWL demonstration platform in early June 2006 for converting some example CapeML files to OWL instances files. We noticed that the OWL ontology was generated correctly but the OWL instances file was not created properly. The converter did not create the correct number of individuals for a given type of XML elements – for example, only one individual of class Stoichiometry was created, even though there were 13 corresponding XML elements in the XML source file. As a result, the associated object and datatype properties were also incorrect.

6.2 SWAD-Europe Mapping Tool

A mapping tool for manual generation of mappings between existing OWL ontologies and XML Schemas has been developed as part of the SWAD-Europe project. This Java-based mapping tool can be used to graphically define mappings, which are then stored in a specific XML format. However, there is no converter available that executes the specified rules. In the following, we present comparison between the mapping rules defined in the SWAD-Europe project and those applied in our approach.

In the SWAD-Europe mapping tool, the mapping between an OWL class and an XML element is specified by using the complete XPath of the XML element. An abstract representation of the mapping element is given below.

<map:classMap> <map:source class="class_name"/> <map:target path='/xsd:schema/xsd:element[@name="xml_element"]'/> </map:classMap>

This mapping rule is similar to the IND rule used in our approach. The IND rule gives more flexibility by allowing conditional creation of individuals based on the value of an attribute. Furthermore, the identifiers of the OWL individuals may contain the value of an attribute, a prefix string and/or a suffix string.. The syntax of the corresponding IND rule is given below.

IND(class_name, xml_element, "prefix_string"+xml_element@attribute1+"suffix_string", [xml_element@attribute2="attribute_value"] )

The structure of an example datatype mapping element of the SWAD-Europe tool is given below. The domain and the range are mapped to the corresponding XML element and attribute respectively.

<map:propertyMap> <map:source property="property_name"/> <map:target path='/xsd:schema/xsd:complexType[@name="xml_element"]/xsd:attribute[@name="xml_attribute"]'/> <map:domain path='/xsd:schema/xsd:element[@name="owl_class"]'/> <map:range path='/xsd:schema/xsd:complexType[@name="xml_element"]/xsd:attribute[@name="xml_attribute"]'/> </map:propertyMap>

Our converter defines a datatype property using the DTP mapping rule. The rule’s definition includes the property’s name, the XML element that was mapped to the domain class, the XML attribute and the OWL datatype.

DTP(property_name, domain_class, xml_element@attribute, datatype)

One difference here is that the xml_element must be the one that has been used to create the individuals of the domain class in the first place. This restriction may not be imposed on datatype properties by the SWAD-Europe mapping tool.

The definition of mapping rules for object properties is similar to that of datatype properties in the SWAD-Europe mapping tool. The only difference is that, for an object property, the map:range child element of the map:propertyMap element gives the XPath expression for a schema element instead of an attribute.

Our converter uses the OTP mapping rule for the creation of object properties. An advantage of the OTP rule over the corresponding rule of the SWAD-Europe mapping tool is that the OTP rule may also include a test condition. Moreover, object properties may be created between newly created individuals and individuals defined in existing OWL instance file by means of an OTPEXT rule.

During the testing of the SWAD-Europe mapping tool in early June 2006, several of its functions did not work properly, such as the drag ‘n’ drop option for graphical mapping or the highlighting of the datatype and object properties of an OWL class. Consequently, we could not evaluate the mapping tool for our test case of CapeML file and OntoCAPE ontology.