Schemas tutorials

How to perform an XML validation regarding a schema, how to express algorithmic rules and co-occurrence constraints in active schemas, how to design a semantic data type. Most things that can't be achieved with DTDs, Relax NG schemas, W3C XML Schema schemas, and sometimes neither Schematron are showned here : if you want more control on your XML documents, it is the right place...

About XML validation

This section is dedicated to validation with the Active Schema Language. If you have to validate with a DTD or a W3C XML Schema, please refer to DTD validation. If you have to validate with Relax NG or Schematron, you ought to implement your own active tag ; see the "how-to" section for that purpose.

XML validation with Active Schema (co-occurrence constraint on table columns)

The idea of the Active Schema Language is to be more expressive than other schema technologies (DTD, W3C XML Schema, Relax NG). For this purpose, ASL allows to define dynamic (active) content models, has made occurrence boundaries computable, and allows to mix other modules in the schema if necessary, for example in order to get with SQL the values authorized within an attribute from a RDBMS. ASL also offers a smart support for datatypes, as shown in this other example.

In this example, there is a simple constraint that we wish to apply on <table>s : the number of <cell>s must be the same in all <column>s. This simple constraint can't be expressed in any schema technology listed above. We are showing how to define a simple schema that express this constraint, and how to display an error report.

  • The sample datas
  • The schema :

    [doc/tutorial/schema/co-oc/schema.asl]

    <?xml version="1.0" encoding="iso-8859-1"?>
    <asl:active-schema target="" xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:asl="http://ns.inria.org/active-schema"> <!-- Testing remote co-occurrence constraint : checking that the number of <cell>s is the same in any <column> of a <table> --> <asl:element name="test" root="always"> <asl:attribute ref-ns="xml" min-occurs="0"/> <asl:sequence> <asl:element ref-elem="table" min-occurs="0" max-occurs="unbounded"/> </asl:sequence> </asl:element> <asl:element name="table"> <asl:sequence> <asl:element ref-elem="column" min-occurs="1" max-occurs="unbounded"/> </asl:sequence> </asl:element> <asl:element name="column"> <asl:sequence> <xcl:if test="{ asl:element()/preceding-sibling::column }"> <xcl:then> <asl:element ref-elem="cell" min-occurs="{ $asl:max-occurs }"
    max-occurs="{ count( asl:element()/../column[1]/cell ) }"/> </xcl:then> <xcl:else> <asl:element ref-elem="cell" min-occurs="1" max-occurs="unbounded"/> </xcl:else> </xcl:if> </asl:sequence> </asl:element> </asl:active-schema>
    This <asl:active-schema> declares some <asl:element>s. Each defines a simple content model with a <asl:sequence>, but the last content model is a dynamic content model : according to the presence of a previous <column>, we simply switch from one content model to another. By doing this, the first <column> will define the number of <cell>s allowed in the others.
  • The active sheet that reports validation errors :

    [doc/tutorial/schema/co-oc/validate.xcl]

    <?xml version="1.0" encoding="iso-8859-1"?>
    <!-- A validator that validates XML documents with an Active Schema, and display the errors encountered --> <xcl:active-sheet xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:asl="http://ns.inria.org/active-schema"> <xcl:parse name="datas" source="datas.xml"/> <asl:parse-schema name="schema" source="schema.asl"/> <asl:validate schema="{ $schema }" node="{ $datas }" deep="yes" report="report"/> <!--display the errors--> <xcl:for-each name="err" select="{ $report }"> <xcl:echo value="Error { name( value( $err/@reason-id ) ) }"/> <xcl:echo value=" node : { value( $err/@path ) }"/> <xcl:echo value=" candidate : { value( $err/@candidate-path ) }"
    xcl:if="{ $err/@candidate-path }"/> <xcl:echo value=" candidate value : { string( $err/@candidate ) }"
    xcl:if="{ not( $err/@candidate-path ) and value( $err/@candidate ) }"/> <xcl:echo value=" rule : { value( $err/@rule-path ) }"/> </xcl:for-each> </xcl:active-sheet>
  • Open a console and at the prompt, type the following command from the RefleX home directory (note that the (line cut) (line cut) icon means that you MUST NOT insert a line break) :
     $ java -jar reflex-0.4.0.jar (line cut)
         run doc/tutorial/schema/co-oc/validate.xcl
  • The error report displayed :
    Error asl:elementExpected
            node : /test[1]/table[3]/column[2]
            rule : /asl:active-schema[1]/asl:element[3]/asl:sequence[1]/xcl:if[1]/xcl:then[1]/asl:element[1]
    Error asl:noMoreContentAllowed
            node : /test[1]/table[4]/column[3]
       candidate : /test[1]/table[4]/column[3]/cell[3]
            rule : /asl:active-schema[1]/asl:element[3]
    Error asl:elementExpected
            node : /test[1]/table[4]/column[4]
            rule : /asl:active-schema[1]/asl:element[3]/asl:sequence[1]/xcl:if[1]/xcl:then[1]/asl:element[1]
    Error asl:elementExpected
            node : /test[1]/table[5]
            rule : /asl:active-schema[1]/asl:element[2]/asl:sequence[1]/asl:element[1]
    Error asl:noMoreContentAllowed
            node : /test[1]/table[5]
       candidate : /test[1]/table[5]/cell[1]
            rule : /asl:active-schema[1]/asl:element[2]
    Error asl:noMoreContentAllowed
            node : /test[1]/table[5]
       candidate : /test[1]/table[5]/cell[2]
            rule : /asl:active-schema[1]/asl:element[2]
Error report

In this version of RefleX, getting an error report is not obvious ; in a future release, more suitable objects will be designed to get formatted messages.

Co-occurrence constraint on classification level

Schematron was designed for filling the gap due to the weakness of other schema technologies (DTD, W3C XML Schema, Relax NG). Unfortunately, Schematron express additional assertions out of the context of the grammar validation ; that is to say that within an XML editor, the user will be able to insert an element that Schematron will refuse. ASL on the opposite act directly on the content model.

In this example from www.xfront.com, for the instance document to be valid, the classification level of the <Para> elements must not be more permissive than the classification level of the host <Document> element. Instead of using 2 schema technologies, ASL does the job straightfully :

  • The sample datas :

    [doc/tutorial/schema/classification/classification.xml]

    <?xml version="1.0" encoding="iso-8859-1"?>
    <Set> <!--valid--> <Document classification="secret"> <Para classification="unclassified"> One if by land, two if by sea; </Para> <Para classification="confidential"> And I on the opposite shore will be, Ready to ride and spread the alarm </Para> <Para classification="unclassified"> Ready to ride and spread the alarm Through every Middlesex, village and farm, </Para> <Para classification="secret"> For the country folk to be up and to arm. </Para> </Document> <!--invalid--> <Document classification="confidential"> <Para classification="unclassified"> One if by land, two if by sea; </Para> <Para classification="confidential"> And I on the opposite shore will be, Ready to ride and spread the alarm </Para> <Para classification="secret"> Ready to ride and spread the alarm Through every Middlesex, village and farm, </Para> <Para classification="top-secret"> For the country folk to be up and to arm. </Para> </Document> </Set>
  • The schema :

    [doc/tutorial/schema/classification/schema.asl]

    <?xml version="1.0" encoding="iso-8859-1"?>
    <asl:active-schema target="" xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:asl="http://ns.inria.org/active-schema" xmlns:xs="http://www.w3.org/2001/XMLSchema-datatypes"> <!-- Testing co-occurrence constraint : checking that a classification level is not higher than the enclosing classification level an integer value is bound to the attribute --> <asl:element name="Set" root="always"> <asl:attribute ref-ns="xml" min-occurs="0"/> <asl:sequence> <asl:element ref-elem="Document" min-occurs="1" max-occurs="unbounded"/> </asl:sequence> </asl:element> <asl:element name="Document"> <asl:attribute name="classification" ref-type="classification"/> <asl:sequence> <asl:element ref-elem="Para" min-occurs="1" max-occurs="unbounded"/> </asl:sequence> </asl:element> <asl:element name="Para"> <asl:attribute name="classification" ref-type="classification"/> <asl:sequence> <asl:text ref-type="xs:string" min-occurs="0"/> </asl:sequence> </asl:element> <asl:type name="classification"> <asl:choice min-occurs="0"> <asl:text
    xcl:if="{ asl:element()/self::Document or asl:element()/parent::Document/@classification <= 1 }"
    ignore="yes" min-occurs="0" value="top-secret"> <asl:interim> <xcl:update referent="{ $asl:data }" operand="1"/> </asl:interim> </asl:text> <asl:text
    xcl:if="{ asl:element()/self::Document or asl:element()/parent::Document/@classification <= 2 }"
    ignore="yes" min-occurs="0" value="secret"> <asl:interim> <xcl:update referent="{ $asl:data }" operand="2"/> </asl:interim> </asl:text> <asl:text
    xcl:if="{ asl:element()/self::Document or asl:element()/parent::Document/@classification <= 3 }"
    ignore="yes" min-occurs="0" value="confidential"> <asl:interim> <xcl:update referent="{ $asl:data }" operand="3"/> </asl:interim> </asl:text> <asl:text ignore="yes" min-occurs="0" value="unclassified"> <asl:interim> <xcl:update referent="{ $asl:data }" operand="4"/> </asl:interim> </asl:text> </asl:choice> </asl:type> </asl:active-schema>
    This <asl:active-schema> declares some <asl:element>s. Each defines a simple content model with a <asl:sequence> and with <asl:attribute> definitions ; one of them refer to a custom type named #classification, which is defined thanks to the <asl:type> element.
    The definition of the #classification type consist on :
    • drawing up the list of the 4 labels available ("top-secret", "secret", "confidential", "unclassified") with the <asl:text> element
    • binding each label to an integer value (1..4) ; this is done when the text matches the attributes with the <asl:interim> and <xcl:update> elements
    • for a <Para> element, checking whether the level of the parent <Document> element is compatible ; if not, the label is not available in the list (the <asl:text> element is skipped) ; this is done with the @xcl:if attribute ; notice that the @ignore attribute is not involved in this behaviour : it just indicates when the text is matched to ignore the text value when building the typed data bound to the attribute.
  • The active sheet that reports validation errors :

    [doc/tutorial/schema/classification/validate.xcl]

    <?xml version="1.0" encoding="iso-8859-1"?>
    <xcl:active-sheet xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:sys="http://ns.inria.org/active-tags/sys" xmlns:asl="http://ns.inria.org/active-schema"> <asl:parse-schema name="schema" source="schema.asl"/> <xcl:parse name="datas" source="classification.xml"/> <asl:validate schema="{ $schema }" node="{ $datas }" augment="yes" deep="yes" report="report"/> <xcl:for-each name="err" select="{ $report }"> <xcl:echo value="Error { name( value( $err/@reason-id ) ) }"/> <xcl:echo value=" node : { value( $err/@path ) }"/> <xcl:echo value=" candidate : { value( $err/@candidate-path ) }"
    xcl:if="{ $err/@candidate-path }"/> <xcl:echo value=" candidate value : { string( $err/@candidate ) }"
    xcl:if="{ $err/@candidate }"/> <xcl:echo value=" rule : { value( $err/@rule-path ) }"/> </xcl:for-each> </xcl:active-sheet>
  • To run the script from the console prompt :
     $ java -jar reflex-0.4.0.jar (line cut)
         run doc/tutorial/schema/classification/validate.xcl
  • The error report displayed :
    As the classification level in the second <Document> element is "confidential", the text values available in the enclosed <Para> elements are only "confidential" and "unclassified", other text values are rejected as shown below :
    Error asl:badAttributeValue
                 node : /Set[1]/Document[2]/Para[3]
            candidate : /Set[1]/Document[2]/Para[3]/@classification
      candidate value : secret
                 rule : /asl:active-schema[1]/asl:element[3]/asl:attribute[1]
    Error asl:badAttributeValue
                 node : /Set[1]/Document[2]/Para[4]
            candidate : /Set[1]/Document[2]/Para[4]/@classification
      candidate value : top-secret
                 rule : /asl:active-schema[1]/asl:element[3]/asl:attribute[1]
    

Expressing algorithmics rules with Active Schema

Once again, a single Active Schema can produce a content model dynamically without the help of Schematron. Simple and more efficient.

In this example from www.xfront.com, for the instance document to be valid, the sum of the <Candidate> values must be 100. Instead of using Schematron, the expected content model is built with ASL :

  • The sample datas :

    [doc/tutorial/schema/vote-count/datas-total-100.xml]

    <?xml version="1.0" encoding="iso-8859-1"?>
    <ElectionResultsByPercentage> <Candidate name="John">61</Candidate> <Candidate name="Sara">24</Candidate> <Candidate name="Bill">15</Candidate> </ElectionResultsByPercentage>
  • The schema :

    [doc/tutorial/schema/vote-count/schema.asl]

    <?xml version="1.0" encoding="iso-8859-1"?>
    <asl:active-schema target="" xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:asl="http://ns.inria.org/active-schema" xmlns:xs="http://www.w3.org/2001/XMLSchema-datatypes"> <!-- Testing algorithmic validation : checking that the sum of the Candidate values must be 100. --> <asl:element name="ElectionResultsByPercentage" root="always"> <asl:attribute ref-ns="xml" min-occurs="0"/> <asl:sequence> <xcl:set name="total" value="{ sum( asl:element()/Candidate ) }"/> <asl:element ref-elem="Candidate"
    min-occurs="{ count( asl:element()/Candidate ) + number( $total < 100 ) }"
    max-occurs="{ count( asl:element()/Candidate ) - number( $total > 100 ) }"/> </asl:sequence> </asl:element> <asl:element name="Candidate"> <asl:attribute name="name" ref-type="xs:string"/> <asl:sequence> <asl:text ref-type="xs:int"/> </asl:sequence> </asl:element> </asl:active-schema>
    This time the occurrence boundaries will adjust themselves if the total is greater or lower than 100.
  • The active sheet that reports validation errors.
  • To run the script from the console prompt :
     $ java -Dn=100 -jar reflex-0.4.0.jar (line cut)
         run doc/tutorial/schema/vote-count/validate.xcl
    The parameter selects the input document where the total is that number. With 100, the validation shouldn't fail. 2 other documents are supplied for producing validations errors ; type 95 or 105 instead of 100 in the console.

Playing with semantic datatypes and PSVI

This example shows how to define a semantic type with the Active Schema Language and uses it to augment the amount of informations of an XML document. Then, the datas are sorted regarding the typed datas.

There are 3 variant for this type in this tutorial :

  1. a simple semantic type
  2. a simple semantic type which relies on dispatched datas
  3. a polymorphic semantic type that uses both previous types

Batch script

This script...

[doc/tutorial/schema/weather-report/weather-report.xcl]

<?xml version="1.0" encoding="iso-8859-1"?>
<xcl:active-sheet xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:sys="http://ns.inria.org/active-tags/sys" xmlns:asl="http://ns.inria.org/active-schema"> <!-- invoke this active sheet with the system property "n" which can be "1", "2", or "3" --> <xcl:parse name="wr" source="wr{ string( $sys:env/n ) }.xml"/> <asl:parse-schema name="schema" source="schema{ string( $sys:env/n ) }.asl"/> <asl:validate schema="{ $schema }" node="{ $wr }" augment="yes" deep="yes"/> <xcl:echo value="List of towns, sorted in temperature order."/> <xcl:for-each name="town" select="{ xcl:sort( $wr/*/town, @temp ) }"> <xcl:echo value="{ $town/@temp }{ $town/@scale } { $town/@name } { $town/@date }"/> </xcl:for-each> <xcl:echo value="( °C as well as °F are sorted correctly )"/> </xcl:active-sheet>

...simply sorts a list of towns...

[doc/tutorial/schema/weather-report/wr1.xml]

<?xml version="1.0" encoding="iso-8859-1"?>
<weather-report> <town name="Paris" date="2005/09/09" temp="19°C"/> <town name="Paris" date="2005/09/08" temp="22°C"/> <town name="Vladivostok" date="2005/09/09" temp="32°F"/><!-- 32°F = 0°C --> <town name="Paris" date="2005/09/07" temp="23°C"/> <town name="London" date="2005/09/08" temp="68°F"/><!-- 68°F = 20°C --> </weather-report>

...according to their temperatures, which are expressed in °C as well as in °F. The type that allows to sort a set of temperatures in °C and °F is defined in the Active Schema below :

[doc/tutorial/schema/weather-report/schema1.asl]

<?xml version="1.0" encoding="iso-8859-1"?>
<asl:active-schema target="" xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:asl="http://ns.inria.org/active-schema" xmlns:xs="http://www.w3.org/2001/XMLSchema-datatypes"> <asl:type name="temperature" base="xs:int" init="{.}"> <asl:sequence> <asl:text ignore="yes" min-occurs="0" value=" "/> </asl:sequence> <asl:choice> <asl:text ignore="yes" min-occurs="0" value="°C"/> <asl:text ignore="yes" min-occurs="0" value="°F"> <asl:interim> <xcl:update referent="{ $asl:data }" operand="{ (value( . ) - 32) * 5 div 9 }"/> </asl:interim> </asl:text> </asl:choice> </asl:type> <asl:element name="weather-report" root="always"> <asl:sequence> <asl:element ref-elem="town" min-occurs="1" max-occurs="unbounded"/> </asl:sequence> </asl:element> <asl:element name="town"> <asl:attribute name="name" ref-type="xs:string"/> <asl:attribute name="date" ref-type="xs:date"/> <asl:attribute name="temp" ref-type="temperature"/> </asl:element> </asl:active-schema>

This <asl:active-schema> declares a <asl:type> and 2 <asl:element>s. Each declaration is made of <asl:sequence>, <asl:choice>, <asl:attribute> references, and pieces of <asl:text>.
The <asl:interim> step is invoked when its host element matches a given text ; this step updates the data model that is built while parsing the attribute value : the $asl:data property contains the current typed data, based on an #xs:int. The typed data will be added to each attribute of this type (the attribute is kept unchanged : the typed data is just bound to the attribute).

In the script that validates the XML document and sort the result, the xcl:sort() function will sort the temperatures regarding the typed datas rather than the values of the attributes, because the "augmentation" feature has been explicitely set while validating.

To run the script from the console prompt :

 $ java -Dn=1 -jar reflex-0.4.0.jar (line cut)
     run doc/tutorial/schema/weather-report/weather-report.xcl

...which produces this result :

List of towns, sorted in temperature order.
32°F Vladivostok 2005/09/09
19°C Paris 2005/09/09
68°F London 2005/09/08
22°C Paris 2005/09/08
23°C Paris 2005/09/07
( °C as well as °F are sorted correctly )

Try to change some values in the XML source, for example with 68°F=20°C...

See also :

Variant : trying another format

The XML raw datas are in the following format :

  <town date="2005/09/09" name="Paris" temp="21°C"/>

Try to define by your own a data type that behaves like those above but that can handle the following format :

  <town date="2005/09/09" name="Paris" scale="°F" temp="21"/>

Using the asl:element() function that refers to the current element being processing may help.

Solution : this is the schema that can check these XML datas. Trying the solution from the RefleX home directory :

 $ java -Dn=2 -jar reflex-0.4.0.jar (line cut)
     run doc/tutorial/schema/weather-report/weather-report.xcl

Variant : trying both format simultaneously !

Do you think that a data type could be defined to react indifferently on both types ?

  <town date="2005/09/09" name="Paris" temp="21°C"/>
  <town date="2005/09/08" name="Paris" scale="°C" temp="22"/>

The Active Schema Language also support polymorphism... Try by yourself to define such a schema.

Solution : this is the schema that can check these XML datas. Trying the solution from the RefleX home directory :

 $ java -Dn=3 -jar reflex-0.4.0.jar (line cut)
     run doc/tutorial/schema/weather-report/weather-report.xcl

Real polymorphism with an hybrid format

In fact, in the previous example, the type defined is testing if the @scale attribute is present or not, but to make a real polymorphic type, we'd rather define 2 other types :

  • #temperature-with-scale : the scale is in the text that follows the int value
  • #temperature-without-scale : the scale is inside an other attribute

...then, we simply defined our #temperature type as a choice of one of the 2 previous.

Solution : this is the schema that can check these XML datas. Trying the solution from the RefleX home directory :

 $ java -Dn=4 -jar reflex-0.4.0.jar (line cut)
     run doc/tutorial/schema/weather-report/weather-report.xcl
Data preservation

Notice that in all the above examples, the XML source document is kept as-is ; the value of the attribute is not updated (but it could) : the typed data is just bound to the attribute as an additional information.

Actually, it is not recommanded for a typed data to update the XML tree it belongs, because if the underlying type is used in a composite type which has a part that fails, the tree might be updated whereas the type is proved to be not applicable.


This tutorial demonstrates the ability of ASL to define semantic data types in order to use them in an application (actually sorting the towns according to their temperature).