Schemas tutorials
How to perform an XML validation regarding a schema, how to express algorithmic rules and co-occurrence constraints in active schemas, how to design a semantic data type. Most things that you can't achieve with your DTDs, Relax NG schemas, W3C XML Schema schemas, and sometimes neither Schematron are showned here : if you want more control on your XML documents, it is the right place...
- XML validation with Active Schema (co-occurrence constraint on table columns)
- Co-occurrence constraint on classification level
- Expressing algorithmics rules with Active Schema
- Playing with semantic datatypes and PSVI
About XML validation
This section is dedicated to validation with the . If you have to validate with a DTD or a W3C XML Schema, please refer to DTD validation. If you have to validate with Relax NG or Schematron, you ought to implement your own active tag ; see the "how-to" section for that purpose.
XML validation with Active Schema (co-occurrence constraint on table columns)
The idea of the Active Schema Language is to be more expressive than other schema technologies (DTD, W3C XML Schema, Relax NG). For this purpose, ASL allows to define dynamic (active) content models, has made occurrence boundaries computable, and allows to mix other modules in the schema if necessary, for example in order to get with SQL the values authorized within an attribute from a RDBMS. ASL also offers a smart support for datatypes, as shown in this other example.
In this example, there is a simple constraint that we wish to apply on <table>s : the number of <cell>s must be the same in all <column>s. This simple constraint can't be expressed in any schema technology listed above. We are showing how to define a simple schema that express this constraint, and how to display an error report.
- The sample datas
- The schema :
[doc/tutorial/schema/co-oc/schema.asl]
This <asl:active-schema> declares some <asl:element>s. Each defines a simple content model with a <asl:sequence>, but the last content model is a dynamic content model : according to the presence of a previous <column>, we simply switch from one content model to another. By doing this, the first <column> will define the number of <cell>s allowed in the others.<?xml version="1.0" encoding="iso-8859-1"?>
<asl:active-schema target="" xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:asl="http://ns.inria.org/active-schema"> <!-- Testing remote co-occurrence constraint : checking that the number of <cell>s is the same in any <column> of a <table> --> <asl:element name="test" root="always"> <asl:attribute ref-ns="xml" min-occurs="0"/> <asl:sequence> <asl:element ref-elem="table" min-occurs="0" max-occurs="unbounded"/> </asl:sequence> </asl:element> <asl:element name="table"> <asl:sequence> <asl:element ref-elem="column" min-occurs="1" max-occurs="unbounded"/> </asl:sequence> </asl:element> <asl:element name="column"> <asl:sequence> <xcl:if test="{ asl:element()/preceding-sibling::column }"> <xcl:then> <asl:element ref-elem="cell" min-occurs="{ $asl:max-occurs }"
max-occurs="{ count( asl:element()/../column[1]/cell ) }"/> </xcl:then> <xcl:else> <asl:element ref-elem="cell" min-occurs="1" max-occurs="unbounded"/> </xcl:else> </xcl:if> </asl:sequence> </asl:element> </asl:active-schema> - The active sheet that reports validation errors :
[doc/tutorial/schema/co-oc/validate.xcl]
<?xml version="1.0" encoding="iso-8859-1"?>
<!-- A validator that validates XML documents with an Active Schema, and display the errors encountered --> <xcl:active-sheet xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:asl="http://ns.inria.org/active-schema"> <xcl:parse name="datas" source="datas.xml"/> <asl:parse-schema name="schema" source="schema.asl"/> <asl:validate schema="{ $schema }" node="{ $datas }" deep="yes" report="report"/> <!--display the errors--> <xcl:for-each name="err" select="{ $report }"> <xcl:echo value="Error { name( value( $err/@reason-id ) ) }"/> <xcl:echo value=" node : { value( $err/@path ) }"/> <xcl:echo value=" candidate : { value( $err/@candidate-path ) }"
xcl:if="{ $err/@candidate-path }"/> <xcl:echo value=" candidate value : { string( $err/@candidate ) }"
xcl:if="{ not( $err/@candidate-path ) and value( $err/@candidate ) }"/> <xcl:echo value=" rule : { value( $err/@rule-path ) }"/> </xcl:for-each> </xcl:active-sheet> - Open a console and at the prompt, type the following command from the home directory
(note that the
(line cut) icon means that you MUST NOT insert a line break) :
$ java -jar reflex-0.3.2.jar
run doc/tutorial/schema/co-oc/validate.xcl - The error report displayed :
Error asl:elementExpected node : /test[1]/table[3]/column[2] rule : /asl:active-schema[1]/asl:element[3]/asl:sequence[1]/xcl:if[1]/xcl:then[1]/asl:element[1] Error asl:noMoreContentAllowed node : /test[1]/table[4]/column[3] candidate : /test[1]/table[4]/column[3]/cell[3] rule : /asl:active-schema[1]/asl:element[3] Error asl:elementExpected node : /test[1]/table[4]/column[4] rule : /asl:active-schema[1]/asl:element[3]/asl:sequence[1]/xcl:if[1]/xcl:then[1]/asl:element[1] Error asl:elementExpected node : /test[1]/table[5] rule : /asl:active-schema[1]/asl:element[2]/asl:sequence[1]/asl:element[1] Error asl:noMoreContentAllowed node : /test[1]/table[5] candidate : /test[1]/table[5]/cell[1] rule : /asl:active-schema[1]/asl:element[2] Error asl:noMoreContentAllowed node : /test[1]/table[5] candidate : /test[1]/table[5]/cell[2] rule : /asl:active-schema[1]/asl:element[2]
Error report
In this version of , getting an error report is not obvious ; in a future release, more suitable objects will be designed to get formatted messages.
Co-occurrence constraint on classification level
Schematron was designed for filling the gap due to the weakness of other schema technologies (DTD, W3C XML Schema, Relax NG). Unfortunately, Schematron express additional assertions out of the context of the grammar validation ; that is to say that within an XML editor, the user will be able to insert an element that Schematron will refuse. ASL on the opposite act directly on the content model.
In this example from www.xfront.com, for the instance document to be valid, the classification level of the <Para> elements must not be more permissive than the classification level of the host <Document> element. Instead of using 2 schema technologies, ASL does the job straightfully :
- The sample datas :
[doc/tutorial/schema/classification/classification.xml]
<?xml version="1.0" encoding="iso-8859-1"?>
<Set> <!--valid--> <Document classification="secret"> <Para classification="unclassified"> One if by land, two if by sea; </Para> <Para classification="confidential"> And I on the opposite shore will be, Ready to ride and spread the alarm </Para> <Para classification="unclassified"> Ready to ride and spread the alarm Through every Middlesex, village and farm, </Para> <Para classification="secret"> For the country folk to be up and to arm. </Para> </Document> <!--invalid--> <Document classification="confidential"> <Para classification="unclassified"> One if by land, two if by sea; </Para> <Para classification="confidential"> And I on the opposite shore will be, Ready to ride and spread the alarm </Para> <Para classification="secret"> Ready to ride and spread the alarm Through every Middlesex, village and farm, </Para> <Para classification="top-secret"> For the country folk to be up and to arm. </Para> </Document> </Set> - The schema :
[doc/tutorial/schema/classification/schema.asl]
This <asl:active-schema> declares some <asl:element>s. Each defines a simple content model with a <asl:sequence> and with <asl:attribute> definitions ; one of them refer to a custom type named #classification, which is defined thanks to the <asl:type> element.<?xml version="1.0" encoding="iso-8859-1"?>
<asl:active-schema target="" xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:asl="http://ns.inria.org/active-schema" xmlns:xs="http://www.w3.org/2001/XMLSchema-datatypes"> <!-- Testing co-occurrence constraint : checking that a classification level is not higher than the enclosing classification level an integer value is bound to the attribute --> <asl:element name="Set" root="always"> <asl:attribute ref-ns="xml" min-occurs="0"/> <asl:sequence> <asl:element ref-elem="Document" min-occurs="1" max-occurs="unbounded"/> </asl:sequence> </asl:element> <asl:element name="Document"> <asl:attribute name="classification" ref-type="classification"/> <asl:sequence> <asl:element ref-elem="Para" min-occurs="1" max-occurs="unbounded"/> </asl:sequence> </asl:element> <asl:element name="Para"> <asl:attribute name="classification" ref-type="classification"/> <asl:sequence> <asl:text ref-type="xs:string" min-occurs="0"/> </asl:sequence> </asl:element> <asl:type name="classification"> <asl:choice min-occurs="0"> <asl:text
xcl:if="{ asl:element()/self::Document or asl:element()/parent::Document/@classification <= 1 }"
ignore="yes" min-occurs="0" value="top-secret"> <asl:interim> <xcl:update referent="{ $asl:data }" operand="1"/> </asl:interim> </asl:text> <asl:text
xcl:if="{ asl:element()/self::Document or asl:element()/parent::Document/@classification <= 2 }"
ignore="yes" min-occurs="0" value="secret"> <asl:interim> <xcl:update referent="{ $asl:data }" operand="2"/> </asl:interim> </asl:text> <asl:text
xcl:if="{ asl:element()/self::Document or asl:element()/parent::Document/@classification <= 3 }"
ignore="yes" min-occurs="0" value="confidential"> <asl:interim> <xcl:update referent="{ $asl:data }" operand="3"/> </asl:interim> </asl:text> <asl:text ignore="yes" min-occurs="0" value="unclassified"> <asl:interim> <xcl:update referent="{ $asl:data }" operand="4"/> </asl:interim> </asl:text> </asl:choice> </asl:type> </asl:active-schema>
The definition of the #classification type consist on :- drawing up the list of the 4 labels available ("top-secret", "secret", "confidential", "unclassified") with the <asl:text> element
- binding each label to an integer value (1..4) ; this is done when the text matches the attributes with the <asl:interim> and <xcl:update> elements
- for a <Para> element, checking whether the level of the parent <Document> element is compatible ; if not, the label is not available in the list (the <asl:text> element is skipped) ; this is done with the @xcl:if attribute ; notice that the @ignore attribute is not involved in this behaviour : it just indicates when the text is matched to ignore the text value when building the typed data bound to the attribute.
- The active sheet that reports validation errors :
[doc/tutorial/schema/classification/validate.xcl]
<?xml version="1.0" encoding="iso-8859-1"?>
<xcl:active-sheet xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:sys="http://ns.inria.org/active-tags/sys" xmlns:asl="http://ns.inria.org/active-schema"> <asl:parse-schema name="schema" source="schema.asl"/> <xcl:parse name="datas" source="classification.xml"/> <asl:validate schema="{ $schema }" node="{ $datas }" augment="yes" deep="yes" report="report"/> <xcl:for-each name="err" select="{ $report }"> <xcl:echo value="Error { name( value( $err/@reason-id ) ) }"/> <xcl:echo value=" node : { value( $err/@path ) }"/> <xcl:echo value=" candidate : { value( $err/@candidate-path ) }"
xcl:if="{ $err/@candidate-path }"/> <xcl:echo value=" candidate value : { string( $err/@candidate ) }"
xcl:if="{ $err/@candidate }"/> <xcl:echo value=" rule : { value( $err/@rule-path ) }"/> </xcl:for-each> </xcl:active-sheet> - To run the script from the console prompt :
$ java -jar reflex-0.3.2.jar
run doc/tutorial/schema/classification/validate.xcl - The error report displayed :
As the classification level in the second <Document> element is "confidential", the text values available in the enclosed <Para> elements are only "confidential" and "unclassified", other text values are rejected as shown below :Error asl:badAttributeValue node : /Set[1]/Document[2]/Para[3] candidate : /Set[1]/Document[2]/Para[3]/@classification candidate value : secret rule : /asl:active-schema[1]/asl:element[3]/asl:attribute[1] Error asl:badAttributeValue node : /Set[1]/Document[2]/Para[4] candidate : /Set[1]/Document[2]/Para[4]/@classification candidate value : top-secret rule : /asl:active-schema[1]/asl:element[3]/asl:attribute[1]
Expressing algorithmics rules with Active Schema
Once again, a single can produce a content model dynamically without the help of Schematron. Simple and more efficient.
In this example from www.xfront.com, for the instance document to be valid, the sum of the <Candidate> values must be 100. Instead of using Schematron, the expected content model is built with ASL :
- The sample datas :
[doc/tutorial/schema/vote-count/datas-total-100.xml]
<?xml version="1.0" encoding="iso-8859-1"?>
<ElectionResultsByPercentage> <Candidate name="John">61</Candidate> <Candidate name="Sara">24</Candidate> <Candidate name="Bill">15</Candidate> </ElectionResultsByPercentage> - The schema :
[doc/tutorial/schema/vote-count/schema.asl]
This time the occurrence boundaries will adjust themselves if the total is greater or lower than 100.<?xml version="1.0" encoding="iso-8859-1"?>
<asl:active-schema target="" xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:asl="http://ns.inria.org/active-schema" xmlns:xs="http://www.w3.org/2001/XMLSchema-datatypes"> <!-- Testing algorithmic validation : checking that the sum of the Candidate values must be 100. --> <asl:element name="ElectionResultsByPercentage" root="always"> <asl:attribute ref-ns="xml" min-occurs="0"/> <asl:sequence> <xcl:set name="total" value="{ sum( asl:element()/Candidate ) }"/> <asl:element ref-elem="Candidate"
min-occurs="{ count( asl:element()/Candidate ) + number( $total < 100 ) }"
max-occurs="{ count( asl:element()/Candidate ) - number( $total > 100 ) }"/> </asl:sequence> </asl:element> <asl:element name="Candidate"> <asl:attribute name="name" ref-type="xs:string"/> <asl:sequence> <asl:text ref-type="xs:int"/> </asl:sequence> </asl:element> </asl:active-schema> - The active sheet that reports validation errors.
- To run the script from the console prompt :
$ java -Dn=100 -jar reflex-0.3.2.jar
The parameter selects the input document where the total is that number. With 100, the validation shouldn't fail. 2 other documents are supplied for producing validations errors ; type 95 or 105 instead of 100 in the console.
run doc/tutorial/schema/vote-count/validate.xcl
Playing with semantic datatypes and PSVI
This example shows how to define a semantic type with the and uses it to augment the amount of informations of an XML document. Then, the datas are sorted regarding the typed datas.
There are 3 variant for this type in this tutorial :
- a simple semantic type
- a simple semantic type which relies on dispatched datas
- a polymorphic semantic type that uses both previous types
Batch script
This script...
[doc/tutorial/schema/weather-report/weather-report.xcl]
<?xml version="1.0" encoding="iso-8859-1"?>
<xcl:active-sheet xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:sys="http://ns.inria.org/active-tags/sys" xmlns:asl="http://ns.inria.org/active-schema"> <!-- invoke this active sheet with the system property "n" which can be "1", "2", or "3" --> <xcl:parse name="wr" source="wr{ string( $sys:env/n ) }.xml"/> <asl:parse-schema name="schema" source="schema{ string( $sys:env/n ) }.asl"/> <asl:validate schema="{ $schema }" node="{ $wr }" augment="yes" deep="yes"/> <xcl:echo value="List of towns, sorted in temperature order."/> <xcl:for-each name="town" select="{ xcl:sort( $wr/*/town, @temp ) }"> <xcl:echo value="{ $town/@temp }{ $town/@scale } { $town/@name } { $town/@date }"/> </xcl:for-each> <xcl:echo value="( °C as well as °F are sorted correctly )"/> </xcl:active-sheet>
...simply sorts a list of towns...
[doc/tutorial/schema/weather-report/wr1.xml]
<?xml version="1.0" encoding="iso-8859-1"?>
<weather-report> <town name="Paris" date="2005/09/09" temp="19°C"/> <town name="Paris" date="2005/09/08" temp="22°C"/> <town name="Vladivostok" date="2005/09/09" temp="32°F"/><!-- 32°F = 0°C --> <town name="Paris" date="2005/09/07" temp="23°C"/> <town name="London" date="2005/09/08" temp="68°F"/><!-- 68°F = 20°C --> </weather-report>
...according to their temperatures, which are expressed in °C as well as in °F. The type that allows to sort a set of temperatures in °C and °F is defined in the Active Schema below :
[doc/tutorial/schema/weather-report/schema1.asl]
<?xml version="1.0" encoding="iso-8859-1"?>
<asl:active-schema target="" xmlns:xcl="http://ns.inria.org/active-tags/xcl" xmlns:asl="http://ns.inria.org/active-schema" xmlns:xs="http://www.w3.org/2001/XMLSchema-datatypes"> <asl:type name="temperature" base="xs:int" init="{.}"> <asl:sequence> <asl:text ignore="yes" min-occurs="0" value=" "/> </asl:sequence> <asl:choice> <asl:text ignore="yes" min-occurs="0" value="°C"/> <asl:text ignore="yes" min-occurs="0" value="°F"> <asl:interim> <xcl:update referent="{ $asl:data }" operand="{ (value( . ) - 32) * 5 div 9 }"/> </asl:interim> </asl:text> </asl:choice> </asl:type> <asl:element name="weather-report" root="always"> <asl:sequence> <asl:element ref-elem="town" min-occurs="1" max-occurs="unbounded"/> </asl:sequence> </asl:element> <asl:element name="town"> <asl:attribute name="name" ref-type="xs:string"/> <asl:attribute name="date" ref-type="xs:string"/> <!-- it should be xs:date but it is not yet implemented :) --> <asl:attribute name="temp" ref-type="temperature"/> </asl:element> </asl:active-schema>
This <asl:active-schema> declares a
<asl:type> and 2 <asl:element>s.
Each declaration is made of <asl:sequence>,
<asl:choice>, <asl:attribute>
references, and pieces of <asl:text>.
The <asl:interim> step is invoked when its
host element matches a given text ; this step updates the data model that
is built while parsing the attribute value :
the $asl:data property contains the current
typed data, based on an #xs:int.
The typed data will be added to each attribute of this type
(the attribute is kept unchanged : the typed data is just bound to the attribute).
In the script that validates the XML document and sort the result, the xcl:sort() function will sort the temperatures regarding the typed datas rather than the values of the attributes, because the "augmentation" feature has been explicitely set while validating.
To run the script from the console prompt :
$ java -Dn=1 -jar reflex-0.3.2.jarrun doc/tutorial/schema/weather-report/weather-report.xcl
...which produces this result :
List of towns, sorted in temperature order. 32°F Vladivostok 2005/09/09 19°C Paris 2005/09/09 68°F London 2005/09/08 22°C Paris 2005/09/08 23°C Paris 2005/09/07 ( °C as well as °F are sorted correctly )
Try to change some values in the XML source, for example with 68°F=20°C...
See also :
Variant : trying another format
The XML raw datas are in the following format :
<town date="2005/09/09" name="Paris" temp="21°C"/>
Try to define by your own a data type that behaves like those above but that can handle the following format :
<town date="2005/09/09" name="Paris" scale="°F" temp="21"/>
Using the asl:element() function that refers to the current element being processing may help.
Solution : this is the schema that can check these XML datas. Trying the solution from the home directory :
$ java -Dn=2 -jar reflex-0.3.2.jarrun doc/tutorial/schema/weather-report/weather-report.xcl
Variant : trying both format simultaneously !
Do you think that a data type could be defined to react indifferently on both types ?
<town date="2005/09/09" name="Paris" temp="21°C"/> <town date="2005/09/08" name="Paris" scale="°C" temp="22"/>
The also support polymorphism... Try by yourself to define such a schema.
Solution : this is the schema that can check these XML datas. Trying the solution from the home directory :
$ java -Dn=3 -jar reflex-0.3.2.jarrun doc/tutorial/schema/weather-report/weather-report.xcl
Real polymorphism with an hybrid format
In fact, in the previous example, the type defined is testing if the @scale attribute is present or not, but to make a real polymorphic type, we'd rather define 2 other types :
- #temperature-with-scale : the scale is in the text that follows the int value
- #temperature-without-scale : the scale is inside an other attribute
...then, we simply defined our #temperature type as a choice of one of the 2 previous.
Solution : this is the schema that can check these XML datas. Trying the solution from the home directory :
$ java -Dn=4 -jar reflex-0.3.2.jarrun doc/tutorial/schema/weather-report/weather-report.xcl
Data preservation
Notice that in all the above examples, the XML source document is kept as-is ; the value of the attribute is not updated (but it could) : the typed data is just bound to the attribute as an additional information.
Actually, it is not recommanded for a typed data to update the XML tree it belongs, because if the underlying type is used in a composite type which has a part that fails, the tree might be updated whereas the type is proved to be not applicable.
This tutorial demonstrates the ability of ASL to define semantic data types in order to use them in an application (here : sorting the towns according to their temperature).

