As we learned in the Relax NG tutorial, we write and associate schema to constrain the content of an XML document. This helps if you are working with many complex files or trying to coordinate a team of coders to maintain consistency across an entire project. Relax NG is a grammar-based schema language, which means that it defines the hierarchical relationship of elements and attributes in an entire document from its starting root to all its branches. It may seem like Relax NG ought to be able to govern everything we need, but there are certain kinds of constraints that it can’t handle. For these we apply a rule-based schema to function alongside our grammar-based schema in order to fine-tune precise relationships among elements and attributes. We work with Schematron, a rule-based constraint language that uses XPath expressions to assert or report on the presence or absence of patterns. Rule-based schema languages like Schematron typically do not constrain every element and attribute like our Relax NG Schemas. Instead, when we write Schematron, we usually concentrate on just a few things that we need to control very precisely, as we will show you here.
Relax NG and Schematron are commonly used together. For example, let’s say we are collecting data from 100 people and want to record their votes for their favorite ice cream flavor: vanilla, chocolate, or strawberry. Limiting our attributes to those three flavors and defining the responses as integers would not be difficult using Relax NG. But what if, instead of 31 votes for chocolate, I accidentally entered 131 votes? A basic Relax NG schema that defines the element vote this way vote = element vote {type, xsd:integer}
and type = attribute type {"chocolate" | "vanilla" | "strawberry" }
wouldn’t catch any problems with the specific numbers I enter, because the data type for integer is not something we can set to specific numerical values in relation to a total. If we want to make sure that the numerical values of all <vote>
elements add up to 100, Schematron is the tool we need. More generally, we use Schematron if we need to define rules that assert relationships in the content of our elements and attributes, such as (among other things) to make sure that the preceding-sibling::header does not contain the identical text of a following-sibling header, to check that elements holding page number values appear in the correct order, or to flag every time we are missing a punctuation mark that is supposed to appear inside a sentence element.
When you open a new Schematron file in <oXygen/>, you will see the following superstructure:
<?xml version="1.0" encoding="UTF-8"?> <sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2" xmlns:sqf="http://www.schematron-quickfix.com/validator/process"> </sch:schema>
We will first add some namespace information that will dictate how we represent the elements in a) the Schematron document we are writing, and b) the XML document it will constrain if that XML document is in a special namespace. We typically set the Schematron namespace as a default. (Without this line, we would have to type sch:, a namespace prefix , in front of all of our Schematron elements, so we really prefer to use it.) Paste the line bolded in red below into your new Schematron:
<schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"
xmlns:sqf="http://www.schematron-quickfix.com/validator/process"
xmlns="http://purl.oclc.org/dsdl/schematron"
>
</schema>
If the XML document(s) you’re trying to constrain are in a specific namespace, such as the TEI, you must identify that namespace with an empty element called <ns/>
, and you will also have to use a namespace prefix when representing the XML elements in your schema rules. The next box shows how to define the TEI namespace and its special namespace prefix. If you are writing Schematron to govern TEI XML and you don’t define your namespace, or if you forget to use a prefix to point out the elements that belong to that namespace, the Schematron’s rules simply will not fire when you associate it with your TEI document(!)
<schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"
xmlns:sqf="http://www.schematron-quickfix.com/validator/process"
xmlns="http://purl.oclc.org/dsdl/schematron">
<ns uri="http://www.tei-c.org/ns/1.0" prefix="tei"/>
</schema>
About namespaces: Documents are in a namespace or in no namespace, as signaled in their root element. We can see in the code above that a Schematron document has a special xmlns
(or XML namespace) attribute that seems to point to a web address. This is not really a website (though sometimes developers put up placeholder websites at namespace URIs): it's simply a unique uniform resource identifier (that is what URI stands for) and it is simply a unique string of characters used to identify the Schematron namespace. The TEI has its own namespace URI too, and so do other forms of XML (like XSLT) that we are presenting in this course. If your input document is in the TEI namespace (that is, the root element is <TEI xmlns="http://www.tei-c.org/ns/1.0">
, you have to include the <ns uri="http://www.tei-c.org/ns/1.0" prefix="tei"/>
element we illustrated above in your Schematron and you must use the tei:
prefix before all references to elements (but not attributes) from the input TEI document in your Schematron file. That means you need to write //tei:body/tei:div
and not just //body/div
. Attributes are special because they exist in no namespace, so they do not take a prefix (and you will not be able to find them if you apply the prefix). So if we are looking for @ref
attributes on TEI <div>
elements, we would write: //tei:body/tei:div/@ref
. You can think of this as a magic incantation that’s needed for Schematron to match just the elements in the TEI document, but if you’d like more explanation of how namespaces work, see http://www.w3schools.com/xml/xml_namespaces.asp.
Each new schema rule starts with a <pattern>
element. Inside the <pattern>
is a <rule>
element with an @context
attribute. It looks like this:
<pattern>
<rule context=" ">
</rule>
</pattern>
We can set as many rules as we wish inside a pattern
element, which simply works as a convenient organizing structure for you to put related rules together. A pattern
element may contain one or multiple rule
elements as you wish. A rule
element must have a @context
attribute that is distinct from other rule
elements in your Schematron file.
The value of @context
is the specific place in your XML document where the rule applies. (When you have associated your Schematron file with your XML and do validation checking in <oXygen/>, the XPath pattern defined by your @context
is where <oXygen/> will mark a validation error triggered by a test of your Schematron rule.) The form this takes is called an XPath pattern and we also use it in XSLT: it is a pattern of elements and/or attributes set in relation to each other that might appear at any level of your document hierarchy: For example, if you write the XPath pattern p/said
as the value of @context
, rule context will apply to any <said>
elements within a <p>
element positioned at any level in the XML document hierarchy, whether the parent p
element is sitting inside a TEI header in an outer level of the hierarchy or deeply nested inside a note
element inside another body p
. XPath pattern expressions let us locate particular patterned relationships wherever they sit in the document hierarchy so they can be a powerful tool to keep our Schematron and XSLT code tidy and efficient. Why is this more efficient? Because we do not have to write the same rule for said
elements over and over again depending on the different XPath positions of p
, and we may save computer parsing time by not starting our searches over and over again from the document node were we to begin with //p/said
. Constructing an XPath pattern, p/said
takes advantage of the relational patterns that rule-based schema languages are designed for.
XPath patterns can also be set to use predicates, so that, for example, said[@who]
matches on any <said>
elements that have @who
attributes anywhere they are sitting in our XML document.
The <assert>
or <report>
element is the heart of each Schematron rule. Within each <rule>
element we can set one or more <assert>
or <report>
elements, which contain an attribute called @test
. With all of these pieces together, here is the basic skeleton of a Schematron rule using <assert>
:
<pattern>
<rule context=" ">
<assert test=" "> </assert>
</rule>
</pattern>
The value of @test
is a literal XPath statement defined in immediate relation to the current XPath location of @context
, wherever this is. The @test
sets a condition for the True or False value of something you write here: For example, does particular string pattern exist here? Does the numerical value of this equal the preceding::sibling of the current context? Imagine the current context to be shifting with each discovery of the XPath pattern. As the validation checker lands on each new instance, it runs your @test
and checks for some condition, true or false, that hinges on that pattern in some way. Basically, @context
tells <oXygen/> where to look, and @test
tells <oXygen/> what to test when it gets there. You then type a message, your very own customized validation error message, inside the <assert>
or <report>
element as its text content, and explain (to yourself and/or your project team) the reason the rule is firing. When a rule fires, it will generate an alert message in <oXygen/> just like a message from Relax NG, although in Schematron, it’s your own custom-made message that fires.
Okay, now that we understand the structure, let’s construct some sample rules so we understand how and why they function. Let’s say you’re keeping track of points in a game where the goal is to get as many points as possible. The person in first place got 23 points, second place got 16, and third place got 12. Let’s construct a basic XML document to store the results:
<gameResults>
<first>23</first>
<second>16</second>
<third>12</third>
</gameResults>
In our very simple example, the first place score should always be more points than the second place score. Let’s write a Schematron rule to make sure the values are entered correctly. First, let’s start by writing the <pattern>
, <rule>
, and @context
. We want the rule to fire (or alert the user) on the <gameResults>
element.
<pattern>
<rule context="gameResults"
>
</rule>
</pattern>
Now, we want to write the rule. We want to assert (or say definitively) that the first-place score must always be greater than the second-place score. This means that the rule will fire when the defined assert test fails.
<pattern>
<rule context="gameResults">
<assert test="number(first) gt number(second)">The first-place score must be greater than the second-place score.</assert>
</rule>
</pattern>
When we associate our schema, if we have entered 116 instead of 16 for the second place score, our schema will fire an error because what we typed fails to fulfill our Schematron assert test. Notice that in this example we need to use an Xpath number()
function for our rule to treat the contents of the first
and second
elements as a numerical value to be compared. This is because we are using a value comparison operator which we discuss in more detail in the section on comparison operators later in this tutorial.
first
, second
, and third
, we discover that our Schematron still reads the contents of those elements as a string of text until we convert them to a number in XPath. The Relax NG grammar constructs a numerical data format, then, that is nevertheless not read as a number by an XPath parser unless it is prompted to do so. (We provide links to our sample Relax NG and Schematron files so you can test this for yourself.)
Now that we have a working schema rule to test the difference between the first- and second- place scores, let’s make a rule that tests the second- and third-place scores. The rule is essentially the same (the second-place score is always greater than the third-place score), but we’ll use the report
element instead to demonstrate how it works. We must add a new test within our rule since it shares the same @context in the gameResults
element. Note: If we attempt to define a new rule with the same:@context
as the first, one of the two rules will be applied and the other ignored! So within a given rule @context, we need to define all our assert and report tests together.
When we write a report
element, we are saying to tell us (flag or report) when a particular condition in an @test is met. The difference between assert and report then, is that an assert
test fires and error when its assertion is violated, while a report
test fires and error when its condition is met. In this case, we call for a report when the second-place score (or current context) is less than or equal to the third-place score. Using report
in our second test in the example below, the rule will fire when these conditions are met.
<pattern>
<rule context="gameResults">
<assert test="number(first) gt number(second)">
The first-place score must be greater than the second-place score.
</assert>
<report test="number(second) le number(third)">
The second-place score must be greater than the third-place score.
</report>
</rule>
</pattern>
Here is another way we might write that report statement, to illustrate how we might use the XPath function not()
wrapped around a test value:
<report test="not(number(second) gt number(third))
">
The second-place score must be greater than the third-place score.
</report>
XPath functions that return numerical values are frequently used in Schematron for comparison tests. Some of these functions operate over text content that needs to be converted to numbers as we did here, and some of them calculate and measure things (like string-length() to return a numerical value. Here are the standard ways to indicate comparisons in XPath and Schematron, with value comparison and with general comparison. These function a little differently, as we note below:
The operators below only work to compare exactly one item to exactly one other item: they are for one-to-one comparison only:
eq
or =
gt
or >
ge
lt
or <
le
ne
or !=
Note: Value comparison of numbers may require you to use the number()
function to convert a number string in your element content into an integer. That will not be necessary in using count()
or sum()
functions, however. For example, we can write a report
test to fire if there is a count of more than three instances of a <geo>
element inside a <p>
(or paragraph), with:
<rule context="p"> <report test="count(geo) gt 3"> There should never be more than three geo elements inside a paragraph! </report> </rule>
This works because the output of the count()
function is a single
item, an integer, and it is being compared to another single item, the integer value
3.
You can also use value comparison operators to compare strings of text (not just numbers). For example, we could make a report test for <place>
elements with an @name
attribute set to "london" to help us catch this in replace it with "London" instead. This works because each <place>
element can contain only one @name
attribute to compare to a single string of text, "london". It also works (and returns false) if there is no @name
attribute because it permits an empty sequence (no items). But if there is more than one of either item the comparison cannot be made and throws an XPath syntax error.
While value comparison operators can compare only one thing on the left to one thing on the right, general comparison operators can have one or more items on either side of the comparison (also zero items, since the empty sequence is also allowed). The general comparison operators are:
These operators are more commonly used than value comparisons in Schematron expressions because they have broader applicability (for one-to-one and many-to-one comparisons. When testing numerical strings as element content, general comparison operators also do not seem to require conversion with the number()
function. For more detailed discussion of comparison operators in XPath, please see our Follow the XPath!
tutorial.
Associating a Schematron schema is a lot like associating a Relax NG schema. While viewing your XML document, in the taskbar, click on Document -> Schema -> Associate Schema. From there, locate your schema file (the file extension should be .sch). When you associate a .sch file, <oXygen/> should automatically set the schema type to Schematron. A note on mindful file management: Remember to save your Schematron in a directory where you can easily and consistently locate it. Finalize that, and <oXygen/> should insert a superscript that looks like this:
<?xml-model href="your_file_name.sch"
type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
If you also have a Relax NG schema associated, you will have two different schema lines at the top of your XML document. The two different kinds of schema will function together so that as you code the red square in <oXygen/> will appear as validation errors. The bottom window will feature messages associated with these validation errors, and this will include the messages you write in the text content of your Schematron assert
and report
elements.
When you associate your schema, always tinker with your XML to create conditions that will cause your Schematron rules to fire! Testing your schema code should be a back-and-forth process to ensure that your assert and report tests are functioning as you want them to.