For this assignment and the next, you will be working with a digitized XML collection of the Behrend family's travel letters from Europe made in Fall 2021. You will need to access this collection from our textEncoding-Hub, and one of our goals is for you to write XSLT to process a local directory of files rather than just one at a time as we have been doing up to this point. Here is how to access the directory:
behrendTravel2021/
. Inside this is a directory called xml-letters/
that contains a few XML files that we are working with as a collection, a starter XSLT file to help you begin this assignment, and a web-out/
with some sample output and CSS to style it.behrendTravel2021/
directory to some other location on your computer outside of the textEncoding-Hub. (We do not want you to push your homework to the whole class over our textEncoding-Hub, so we just need you to make your own private copy of this directory to work with in the same folder in which you do your homework for this assignment and the next.Please be careful to copy rather the move the directory out of GitHub! If you move it out of the directory, the next time you sync our textEncoding-Hub, GitHub will prompt you to commit the change and push it, which will effectively eliminate the behrendTravel2021 folder. I can easily put it back if that happens, but please alert me ASAP if something goes awry!
We can process a whole directory of files using the collection()
function in XSLT, so we can represent content from a whole collection of XML files in one or more output HTML files. One useful application for working with a collection is to process several short XML files and unify them on a single HTML page designed to merge their content. For this assignment, we will transform the small collection of XML files so that they output on one HTML page, which we will produce with a table of contents, followed by the full representations of each letter in the collection.
Since these documents are all encoded with the same structural elements, we use the collection()
function to reach into them as a group, and output their content one by one based on their XML hierarchy. We are actually treating the collection itself as part of the hierarchy as we write our XSLT, so we move from the directory down into the document node of each file to do our XSLT processing.
Besides working with a collection of files, the other interesting new application in this assignment is modal XSLT, which lets you process the same nodes in your document in two different ways. How can you output the same element contents to sit in a table of contents at the top of an HTML page, and also in another section of your document, below the table of contents? Wouldn’t it be handy to be able to have two completely different template rules that match exactly the same elements: one rule to output an element node selectively to preview in a table of contents, and the other to output the same node more fully in section
or div
elements? You can write two template rules that will match the same nodes (have the same value for their @match
attribute), but how do you make sure that the correct template rule is handling the data in the correct place?
To permit us to write multiple template rules that process the same input nodes in different ways for different purposes, we write modal XSLT, and that is what you will be learning to write with this assignment. Modal XSLT allows you to output the same parts of the input XML document in multiple locations and treat them differently each time. That is, it lets you have two different template rules for processing the same elements or other nodes in different ways, and you use the @mode
attribute to control how the elements are processed at a particular place in the transformation. Please read the explanation and view the examples in Obdurodon’s tutorial on Modal XSLT before proceeding with the assignment, so you can see where and how to set the @mode
attribute and how it works to control processing.
For this assignment you want to produce in one HTML page our collection of letters and documents, and that page needs to have a table of contents at the top. The table of contents should have one entry for each document, which
produces the information we have encoded in <title>
element that is a descendant of the <meta>
element in our XML source code, together with the first line. Below the full table of contents you should output a new section that renders the complete text as encoded of all the documents. In the full text, you should wrap <span>
elements around any markup of interest (including tagging of unclear passages, as well as persons, places, etc. Preserve some info from your source XML by outputting markup information in the HTML @class
attribute.
To generate the attribute value on @class, we used an Attribute Value Template, which you should review here.
Here is a view of our output HTML, but yours does not need to look exactly like ours. For this assignment, just concentrate on outputting the full text and the table of contents at the top.
To ensure that the output would be in the XHTML namespace, we
add a default namespace declaration (in purple
below). To output the required DOCTYPE declaration, we also created
<xsl:output>
element as the first child of our root
<xsl:stylesheet>
element (in green below), and we needed to include an attribute there to omit the default XML declaration because if we output it that XML line in our XHTML output, it will not produce valid HTML with the w3C and might produce quirky problems with rendering in various web browsers. So, our modified stylesheet template and xsl:output line is this, and you should locate this in the starter XSLT we provided (or otherwise copy it into a new XSLT file for working with a collection):
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" xmlns="http://www.w3.org/1999/xhtml"> <xsl:output method="xhtml" html-version="5" omit-xml-declaration="yes" include-content-type="no" indent="yes"/> </xsl:stylesheet>
First of all, file locations will be important in this assignment. Save your XSLT file just inside the behrendTravel2021/ directory that you copied to your local homework file space. The XSLT file should be sitting outside the xml-letters/
directories.
Forget about the table of contents for the moment and concentrate now on just outputting the
full text of the documents. Except for having to pull the contents from a collection of files, this is just like the XML-to-HTML transformations you have
already written, and you’ll use regular template rules (without a @mode
attribute) to perform the transformation.
The collection() function: Here is how we write and run XSLT to process a collection of files. Just ahead of the first template match, after the <xsl:output method>
statement, we define a variable in XSLT, which simply sets up a convenient shorthand for something complicated that we need to use more than once, so we don’t have to keep retyping it.
<xsl:variable name="travelColl" as="document-node()+"
select="collection('xml-letters/?select=*.xml')"/>
An xsl:variable
works by designating an @name
which holds any name you like to refer to it later (we have used "travelColl" here to refer to the 2021 Behrend Travel Letters' collection of files), and with @select
it holds anything you wish: a complicated XPath expression or a function, or whatever it is that is easier to store or process in a variable rather than typing it out multiple times. We use variables to help keep our code easy to read! In this case, we are using a variable to define our collection, using the collection()
function in the @select
attribute. The collection()
function is set to designate the directory location of the collection of letters in relation to the stylesheet I am currently writing. My XSLT is saved in the directory immediately above the xml-letters/
directory, so I am simply instructing the XSLT parser to take a directory-path step down to that directory. Definitely keep the the ?select=*.xml
because it helps make sure that only XML files are included in the collection, screening out the Relax NG file and hidden files that Mac or Windows operating systems sometimes add to file directories.
We will call this variable later in the XSLT file whenever we need it, to show how we are stepping into our collection of documents. That will happen in the first template rule that matches on the root element. Open any one of the input XML files in the XML collection in <oXygen/> and you will see that the transcription contents are all coded within the <letter>
element, so we can write this stylesheet to look through the whole collection of files and process only the elements below <letter>
. You call or invoke the variable name for the collection by signalling it first with a dollar sign $
, giving the variable name, and then simply step down the descendant axis straight to the <letter>
element in each file. Here is how the code looks to call or invoke the variable in our first template match:
<xsl:apply-templates select="$travelColl//letter"/>
Note on running the transformation: Unlike other transformations we do on single XML files, when we run this XSLT in <oXygen/> it actually doesn’t matter what file we have selected in the XML input, because we have indicated in the stylesheet itself what we are processing, with the collection()
function. We can even set a file that is outside of our collection as the input XML file (and we actually ran it successfully with the HTML file of the previous exercise selected). You do need to enter something in the input window, but when you work with the collection()
function, your input file is just a dummy or placeholder that <oXygen/> needs to have entered so it can run your XSLT transformation.
In our HTML
output (scroll down past the table of contents, to where the full text of the
letters is rendered), the parts of each letter are held and spaced apart using HTML
<p>
elements. Here’s a sample of HTML output for
one of our letter documents:
<!-- I output the xml:id from the letter element as an id on an HTML div element to organize my letters on the page and prepare them for linking. --> <div class="letter" id="Greenwich-1955-07-18"> <p><span class="placeName">Departure. On Queen Elizabeth.</span> At Noon, <span class="date">July 18, 1955.</span></p> <p> We got on board about 10 <span class="persName">Harriet</span> had stayed in <span class="placeName">Greenwich</span> over night so went in town with us. . . </p> </div>
The fine print: Don’t worry if your HTML output isn’t structured exactly the same way ours is. But you should open your HTML output in <oXygen/> and simply check to make sure that what you’re producing is valid HTML and renders the text appropriately.
Remember to output span elements for interesting markup in the texts that you can style (later) with CSS.
Once your documents are all being formatted correctly in HTML, you can add the functionality to create the table of contents at the top, using modal XSLT.
For this portion we are outputting an HTML table to show a little preview of information from each file in the collection. It may help to orient yourself to HTML table coding. HTML tables are organized in rows, using <tr>
elements, which contain <td>
elements (which means table data
). You control the columns in an HTML table by the setting the <td>
cells in an ordered sequence. Inside a <tr>
, the first <td>
is set in column 1, the second <td>
in column 2, the third in column 3, and so on. The top row conventionally contains headings in <th>
cells, which HTML will emphasize by default. Here is a simple HTML table (styled following our newtfire CSS, in which I’ve outlined the borders and given a background color to the table heading cells). Next to it is a view of the HTML code that creates the table structure:
Heading 1 | Heading 2 | Heading 3 |
---|---|---|
Row 1, cell 1 | Row 1, cell 2 | Row 1, cell 3 |
Row 2, cell 1 | Row 2, cell 2 | Row 2, cell 3 |
<table> <tr> <th>Heading 1</th> <th>Heading 2</th> <th>Heading 3</th> </tr> <tr> <td>Row 1, cell 1</td> <td>Row 1, cell 2</td> <td>Row 1, cell 3</td> </tr> <tr> <td>Row 2, cell 1</td> <td>Row 2, cell 2</td> <td>Row 2, cell 3</td> </tr> </table>
The template rule for the document node in our solution, revised to output a table of contents with all the information we wish to show before the text of the letters, looks like this:
<xsl:variable name="travelColl" as="document-node()+"
select="collection('xml-letters/?select=*.xml')"/>
<xsl:template match="/">
<html>
<head>
<title>Behrend Travel Letters</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<!--ebb: The line above helps your HTML scale to fit lots of different devices. -->
<link rel="stylesheet" type="text/css" href="webstyle.css"/>
</head>
<body>
<h1>The Behrends' Travel Adventures in Europe</h1>
<section id="toc">
<h2>Contents</h2>
<!-- ebb: Let's set up the HTML table here. -->
<table>
<tr>
<th>Letter Date and opening</th><!--first column table heading-->
<th>Places Mentioned</th><!--third column table heading-->
<th>People Mentioned</th><!--second column table heading-->
</tr>
<!--ebb: Here we use our $travelColl variable pointing into the collection AND use modal XSLT set the toc mode for Table of Contents: -->
<xsl:apply-templates select="$travelColl//letter" mode="toc"/>
</table>
</section>
<section id="fulltext">
<xsl:apply-templates select="$travelColl//letter"/>
</section>
</body>
</html>
</xsl:template>
The highlighted code is what we added to include a table of contents, and the important
line is <xsl:apply-templates select="$travelColl//letter" mode="toc"/>
. This is
going to apply templates to each document with the @mode
attribute value
set to toc
. The value of the @mode
attribute is up to you
(we used toc
for table of contents
), but whatever you call it, setting the
@mode
to any value means that only template rules that also specify a
@mode
with that same value will fire in response to this
<xsl:apply-templates>
element. Now we have to go write those
template rules!
What this means is that if you write new template rules to process the <letter>
elements to
output the full text of the documents, you use <xsl:apply-templates>
and
<xsl:template>
elements without any @mode
attribute.
To create the table of contents, though, you can have
<xsl:apply-templates>
and <xsl:template>
elements that select or match the same elements, but that specify a mode and apply
completely different rules. A template rule for <letter>
elements in
table-of-contents mode will start with <xsl:template match="letter"
mode="toc">
, and you need to tell it to create an <td>
element that contains the text of the <date>
element. You can then apply-templates with mode="toc"
. The rule for those same elements not in any mode will start with
<xsl:template match="letter">
(without the @mode
)
attribute). That rule can create a <div>
element for each letter, and then output the full text of the document using <p>
elements. In this way, you can have two
sets of rules for the letters, one for the table of contents and one to output the full text, and we use
modes to ensure that each is used only in the correct place.
Remember: both the <xsl:apply-templates>
, which tells the system
to process certain nodes, and the <xsl:template>
that responds to
that call and does the processing must agree on their mode values. For the main
output of the full text of every letter, neither the
<xsl:apply-templates>
nor the <xsl:template>
elements specifies a mode. To output the table of contents, both specify the same
mode.
In this assignment, we are inviting you to pull some data from the source files that provide a preview of what people will read about in the letters. We are creating an HTML
table that features at least the @xml:id
on the letter
element as a distinct identifier in the first cell.
The 2021 Travel Letters team coded information about places the Behrends visited in <placeName>
elements, and they coded references to people in
<persName>
elements. To populate the second table cell, we want to reach into each letter and pull out a sorted list of distinct-values of all the <placeName>
elements. And to populate the third table cell, we do the same thing with the <persName>
elements. This allows our readers to survey all the distinct places and people mentioned in each letter. Here are some XPath hints:
@xml:id
on the letter
. But we encourage you to also add the first 80 characters of the text of the letter to provide people a quick view o how the letter begins! To do this, review our XPath Exercise on string functions and specifically try reaching into the first paragraph of the letter and try the XPath substring() function to take the string value, only up to a certain length that you specify.normalize-space()
to each node (use the simple map !
operator for this) so you remove extra spaces from individual nodes first. Then use the arrow operator =>
to operate on the whole sequence of nodes and distinct-values()
first, then sort()
. Finally bundle these all together tidily with a comma separator by sending it to string-join()
like this: string-join(',')
.