In this exercise, we work with an XML file begun by Fall 2020 DIGIT 110 students working on a collection of poems titled Montage of a Dream Deferred by Langston Hughes.
Because this document is not in a namespace, we do not need the
@xpath-default-namespace
attribute, and the only thing we need to
add to <oXygen>’s default XSLT stylesheet template is the @xmlns
attribute pointing to the HTML namespace. We also add our usual
<xsl:output>
line that we use when producing HTML (for making
sure we produce valid HTML 5 in XHTML format). Here’s what we need:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" xmlns="http://www.w3.org/1999/xhtml"> <xsl:output method="xhtml" html-version="5" omit-xml-declaration="yes" include-content-type="no" indent="yes"/> </xsl:stylesheet>
We will be preparing HTML reading view styled for the web, based on the original XML
document. We’re going to work with the <poem>
elements,
and work out ways to transform them into block and inline HTML elements for display on the web. Elements of note for our transformation include the following:
<stanza>
holding groups of lines.@n
attribute on <line>
elements.<format>
elements and their associated attributes some display information about italics (with @wordType="italics"
) and indentation (with @margin="ind1"
).To render these poems in HTML, it will help to orient ourselves to HTML block elements, and there are several choices that may make sense for this assignment. Block elements always start on a new line, so they are appropriate for formatting stanzas as well as lines inside them. w3schools provides a helpful list of block elements available in HTML 5.
For formatting within lines of poetry, we will want to work with inline HTML elements, which do not start on a new line but are meant to be part of a flow of text within a block. For example, we can output inline HTML code for emphasis (em
) that will render in italics, or to output line numbers in a distinctive color and style. (We recommend outputting an HTML span
element with a @class
attribute to help make these available for CSS styling. Read more about these HTML elements and how you can use them for styling here: Using <span>
and
@class
to style your HTML page.)
The <format>
holds attributes and a range of values indicating things like italics, underlining, circling in the source document. Those <format>
elements can appear in a few different places, not just inside <line>
elements but also inside title
, poemTitle
elements and elsewhere. You may not know at the outset which ones
can be inside which other ones, or how many different attributes they hold, or how deeply they can nest. Happily, with XSLT, unlike with many other programming languages, you don’t need to care about those questions!
Though the Montage source XML is carefully structured, it nevertheless contains format code that is flexibly worked in at various levels, and it is likely as the team continues working, they will apply tagging within the lines of poetry to mark words or phrases of interest, generating mixed content. XSLT was designed to work efficiently with such flexibility and variation as tagging that appears at different levels of the same document. With a traditional procedural programming language, you’d have to write rules like inside this
That is, most programming languages have to tell you what to look for
at every step. The elegance of XSLT when dealing with this type of data is that all you have to say inside paragraphs and other elements is, poemTitle
, if there’s a <format wordType="italics">
do X, and, oh, by the
way, check whether there’s a <format wordType="italics>
inside the <poemTitle>
or even nested inside another <format>
element, etc.I’m not worried about what I'll find here; just process (apply templates to) all my children, whatever they might be.
The way to deal with mixed content in XSLT is to have a template rule for every
element and use it to output whatever HTML markup you want for that element and
then, inside that markup, to include a general
<xsl:apply-templates/>
, not specifying a @select
attribute. For example, if you want your <format wordType="italics>
to be
tagged with the HTML <em>
tag, which means emphasis
and which is usually rendered in italics, you could have a template
rule like:
<xsl:template match="format[@wordType='italics']"> <em> <xsl:apply-templates/> </em> </xsl:template>
You don’t know or care whether <format wordType="italics">
has any child nodes
or, if it does, what they are. Whatever they are, this rule tells the system to try
to process them, and as long as there’s a template rule for them, they’ll get taken
care of properly somewhere else in the stylesheet. If there are no child nodes, the
<xsl:apply-templates/>
will apply harmlessly (as there will
be nothing to process). As long as every element tells you to process its children,
you’ll work your way down through the hierarchy of the poems without having to
know which elements can contain which other elements or text nodes.
@select
In the previous XSLT assignment, where you built HTML lists from XML code, you used <xsl:apply-templates select="…"/>
, specifying
exactly what part of the source XML tree you wanted to process where. That makes sense when your input contains a lot more than you want to want to output and when you are processing the tree selectively. Use
the @select
attribute when you know exactly what you’re looking for
and where you want to put it.
In this assignment, on the other hand, you don’t know (and don’t need to know) the
order and nesting hierarchy of whatever mixed content you might find within a poem or its subelements. You just want to process
whatever comes up whenever it comes up. <xsl:apply-templates/>
without the @select
attribute says apply templates to whatever you
find.
The upshot: Omit the @select
attribute when you are processing lots
of different mixed up alternatives and you do not need to rearrange them.
(You can still treat them all differently because you’ll have different template
rules to match
them, but when you assert that they should be processed, you
don’t have to know which ones they actually are.)
HTML provides a limited number of elements for styling in-line text, which you can
read about at http://www.w3schools.com/html/html_formatting.asp. You can use any of these
in your output, but note that presentational elements, the kind that describe how
text looks (e.g., <i>
for italic
), are generally regarded
as less useful than descriptive tags, which describe what text means (e.g.,
<em>
for emphasis
). Both of the preceding are normally
rendered in italics in the browser, but the semantic tag is more consistent with the
spirit of XML than the presentational one.
The web would be a dull world if the only styling available were the handful of presentational tags available in vanilla HTML. In addition to those options, there are also ways to assign arbitrary style to a snippet of in-line text, changing fonts or colors or other features in mid-stream. To do that:
<span>
and
@class
to style your HTML page.<span>
element with a @class
attribute.
For example, you might want to output the line numbers coded in the source document as @n
on the element line
. So in a template rule that matches on line
elements, set down a <span class="LineNum">
(or something that makes sense to use to signal line numbers) in the output HTML, and inside it apply-templates to select the value of the @n
(exactly where you want it to be). You can then specify CSS styling by reference to the
@class
attribute, as described in the page we link to above.@class
attributes in the output HTML makes it
possible to style the various <span>
elements differently
according to the value of those attributes, but you need to create a CSS
stylesheet to do that. Create the stylesheet (just as you‘ve created CSS in
the past), and specify how you want to style your <span>
elements. Link the CSS stylesheet to the HTML you are outputting by creating
the appropriate <link>
element in your output HTML (you
can remind yourself how to do that here in this section
of our Intro to CSS). You should set that <link>
element in your XSLT so it is always output every time you update your
code.What you should produce, then, is:
<body>
element and its
contents into HTML.section
element with a @class="poem"
to section off the poems, and maybe try stanza sections with <section class="stanza">
.div
element, or experiment with HTML list elements. (You can write CSS to suppress the bullets in an unordered list, for example).<span>
elements with the @class
attribute for line numbers and other formatting.CSS stylingfor styling backgrounds, text, and fonts, as well as the link for borders under
CSS box model.
padding-left:
properties. Here are some helpful examples.ol
or ul
or div
elements inside a p
element, so you will need to come up with another configuration. You may, however, nest some HTML block elements, as in putting section
elements outside an h3
heading element, and either div
or ul
. Test your combinations to make sure they are valid HTML, and test your CSS by viewing your files locally in a web browser. (Use Command + O
on Mac or Control + O
on Windows to open your HTML file (with its associated CSS) to preview it in a web browser to see how it looks. You can also use the Output preview in <oXygen/>, but that view will not show you a) whether the document is well-formed and valid, and b) will not be able to display as much styling.To submit the assignment, please upload your XSLT, HTML, and CSS files to Canvas. Remember to link your CSS to the HTML file, and ideally, do so in your XSLT document, so it will appear in your output HTML.