The Fall 2020 DIGIT 400 James Bond project team has prepared XML for the screenplay Goldeneye, which you can access by right-clicking on the file and downloading it from here: Goldeneye.xml. Open the file in oXygen and work with the XPath Window set to version 3.1. Respond to the XPath questions below in a text or markdown file, and upload to Canvas for this assignment when you’re finished. (Please use an attachment! If you paste your answer into the text box, canvas may munch the code formatting.) Some of these tasks are thought-provoking, and even difficult. If you get stuck, do the best you can, and if you can’t get a working answer, give the answers you tried and explain where they failed to get the results you wanted. Sometimes doing that will help you figure out what’s wrong, and even when it doesn’t, it will help us identify the difficult moments.
You should consult The XPath Functions We Use Most page and especially its section 4 on Strings. As always, consult our class notes and our introductory guide Follow the XPath!. Be sure to give the XPath expression you used in your answer, and don’t just report your results. This way, if the answer is incorrect, we can help explain what went wrong.
First of all, skim through the document to get a sense of how it is coded. Write some XPath to see if you can write XPath expressions to find all the scenes, stage directions, speeches, and speakers just to warm up and familiarize yourself with the file.
sd
elements. These contain the stage directions.
sd
elements in the document.Scene
element, and in each Scene the first element child is a Heading
element. How can you reliably find the first stage direction immediately following that Heading element? (Hints: Take this in stages: First look for all of the Heading
elements. Notice how the first sd
element is positioned in relation to a Heading
element: they are children of the same parent. Our solution uses the following-sibling::
axis and a numerical position predicate to indicate the first in a sequence.) Heading
elements, we are interested in the ones that feature computers in the scene. How can you find out which ones contain the string "computer"? (Hint: add a predicate).Scene
elements without sd
elements inside. How many of these scenes are there? (Hint: use a predicate with the not()
function.)string()
function, which pulls text strings out of XML nodes, and the string-length()
function, which measures the number of text characters in the XML node that you visit.
sp
elements. Write an XPath to locate all of the speeches (and notice how they are coded with a spk
element inside). Now, use the simple map !
operator to apply the string()
function to each sp
element one by one. How is this return with string()
different from just returning the sp
elements? (Respond with your XPath expression, and a brief explanation of what you are seeing in the return window: How did the string()
function change your results?)string()
function, and instead, step to the text()
node child of sp
. How does this change the results in the return window? (Note: text()
is a node in the XML tree, so this is not a function, but a path step from parent to child. Tecnically, text()
is a child of the parent element.) string-length()
. What does this return?max()
function to find out the longest length of a speech in the Goldeneye script.string-length()
and max()
functions took us off the XML tree to yield calculated results. How can we write XPath to return the XML element sp
that has the maximum string-length()
? Hint: Try searching for sp
elements with a predicate that checks to see if the string-length()
is equal to the maximum string-length you found in the previous step.spk
elements, to return information about the speakers.
spk
elements are nested as children inside the sp
elements. Write an XPath expression to return all the speakers who deliver speeches that contain the word "Iraq". (Hint: Try breaking this down: first return all of the speeches that contain "Iraq" and then take a step to return the spk
element. spk
elements are entered in block caps. Use the XPath lower-case()
function to return all the spk elements lower-cased instead and record your expression. Hint: For this special function, you will need to refer to the self:: node using the dot like this: lower-case(.)
string-surgeryin XPath by working with substrings. Consult this page to learn about the XPath
substring()
function and see how to write it out. Now, see if you can apply the substring()
function to isolate the 2nd letter onward in the spk
elements. Then, lower-case()
that substring!substring()
to isolate letters 2 to the end, you should be able to change it to return only the very first letter. (This time, we do not want to apply the lower-case function, because we want to preserve the upper case of the first letter.) Try it and record your expression.concat()
function, and there is a convenient shorthand for it in XPath 3.1 which sets two vertical bars ||
between the expressions you want to connect. However, we need to be careful because concatenation requires joining exactly one thing to exactly one other thing. (XPath can't figure out on its own how to concat (or tie together) the whole sequence of substrings of the first letter to the whole sequence of the substrings of the rest.) To help XPath to work one at a time over sequences of spk
substrings, look up the for $i in (sequence) return ...
XPath sequence. (This is a for-loop in XPath, and $i
is known as a range variable that isolates each member of the series, one by one.) With the for-loop, you can go one step at a time through the series of //spk
nodes and return a concatenation of the substring functions you figured out, using $i
as the first argument of your substring functions. See if you can work out how to write this XPath.