The Graveyard project currently archives and shares personography data (or data about people) from the Brush Creek Cemetery’s records, specifically for a group of 142 burials in Section I of the graveyard. These burials are organized into family plots, and the records frequently (but not always) indicate a location of death. Thanks to the data curation of the Graveyard team, we are able to look up and plot information about each family represented in Section I and get a sense of the family’s geographic distribution from the locations of death associated with each last name (or surname). XPath of the Graveyard team’s data file tells us there are 56 distinct surnames for the 142 persons buried here. For this assignment, we will concentrate only on the larger families, those where the same surname is associated with three or more people. Of these we will graph a total count of deaths per surname, and then superimpose stacked bars representing each regional death location and the count of deaths for that region.
Here is our sample output for the assignment. Yours may be styled differently, and you do not need to output the diagnostic information we did at the top of the plot (unless you want to do something similar). You should title and label your graph clearly and provide an explanation of colors (or textures) that you use to distinguish locations.
Access the Graveyard TEI personography file from our eXist database in this location:
doc("/db/graveyard/graveyardInfo-TEI.xml")
The personography file was prepared using TEI code. To read from the TEI and to output in the SVG namespace, you will need to declare your namespaces and work with the tei:
prefix for all TEI elements.
xquery version "3.0"; declare default element namespace "http://www.w3.org/2000/svg"; declare namespace tei="http://www.tei-c.org/ns/1.0"
We should begin by surveying the personography file. Open it in eXide (with file-->open, and browse your way to it). Notice how the personography entries are organized, and see how the <surname>
elements are positioned. And notice how the <region>
elements are nested inside the <death>
element. Since the regions (US states or Canadian provinces) are more frequently shared and are easiest to understand, we will plot our stacked chart based on these elements (and bypass the cities encoded in <settlement>
settlement). Here is a sample entry, highlighting the elements we seek:
<person xml:id="L12P1" role="occupant" sex="m"> <persName><surname>Henderson</surname>
<forename type="first">James</forename></persName> <age>49</age> <death when="1931-05-21"><placeName><settlement type="city">Mann County</settlement><region type="state">New York</region>
</placeName><note type="cause">unknown</note></death> <event type="interred" when="1931-05-24"> <desc/> </event> <trait type="racial"> <label>white</label> </trait> <geo><!--whitespace-separated geocoordinates look up how to do this in the TEI--></geo> </person>
To plot your graph in SVG from XQuery, apply what you have been learning about SVG in the previous assignments. For example, when you plotted the timeline, you learned how to code a viewport in the SVG root element, and you learned how to plot from x=0 and y=0 so that your plot is visible in the SVG coordinate space, using transform="translate(x, y)"
. You also learned how to plot and space hashmarks at regular intervals along a line. At the very least you want to space bars on a bar graph at regular intervals, and draw X and Y axes based on maximum values multiplied by spacer variables that you set. Keep in mind that when you output multiple SVG elements in a return, you will want to bundle them together in a single group, or <g>
. And don’t forget to use the the tei:
prefix when reaching into the TEI elements!
Work out your maximum values for X and Y and set a view port with a width and a height, and then a viewBox attribute to scale your output if you wish.
Look at examples of how we prepared SVG Viewports in class, and check out Sara Soueidan’s excellent detailed explanation. Here is a brief summary overview of how to set the Viewport attributes on the SVG root element:
width="{largest X value for the ENTIRE plot + something with some wiggle room}"
height="{largest Y for the ENTIRE plot + wiggle room}"
Now, if I want to define how the image behaves on a screen, I define the viewBox
attribute. viewBox
takes four values: viewBox="(x1,y1,x2,y2)"
which define a new coordinate system to use in rendering our output image.
x1,y1
defines starting top-left part of the image, and x2,y2
defines the number that represents the bottom right of the user’s screen.x1,y1
does not start at 0,0
the viewBox
will select the part of your image that starts where you say as the top left of the viewable SVG. (Notice what happens to the output SVG if you set x1,y1 to 200,200)x2,y2
is SMALLER than the total width and height you defined for your image, you’ll be zooming and cropping, because the viewBox
defines what you can see on the visible screen. (Notice what happens if you set x2,y2
to the width div 2
and height div 2
).x2,y2
is LARGER than the total width and height, the result effectively zooms out, making the output image take up LESS space on the screen. Think of x2,y2
as defining a ratio with your width and height attribute values.We recommend beginning by plotting the each surname in a text element running beneath your X axis, and in the same X locations, plotting the total count of deaths per surname. Then we will go on to superimpose the stacked bars overtop of that total bar. Not every death was recorded with a location, so our stacked bars should stack from the bottom up, and in many cases leave some room at the top for those whose locations at death were not marked.
for
loop, or arrayAfter you have output your surnames with their total counts, you will need to make an inner FLWOR
, within the return statement of the “surname” FLWOR
.
We found it helpful to store some arrays (or lists of values) in global variables, and we looped through the arrays in FLWOR
statements used to output the colors in our legend, as well for the regional bars. Later on we found it absolutely essential to make a special kind of array to properly calculate the Y position of each stacked bar associated with a surname.
In order to stack bars you need to start each new bar where the previous bar completed. That means, if there are four bars to plot (bar 1, bar 2, bar 3, and bar 4), we have to plot like this:
We need a way to keep a running total of the heights, so that as we loop through each region associated with a surname, we output the cumulative sum()
of an array storing the values from the previous loops. Here is some code to show how we prepared an accumulator array:
let $matchesRegionList := for $d in $distDeathsSurRegion where substring-before($d, '_') = $i return $d for $m at $posM in $matchesRegionList let $reg := tokenize(substring-after($m, '_'), '-')[1] let $count := tokenize(substring-after($m, '_'), '-')[last()] let $intCount := xs:integer($count) let $regYVal := $intCount * $Y_StretchFactorlet $accumYVal := for $a in (0 to $posM - 1) (:ebb: This very useful loop lets us look up the counts at each of the *previous* $posM steps! :) let $accum := $matchesRegionList[$posM - $a] let $countAccum := (tokenize(substring-after($accum, '_'), '-')[last()], '0')[1] let $intCountAccum := xs:integer($countAccum) let $accumY := $intCountAccum * $Y_StretchFactor return $accumY let $accumPos := sum($accumYVal)
let $cVal := for $v in $colorStates where $reg = substring-before($v, '_') return substring-after ($v, '_') (:ebb: Here we're looping over a global variable called $colorStates, and wherever its region substring matches our current region, we output its color value substring for use in coloring our region stacks. :) return <rect class="{$reg}_{$count}" x="{$pos * $X_Spacer}"y="-{$accumPos}"
width="20" height="{$regYVal}" style="stroke: black; stroke-width:1; fill: {$cVal}"/>
Note that $i
refers to the surname value from our outermost for
loop, and $distDeathsSurRegion
is a reference to another array we stored up in a global variable, in which we stored, for each distinct region, a concatenated string (using the concat()
function) piecing together a surname, followed by an underscore ("_"), followed by the region, followed by a hyphen ("-"), and the surname deathcount at that region. We are reaching up into that global variable, finding the substring that matches our current $i
surname, and outputting it as a smaller array, stored in $matchesRegionList
.
Next we loop through the $matchesRegionList
array and extract the substrings with the information we want for region ($reg
) and deathcount ($count
). And we convert the count to an integer, and multiply it to the proportion we set for plotting on our graph. And this is where we need to plot how much we have to adjust the y position of each bar, by looking up the values at each preceding position in our $matchesRegionList
array. This line of code is vital:
for $a in (0 to $posM - 1)
Here we define a for
to range over integers from 0 to the last previous position of the $posM
variable (or $posM - 1
). For each position, whether it is 0, 1, 2, 3, etc, we subtract it from the current $posM, and we set that value as the position to retrieve on the $matchesRegionlist. The loop runs to its maximum, and outputs a series of numbers. In our next variable, $accumPos
we add up the values of that array using the sum()
function, which is designed add up a set of values. We use that $accumPos
to calculate the position of the current bar for a region in our graph.
The code block above shows how we associated colors with each region (where we work with $cVal
). Notice that involves a similar strategy to what we used above, with opening an inner for loop
and finding where something in it matches something at the current position in the outer for loop
. Here is how we recommend working with colors:
for
loops to run together: Set this up by walking through your array of the distinct values of regions for all families with more than 3 deaths, and set a position variable, thus: for $i at $pos in $distinctUsualRegions
.concat()
for this. This makes an array of values that hold region and color information together, that you can access later as you plot your graph. And you can use it to plot a legend, too!The dimensions and style of your plot are up to you, though we expect your output to be clearly labelled, so visitors to the Graveyard project will understand what they are seeing. Save your SVG output in your folder in eXist, but paste a copy of your XQuery script in a text file, save it according to our usual homework file naming conventions, and upload it to Coursweb.