Spring 2021 Syllabus (Schedule)
Classes meet M W F 1:25 - 2:15pm over Zoom. Zoom
attendance is required for all students.
This contains a detailed explanation of course policies and the basis for grades.
This link jumps to the closest day to today's date. Review the schedule as we get
started to get a sense of how this course will work on a daily basis.
All the Tools You Need As We Begin:
Download and install the following software on your own personal computer(s) on or
before the first day of class. These software tools are available in our campus computing
labs, too.
- <oXygen/>.
The DIGIT program has purchased a site license for this software, which
is installed in Kochel 77, the Lilley Library computers, and Witkowski 109. The license also permits
students enrolled in the
course to install the software on their home computers (for course-related use
only). When installing this on your own computers, you will need the
license key, which we have posted on our course Announcements section of
Canvas.
- Zoom: Make sure your Zoom installation is up-to-date, and you are ready to
connect. The Zoom link for our regularly scheduled class meetings is posted in
Canvas: Look for the Zoom menu option.
- We will use GitHub for for sharing code and for project management. Create an account (choose the free options) at the https://github.com and install the GitHub client software for your operating system on your own machine on your computer. (We will explain how to use git and GitHub this in our course.)
- We will use the Slack chat platform for discussion and for asking questions (see https://slack.com/help/articles/218080037-Getting-started-for-new-members). Download and install the Slack client, configuring your account to use use your Penn State email address (the official address, which looks like xyz123@psu.edu, and not an alias based on your name that you may have set up), so you can join our Slack workspace: DIGIT-coders. When you receive an invitation to join this workspace you should accept.
- Later in the semester we will ask you to install local copies of the eXist-db
XML database, which you can download from https://exist-db.org/. We will go through the installation process
with you when the time comes, since it can be confusing the first time, and
we recommend that you not install this application now in any case because
it is updated frequently, and there is likely to be a more advanced version
available by the time we need it. If you want to
install it early in order to begin to experiment with it, we recommend that
instead of the Latest Stable Release (version 5.2.0 as we write this) you
install the most recent Nightly Build; after you click on the link from the
main eXist-db page, scroll down past the FusionDB (which is a different
product than eXist-db) nightly builds to reach the eXist-db nightly
builds.
- Later in the semester, we will ask you to install Python version 3.7 or
higher on your computer, and install PyCharm Edu to assist in learning and
writing Python code with syntax checking. Follow instructions and links from
Pycharm ( https://www.jetbrains.com/help/pycharm/quick-start-guide.html#meet ) paying attention to what you need for your own computer systems.
Feel free to download and explore Pycharm Edu on your own before we start
working with it together: https://www.jetbrains.com/pycharm-edu/. Also, configure Anaconda so it is available to work within Pycharm following this guide: https://www.jetbrains.com/help/pycharm/conda-support-creating-conda-virtual-environment.html. As with eXist-dB, we will also go over the installation of this and
setting up a Python environment on your computer.
- We may be publishing projects on our web server, newtfire. To access this, you will need secure file transfer (SFTP) for homework
assignments and projects (also available in the campus computer labs). There are
several good options available. We recommend you download and install on your own
computers one (or more) of the following, depending on your platform: (Feel free
to experiment with these and others!)
- Windows users: one of the following FTP clients—the functionality is
similar:
- FileZilla
(This is our favorite client because it behaves the same way across platforms.)
- WinSCP (This is one we used for a long time, since the 1990s, but we now use SSH and Filezilla
more frequently.)
- SSH Secure Shell Client
- Mac users:
- Linux users: You probably don’t need to install anything,
but look at how your system handles secure file transfer (SFTP).
(FileZilla or other clients designed for Linux
environments.)
- No coding experience? Don’t worry! Past students
in this course
who never saw anything like markup or XML code have designed projects (like these) and even spoken about them at undergraduate conferences! You will learn to develop
your own digital tools and how to manage digital projects as teamwork.
Class Web Resources:
Week 1 |
Topics
|
Do before class
|
W 1-20
|
Welcome! Intro to the course. Intro to document formats. Working in
oXygen XML Editor: |
Install oXygen XML Editor on your own computers and add our license key (in Canvas), ideally
before the first day of class but by Friday this week at the latest.
Instructions and license key posted on Canvas. |
F 1-22
|
- Discussion of the XML recipe homework: XML Comments and Well-formedness, and how to work with <oXygen/>.
- Introduce XML Exercise 2
|
|
Week 2 |
Topics
|
Do before class
|
M 1-25
|
|
XML Exercise 2: First, revise the code in your XML
Exercise 1:
- Do you have a
red square in oXygen on your first XML? Address any issues with
well-formedness.
- Try adding tags and attributes to help track
more specific pieces of information than you marked at first, or try to tag a little differently to organize information a little more efficiently, keeping related things tagged the same way, separating numbers from units, etc.
- Resubmit code for XML Ex 1 if I asked you to revise and resubmit (if it was not well-formed). Otherwise, just submit your revisions as part of XML Exercise 2 on Canvas
- Now, choose another recipe with at least five ingredients and multiple steps, to code. Try to code this recipe in a way that is consistent with the bread recipe you just coded: try to track the same kinds of information using the same elements and attributes. Submit your newly coded files (both of them) on Canvas for XML Exercise 2.
|
W 1-27
|
- GitHub and version controlled file management. pulling, adding,
committing, and pushing to your repo.
- Initiate Seven Days of Command Line GitHub: Every other day, alternate
between the textAnalysis-Hub and your own personal repo. Pull before you
push, and push something new every day. (Use the Sandbox in the
textAnalysis-Hub.) Day 1 of 7.
|
|
F 1-29
|
Introducing up-conversion with Regular Expressions. Character sets,
symbols, how to look stuff up and apply it. Repetition indicators. |
- XML Exercise 3: Review our feedback on your coding exercises so
far and submit revisions if we asked you to. Mark up a text of
your choice (any genre; manageable but reasonable size;
non-English languages welcome): With your markup:
- Make a hierarchy at least three or four levels deep (it can be deeper!)
- Work
on applying attributes with your elements, and doing so in a
careful and systematic way to make specific information easy to retrieve from your code.
Work
on applying attributes with your elements, and doing so in a
careful and systematic way.
- Seven Days of Git: Day 2 of 7
|
Week 3 |
Topics
|
Do before class
|
M 2-01
|
- Regular Expressions: thinking algorithmically. Greedy and non-greedy matching.
- Choosing a license for your GitHub repo.
|
|
W 2-03
|
- Regular Expressions: Thinking (and writing) in markdown, algorithmically! Simplifying overcomplicated expressions.
- Looking ahead: Project ideas for XML and text at scale. Initiate Project Ideas discussion on DIGIT-Coders Slack
|
|
F 2-05
|
- Project possibilities and what's next! Initiate project proposal process
- Regex Issues: Debugging and problem solving. Greedy and non-greedy matching. Selecting for what's not there.
|
|
Week 4 |
Topics
|
Do before class
|
M 2-08
|
Guest speaker: Dr. Patrick Juola, EVL Labs, Duquesne University, on Questions in Text Analysis.
|
|
W 2-10
|
|
Beginning from today through Wednesday 2/17:
- Post proposal ideas (each in its own distinct post) in the DIGIT-Coders Slack channel
#digit210-projectideas for team projects to work on this
semester.
- Each student should post an idea for the class to consider: a project involving large text or collection of texts to be managed within a team of 2-4 students
to investigate something we could study and visualize from our markup more effectively with computers than with human reading and description alone.
- Proposals should identify and link to a source of texts you want to
work with available on the public web.
- All projects must involve a team of at least two persons, but this first exploratory proposal is an individual assignment.
- Each student must respond to at least one of the proposed ideas from another student and indicate suggestions or further ideas.
- You may respond to more than one if you like, and indicate which proposals interest you to work on.
- Proposal discussions will run until class time on F 2/19 when we will form project teams.
|
F 2-12 |
Introducing schema code: Validity for a project: what is a schema? What is schema validation?
- Validation for Google Sheets
- How to write a Relax NG schema
Revisit project ideas.
|
Read Intro to Relax NG
- Post in Project ideas and options on DIGIT-Coders Slack
|
Week 5 |
Topics
|
Do before class
|
M 2-15
|
- Good projects: ideas, sources, teamwork expectations: discussion
- Relax NG: data types and mixed content
- Troubleshooting and debugging Relax NG
|
|
W 2-17
|
- Relax NG schemas for project management
- Project ideas
|
- Post project your project idea(s) by today in the DIGIT-Coders Slack (#digit210-projectideas channel)
|
F 2-19
|
- Form Project Teams!
- Next steps for projects: first project milestone
|
|
Week 6 |
Topics
|
Do before class
|
M 2-22
|
What is an HTML file?
- Relationship to / difference from XML.
- Emphasis on very simple structure: head and body, block vs. inline elements
- HTML attributes for file associations: images, links
- Absolute vs. relative file associations (like with schemas)
- Our usage in this class: project documentation, output format for sharing data pulled from XML.
- Creating web publishing space in GitHub personal and project repos: docs directory setup
|
Project Checkpoint 1: Launch the project GitHub repo and invite your teammates and me to join (using Settings > Manage settings). Launch Slack channel for project and invite teammates and Dr. B. Post in your Slack project thread your available meeting times to help determine a regular meeting time for your group.
|
W 2-24
|
HTML and CSS
- HTML 5 semantic elements
- CSS box model
- Web browsers and display variations
- Positioning and controlling layouts with HTML: flexboxes
=
|
Complete HTML Exercise 1. The files go on your webspace: Provide the published web link to your files on Canvas.
|
F 2-26
|
- Debugging HTML / CSS issues; resources for Looking Stuff Up.
- Start XPath and XQuery in oXygen, and in eXist-dB: simple functions and sequences. Exploring XML through child and descendant axes. Predicate
filters.
|
Complete HTML/CSS Exercise 2
- Read about HTML Accessibility and apply what you learn about accessible code on your HTML code for headings, images (providing alt attributes), links, declaring the language. Try applying title attributes.
- Read about Responsive HTML and try applying what you learn to scaling some elements on your site.
The files go on your webspace: Provide the published web link to your files on Canvas.
|
Week 7 |
Topics
|
Do before class
|
M 3-01
|
XPath predicates [ ] as filters. Awareness of sequences: An XPath sequence can be zero, one, or more results.
XPath functions and their cardinality: can they handle only one node at a time? Or many at once? Introducing the FLWOR
|
|
W 3-03
|
- XQuery: Writing FLWOR statements and outputting HTML lists and
tables
- Outputting files and saving them to the eXist-db database for
previewing
- XQuery online and offline: in eXist and in <oXygen/>
|
- Read and check out some things you can try in our newtFire eXist environment at Michael Kay’s
Learn XQuery in 10 Minutes tutorial and our Introduction to XQuery and the eXist XML Database.
- (Due before class) XQuery Exercise 1: XPath over a collection
- Project Checkpoint 2 (due by the end of the day):
- Create a file directory structure for the project GitHub repo(s): Initiate the project website within the docs directory with an index.html page and some CSS. Consult with your team and Dr. B to decide on a place to work on the text files (in its own directory, or in a separate private repo?) and create that space. Create a directory for XML files. Begin populating those file directories (even with placeholder Readme.md files to describe what belongs where).
- Assemble the text files you want to work with on the project. As a team, work on document analysis to plan for how you want these to be marked for structure. What XML structure do you want to use to contain meaningful units of text data? Aim for a clear, simple structure that distinguishes the kind of info you want to be able to track.
|
F 3-05
|
- What you can count and measure with XPath in XQuery
- Saving and Accessing files in the Newtfire eXist-db: set up individual
and team project directories.
|
XQuery Exercise 2: Writing a FLWOR
|
Week 8 |
Topics
|
Do before class
|
M 3-08
|
- Logging in to newtfire eXist-dB
- What can we do with XQuery for loops: over XML nodes, and
off the tree over text values (like distinct-values).
- XQuery from eXist to Web: Writing HTML output from eXist-dB
|
XQuery Ex 3: Querying the Disney Songs
|
W 3-10
|
XQuery to HTML. Working with eXist-dB outputs. Installing and working with eXist-dB locally.
|
- Follow our instructions to install eXist-dB locally on your computer. If you run into trouble, shout out on Slack! Upload the Disney Songs collection (from our textAnalysis-Hub to work with in your local eXist-dB, and try starting your XQuery for Exercise 4 here. (If it doesn’t work, just do your XQuery on our newtfire eXist-dB as usual.)
- Work on your project! Check in with your team. Work on preparing file(s) for your project collection to explore with XQuery in preparation for the next milestone (next Wed. 3/17).
|
F 3-12
|
XQuery to HTML. Other output formats to save:
preparing for network analysis: The CSV / TSV file.
|
|
Week 9 |
Topics
|
Do before class
|
M 3-15
|
Reviewing XQuery so far. Issues with for loops, building files, saving outputs. Prep for Project Milestone(s). |
XQuery Exercise 5
|
W 3-17
|
XML that makes graphics: SVG (Scalable Vector Graphics). Drawing elements,
and screen grid coordinates.
Introductory Slideshow. and w3Schools SVG Tutorial.
|
- Projects: Midterm Checkpoint
- All or most of your texts are prepared in XML and (nearly) ready for XQuery and analysis. Or the work remaining to prepare your texts is easily defined on your GitHub Issues for the project.
- Relax NG schema is prepared and associated with your files. The team is has been error-correcting and proofing the text base.
- The website is progressing: there is a site menu and more than one page. Some content appears to announce what this project is about and what questions the team is exploring.
- Some XQuery over all of some of the team XML is present in the team GitHub repo, and some results of that XQuery are shared, if not on the website, at least in the repo.
|
F 3-19
|
XQuery to SVG: Pulling data for visualizing. |
|
Week 10 |
Topics
|
Do before class
|
M 3-22
|
Plot a clear, simple, legible, labelled SVG graph by pulling data via
XQuery |
SVG Exercise 2 (from XQuery): Plot a clear, simple, legible,
labelled graph. |
W 3-24
|
XQuery to SVG development. Introducing Network Analysis via XQuery and TSV files. |
- SVG Exercise 3: Catch up on previous XQuery exercises if you need to repair them.
Continue with plotting SVG from Assassin's Creed, this time converting your plot to a bar graph: Prepare evenly-spaced side-by-side bars, so the count of actions is next to the count of distinct speakers. Prepare X and Y axes for your graph, and label your counts.
- Prepare for Network Analysis: Install Cytoscape on your computer. Begin familiarizing
yourself with the Cytoscape interface, working with Cytoscape session
files (with .cys extension) in the textAnalysis-Hub in Class Examples >> XQuery-NetworkAnalysis. Try opening one of the Cytoscape
session files (.cys) found in one of the project directories there following our tutorial instructions.
- Read An Introduction to Network Analysis
and Cytoscape for XML Coders
|
F 3-26
|
Network Analysis: working with Cytoscape: Importing Data and working with the Network Analyzer and network stats.
|
XQuery to Network Analysis: Exercise 1 (prepare a TSV for class today)
|
Week 11 |
Topics
|
Do before class
|
M 3-29
|
Network Analysis, continued: Reorganizing and styling network visualizations. Working with output files on your website.
|
|
W 3-31
|
- Network Analysis in project development
- Issue XQuery Test
|
|
F 4-02
|
Introducing Python for Digital Humanities work: Orientation to Pycharm
Edu IDE and tutorial work together |
- Project Visualization Milestone:
- Each team prepares visualizations for the project with SVG and/or Network plots. Store these in the project GitHub repo and the newtfire eXist-dB database and post progress and links to these materials on Canvas
- Progress on Project Websites:
- Work on incorporating places for analysis and visualizations on the site.
- Draft background information about your resources and their origins.
- Write up and post the research questions the team is exploring.
|
Week 12 |
Topics
|
Do before class
|
M 4-05
|
Python flipped class! Discussion of So, "All Models are Wrong" |
- Complete Pycharm Edu Community tutorials: Introduction through Strings and submit evidence of completion (via screen capture) on Canvas.
- XQuery Test due by Tuesday 4/6, 11:59pm
|
W 4-07
|
Wellness Day: No Classes |
Relax! It's a Wellness Day. :-) |
F 4-09
|
Python: Working with libraries (modules and packages), saving and executing files. Introduce first project-data Python exercise. |
Pycharm Edu: FINISH the Intro to Python tutorials: Data structures (lists
and dictionaries), Functions, classes, and
objects, modules and packages, file input and output.
|
Week 13 |
Topics
|
Do before class
|
M 4-12
|
Python NLP work on projects |
Python / project data exercise 1 |
W 4-14
|
Python NLP work on projects |
Python / project data exercise 2: output formats: making an SVG graph |
F 4-16
|
Python NLP work on projects |
Python Exercise 3 (and Catch-Up): Pull and save precisely-named and filed text files from your project and process with NLP, continuing to experiment with selections of NLP data (parts of speech, named entities).
File in Canvas and on GitHub repos |
Week 14 |
Topics
|
Do before class
|
M 4-19
|
Guest Speaker: Dr. Patrick Juola of Duquesne University returns:
Stylometry intro |
Read and post in Slack #stylometry discussion thread about the following articles:
|
W 4-21
|
Stylometry Workshop: Our guest Dr. Patrick Juola leads us in trying out JGAAP on collections of text files |
- (Within project teams): Prepare text files from your project for the Stylometry workshop: pull a collection of text files, or plan to work wtih files you’ve prepared for Python NLP.
- Work toward the Project Visualization and Documentation Development Milestone
|
F 4-23
|
Putting it all together: Web work |
Project Milestone due: Visualization and Documentation Development |
Week 15 |
Topics
|
Do before class
|
M 4-26
|
Ethics in public-facing digital data representation. Thinking about user experience, range of audiences. |
Read Timnit Gebru et. al.,
Datasheets for datasets and respond to Slack discussion in #ethics channel. |
W 4-28
|
Documentation and reflection work: writing about what isn't there,
assessing what could come next. |
Prep for presentations |
F 4-30
|
Last Day! Teams deliver presentations together with DIGIT 409 class, following the DIGIT Works presentation schedule for April 30 |
Prep for presentations |
Finals Week: M 5/03 - F 5/07 |
Due |
H 5-06
|
Semester projects due by 11:59pm. Finish developing
projects and send a post to me on GitHub and Canvas to indicate your team is
finished. |