Spring 2024: Classes meet M W F 3:35 - 4:25pm in Burke 153.
Schedule: Spring 2024
DIGIT 210: Lionpath class number: 5905. This course fulfills a core Digital Humanities requirement for the Digital Media, Arts, and Technology (DIGIT) major and an elective toward the Data Visualization Minor at Penn State.
Instructor
Dr. Elisa
Beshero-Bondar (Dr. B
), Professor of Digital Humanities and Program
Chair of DIGIT.
- E-mail: eeb4 at psu.edu
- Office locations: in/near our classroom in Burke Center, 128 Kochel, and online over Slack and Zoom.
- Office Hours: Tuesdays 6:30 - 8pm on Zoom at https://psu.zoom.us/my/ebeshero, Wednesdays 12:15 - 2pm at Burke Center in or near our classroom, and by appointment.
Text Analysis: Course Description
This course orients you to text and document data formats, and engages you in
hands-on coding and programming activities to manipulate these: to explore how computers read and process natural language
patterns in texts, to translate unstructured texts into
structure, and to track, visualize, and explore complex data. You will be learning about how large language models used
in today's artificial intelligence models apply natural language processing to predict and generate text and media.
In this course, you will learn
methods for marking, extracting, and analyzing data from digital documents to produce
infographics such as graphs, charts, diagrams, maps, which you will design in the
context of real projects. This course is meant to be complementary with DIGIT 110: Text Encoding, but where the
emphasis in that course is on curating and preparing reading views of documents, this
course concentrates on analyzing data. Neither
course is meant to be a prerequisite for the other: you may take either one as a
beginner. Returning students (in either semester) review and help mentor beginning
students for overlapping units they have experienced in the other course.
Learning to Code: Our Context
You do not need any background at all with computer programming or web development to succeed in this course. We teach practical programming as a foundational skill (like reading, writing, and arithmetic) that all students should experience regardless of major or background. We also teach it in the writerly context of clear communication and documentation, which helps to build communities and connect projects over long periods of time.
Learning Objectives:
- Work with Texts as Data-bearing Digital Artifacts
- Prepare digital documents to curate and organize information accessible on the worldwide web.
- Gain practical experience with document data curation
- Learn and practice coding of various kinds to address unstructured texts and to produce marked-up structure at scale to support analytical research
- Gain confidence with reading and writing code in multiple environments
- Gain code literacy: Recognize common patterns in code syntax.
- Gain experience with
looking stuff up
and applying it to your purpose - Gain confidence in your ability to learn, adapt, and experiment with code
- Gain Experience with Natural Language Processing (NLP), Document Data Modeling,
Distant Reading, and Autotagging Techniques
- Write code to apply searching and data extraction methods through multiple kinds of pattern-matching algorithms, including forms of regular expression matching. Take conventional boolean searches and library database searches to new levels.
- Learn how to
autotag
enormous texts or collections of texts, for practical results: to code the structure of enormous texts from a distance in order to locate data and make them accessible for navigation. - Apply
mining
anddrilling
methods to interact with texts and visualizations differently than we could do "manually" or with unassisted eyes and brains. - Reflect on thepossibilities and limitations of text data processing and visualization.
- Gain Project Design and Editing Experience
- Gain digital editing experience with proposing, designing, and contributing to one or more digital research projects, applying coding to the preserving, sharing, and investigating of textual resources
- Transform unstructured and structured text nto publishable web formats, to build a project website.
- Design navigation elements, and build visual aids and models (such as timelines and tree diagrams) from texts: to generate charts and images from extracted data
Optional Textbook and Other Class Resources
- Michael Kay, XSLT 2.0 and XPath 2.0: Programmer’s Reference, 4th edition (Wiley Publishing, 2008) ISBN-13: 978-0-470-19274-0 This book is optional, and I have not requested it at the bookstore. I have two copies, and it is available in the Penn State Libraries as an e-book. This is really the authoritative word on XSLT and XPath, written by a designer of the official W3C specifications of XSLT and XQuery that we’re using. We’re not requiring that you buy it, but we recommend it to have a powerful reference at your fingertips and for learning more on your own. There’s a Kindle edition available but poorly designed for searching, so we prefer the hardcover print edition. If you’re going to purchase it, be sure you pick up the latest edition (from 2008), and not the earlier versions.
Software to install
Download and install the following software on your own personal computer(s) on or before the first day of class. These software tools are available in our campus computing labs, too.
- <oXygen/> XML Editor. (You will probably have this installed from DIGIT 100 or 110.) The DIGIT program has purchased a site license for this software, which is installed in Kochel 77, the Lilley Library computers, and Witkowski 109. The license also permits students enrolled in the course to install the software on their home computers (for course-related use only). When installing this on your own computers, you will need the license key, which we have posted on our course Announcements section of Canvas.
- AntConc: (You may have this installed from DIGIT 100.) Free corpus text analysis tool.
- We will ask you to install Python version 3.8 or higher on your computer, and install PyCharm Edu to assist in learning and writing Python code with syntax checking. Follow instructions and links from Pycharm ( https://www.jetbrains.com/help/pycharm/quick-start-guide.html#meet ) paying attention to what you need for your own computer systems. Feel free to download and explore Pycharm Edu on your own before we start working with it together: https://www.jetbrains.com/pycharm-edu/. Also, configure Anaconda so it is available to work within Pycharm following this guide: https://www.jetbrains.com/help/pycharm/conda-support-creating-conda-virtual-environment.html. (We will provide guidance on this in class.)
- Zoom: Make sure your Zoom installation is up-to-date, and you are ready to connect. Sometimes we will record portions of class meetings and tutorial sessions for future reference to share over Zoom. Look for these in Canvas Announcements and use the Zoom menu option in Canvas to access these meetings.
- We will use GitHub for for sharing code and for project management. Create an account (choose the free options) at the https://github.com and install the GitHub client software for your operating system on your own machine on your computer. (We will explain how to use git and GitHub this in our course.)
- We will use the Slack chat platform for discussion and for asking questions (see https://slack.com/help/articles/218080037-Getting-started-for-new-members). Download and install the Slack client, configuring your account to use use your Penn State email address (the official address, which looks like xyz123@psu.edu, and not an alias based on your name that you may have set up), so you can join our Slack workspace: DIGIT-coders. When you receive an invitation to join this workspace you should accept.
- Later in the semester we may ask you to install a local copy of the eXist-db XML database, which you can download from https://exist-db.org/.
Class Web Resources:
- Course Home Website: https://newtfire.org/courses/textAnalysis/ Home of our syllabus and schedule.
- textAnalysis-Hub: https://github.com/newtfire/textAnalysis-Hub Class GitHub Repository and Issues Board
- Canvas: http://canvas.psu.edu To submit homework assignments and exams and read private course announcements
- File Conventions for Canvas Assignments
- Guidelines for Projects Developed in This Course
- Student Course Projects
- Explanatory Guides and Exercises: Complete List
- More resources will be added as we work together this semester.
Grading:
Homework Exercises (30%):
To keep up with this class, you must work on exercises regularly. Each day will involve some small assignment, to prepare you for the next of class, and to help you to build your course project. Homework is time-sensitive and due before class begins. Therefore, points are awarded based on engagement with the assignment and timeliness of completion:
- If the homework is completed before class and demonstrates thoughtful engagement, it is awarded a full 3 points.
- If it is completed by the end of the day after reviewed in class, it can earn no more than 2 points.
- If completed within the week assigned or before a test on the unit in which it is assignned, it can be worth a maximum of 1 point.
- Unless a special arrangement has been made with me, homework submitted after the test for its unit is worth no points, though you may wish to complete it to review a concept or method for work on a project.
- The homework portion of the course grade reflects the percentage of work completed in a timely manner with some flexibility. If you complete 90% of the homework on time, you will receive an A in this portion of the course grade. Homework does not have to be perfect to be awarded full credit.
- Students are not eligible to join a project team (see Project portion of the grade) if they have not completed a majority of the homework when the class forms project teams.
About homework assignments: Coding and project review exercises in this course are about your active learning, and not—as in other courses—a way of testing whether you have already learned something we covered in class or in an assigned reading. You may often need to look up how to do something that you don’t already know how to do. Often there will be multiple ways of accomplishing the task and I am not simply looking for you to do things perfectly in just one way. Instead, I am looking for signs of your active learning process as you take on a challenge. Documenting problems is key to learning, and sometimes just writing out what you are trying to do helps lead you to a solution! There may be times when you don’t get the result you want in the homework, and that is to be expected! In those cases you can still get full credit for the assignment if you’ve made a serious attempt and if you submit, along with your code, a description of what else you tried, what results you expected, what results you got, and what you think went wrong. Getting stuck is part of the learning process. You will see me get stuck sometimes, and I will need your eyes to help me fix something! As long as you’ve described your understanding of the problem and your attempts to resolve it on your own, you will do well: documentation of how you get stuck is key. One of our goals is to form a supportive coding community in this class, so we are comfortable with unsticking each other.
I will read and evaluate all student homework, and will post assessments on Canvas. Coding homework is basically marked complete
(1 point) or incomplete
or redo
(0 pts). If you are asked to redo an assignment it is considered incomplete or problematic. If you resubmit a redo
to correct a serious problem, you will receive full credit for the assignment. I will post comments for feedback and learning purposes and you will find these comments on Canvas, sometimes in your coded homework file. If you have
not engaged with the assignment adequately (whether that means solving the tasks or
discussing the coding obstacles you encountered and how you dealt with them), I may
ask you to meet with me to review the issues and then complete a followup (redo) task
in order to receive credit. For assignments with posted solutions, I will invite you to
review the posted solution on GitHub and comment on it (we will show you how to do
this) to address something you learned from the solution or did in a different way.
For some assignments where we review posted solutions together in class, we will write back to you with individual comments only if your
specific submission raises an issue that we don’t address elsewhere. When much of the class is stuck
on something, we will go over assignments together in class, too. If I don’t return
your assignment, that means that I found nothing to add to our posted solution. In those cases, if you have any questions about your work after reading the posted solution, please ask.
Presence and Connection (In-class and Virtual) (15%):
Coding and programming in real life is a social activity, and professionals in
the real world aren’t know-it-all
experts who work alone, but rather are tuned
into discussion boards and regularly ask and answer questions to stay sharp and to
learn from their community. In this class, we want you to work together and talk
to each other and your instructors as your community resource, so we have built
this into our course participation grade as a formal expectation. Beginning by week two, we’ll expect each student to post at least once per
week on our course GitHub repo, and we strongly encourage you
to do more than this minimum. Earn an A
in participation by asking questions, making suggestions, and sharing helpful
resources you’ve found. Help each other out by trying to answer questions on GitHub
(and
read the instructor posts too as we wade in to help). Your instructors will likely
be dominating the class time as we model concepts and methods, so the GitHub Issues
board gives the students a good space to form into a coding community to help each
other and reflect together. Also, if you have a question about an assignment, always think of our GitHub Issues board as your first resource to
check for helpful hints and to post your questions, because others may have the
same question and answers are best shared! Of course you may e-mail us, but we
really prefer you go the discussion board first, and doing so is, after all, worth
course credit as your participation grade.
Issue posts: Throughout the course, we’ll assign discussion posts worth points toward your Participation grade on our class GitHub site and on our Slack group. You will be discussing online readings or evaluating web resources. Your posting should do more than summarize the article or site (which you could just do by skimming or reading the first paragraph), but should demonstrate a thoughtful reflection on specific ideas and issues. When evaluating a web resource, don’t simply praise or condemn it without going into details about why a key component is effective or poorly designed. Good posts demonstrate care and reflection, and you may choose to respond to the overarching ideas of a piece, or to selected details of specific interest.
Tests (25%):
As scheduled throughout the course there will be a few (three or four) tests on the concepts and various kinds of markup technologies we are learning in the course. All will be take-home or taken online in between classes. They are open-book, open notes, but they must be completed individually and are designed to demonstrate that you have learned from the class material, coding assignments, and posted solutions. Tests may resemble homework assignments, but unlike homework exercises, these are given letter grades. These are given grades because they are evaluative and involve demonstrating what you have learned after we have finished a coding unit.
Project (30%):
This course involves working on a team-based semester project. Project work will be scheduled with paced due dates throughout the semester, and will give you experience with team work to explore a research question and to document methods and discoveries using the coding and text analysis technologies addressed in our course.
Grading Scale:
Grades for the course are calcuated and posted on Canvas, and follow this standard scale: A: 93-100%, A-: 90-92%, B+: 87-89%, B: 83-86%, B-: 80-82%, C+: 77-79%, C: 70-76%, D: 60-69%, F: 59% and below. In taking the course on a S / NC (pass-fail) basis, students must earn a C to receive Satisfactory credit.
Course Policies:
Each day we are covering material that builds on earlier material and assignments, so your success depends upon regular attendance and completing each assignment on time.
Due dates and why we need them:
Your daily homework for this course is time-sensitive, because it is connected to a daily learning process. Keeping up with assignments, even if you do not always do them correctly, is key to what we discuss in class and helps you with your next steps and makes it possible for you to help build the semester project with your team. Work with the time schedule and upload coding assignments, response posts, and other homework exercises to Canvas (or GitHub or our web server as specified), by the due date and time indicated on the class schedule. Sometimes I will ask you to revise your work, and it will always help you to have a starting point, even if you know it is not correct when you submit it at first. Homework assignments will be posted online to our class website and linked from Canvas, so students who miss class are nevertheless expected to consult the schedule and submit assignments on time. Because we post, discuss, and share answers to homework exercises after submission deadlines, we will usually not accept late homework submissions.
Exam Policy:
Exams will be take-home, to do on your own time, with submissions due in Canvas or by web submission. Because I will be posting answers and sharing them in class, I do not allow people to write exams after the solutions are posted. However, I will drop your lowest exam score for the class, so that you may miss one exam without penalty.
Attendance and Classroom Courtesy:
I expect your active presence and interaction with me and your classmates this semester. Being an active part of this class means helping to form a community. We need to rely on each other in the classroom and online in our coding environemnts to learn and develop projects. Attendance is about connecting, being part of our class community of coders.
Our class is fast-paced and requires that we all be making the best use we can of
our in-person class sessions. Arriving late and leaving early (physically or
remotely) disrupts the important collective mental activity of class. So does
in-class texting and checking your cell phone. During classtime,
I ask that you put mobile devices in Do Not Disturb
mode. While class is in progress, talking disruptively, leaving the classroom,
texting or using a cell phone or computer, reading a newspaper, or other
distracting behavior will be actively discouraged.
If you need to miss classes for health reasons, make arrangements with me and
your peers to catch up. Stay in the class loop
by consulting Canvas and checking in over Slack and GitHub.
Student (and Faculty) Health and Wellness Services
If any of us, you students or me, are feeling sick, with COVID or flu-like, or other serious ailments this semester, please contact Behrend Student Health & Wellness Services at 814-898-6217. Reporting in when you do not feel well is not shameful; it is responsible and important to protect yourself and our community.
Please do not attend our physical class if you are not feeling healthy! Stay home, report symptoms, get tested. This applies to me as your professor as well as to you!
Counseling Services
This semester may be stressful for all of us. Many people at Penn State face personal challenges or have psychological needs that may interfere with their academic progress, social development, or emotional well-being. Seek help! The university offers a variety of confidential services to help you through difficult times, including individual and group counseling, crisis intervention, consultations, online chats, and mental health screenings: see resources posted at https://behrend.psu.edu/student-life/student-services/personal-counseling. These services are provided by staff who welcome all students and embrace a philosophy respectful of clients’ cultural and religious backgrounds, and sensitive to differences in race, ability, gender identity and sexual orientation. Counseling and Psychological services are available through the Personal Counseling Office in Reed Union Bldg. Rm 1: 814-898-6504.
LionHelp App
LionHELP is a smartphone application, available for both iOS and Android, that you can download if you or someone you know may be facing a mental health emergency. This app provides information about the signs of a mental health crisis, how to talk to someone who may be in crisis, a guide to help refer someone to the appropriate resource, and a full list of resources available on campus. The app can be downloaded free of charge, and there is absolutely no tracking of any information. Please note that LionHELP is not a diagnostic tool and should not take the place of services provided by a licensed mental health professional.
Equity
Penn State takes great pride to foster a diverse and inclusive environment for students, faculty, and staff. Acts of intolerance, discrimination, or harassment due to age, ancestry, color, disability, gender, gender identity, national origin, race, religious belief, sexual orientation, or veteran status are not tolerated and can be reported through Educational Equity via the Report Bias webpage (http://equity.psu.edu/reportbias/).
E-mail:
Each student is issued a University email address (username@psu.edu) upon admission. This email address may be used by the University for official communication with students. Students are expected to read email sent to this account on a regular basis. Failure to read and react to University communications in a timely manner does not absolve the student from knowing and complying with the content of the communications. The University provides an email forwarding service that allows students to read their email via other service providers (e.g., Hotmail, AOL, Yahoo). Students who choose to forward their email from their psu.edu address to another address do so at their own risk. If email is lost as a result of forwarding, it does not absolve the student from responding to official communications sent to their University email address. To forward email sent to your University account, go to http://accounts.psu.edu, log into your account, click on Edit Forwarding Addresses, and follow the instructions on the page. Be sure to log out of your account when you have finished.
Academic Integrity
Penn State Erie, The Behrend College, puts a very high value on academic integrity, and violations are not tolerated. Academic integrity is the pursuit of scholarly activity in an open, honest and responsible manner. Academic integrity is a basic guiding principle for all academic activity at The Pennsylvania State University, and all members of the University community are expected to act in accordance with this principle. Consistent with this expectation, the University’s Code of Conduct states that all students should act with personal integrity; respect other students’ dignity, rights and property; and help create and maintain an environment in which all can succeed through the fruits of their efforts. Academic integrity includes a commitment by all members of the University community not to engage in or tolerate acts of falsification, misrepresentation or deception. Such acts of dishonesty violate the fundamental ethical principles of the University community and compromise the worth of work completed by others.” (Senate Policy 49-20 and G-9 Procedures. Any violation of academic integrity will receive academic and possibly disciplinary sanctions, including the possible awarding of an XF grade which is recorded on the transcript and states that failure of the course was due to an act of academic dishonesty. All acts of academic dishonesty are recorded so repeat offenders can be sanctioned accordingly. More information on academic integrity can be found at: http://psbehrend.psu.edu/intranet/faculty-resources/academic-integrity/academic-integrity.
Students facing allegations of academic misconduct may not drop/withdraw from the affected course unless they are cleared of wrongdoing (see G-9: Academic Integrity). Attempted drops will be prevented or reversed, and students will be expected to complete course work and meet course deadlines. Students who are found responsible for academic integrity violations face academic outcomes, which can be severe, and put themselves at jeopardy for other outcomes which may include ineligibility for Dean’s List, pass/fail elections, and grade forgiveness. Students may also face consequences from their home/major program and/or The Schreyer Honors College.
Academic Integrity and Use of AI
Text Generative Technology
Here is Penn State’s recommended policy on academic integrity and generative technology: If your instructor allows you to use ideas, images, or word phrases created by another person or by generative technology, such as ChatGPT, you must identify their source. You may not submit false or fabricated information or use the same academic work for credit in multiple courses. Students with questions about academic integrity should ask their instructor before submitting work.
Here is how we will approach work with AI and generative technology in this class: Sometimes we will experiment with it, and I will specifically ask you to try something with AI. When I do that, there will always be something more that I ask you to do to reflect on the output and work with or build on it in some way. I will expect you to explain and document your process: what you are doing with the AI. If your method of preparing an assignment involves chat-generative AI, include a link to the chat transcript in your assignment when you submit it.
Source Citation and Plagiarism: One goal of our course is to reflect on how best to cite sources in digital contexts, including applications of artificial intelligence. We will consider how and why such citations differ from documenting printed texts. We will also consider the ease and frequency with which digital texts and graphics are plagiarized on the worldwide web, and discuss how the omission of source citations detracts from the authority of a digital information resource. We expect you to practice mindful source citation, and plagiarism on your part will have very serious consequences.
Representing the voice of another individual as your own voice constitutes
plagiarism, however generous that person may be in helping
you with an
assignment. Turning in an assignment generated collectively under the name of a
single individual is considered plagiarism. When instructed to collaborate
on a project, project collaborators share collective authorship and should
identify themselves directly as a team. To avoid plagiarism, cite your
sources whenever you quote, paraphrase, or summarize material, or use digital
images from any outside source (including websites, articles, books, course
readings, Courseweb or GitHub postings, or someone else’s notes). When using the
copy
and paste
features as you read and research, be sure that you are
carefully marking that these passages are unprocessed from their source, so that
you know to process it later. Forgetting to do so not only produces sloppy work
but (whether you intended it or not) results in a false representation. As long as
you make a good faith and clear effort to cite your sources, you will not be
faulted for plagiarism, but your work will be penalized if citations are
inaccurate, unclear, or lack important information.
That said, the coding and digital development we do encourages collaboration, and for that reason we adopt our colleague David Birnbaum's Collaboration policy, since his course is very similar to ours. This policy specifies that students identify collaborators in a comment on submitted asignments and take care on projects that all students contribute equally (and no student is contributing excessively more than what everyone else has done). When joining a group homework session, always work on the assignment by yourself first so you can be an equal participant, and write up the assignment by yourself, after the session is over so you take care not to copy from the other students. While we want you to consult with each other, you are responsible for doing all your writing and coding by yourself, using your own words.
Disability Services:
This course could pose certain issues related to physical abilities. Please talk to me if you need help navigating the course or accessing our resources. In the case of documented disabilities, students must meet with the instructor to discuss their specific accommodations. In order to receive consideration for reasonable accommodations, you must contact the appropriate disability services office at the campus where you are officially enrolled, participate in an intake interview, and provide documentation: See documentation guidelines (http://equity.psu.edu/sdr/guidelines). If the documentation supports your request for reasonable accommodations, your campus disability services office will provide you with an accommodation letter. Please share this letter with your instructors and discuss the accommodations with them as early as possible. You must follow this process for every semester that you request accommodations. Penn State Behrend’s Disability Services Coordinator is Amy James (ajk7@psu.edu)
Career Services
Career Services prepares Penn State students to enter the workforce or graduate school through a variety of services. Career professionals will assist with resume and cover letter reviews, internship and job searches, interview prep and mock interviews, career fair prep, development of career competencies, and graduate school prep. Be sure to utilize Career Services for all of your career endeavors, start planning your career early! Do not wait any longer—check out their website and/or stop into their office which is located in Reed 125 during drop-in hours Monday-Friday, 12:00-4:00 p.m. Make an appointment via the Career Services website instructions or call 814-898-6164
Inspiration
We gratefully acknowledge David Birnbaum’s Digital Humanities course at the University of Pittsburgh as our starting point and supporting resource for much of our development. Other inspirational resources include:
Projects that inspire us:
- Obdurodon: where we learned what we can teach, and where we’re still learning.
- Venice Time Machine: very ambitious, enormous project team of faculty and students to study and model a thousand years of Venice, digitizing "kilometers of archives."
- Map of Early Modern London
- Lord Byron and His Times: The very thoughtful stylistic design of this important project reproduces the style of nineteenth-century print and layout. The content makes many rare materials about Lord Byron’s social network searchable and connected to the web of linked open data.
- The Shelley-Godwin Archive: digitizes the manuscripts of Percy and Mary Shelley, and Mary Shelley’s parents, William Godwin and Mary Wollstonecraft—manuscripts often written in multiple hands. Provides an important study of the Frankenstein notebooks to demonstrate how much of a role Percy Shelley played in the writing of Frankenstein. The archive provides a good model of the use of TEI for manuscript encoding and of complex and multiple visualizations of manuscript texts.
- A Tour Through the Visualization Zoo
- Clay Shirky on Love, Internet Style (9 minutes of Youtube inspiration: on what lasts, and why community matters in our digital worlds.)
Previous versions of this course
- Spring 2023:
- Spring 2022:
- Spring 2021: