[This is my working script for the talk I gave to the CNY Humanities Corridor Early Modern Working Group last week. It probably does not reflect the talk completely accurately, but there you have it. Click here for pdf handout of bibliography & resources.]
Thank you so much for having me. I want to thank Crystal for inviting me and for all of her support for my participation in the Early Modern Digital Agendas program. It was in fact, she who brought it to my attention, and, knowing my interests, encouraged me to apply. Thanks also to Amanda Winkler and Dympna Callaghan for their support and to the Central New York Humanities Corridor and the Syracuse University Humanities Center for making this event possible. You may also detect a bit of a love-letter to the Folger Shakespeare Library in this talk—my every visit there has been transformative, and the staff there has been so generous and supportive of me.
This talk has been a long time coming, but I feel like this is the right time to give it. I took my title, the book is swelled alreadie to a far bigger bulk then was intended, from a note in Charles Hoole’s 1649 An easie entrance to the Latine tongue … a work tending to the school-masters’s eas, and the weaker scholar’s encouragement in the first and most wearisome steps to learning, which I thought was appropriate for this talk for a number of reasons. First, as it’s been on my mind for many months, there is so much more that I want to share than we possibly have time for (hence the bibliography & resources I have provided), and second, because it gets us thinking about the contents and constraints of books, which I will explore.
The original Early Modern Digital Agendas program took place at the Folger Shakespeare Library during what must have been the three hottest weeks of the summer in July 2013. This program, sponsored by the NEH Office of Digital Humanities, brought together 20 faculty, students, and librarians from the US, Canada, Australia, and the UK to engage with visiting scholars and the Folger’s staff of genius librarians, bibliographers, and technologists. Our task was to consider the effects, past and future, of the Digital on Early Modern studies. In addition to this particular group flourishing over the last 15 months, the time for this talk is also fortuitous because in recent months a few projects that are expressive of the potentials that program covered have emerged, and also I am able to preview something very exciting that is coming in 2015.
I will probably now revert to calling the program EMDA, or hashtag EMDA 13,
in the parlance of our times & as it was referred to on twitter. Incidentally, EEBO features some differing opinions on Twitter, as I’m sure is true of the people in this room.
[terrible twitter, from Wing F1418b]
[Do you not twitter yet? from Wing B1582]
[Twitter, to laugh, from Wing E4]
I’ll give an overview of the EMDA program momentarily, but I want to mention an unintended result of that program and bring up as a major theme of both this talk and of what I see to be among the major agenda items for digital studies, both Early Modern and otherwise.
Within a few hours of our introductions to one another, the participants in this program formed a very strong community—despite our differences in experience (ranging from early-program doctoral students to full professors, and from avid coders and encoders to scholars of mostly print)—totally engaged in thoughtful discussion, criticisms, and informed speculation about the myriad problems and potentials outlined in our program. Additionally, in what I believe to be a brilliant stroke of ill-advised classroom management, our leaders Jonathan Hope (from University of Strathclyde) and the Folger’s own Owen Williams, encouraged us to tweet, post, and share what we were discussing with our greater networks, allowing for real-time reporting and the inclusion of voices, opinions, and expertise from outside the room, the group, the library, and even the field. It was this emphasis on an engaged, public community of scholars, in my opinion, that characterizes EMDA most of all, and is its most enduring lesson for me. I’ll be speaking more about this in a moment.
Since that original three weeks, the group, including the visiting scholars, has convened in different sized chunks at the Modern Language Association conference and others, has been featured at an authorship symposium in Australia, and has engaged in dozens of collaborations among our ranks from things as informal guest lectures via skype, to offering comments on each other’s projects, to collaborative papers and panels at upcoming conferences. In May, and I believe this is due to the continued strength of these collaborations, the NEH brought us all back to the Folger for a two-day program to report, reflect, and foster future collaboration.
Here we are in May during one of the few moments where we weren’t at the board room table or in the reading room—note our scholarly pallor and sensitivity to sunlight. Not everyone was able to attend in person, but many who couldn’t were present via Skype or twitter. It meant a lot to me that the Folger and the NEH both appreciated the importance of in-person collaboration to support out continued attention to the questions of how new digital technologies can and do shape early modern research
So, one thing I truly want to foreground in my talk today is the importance of actual humans in digital humanities work. Not just as actors, authors and scholars, but as the connective tissue of discovery, as the informal network of question askers, and as the contributors of labor that is not always emphasized in the projects given the digital humanities spotlight.
For those of you who know me, and who know my work, it probably comes as no surprise that the “work around the work” is something that struck me most. As a librarian, I am very interested in drawing attention to the underlying structures of organization and access that drive this work, and to the quiet labor of digitization, representation, and meaning-making that accompanying it.
That said, I’d like to return to that original program—rewinding to the three weeks in July 2013—for a section of my talk perhaps properly titled “what I did on my summer-before-last vacation.”
The stated purpose of the EMDA13 program was to be a “forum in which twenty faculty, information staffers, and advanced graduate student participants could historicize, theorize, and critically evaluate current and future digital approaches to early modern literary studies—from Early English Books Online-Text Creation Partnership (EEBO-TCP) to advanced corpus linguistics, semantic searching, and visualization theory—with discussion growing out of, and feeding back into, their own projects (current and envisaged). … participants paid attention to the ways new technologies were and are shaping the very nature of early modern research and the means by which scholars interpret texts, teach their students, and present their findings to other scholars.”
The plan over the course of the program was for us to investigate the whole of the digital humanities landscape—from perhaps its most public face of distant reading and its many forms—topic modelling, text mining, natural language processing, sentiment analysis, and other statistical analyses, to the less visible realms of encoding, transcription, network analysis, and modes of publication and access. We addressed these concepts as scholars, as teachers, and as librarians, with an eye out for what is unique to early modern studies within these concerns.
Early English Books online and the Text Creation Partnership are two very unique digital entities of concern to Early Modernists. The first week of our program focused on this as a major component of the early modern digital corpus. We were fortunate to be have Wendy Chun, Ian Gadd, Jonathan Sawday, and Mark Davies join us for this portion of the program.
The second week of the program focused on ways of extending, enhancing, and encoding the corpus into major projects—think tools like EEBO, but also digital scholarly editions, visualization projects, transcription and encoding, and pedagogical exercises– with the help of Julia Flanders, Alan Galey, Kathy Rowe, Martin Mueller, and Gabriel Egan.
In the final week of the residency, we surveyed new analytic approaches to the early modern corpus, with the assistance of Folger Director Michael Witmore, Marc Alexander (from historical thesaurus of English), and EMDA leader Jonathan Hope. At the end of this week, we each gave a talk on our own experiences, and pointed to the issues most present in our work, and proposed ways in which we’d work together to address these issues.
I will get into what I see are the most critical themes that emerged from our discussion, and will talk through some specific examples of projects that embody them as well, but first I just want to give you a sense of the tone and approach of these days in the board room.
First, though, I feel I need to identify myself as a very print-culture-oriented person. In my research and work I have been active in digital realms and much of my time there has been spent cataloging the millions of ways that these technologies create barriers, diminish experiences, or enforce arbitrary constraints on the people who use them. I think people see me as a “digital person,” but I identify much more closely with physical media—I have carted my collection 5000 vinyl records to across the country more times than I care to consider, and I don’t have a tablet or an e-reader. I’m not an early adopter or a tech evangelist; for the most part, I am a critical and frustrated user of digital technologies.
Even coming from that curmudgeonly perspective, I was struck by how connected all of our discussions of digital technology at emda were to the world of print to which it was being applied. We spent lots of time in the reading room and in the board room working with print and manuscript materials.
We considered pre-digital approaches to new ways of reading, and these creative and unexpected treasures guided our investigations.
We used the Hinman Collator to think about visualizing variation and reading multiple pages.
We looked at Teena Rochfort Smith’s 1883 4-text edition of Hamlet,
which was typeset in columns with textual variants revealed by the typography.
You can see the striking mix of typefaces and characters that represent differences in the four editions.
This whole title has been scanned by the Folger (images are courtesy of the Folger Shakespeare Library) and I recommend you take a look.
We viewed Walter Rogers Furness’s 1885 Composite photography applied to the portraits of Shakespeare, where budding photographic processes were used to compare and contrast paintings (and one sculpture of the bard).
We read Robert C Binkley’s 1934 memo New Tools, New Recruits, for the Republic of Letters, which describes an academic job market very similar todays and in which Binkley argues for investigations into a print-based data and image mining that presages many of the digital humanities we see projects today.
We talked about Vannevar Bush’s 1945 As We May Think, and the way his Memex tool sought to connect researchers through their own annotations & trails of moves through primary materials.
This is to say, we were not a group enamored with the allure new digital technologies convinced they are the sources of new ideas and new directions, but a group interested in investigating tools potentially useful for exploring the kinds of questions we are already exploring, albeit at different scales and from different angles, while being attentive to their own situated sets of problems, barriers, and failures. On day 2 of the program, Wendy Chun, in particular, set the tone for our critical approaches in examining digital tools, and helped us to define ways in which researchers using and developing digital tools can do so ethically. I’ve included some of her work on the bibliography if you are unfamiliar.
The remediations, and perhaps re-remediations of work in Teena Rochfort Smith’s and Walter Furness’s books, and those present in the ideas of Binkley and Bush, as well of those in EEBO and other contemporary tools were the starting place, and often the focus of our discussions.
Our collaborative, exploratory use of these tools and texts, and of the investigations of the causes and effects of different generations of technologies repositioned the Folger Library as a lab—a comparison that John Unswerth has made before. Both as a librarian and a researcher (and not to go off on a tangent) I think it is essential that we consider, and speak about, libraries as labs for humanities scholars, and demand that, like their counterparts in the Sciences, our labs our resourced in a commensurate manner.
I’m certainly not the first to make this claim; I echo my friend Heather Froelich (EMDA technical specialist & linguist) and others here. I am also very excited that the theme for RBMS this year is Preserve the Humanities! Special Collections as Liberal Arts Laboratory.
Folger Director Michael Witmore also asked us to reconsider our assumptions about what libraries and the texts that comprise them can be.
In a talk in which presented his work on text as a massively addressable object, he asked us to think of the surfaces that make up a library; all folded into on one another, represented by printed materials and manuscripts, organized on walls and shelves, with adjacencies representing similarities (while ignoring others) in specific buildings, under specific constraints, in disparate locations all over the world. His research question stem from our abilities, through digitization projects to spread out those surfaces, and work in impossible spaces.
His mazelike imagery had me considering the library not as a lab, but the library as a labyrinth.
It is a very familiar metaphor, actually, to anyone who has browsed the stacks of any library—it can be a solitary experience of twists, turns, and dead ends, and you are never really sure that you will find what you are looking for, or even what it would look like.
But there are labyrinths we walk through and labyrinths we follow with a pen or pencil. The only thing separating one from another is a question of scale, and the approach to using either to get what we need changes depending on our context. And information about one scale can be quite useful when you’re working at the other.
This tension between the micro and macro view, between close reading and distant reading, is very intriguing to me. I’m interested in what’s between. Call it medium data.
In the social sciences we’d call purposive sampling—a form non-probability sampling that engages the researcher’s judgement to select groups of critical cases from a population. EMDA participant Jacque Wernimont, writing with others about being the first generation of scholars for whom EEBO was the norm, puts it this way: “Close reading is not inimical to distant reading; on the contrary, it is part of it.” I think this is a critical point. With all the media interest in distant modes of reading, the ways in which close readings have informed and made distant reading possible often get lost in the hype.
If you’ve heard me talk about Digital Humanities before, you know I like to lean on a few concepts as questions we can ask of texts, collections, and projects.
Those concepts are overlay, chatter, timeline, and zoom. Overlay in the sense of considering what other documents, layers, or mergers can tell us more about a document in which we are interested—think Hinman collator or matching historical maps to their contemporary counterparts.
Chatter is in the sense of the conversations around a particular text or collection—in the form of annotations, citations or other links, like Bush’s trails, or reception and response from those outside the text or inquiry—the #EMDA13 twitter archive is good example of this.
Timeline asks us to situate a work on a temporal scale and to align with other phenomena.
And finally, zoom, the ability to shift between those macro and micro views.
After my participation in EMDA, I’ve added network as another dimension to this schema because of how my work there brought my awareness to the bridges, the unbridgeable spaces, and the gaps that networks represent.
I find these five concepts helpful when thinking about what digital humanities projects actually do. They can serve as identifiers of types of treatments. These and many more are borne out in Matt Kirschenbaum & Folger Digital Strategist Sarah Werner’s article “Digital Scholarship and Digital Studies: The State of the Discipline” just out last month in Book History. The authors also provide a broader list of interesting Early Modern digital projects than I have time to cover here. But beyond the ones concerned with overlay, chatter, timeline, and zoom, Werner points out one project, preserving the Great Parchment Book at University College London which used digital imaging and manipulation to
“virtually unwrinkle” formerly unreadable pages, 90% of which are now readable. This is to say, there are early modern digital projects working well beyond the dimension of the textual treatments I’ll describe. But that particular concept was just so beautiful to me, I felt it deserved a slide.
As a group, we stumbled on some additional ideas that seem to provide good critiques to digital humanities tools and projects, and serve as reminders of the distances between methods like searching, text mining, and digital imaging from the print and manuscript materials they represent.
The first concept, which I’ll summarize as the idea of “Lossy Compression” pertains to that gulf between a printed text and how its surrogates are represented in different systems. Anyone who spends 10 minutes comparing a print book to its digital surrogate in EEBO instantly realizes what has been lost in the remediation: the outside of the book (very few bindings show up in EEBO), any information about different colored inks—the Original Early English Books microfilms, which supplied the page images, were high-contrast black and white, only the edges of the most obvious cancel slips appear in the photographs, plates are often missing, text is at times unreadable, not to mention the loss of access to any tactile information about the paper, the print, the ink, or the original item’s aura. Furthermore, puzzlingly, the EEBO interface presents all of its books as if they were the same size. It’s clear that not all EEBO texts are equal in terms of legibility, completeness, and quality.
Both the images and the metadata in EEBO are great examples of lossy compression, because we are so aware of the losses. As we think about other tools, and move toward higher fidelity representations, remediations, and visualizations, we must be very attentive to the ways in which they too are lossy—both in terms of our experiences as readers and in our endeavors to use these representations to make discoveries.
I think the Short Title Catalog is also an example of Lossy Compression—it’s a portable, powerful, and essential too, but again, an entry in STC is about as lossy as as a representation comes. A deliberate, very useful compression, of course, but quite far away from the original texts on the spectrum of representation.
Which leads me to another mantra of the EMDA experience, one concerned with reading the database.
Completeness is an Illusion. Early on in the process it became clear to us, most of us fairly heavy users of Early English Books Online and the Text-creation Partnership, that we had very divergent understandings of and even access to, what the EEBO corpus covers. We were mistaken to assume that our colleagues from other institutions were seeing at what we saw when we did a search against EEBO. There are as many as 5 different levels of access to EEBO, and you may be unaware of your own level until you see someone else’s.
There are those privileged institutions, like Syracuse, who have access to the 130,305 bibliographic records, the 129,541 records accompanied by page images, the 44,314 records with page images and TCP-encoded full text, and the additional 10,000 transcripts from TCP that have yet to make it into EEBO. Then there are institutions with access at any single one of those levels—some with access to only the bib records, and of course, some with no access at all. I’m pleased that SU is able to provide the level of access that we do, but I’m more pleased that, as TCP partners, we are supporting the eminent release of 25000 TCP Phase I into the public domain in a matter of months. More on that to come.
Part of our further re-exploration of EEBO was to critically look at its history, which Ian Gadd has very thoroughly covered in “Use and Misuse of Early English Books Online,” and, at his urging, spent the better part of the afternoon playfully misusing EEBO—to get to know the limits and holes in the archive in a way we might not have had the time or inclination to before.
This also brought awareness of some serious friction between what we found and what Early English Books Online describes itself containing. There are plenty of items that contain manuscript materials, that are in other languages, and that deviate from the stated time period.
The result this, of course, was an increase in our awareness of the needs for training in EEBO literacy. It cannont be truthfully considered a comprehensive, or even a common, representation of the whole of Early Print. But it’s what we have, and in for many applications it is good enough—it is up to us to make it better.
One morning in the reading room, while spending some time with a copy of Richard Baxter’s Two Treatises on Death, I came across this apologetic note from the microphotographer of the book reminding me that this book was very old, and difficult to work with. It occurred to me that the book in my hand, had aged almost a hundred years since that note had likely been penned. I momentarily felt like a had a microform friend looking out for me.
On the EEBO next image, the pages are repeated, and are essentially readable, but that’s not true in every case of workers’ notes. I love how these human interventions into the microfilmed images foreground the generations of labor that have gone into the construction of the databases, from Pollard, Redgrave, and Wing & their associates,
(sorry, no Redgrave!)
to Eugene Power, AKA Captain Microfilm, and his staff of photographers
to the transcribers busily encoding the TCP texts.
Eugene Power’s intent with the Early English Books microfilm project was not to replace the original print books, but to extend the reach of the STC project and to bring reference access—in the form of an edition of one– to those who could not consult the books themselves. I think he would be pleased with the extent to which EEBO demands our return to the printed materials it represents.
We have to consider the resources and constraints they had to deal with in this work and how those resources and constraints affect what is in the archive and what isn’t. EMDA participant and former Early Modern OCR Project manager Jacob Heil tells us that “deeply human and humanistic work is necessary for the advancement of current technologies.” We cannot forget this. For more on the EEBO production process, please check out Bonnie Mak’s JASIST piece, and for other EEBO image oddities—some human and some machine, check out Whitney Trettien’s entry on the bibilography (de-proxied mirror)—I’m sure we all have our own favorites. Here’s a mesmerizingly great one:
If you are a fan of this kind of thing, you should also check out the Art of Google Books Blog, which documents the similar interactions & and failures that persist in a modern large-scale digitization project.
The illusion of completeness in any dataset is important to test and challenge, especially when the interpretation of those data can have grave consequences. In an age of Big Data, humanistic critiques are essential, and I hope that an increased presence of humanists in data-driven discussions outside the humanities will be a result of our research and teaching in digital methods.
We can already see some of the effects of the presence of humanists among those who devise, define, and maintain technical systems. Until very recently, the Text Encoding Initiative (a flavor of XML designed for creating digital editions of literary texts), had a very problematic method of documenting the sex of people referred to in documents.
Melissa Terras, whose bug report is seen here, was able to communicate to the gatekeepers of TEI why this was a problem for humanistic scholarship, and had the standard changed from a numerical value to an open field that may be locally defined. Examples like this are precisely why we need humanists engaging with, influencing, and I’d say leading, those who build the technologies of digital scholarship. If we are not at the table to define the categories and functions we need, we will reify in our work the structural problems of the systems that already exist.
We also spent a lot of time discussing what is “good enough” –
what is good enough to serve as a surrogate, and in what contexts, what is a good enough level of access, what is a good enough tool, despite its flaws, to offer a meaningful object of analysis. Within digital scholarship, there is a need for peer review throughout the process—many projects are works-in-progress, tools that function more to generate questions than they serve as scholarly artifacts.
It is a question we can ask with regard to our own technical skills, the query potential of our data sets—that is the questions that are possible to ask relative to the ways in which our digital information is organized or structured.
This theme ran so deep in our discourse, Jonathan Hope and Owen Williams awarded all participants with a certificate of good enough at the end of the program. In return, Jonathan received a trophy we had inscribed with the same slogan.
One final related idea that served as a guiding principle for our explorations is borrowed from Johanna Drucker’s 2001 Digital Humanities Quarterly article “Humanities Approaches to Graphical Display.” In the piece, Drucker calls on humanists to complicate the impression that data are naturally occurring bits of truth by reframing the concept of data. According to Drucker, “This requires first and foremost that we reconceive all data as capta. Differences in the etymological roots of the terms data and capta make the distinction between constructivist and realist approaches clear. Capta is “taken” actively while data is assumed to be a “given” able to be recorded and observed. From this distinction, a world of differences arises. Humanistic inquiry acknowledges the situated, partial, and constitutive character of knowledge production, the recognition that knowledge is constructed, taken, not simply given as a natural representation of pre-existing fact.” I think this is so important as we as humanists move into worlds of big, medium, and tiny data, but more importantly, as we enter a future more and more enamored with Big Data as a savior.
We can look at capta as information that is aware of its lossy relationship to realities and contexts. Capta is compressed by politics, by bias, by technology, by law, by any system affecting the phenomenon, and by our abilities to observe all of those things.
Drucker has an excellent illustration of how this relates to digital humanities visualization projects, which I will share, if you will allow me to jump ahead to the mid-nineteenth century.
In this first graph, we have a representation of the numbers of new novels put into production by a publisher between 1855-1862
This second graph is fractured by additional overlay of complicating information, and I’d add, information it takes a moment to understand. I’d argue that this is a very good thing. To aid in that, I will read Drucker’s caption: The “appearance” in 1855 of fourteen novels is shown in relation to the time of writing, acquisition, editing, pre-press work, and release thus showing publication date as a factor of many other processes whose temporal range is very varied. The date of a work, in terms of its cultural identity and relevance, can be considered in relation to any number of variables, not just the moment of its publication
In this view, that datapoint marked on the first chart is not a point at all, but a moment of selection. Scientists would not dispute this, Drucker says. She continues “Any self-conscious historian of science or clinical researcher in the natural or social sciences insists the same is true for their work. Statisticians are extremely savvy about their artifices. Social scientists may divide between realist and constructivist foundations for their research, but none are naïve when it comes to the rhetorical character of statistics.” The social scientist half of me could not agree more, we savor the limitations of our methods and datasets.
We have the opportunity to employ visualizations like charts and graphs not as succinct proof of our claims, but as methods of complicating our questions and disrupting our audiences’ tendency to merely take “data” at face value. Additionally, Digital surrogates and collections thereof are always going to be incomplete, and part of our work is making those incompletions known.
At risk of repeating myself, I do think that the chatter of others around work in digital scholarship—especially those not necessarily engaged in it—is an essential component of complicating the ways in which texts are displayed, analyzed, and visualized. Within early modern studies and book history, traditional scholars have, through their work with physical bibliography, close reading, deep historical knowledge of figures, periods and regions, and engaging with texts on a very tactile level, developed a keen sense of both the macro and micro with regards to the material that the scholars working primarily in digital modes use. Thoughtfully handling thousands of printed books will do that to a person. They are well equipped to recognize problems and their critiques are essential.
One EMDA moment that I will never forget occured while we were discussing Early Modern Letters Online, a link to which I have provided. Kim McLean Fiander, one of the project leads, was discussing her interface team’s struggle designing visualizations that didn’t imply completeness, as some letters had been lost, or not available to the project. Jonathan Sawday, who was visiting us that day, raised his hand and asked something along the lines of, “and how does this visualization account for the letters that were never written?”
This impossible question led to a really fruitful discussion of what we communicate, implicitly and explicitly, with summaries, visualizations, and representations. How do we visualize absence, uncertainty, silence, and bias? It’s an interesting question to bring to any large scale representation.
On the level of small-scale representations, Alan Galey has been working on creative methods for documenting and representing variation among editions of texts—he’s not the only one looking into this question, but his approach I think is indicative of the tension between the implicit and the explicit. This slide, which you no doubt have noticed is, well, moving, right now, is an animated gif of a screen capture of Alan’s website Visualizing Variation, where he explores disruption of the reader experience as a means of revealing, and hiding, variations among editions of a text. Now, I’ve sped it up in the animation for the benefit of people further from the screen, but when you visit the site on your own, you will notice that it is a much more subtle change in real life.
His aim is to present variation as a “field of possibilities” among readers, but I believe he also accomplishes a productive disorienting effect that forces readers into a metacommentary with themselves about the origins of the texts they read on the screen. It also calls back to the linear actions of type being set, and the technology necessary for doing so. It is a form that instantly generates more questions than answers, and it embodies an engaging playfulness that I believe many digital projects lack.
This is image is of one of Lady Mary Wroth’s manuscripts in the Folger Library was among the manuscripts we worked on encoding in TEI. You can see already, it’s pretty interesting and complicated—strikethroughs, revisions, & if this image were large enough, you’d see the final two of the sonnet’s 14 lines.
The goal of the TEI marked up version would be to create a digital surrogate that captures as much of what is going on here as possible. In his recent Spencer Review article announcing the forthcoming release of the EEBO-TCP Phase I texts into the public domain, Martin Mueller gives us a very good overview of what TEI does: “You can think of TEI encoding as cataloging of the discursive parts of a book. It is particularly helpful in genres like drama, where a system of rigid and genre-wide notation is part of the genre itself. In TEI encoding, acts, scenes, speeches, stage directions and other discursive part are, as it were, containerized in “elements” bounded by the (in)famous angle brackets familiar from HTML. Such encoding increases the query potential of a corpus: it becomes much easier to look for words spoken in lines of verse, quotations, stuff in “paratext,” etc.”
As EMDA participants worked to mark-up, we co-authored TEI documents that sought to encode the elements of chosen manuscripts. I feel like I may need to issue a code warning before I show you what that looked like. So, please take a deep breath.
I hope you’ll agree with me that this is somewhat human-readable. Inside the brackets you see the elements of TEI which indicate where the text begins, what the title is, that identify the larger line group as a sonnet; the l’s in brackets represent line beginnings… and you see other things like “overwritten” to indicate the cross out, and “choice” which can be used to supply variant spellings or emendations. TEI gives us the opportunity to represent the things we see and read in documents like genre-based structures, poetic forms, revisions, meter & rhyme schemes and even sources of handwriting in the texts we encode.
Of course, there is an endless list of things we could notice in just this scrap of manuscript, even more if we are starting with the actual manuscript and not a poorly cropped image—those noticings are both contextual and personal. In a sense, the encoder of the text encodes a bit of herself or himself in the process. You can see in the final lines of the snippet of code, that disagreements about those encodings can even be represented in the text, as Owen, Scott, and DanP all propose alternates for a difficult-to-read span of text. Any or all of these options could be used in a search, processing, and display of this text.
Our responsibility as encoders is to understand the query potential of a particular approach and to balance our resources and future needs in the constant question of what is good enough.
This was a source of much anxiety in our encoding workshop, evidenced here by my colleagues tweets.
Pedagogically, I think encoding can be quite useful—it forces us to slow down, focus our attention explicate our reactions to a text to think about which parts are significant. I see encoding as a means of deeply engaging with a single text to make distant reading possible; which is to say we have to zoom in in order to zoom out in any meaningful way. Someone has to perform that work.
As people engaging in digital scholarship, it is our ethical responsibility to code thoughtfully, conscientiously, and with an eye toward future scholars. Ellen’s concern in the tweet at the beginning of this thread reminds me of the note our friend the microphotographer placed in the camera in the Baxter text. That is: stressed out about what future generations will think of her work!
There is a computer programming maxim that applies here, I believe. Garbage in, garbage out. – if your input is something the system can’t recognize, the system is going to give you garbage as output. In Early Modern studies, we are dealing with such variation in spelling, print and script, it’s important to consider how our datasets will processes what we create. Many tools might not be able appreciate our long Ss or variant spellings, and it’s important to know how they’ll be handled. This is why our humanistic attention to detail is so critical in early modern digital scholarship.
So, with that briefest possible overview of TEI encoding, I want to point at a fantastic example of encoding in practice that has its origin at the Folger.
You may have seen the Folger Digital Texts—you may have used them in class, or pointed your students to them.
On one hand, this site offers free, open access, digital text from the Folger Shakespeare Library scholarly editions of Shakespeare’s plays and poems. Students can read, search, and even create print-format copies of the plays. It’s a one-stop online place for high-quality editions of Shakespeare’s work.
But what is even more remarkable is that, alongside the reader-oriented texts that are available, the Folger has made the TEI formatted XML files, which drive the display and use of the reading texts, available for use by others in noncommercial projects and apps.
Here is what the text of Macbeth from those displays looks like in XML. You can see there is a lot more going on here with regard to the code than in our manuscript example. You can still read the text fairly easily, but it’s not all laid out in lines like before.
Recall Mike Witmore’s characterization of text as a massively addressable object. The approach that Eric Johnson, Mike Poston, and Rebecca Niles have used in encoding Macbeth and the other plays embodies that characterization. You can see here, every word, every piece of punctuation, every space, has its own unique id. You also see FTLN, which stands for Folger text line number—a direct address for every line in every work—and things like milestones, speakers, and even fanfares—there has been an attempt to encode every discernible feature of the text.
All of this encoding results in a fascinatingly high-level of query potential, and enables a variety of operations and readings that can be queried straight from the code.
For example, the Digital Text team is able to pull a character chart for Macbeth, with moments when characters are on & off stage from this data. Grey is on stage, and Black is on-stage speaking.
They also account for character deaths, so we have different shades of green for onstage-dead, and onstage dead and speaking.
Furthermore, the code supports the on-the-fly generation of a concordance, both for the single texts, and across the whole corpus,
allows us to create cue scripts for the individual characters, this is Banquo here, The Bear’s single cue in The Winter’s Tale.
The team has also created witscipts that show readers all of the action and speech that takes place while a certain character is present. In this case, it’s Lady Macbeth.
This data structure also allows us to isolate any line of dialog as a reference point, or any particular utterance of a word.
Additionally, we can use these addresses to automatically determine which characters are on stage at any given line of dialogue.
In a blog post from last month, EMDA participant Doug Duhaime
used this information to chart the relative numbers of characters on stage against a timeline of words in the plays to test a hypothesis about levels of interaction across the genre in Shakespeare’s plays—but this initial inquiry draws him toward several other treatments of the data and in a new direction. I won’t spoil it for you, but I recommend you check out Doug’s blog.
Earlier this month another exciting application of these texts was undertaken by JSTOR, connecting 6 of the play texts to individual works of scholarship in the JSTOR corpus, based on the Folger Text Line Numbers.
When we enter the Understanding Shakespeare interface and browse to a particular line or section, we get a number next to lines that are quoted in articles in JSTOR, and when we click on that number, we get a list of citations and links to the articles themselves.
I’ve stuck with our portion of Macbeth here for consistency’s sake, which I am afraid isn’t too heavily quoted. But if you open up Hamlet, you’ll see exponentially more citations on almost every line. & within JSTOR at least, is a really meaningful index. Visualizations like this allow us to easily point toward heavily-studied portions of works but it also helps us to locate portions that have been unduly ignored.
This is exciting to me, because it accomplishes something that libraries have done very poorly. It directly links primary materials to the scholarship drawing on them, and assists researchers in moving through the past and future in citations networks. Imagine if every finding aid and rare book catalog had links to the scholarship in which their materials appeared. It’s a terrific pedagogical tool, and something that would revolutionize discovery.
Another Folger-related project with similar goals to what I’ve just described is Folgerpedia—a wikipedia-like layer that connects and fills in gaps among the Folger’s online tools, the Hamnet catalog, the images collections, and the Collation, the Folger blog. It’s a wiki that only scholars affiliated with the Folger can contribute to, but it seeks to represent in one place all of the programs, activities, and interactions taking place there. Think of is as an explanatory glue to all things Folger. This is another thing libraries are terrible at. Due to their structure and purpose, catalogs work in a certain way, a way that often lacks meaningful integration with digital collections,with the work of the scholars who use it, and the deeper content (beyond cataloging) of items and collections they use. & Hamnet, as many of you know, is an incredible catalog—when connected to the social fabric of the Collation blog, the Luna image library, and to Folgerpedia, it is also made present in discussions of the work done in and out of the library, of the problems of space and labor, and an engaged, authoritative source of bibliographic information. Folgerpedia, as a constant and gowing social overlay to the folger’s digital presence, contextualizes all of these tools and is expressive of what the Folger Shakespeare Library actually is in a way that most library websites fail miserably.
As an example of that, you can also read more about the EMDA participants and our work in Folgerpedia, including extensive lists of readings and tools for digital scholarship. We’ve collaboratively authored content there and have plans for more—EEBO literacy tutorials, notifications of work based on our time in the program. Even this talk is represented there. We also have plans to offer Early Modern-inflected reviews and tutorials for the different tools available for digital scholarship with information about which tools play nicely with Early Modern texts.
In the coming weeks, I believe there will also be details about EMDA 15, coming next summer, that I encourage you to track down. Jonathan and Owen shared the proposed schedule with us in May, and it’s just as incredible slate of visiting faculty as we saw in 2013.
In essence, Folgerpedia represents a more social, more humanist system for approaching libraries in the digital realm. It replicates online a space like the break room where scholars at the Folger meet every day for afternoon tea and discussion.
It’s the opposite of the EEBO interface, which I feel is just strikingly lonely.
Everytime I visit EEBO and it treats me like it’s my first time, I die a little inside: “Of course, I want to search, EEBO. It’s like you don’t even recognize me.”
EEBO is one of very very few ProQuest Properties that has remained mostly unchanged since the early 2000s. But that does bring us back to EEBO, about which I promised to share some exciting information.
I have made mention that 2015 marks the emergence of the EEBO-TCP Phase I texts into the public domain, meaning the first 25,000 of those double-keyed TEI encoded texts are will be available to everyone for free. The next 40,000 will be coming in 2020.
Not only, as Martin Mueller reminds us, do these texts provide us with opportunities to curate and improve them, but it also means that new tools for searching, text mining, and visualizing these texts will be developed.
One such tool has been developed at Mark Davies, linguistics professor at BYU and will be available soon. Those of us from EMDA have had an extended preview, and it’s very exciting. Based on the tool through which the other popular corpora at BYU are available, Davies’s tool allows users to search for words across the entire corpus, view keywords in context, group and search specific variants, and trace the frequency of usage of a word or phrase across the entire timeline of EEBO. Much more engaging than the typical inscrutable list of EEBO texts at the title level, organized by alphabetically by author. Not to say that this tool will not take you back to the individual titles in which terms appear—it absolutely will.
I expect we will see many new projects, as well as calls for proposals, dealing with the Phase I texts in the coming months. In 2013, I called next year the year of the Early English Bag of Words, a term computational linguists use to humorously describe their corpora, and I think it will be an exciting year.
As a point of conclusion, my experience in Early Modern Digital agendas has me convinced that digital modes of scholarship enable many different routes for participation and inquiry by early modern scholars and their students. I was surprised to conclude that the most powerful change the digital brings to us is the distributed, persistent, accessible, and public discourse around texts, collections, and projects. The individual tools and projects themselves seem secondary to this, beyond our abilities to link them, remediate them, and interrogate them collaboratively.
The emerging tools of digital scholarship give us the chance to shift the scale and orientation of our questions as well as the ability to address our conclusions via alternate ways of knowing. They can also act as a source of new questions and return us to texts, debates, and moments past. They do not supplant or replace traditional modes of scholarship, but rather feed into them.
As these technologies themselves become more accessible and responsive, it is the responsibility of scholarly communities to engage, guide, and critique the manners in which they become that way. The emergence of critical code studies and critical database literacy are but two expressions of humanities scholars taking on this responsibility. There are plenty of other directions in which we can move. A humanist attentiveness to what the technologies of digital scholarship actually do in and to our communities ensures that we can employ them in ethical, conscientious ways.
The question of good enough is not new to the digital age, but the at-times opaque nature of the tools of digital scholarship force us to evaluate the way we evaluate. Dealing with the digital within our own areas of expertise also prepares us to become, and to train, active human citizens for world that seems more likely to listen to the stories our data inadvertently tell than the ones we tell ourselves.
As a post script, I will leave you with a post script. This is from Comedies, tragi-comedies, with other poems, by Mr William Cartwright, 1651, brought to my attention by Robin Davis. Here we have another book that has swelled too big—to bring us back to the problems of labor and space with which we continue to grapple.
Here the author documents, once again, the problems of remediation, and reminds us of our responsibility to disclose what is wrong with the representations we create and our responsibility to prepare others to use them.