Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

Digital Humanities 2007 Trip Report

Report on the Alliance of Digital Humanities: 2007 Conference, Geoffrey Rockwell, June 2007

This is being written while I attend the conference and edited after. As there are simultaneous sessions this doesn't cover the whole conference. Please add to it.

General Thoughts

The quality of the papers this year was excellent. While it may be what I attended or the preferences of the program committee there seem to be some general trends:

  • There were more graduate students and young scholars attending and giving papers, which is a good thing.
  • A number of sessions and papers dealt with issues around large scale computing - what to do with large collections of evidence. The sessions that Mark Olsen chaired (and presented in) on text mining with Philomine, for example were looking at thousands of documents. Bethany's paper on NINES dealt with issues around the management and collation of heterogeneous materials. In short we are shifting from issues around the representation of single works or single author corpora to issues around the study of large collections.
  • There were a number of sessions around visualization and representation. Visualization and interface design seem to be an accepted part of the discipline now.

Tuesday June 5th

Gender, Race, and Nationality in Black Drama, 1850-2000: Mining Differences in Language Use in Authors and their Characters - Mark Olsen

Mark began by introducing three types of tasks that mining can do:

Comparative - Allows us to see what distinguishes different sets of texts. Predictive - Classifies unclassified texts - It is important to not trust these. Similarity/Clustering - Mathematical representation of a group of documents.

Mark and colleagues did a study of American vs Non-American black authors using Philomine.

Very useful for looking at patterns across texts. We are still in the experimental phase trying to figure out how it works. For example, are we just pulling out the most stereotypical features, especially doing binomial classification (male/female, black/white).

He left us with the question, "Should digital humanities be deconstructing machine learning?" Mining seems to be returning to structuralism,

Discourse, Powere and Ecriture feminine - Russell Horton

Machine learning can achieve 70 - 80% accuracy on gender disambiguation on the BMC.

They created a corpus of 300 female-authored texts and corresponding 300 male-authored. Used Support Vector Machine technique to study gender differences in writing. They had a "Sand" affect due to all the texts by Georges Sand. They looked at what words (features) were persistent across the time range and were discriminating.

Feminine writing is more personal, emotional, and familial. Male writing is more public, anatomical and authoritative.

This raises again the issue raised by Olsen about whether we are recapitulating gender stereotypes - a return to structuralism.

Mining 18th Century Ontologies - Glen Roe

Glen talked about how the Encyclopedie itself proposed a system of classify knowledge. He showed a neat image of the tree of knowledge. They trained a miner on the classified articles in the Encylopedie and then looked at the unclassified articles to see how they would be classified. Diderot and D'Alembert had a classification scheme, but didn't classify everything.

They then compared the classified and unclassified. He discussed how the mining software tried to classify the unclassified articles and the successes and failures. He uncovered a connection of letters and sciences.

What is neat is the corpora they are working with and the questions they are asking. The ARTFL group have corpora of the size and types (like the thousands of Encyclopedie articles) that can generate interesting results. They are prototyping the types of studies that should be possible - developing the questions.

The Perhaps-Naturally Patriarchal Bard: Computational Linguistics and Shakespeare's Characterization of Gender - Sobhan Raj Hota

Starts with the big questions, what makes Shakespeare so great? We have to be careful that when we use digital texts we are making a material transition - reading a screen not a page. It is radical transformation of the form of literature. (We need to understand how youth are reading.)

They came up with a fundamental problem with balancing the corpus - the "patriarchal bard" problem where Shakespeare is read as both feminist and patriarchal.

They created a corpus of speeches of more than 200 words and balanced speechs by women and men. They eliminated the speakers who had more words than the most verbose woman speaker. This eliminated many primary characters like Hamlet.

They found similar features to the other studies. They looked at bigrams (two word phrases) and trigrams. Women tend to use "my husband". Men are "in the field". They looked at who the most "female" characters are. It should be possible to linguistically map what male and female contructed discourse looks like. Without this we have a "pornographic" definition - "I know it when I see it." That's what these quantitative studies should be able to do for us - show us how gender is contructed in discourse, even if that isn't all there is.

The humanities is about questions that often can't be solved like "Does God exist". How will quantitative methods help us with these?

Sobhan talked about the gender balancing that the computer scientists did (that left Hamlet, for example out, as his speeches would throw off the balance) and how that balancing creates problems. How can you look at gendered language without Hamlet's speech. Her point is that we have to think about the scientific assumptions behind things like "balancing".

Viewing Texts: An Art-Centered Representation of Picasso’s Writings - Neal Audenaert

Talked about how to represent the thousands of images in the Online Picasso Project. The idea was to visualize relationships for teachers and students so that they could do the types of art history tasks they are asked to do rather than just searching thousands of images. They are trying to model the relationships art historians see between studies and finished works, between works by one artist and others, and between works (of Paris) and place (Paris). They are working on a browsing interface that lets people shift focus on different relationships at will. This is to support serindipity and discovery. The goal is to support deployment.

How do they encode and discover the relationships? Is it automated or hand crafted? Are they going to make this mashable so others can create new relationships? At the moment metadata is manually entered. They haven't decided how to do the relationship building yet.

The Abbey Inside the Machine: The MonArch? Project - Clifford Edward Wulfman

Talked about the Monarch (MONastery ARCHaeology project which is a joint Wesleyan-Brown project. They have three different web sites with different interfaces. The third one, St-Jean-des-Vignes is more than just a site about a project. It is an attempt at creating a model or "synthesized interpretation" of monasticism. It is self-conscious about itself as an interpretation rather than being just a searchable research site. This means developing a presentational rhetoric that is reflective and sceptical.

This means they are trying to model things around the questions about monasticism that the evidence can help answer.

The resource is fundamentally rhetorical - it starts with questions about monasticism and presents an argument. It is thus tending toward being a type of digital book. But ... isn't that what all digital resources are, though perhaps without the self-conscious design of this project.

Macro-Analysis: One way to address the challenge of "what to do with a million books" - Matt Jockers

Two paths to computer analysis of texts:

  • Reading support (close reading) - tools that support reading
  • Text mining (far reading) - tools that support large scale mining

Matt explained what he meant by Macro-Analysis by comparing it to Economics. Micro-analysis is to close reading as Macro-analysis is to far reading. Macro-analysis is working with large scale corpora.

Matt looked at Irish-American fiction. He showed graphs of the large-scale publishing trends. He then got the full-text of the Chadwyk Healy corpus and that allowed him to look at textual trends. For example, he showed the usage of "he" and "she". He then showed example graphs comparing English and American novels and gendered words. What is great is the hypothesis forming that he goes through to explain the trends in the graphs. Each graph suggests questions for which we can hypothesize answers. For example, why does use of "the" drop at times.

Jennifer, a student in the digital humanities program, showed XTF - an interface to Lucene that lets people search his database and present the results on a timeline.

Twelve Hamlets: A stylometric analysis of major characters' idiolects in three English versions and nine translations - Jan Rybicki

Jan uses Burrow's methods to study his translations. Do the microstructural levels that may change in translation affect the macrostructural levels.

Jan has 12 Hamlets by having 3 Quartos/Folios and different translations of each. He compares graphs of characters from the original and the translation. He took 250 most frequent words for each of the translations (not the same words for each translation) for plotting.I didn't catch how the graphs are generated, but I assume it is looking at plotting characters across this vocabulary and then reducing to 2 dimensions that can be graphed. There are clear differences between translations, but why? Should the characters plot similarly across languages. The top 250 words might behave very differently in different languages.

This paper raises interesting quesitons about how stylistics can be used in translation to "contrastively evaluate" the quality of a translation? But, could it be that the closest graphs are produced by the most literal translations?

Zeta and Iota and Twentieth-Century American Poetry - David L. Hoover

A big differences between forensics and digital humanists is that the humanists want to return to the texts. Zeta tests on frequent words and iota on infrequent words. He found that with 21 words he could distinguish Frost from any other of the poets. David reports that Burrows says he only uses Zeta and Iota in head to head tests, not in multi-author tests.

David has an interesting way of presenting with spreadsheets. Spreadsheets make an interesting interface that could be developed.

Wednesday, June 6

I was giving papers Wednesday so I didn't take notes directly. I will add notes later.

Session 5: ALLC Panel: Digital Resources in Humanities Research: Evidence of Value - Session Chair: Prof. Harold Short

This session dealt with how humanities computing projects, services or centres are evaluated. The session is timely as the AHRC in the UK has cancelled funding to the AHDS (Arts and Humanities Data Service) which is endangering the matching funding. How would a national service like the AHDS present evidence of value?

The panel speakers included David Robey (convenor), Harold Short (who presented some case studies), Thorny Staples (who talked about Fedora and federated collections), myself (I talked about types of qualitative and quantitative evidence) and Susan Hockey.

Thorny argued that in the future there will be federated library managed collections and that in order for e-texts to be acquired by these collections they will have to accept a higher level of standardization.

Session 24: Representation and Analysis - Session Chair: Dr. Julia Flanders

Stéfan Sinclair and I presented a dialogue between two positions on how to theorize tools. Stéfan's character took a social construction postion that tools should be read as the work of groups of people. My character took the position that tools should be read as theory.

Bethany Nowviskie presented on "Collex: facets, folksonomy, and fashioning the remixable web" and the NINES project.

Arianna Ciula and Paul Spence presented on a very sophisticated project that uses ontologies, "Expressing complex associations in medieval historical documents: the Henry III Fine Rolls project".

Thursday, June 7

Large-Scale Humanities Computing Projects: Snakes Chasing Tails, or Every End is a New Beginning? - William A. Kretzschmar, Jr.

This was the first paper in a sesion on "Done": Finished Projects in the Digital Humanities that looked at how projects end. Bill talked about the LAMSAS project and how it evolved from a Foxbase database on the Mac to a web resource. They had to change as having their data in a proprietary (Foxbase) form would have frozen the project. Because of their audio files they are one of the largest users of their institutional repository.

The largest humanities computing projects will need ongoing funding, but probably won't get grants reliably. Thus they need stable institutional support. They will probably never get enough money, but with stable funding they can keep moving their research forward.

"You do not have to finish the work, but neither may you desist from it."

It’s For Sale, So It Must Be Finished: Digital Projects in the Scholarly Publishing World - David Sewell

We tend to confuse Open with Incomplete and then that Incomplete is good. Is the Wikipedia unfinished if there is not possibility it will be ever finished.

It is possible that the unfinished character of some works like hypertexts has made it harder for the academy to accept them.

The University of Virginia Press Digital Imprint is committed to the idea that digital works can be treated as done and published. It is done when the press is prepared to offer it for purchase and customers are willing to buy it.

Doneness has both intrinsic and extrinsic characteristics. Closed or bounded, complete or static.

Extrinsic are like social construction conditions. Some extrinsic factors:

  • Economics - if it doesn't sell its done
  • Aggregation - when you add something to something larger then it might have to be updated
  • Technological progress can make things appear
  • Failure to migrate as needed is tantamount to "going out of print"

Orlando Done! Thoughts on publishing an electronic text by subscription on the net - Susan Brown

It is important to design projects with discrete steps that can be "done" and published. They have negotiated with their publisher for regular updates so Orlando is done, but can still evolve.

Orlando will never be done due to the nature of digital editions. There are all sorts of things that could be changed or added to. If digital material is performative then no two electonic texts are the same and nothing can be done. Susan talked about responding to user feedback.

Matt K. raised a question about whether we are preserving the outdated digital editions - they are part of the social construction history. This raises the question of whether we want to keep all this stuff.

Part of the problem is how to define the artefact so we know what to preserve. Is the search interface part of an edition?

-- GeoffreyRockwell - 05 Jun 2007


Use this box to quickly add a comment to the page.

more options...