Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

TREX 2008 Winning Proposal Summaries

TADA held its inaugural TREX (TADA Retrieval and Evaluation Exchange) event in 2008. Of the many submissions received, seven were chosen by the panel of judges to be winners in different categories. The texts below are not the full submissions, but rather summaries of the submissions (as prepared for a proposal to the Digital Humanities 2009 conference).

Degrees of Connection by Susan Brown, Jeffery Antoniuk, Sharon Balazs, Patricia Clements, Isobel Grundy, Stan Ruecker (Best New Tool)

This linkage tracing tool allows users working on a large collection of documents to explore the linkages within the collection based on the semantic tags it contains. Our prototype based on the Orlando textbase traces links between people mentioned in different XML documents based on co- occurrences of a small set of key tags that occur across many documents: personal names, organization names, places, and titles. This exploits the tagging to get at connections between people that may not be made by direct linkages between documents, but rather by the co-occurrence of tags within two documents, or a pathway from document x to document y by way of document z in which different tags common to x and y occur in z. It is a way of getting at implicit but nevertheless potentially important linkages, and while it emerges in this case from an interest in literary history, the tool could be useful to other fields ranging from journalism to creative writing, sociology or psychology. It provides a new way of exploring the large digital collections increasingly used by researchers.

Collocate Cloud by David Beavan (Best Idea for Improving an Existing Tool)

Clouds of information e.g. keywords, tags or words, are a very useful way to aggregate and present vast quantities of data. These clouds have gone on to be used in many web 2.0 sites. As such they are becoming a well known and understood visualisation by many users. TAPoRware currently provides a Word Cloud visualisation, which shows the frequency of words in a document. Scholars often wish to go further, to see how a particular word is used, by examining which words co-occur near their search word. TAPoRware already has this Collocation tool, showing the results in tabular form. The Collocate Cloud would merge the collocation output and the cloud visualisation technology. It will show the collocates of a particular search word in cloud format. The alphabetical ordering of the Collocate Cloud would allow the user to find or discount a word quickly. Frequencies and collocational strength are shown by size and brightness, letting these terms stand out visually.

Magic Circle by Carlos Fiorentino, Stan Ruecker, Milena Radzikowska, Piotr Michura (Best Idea for New Tool)

The Magic Circle is an information glyph that allows scholars to visually summarize combinations of the lexical facets of data included in text collections and the bibliographic information attached to these texts. The tool consists of a set of rings organized outwards from the centre and divided in wedges or sections. The lexical data determine the size of the centre, which also shows a word, a lemma, parts of speech, etc. and the total number of matches found. The bibliographic data is related to authors specified by the user, and the rings allow the user to analyze how the matches are distributed in different collections as well as in different periods of time. The color sets of the rings follow patterns of associations with variations in hue, tone and saturation. A comparative scale helps the user to understand the volume of information found in context with the whole volume of information present in the collections.

Ripper Browser by Alejandro Giacometti, Stan Ruecker, Ian Craig, Gerry Derksen (Best New Tool)

The Ripper Browser is a prototype for rich-prospect browsing of text collections. Rich-prospect browsing interfaces are designed to aid research tasks such as exploration and synthesis by providing both a meaningful representation of each item in a collection and tools to manipulate their visual organization (Ruecker 2003). The Ripper browser offers an environment for exploration and interaction with digital text documents. The system creates tiles that contain faceted information about each document. The tiles can be manipulated with a series of controls to reveal or hide details, organize them according to a particular hierarchy, or select a specific group. By adapting the size of the tiles, the Ripper browser allows researchers to visualize the complete collection and the precise information they need about each document in view at all times. The Ripper browser was developed in web-native technologies: HTML, JavaScript?, and uses the jQuery library. It is configured to use text collections provided by the MONK Project. The Ripper browser is part of an ongoing effort to understand the potential of rich-prospect browsing and improve on our strategies for designing rich-prospect tools. It has allowed us to experiment further with meaningful representation, increased our understanding of the importance of sequences, and provided insight into new possibilities for organization in visualizations.

Back-of-the-Book Index Generation by Patrick Juola (Best Experiment of Text Analysis Using High Performance Computing)

This is actually a work-in-progress; as I have detailed elsewhere (Juola, 2005, ACH/ALLC; Lukon and Juola, 2006, DH2006), we are working on a program to apply standard Machine Learning techniques, including latent semantic analysis (LSI), to the problem of back-of-the-book index generation. LSI’s implicitly huge document-by-term matrices to determine which words appear in similar contexts are therefore good candidates for grouping under a single index term.

The sheer size of this matrix makes it difficult to work without High Performance Computing; one of the tools we are using is the 200+ node Beowulf cluster available at Duquesne University Computer Science Department. We analyze the document to be indexed (which can in theory be arbitrarily large but in practice will be about novel-sized) to select candidate words (mostly nouns, via Part of Speech tagging) for indexing, then use LSA to identify potential relationships among those words.

Bookmarklet for Immediate Text Analysis by Peter Organisciak (Best Idea for Improving the Interface of the TAPoR Portal)

This idea is of an online interface for the generation of TAPoR bookmarklets on demand. Bookmarklets are browser bookmarks that run javascript code. They provide value to text analysis tools in two way: ease and ubiquity. They allow one-click connection of content to tool and, more importantly, allow it to be run on whatever content the user is at. One problem of bookmarklets is that they are static, which means that customization of the query is limited. One solution would be to call up an interface every time the bookmarklet is called. Doing so, however, is an impediment to the core concept of ease. Rather, through an interface for creating customized bookmarklets, a user can create single-purpose bookmarklet buttons that do the same command every time, immediately and directly.

Throwing Bones by Kirsten C. Uszkalo (Best Idea for a New Tool)

The Throwing Bones interface operates as a means to discover meaningful relationships within a corpus of texts. These relationships will appear as a series of piles, which the user can zoom into and out of, shuffle through, and examine closer for more comprehensive, annotated information. For example, in the case of a corpus of early modern witchcraft trials, a user might want to see the relationship between animal familiars and accusers. After shuffling, the top item in a pile would show the number of familiars, while the cards beneath show the number of accusers, illustrating a connection between the imagination of accusers and the presence of familiars in trials and texts. The piles could also be based on geographic, temporal, textual, or relationship proximity. The concept behind Throwing Bones is that the interface will not only offer the pleasure of play, but also erudite and serendipitous textual analysis.

More on TREX

-- StefanSinclair - 12 Jan 2009


Use this box to quickly add a comment to the page.

more options...