Between Language and Literature: Digital Text Exploration
Geoffrey Rockwell and Stéfan Sinclair
Introduction
This article argues for a rethinking of the creation and use of computing resources for teaching at the interstices of language and literature. Much of the teaching in undergraduate language departments is in fact a blend of both linguistic and literary content: it is often about achieving a better grasp of the mechanics of language in order to better express ideas about texts (though of course the emphasis on linguistic proficiency may vary according to the level of the course and the native language of the students). However, very few electronic resources attempt to integrate both linguistic and literary sensibilities.
Computer-Assisted Language Learning (CALL) has, in over three decades, experimented with a variety of techniques for enhancing the teaching and learning of languages (particularly as a second language, or L2). Many of its most compelling exemplars are interactive and adaptable learning modules that address a specific linguistic area (for instance, mastering nominative and object pronouns). Although texts are sometimes used in CALL programs, they are most often hard-coded (determined) and secondary in importance to the linguistic objectives.
Computer-assisted text analysis has also matured considerably in the past 50 years, from a time when computers were primarily used as a means of producing concordances more efficiently to now, when sophisticated packages exist for performing a variety of statistical and quantitative functions. Nevertheless, rarely do text analysis tools blend text and data views, and rarer still are tools that are designed for enhancing the understanding of both language and literature.
This paper's objectives are two-fold: 1) to describe design principles of digital texts and tools that better correspond to the context of teaching in undergraduate language programmes (elements from tools that we have developed will be offered as examples); and 2) to provide to instructors some practical suggestions for using digital text exploration in the classroom as a means to both linguistic and literary ends.
What is Text Analysis and Why is it Important to Learning Online
What is Text Analyis and Why is it Important to Learning Online
Text analysis is a way of asking questions of a text that has its roots in concording. When we encounter a text we bring questions to it like,
- What is this article about?
- What happens to Hamlet in the end?
- Where is that passage about the war of 1812?
- What does the author have to say about “friendship”?
Reading the text is one way to answer the question, but most of us learn to skim ahead, to use an index or to use a tool like a concordance to answer certain questions. Computer-assisted text analysis leverages the searching and counting capabilities of the computer to help readers answer questions. Text analysis is really not one thing, but a class of interpretative methods that can be used appropriately or inappropriately in the careful study of a text.
Text analysis techniques evolved from concording. The concordance is an interpretative tool that dates back to the 13th century that brings together a collection of passages that agree in some fashion, usually because they contain the same keyword. The first concordances to the Bible allowed students to think through what the Bible might say about a subject like “friendship” even where there was no single explicit discussion.
The task of creating concordances of verbose authors was so time consuming that starting in the 1940s people like Father Busa began to imagine how to use information technology to automate the creation of concordances. In the 1960s researchers developed software like COCOA for mainframes for batch concording and this software evolved into microcomputer concording tools like
MircoOCP? (Oxford Concording Program.) Concording became more than a tool for the preparation of print concordances with the advent of interactive tools like
TACT which was released by the University of Toronto in 1989.
TACT made it easy to iteratively ask questions, search for words, view passages concorded, and look at the distribution of words.
sceptic (11)
[1,47] abstractions. In vain would the sceptic make a distinction
[1,48] to science, even no speculative sceptic, pretends to entertain
[1,49] and philosophy, that Atheist and Sceptic are almost synonymous.
[1,49] by which the most determined sceptic must allow himself to
[2,60] of my faculties? You might cry out sceptic and railer, as much as
[3,65] profession of every reasonable sceptic is only to reject
[8,97] prepare a compleat triumph for the Sceptic; who tells them, that
[11,121] to judge on such a subject. I am Sceptic enough to allow, that
[12,130] absolutely insolvable. No Sceptic denies that we lie
[12,130] merit that name, is, that the Sceptic, from habit, caprice,
[12,139] To be a philosophical Sceptic is, in a man of
TACTWeb Keyword in Context
Interactive concording tools are study aides that replace the print concordances and indexes we use to help understand a text. As such their primary use in learning is to help students think through a text, whether literary, linguistic, cultural, or other. Text analysis tools give students the ability to do research quickly and across large collections of information. Generally speaking there are three uses of computer based text analysis in learning:
- Search large texts quickly. Text analysis environments that combine large text collections with easy to use search tools let students find information that only an expert would be able to find by other means. In the teaching of history, for example, students can use text analysis to survey electronic archives finding evidence.
- Conduct complex searches. Increasingly text analysis environments provide the ability to conduct complex searches that look for the co-occurence of words or for patterns rather than words. Language and linguistics students can use text analysis to study the use of language in collections of authentic materials. Further, text analysis environments become the site for reflection on textuality and patterns in language. Having to use a tool forces students to ask what they tool can do with text, how to use it, and ultimately what computers can find about a text.
- Present results in informative ways. Text analysis environments can present information in ways that provoke reflection rather than simply answering questions.
As the wealth of evidence important to students becomes available in digital form, computer assisted research methods are becoming important. With an estimated 2 exabytes of new information being produced each year we are all experiencing information overload.
Lyman and Varian 2003 Much of this excess information is textual and most of it is born digital on computers so it is accessible to computer methods. Being able to use search and analysis tools is an important way of dealing with the excess – a way that has its roots in traditions of humanistic thought.
To teach using text analysis you need to create an environment, or build on an existing one, where the appropriate electronic texts and tools are brought together. For many pedagogical uses existing environments suffice, but where an instructor can control the environment there are three ways the environment can be prepared for student learning with text analysis.
- Text Enrichment. The electronic text itself can be enriched with annotations, associative links, or linguistic information.
- Powerful Tools. Tools can be provided that are optimized for answering certain types of questions.
- Rich Interfaces. The electronic texts can be displayed in conjunction with other resources to create a conducive study environment.
The Web of Evidence
A consideration of the evolution of technology-assisted language and literature instruction reveals an intriguing paradox. On the one hand, pioneers in the discipline (especially CALL) are consistently among the first implementers – and sometimes innovators – of new technologies. For instance, when online virtual worlds in the form of MUDs (Multi-User Dungeons) were first developing in the early 1980s, educators were already finding ways of integrating them into the classroom. Likewise, when CD-ROMs were being disseminated as data media for games and other interactive activities in the early 1990s, instructional applications were among the first to appear (though this time often with a more commercial impetus). On the other hand, the broad adoption of specific technologies in the classroom is usually much more gradual, often slower than that technology’s general penetration in society. Readily recognisable factors are at play, of course, including the economic realities of acquiring and maintaining technologies within scholastic budgets, and the usually bureaucratic pace of change of curricula and instructional techniques. We may lament the relatively slow pace of change in some ways, but it can also be argued that the delay provides an opportunity to fully determine the viability and promise of certain technologies.
If any technology since the personal computer itself has proven its potential for a range of instructional purposes, it is certainly the web. And though the web may seem as prevalent and ubiquitous as any technology currently available, upon closer inspection its use in instruction has followed the same trajectory as described above: intense, creative use by educational pioneers as far back as the first half of the 1990s, but it was not until the first part of this decade that high-speed internet, properly equipped computer labs, and integration of web-based activities into the curriculum were commonplace in schools (colleges and universities have had a faster rate of adoption, but in all cases the hardware and software almost always precede effectively framed uses).
Though students have obviously taken the initiative in exploiting the rich resources of the web (Miall, …), it is far less certain that teachers have been able to effectively harness the power of the web for language and literature instruction. An informal survey of almost two dozen colleagues in both K-12 and university settings reveals that the web is hardly used in a deliberate and conscious manner by instructors. Beyond token webpages of useful links and resources, or the reliance on relatively superficial features of Course Management Systems like
WebCT? or Moodle, the onus for benefiting from the web is left to students. (This is not to say that there are not many outstanding uses of the web by language and literature instructors, but what we know of these tend to be isolated cases reported anecdotally in conferences and journals; by no mean a reflection of the norm).
We take the discrepancy between the broad, public use of the web and its much more limited use in the language and literature classroom to be a reflection of two realities in particular:
- an unfamiliarity with many of the web-based resources that exist, and effective techniques for exploiting them
- a paucity of easy-to-use, flexible, interactive and instructionally sound online resources for language and literature learning (particularly ones that go beyond relatively simple grammar exercises)
In many ways this article is meant to contribute, however modestly, to the first point, that of familiarity with resources and their use. We will refer to several resources that may be of use to some readers, but our primary objective is really to encourage a rethinking of web-based resources, not as neatly packaged and ready-to-use activities or exercises, but rather, as endlessly observable and reconfigurable digital texts, ones that have unfathomable potential for the learner (and teacher) of a language and its literature. As a primarily text-based medium (almost all web searches, even for images, are text-based), the web truly is a trove of useful content for language, representing the entire spectrum of authentic production (authentic in the sense of not contrived for the purposes of a pedagogical exercise), from the struggling non-native speaker to the most eloquent writer.
The second point, regarding a lack of worthy resources, has a natural link with the first: developers of web-based language resources have operated largely within the same paradigm as the pre-web world, for the most part replicating functionality that has long been possible with electronic gadgets and personal computers. Relatively little effort, it seems, has been devoted to exploiting the web as a rich and enormous corpus of authentic digital texts. We wish to emphasize the digital nature of these texts, i.e. the fact that they are essentially composed of many discreet units of information that can be broken-down, analyzed, reconfigured, and reassembled in innumerable ways. Such operations are the very strength of text analysis.
In the next section we will provide a set of recipes – or scenarios – that suggest some of the practical ways in which online text analysis tools can play a role in the instruction of language and literature. For some readers these recipes may provide the outline for concrete activities to use in a course, for others they may spark ideas of other possible recipes. In any case, our primary intent is to demonstrate how text analysis can be used in conjunction with the web to study language and literature in ways that were not previously possible.
Recipes for Text analysis in the Instruction of Language and Literature
Following a Theme Through a Work
One of the most common tasks we ask of students of literature is to discuss how a theme is handled in a literary or intellectual work. Students have traditionally used indexes, where available, to follow a theme through a work. With access to an electronic version of a text a student can now use the search function in a word processor to search for words that would be indicative of the theme. With more advanced search tools or text analysis environments students can build their own study concordance of passages around a theme. The steps a student can take typically involve the following:
- Access an appropriate edition of the text under study. The instructor who wants to encourage text analysis for learning should guide the students to appropriate editions.
- Identify the theme for study. This is the difficult part since most interesting themes are not found simply by searching for a single word. Students should be encouraged to develop a list of words that might be indicative of theme. These words could be synonyms (see Thesaurus.com or WordNet.)
- Use a search and concordance tool like the TAPoRware Find Text - Concordance tool. This tool will let the user submit the URL for an HTML version of a work and provide a list of words to search for. It then generates a concordance of passages for reflection. HyperPo is an online interactive concordance which, while more complex, lets one explore
Many themes can't be easily followed by searching for words. The challenge of using searches to follow a theme provides an opportunity to engage students on the issue of words and meaning. Some strategies to enhance this recipe are:
- Collocation tools like the TAPoRware Find Text - Collocation will show what words are located near the word you search for. Students can search for a word clearly related to the theme in order to find other words that might help follow the theme. The high frequency words in the neighborhood of a keyword can also provide a sense of the semantic field of an idea. This can be useful for brainstorming around a theme in order to develop original essay ideas. Students should be encouraged to ask what terms are anomalies - what words did then not expect? What stands out?
- List Words is a way to identify what might be the interesting themes in the first place. A tool like the List words TAPoRware tool can provide a list of words sorted by frequency. The words that appear often in a text, at least those that are not function words like "the" and "a", can indicate themes. Students should be encouraged to ask why an author would use a content word frequently as they scan the list of words. An alternative to this approach is to compare the text to another in order to identify words that appear more often in the target text than a control sample. Again, the high frequency words can be indicative of themes worth following with a concording tool.
Grammar Verification
One of the most common exercises for a language student is the written composition (where practice of correct written expression is more important than the given topic). With the decline of hand-written submissions (even in primary schools), word processors have become a key location of language production and learning. Yet surprisingly, the multilingual orthographic and grammatical capabilities of common word processors (like MS Word or
WordPerfect?) are seldom used, particularly for languages other than English (open-source and online editors, such as
OpenOffice? or Writely.com, usually lack grammar checking capabilities). Rarer still is the use of stand-alone grammar checkers (such as Correcteur 101 or Anectode for French). This situation may be explained in part by the additional cost of some language modules and in part by the additional steps needed to install those modules, assuming the user is even aware that they exist). But mostly, the underuse of word processors is explained by the fact that the relevant technologies are almost never integrated directly into the instructional context. This is a shame, since the moment of production is so pivotal in effective language acquisition (catching mistakes as they happen rather than expecting students to thoroughly examine a teacher-corrected text).
“LePatron: French Writing Assistant” is a free, online tool (see
LePatron?.ca) designed specifically for learners of French as a second language. Its initial objectives were twofold: 1) to provide friendly, accessible explanations to students of common linguistic pitfalls (feedback from grammar checkers is usually aimed at native speakers and can be linguistically complex and difficult to comprehend), and 2) to help the instructor avoid repetitive corrections for common mistakes – in other words, a first line of defence for both instructor and student.
Co-developed by Terry Nadasdi and one of the authors (Sinclair), “LePatron” differs from most grammar tools in its pedagogical design: potential errors are flagged and explained, but not automatically or even easily corrected. The student must become an active participant in learning by manually correcting the text (rather than, say, right-clicking on a suggested edit, thereby circumventing the need to actually write the correct form). The feedback provided is intended to clearly explain a grammatical point, it is left to the student to apply the principle. In some cases, when the explanation is insufficient, an additional built-in page of explanation can be invoked, including some interactive exercises, and in other cases external resources are suggested (in particular Martin Beaudoin’s Pomme site (
http://www.pomme.ualberta.ca/). These linked resources are an exemplary way of making the most of the web-based learning context.
As is
LePatron? can be a useful tool for the French language learner and instructor (the site currently receives over 20,000 hits per day from almost 100 different countries). We keep improving the site based on user feedback and by consulting the logs of over 190 million words in nearly one million texts (as of September 2006). One of the more interesting activities that can be done with students is to examine closely some of the strengths and weaknesses of the tool, and to speculate, linguistically, on why that might be. This emphasizes for students a fundamental reality: no grammar checker on the market today is perfect (far from it), for a variety of reasons, including the potential for syntactic complexity and semantic ambiguity. For instance, one of
LePatron?’s greatest weaknesses currently is its inability to deal intelligently with proper nouns. As such, almost all capitalized words are ignored (as potential proper nouns), even at the beginning of a sentence. Though this is a weakness of the tool, it is also an opportunity for discussion in the classroom about some characteristics of the French language.
Fill-in-the-Blanks for Dynamic Texts
A common drill in the toolbox of the language instructor is the fill-in-the-blank exercise. This type of exercise has easily made the transition from paper to the screen, often to great benefit. Creators of electronic fill-in-the-blank exercises are able to anticipate a variety of mistakes and have the computer provide immediate and meaningful feedback to the student – an obvious improvement over the print counterpart. However, despite excellent tools for creating electronic exercises (such as Hot Potatoes, see
http://hotpot.uvic.ca/), electronic exercises take considerable time to develop, and involve hard-coded content (the same way that hypertextual links are almost always hard-coded in a document).
Since tools exist to perform syntactic and morphological analysis of texts (identifying parts of speech, like nouns and verbs, and canonical forms like the singular of a plural word), it should be possible to automatically generate fill-in-the-blank exercises, using any text on the web (in the appropriate language). This is precisely what the “HyperPoet: Linguistic Fill-in-the-Blanks” tool does, developed by one of the authors (Sinclair), see
http://hyperpo.mcmaster.ca/LinguisticFillBlanks/ [this tool is ready, it will be available by publication time]. Users can point to a web address, upload a file, or paste content into a box, and the tool will use the specified options to dynamically create a fill-in-the-blanks exercise (in English, French, Italian, German, or Spanish). For instance, one page could be returned where the infinitive of all verbs is provided and the student must fill in the box with the correctly conjugated form. Alternatively, all prepositions could be replaced by blanks, and the student would need to provide the appropriate form.
One of the remarkable features about tools of this type is that students are able to generate useful exercises for themselves, they are not dependent solely on the instructor. Students can decide which type of text corresponds best to their level (e.g. newspaper articles, forum discussions, poetry), and generate as many exercises as they wish from authentic texts. Of course, such automatically generated exercises can also have disadvantages, such as misleading errors in the morphological analysis engine or the inability for instructors to provide contextual feedback (based on a specific incorrect answer). Still, much is gained in the use of text analysis on the rich corpus of authentic texts on the web.
Examine the Language Level of Texts
Several techniques have been developed by text analysis scholars for algorithmically quantifying the level of language of a text. Naturally, each technique has its particular strengths and weaknesses, and each one may be more applicable to different genres. Potentially more revealing than the results of any of these techniques, however, is an examination of the techniques themselves and the linguistic principles underlying them.
One readily accessible text analysis tool that can provide a variety of data on a text is Textalyzer (see
http://textalyser.net/; many other tools exist online, including those found at
http://tapor.ca/). Each step in the use of the Textalyzer tool is a pedagogical opportunity to better understand language analysis and methodologies. For instance, the interface on the first screen, as relatively simple as it is, contains terms such as “stoplist” and “polyword phrases”. Similarly, the results screen contains several terms that are worth examining closely, such as “lexical density” and the “Gunning-Fox Index”; a valuable exercise might be to have students research these terms on the web and compare their definitions. This article is not the appropriate venue to consider each one of the concepts and merits underlying the results of the Textalyzer [see…], suffice it to say that there is plenty of fodder for discussion of issues of language and its analysis.
We have found that students enjoy submitting their own texts to these types of analysis tools, where they discover aspects of their writing of which they were not aware (like a propensity for repeating a given phrase). An engaging activity can be to have students try to find texts on the web that most closely resemble the data profile of their own texts. Doing so can provoke interesting results and awaken the curiosity of the students for the relationship between text analysis and linguistic proficiency.
Conclusions
The use of the web for the instruction of language and literature has two noteworthy characteristics: 1) a vast majority of the tools available (such as interactive grammar exercises) are modest reformulations of what was possible prior to the web (and in some cases even more modest reformulations of what was possible before computers); and 2) links to helpful resources on the web are often compiled, but the actual textual content – a vast corpus of authentic texts – is seldom truly exploited. We have attempted to argue that computer-assisted text analysis can play a transformative role in the teaching of language and literature, especially when explored in the context of the web. The recipes that we have provided might serve as a starting point for further exploration of the interstices of computer-assisted literary and linguistic study.