Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

What is an electronic text?

Note: This is a work in progress.

"This purpose entails reading the text without reading it, that is reading without interpreting ..." (Ferrara 189)

Machine Readable Text

Electronic texts are frequently referred to as "machine readable" meaning that an e-text is a text which can be read by a machine like a computer. To be machine readable the text has to be:

  • Physically inscribed on some device that a computer can read like the platter in a hard drive or a floppy disk.
  • Formally encoded as digital data, by which we mean in binary digital code so that the computer can read the data off the physical device.
  • Logically encoded in a useful (standard) character set that maps the digital codes to textual characters like an alphabet. ASCII would be an example of a standard character set.

To actually be able to humanly read an electronic text you then need a computer capable of "reading" the e-text and rendering it as legible text on the screen. The computer reads the text to us so that we can read it.

We can distinguish the following ways in which electronic texts are talked about:

  • Material (copy) The physical e-text is the material object, like a particular floppy disk, from which the computer can render a reading. Material copies of e-texts are burned on to CD-Roms, stored on disks, saved to USB keys and physically manipulated.
  • File The file is an object of the operating system. It is the sequence of bits that can be saved, copied, loaded, or transmitted. Operating systems will often inscribe files in chunks spread out and out of order on a device, but the operating system can reconstitute the file as needed. Files are what we usually talk about doing things to with the computer when we are manipulating e-texts as objects, as opposed to opening and reading them. We copy files, back them up, attach them to e-mail and so on.
  • Document A document is how we think about a text, electronic or not. A document contains information that has been written or recorded by someone to be read by others. A document has a history and its reading can have consequences. It can be private or public. It can be a version of something in progress or something published. It doesn't have to be electronic, though many documents are electronic at some point in their history of production and some are distributed electronically over the web.
  • String A string is a data type in computer science that is usually used for text. A string is an ordered sequence of symbols. The symbols are encoded using a finite character set. Strings are what we process with text analysis tools.
  • View A view is what we see when the computer renders the e-text to the screen. It is the output that we can use even when that output purports to represent the underlying structure of the file like the "reveal codes" view in WordPerfect?.

We can therefore theorize that an electronic text is the conditions for the reliable rendering of a text for human use. It doesn't really matter what is under the hood, if we agree that what is rendered is the "same" text. These conditions include access to a working computer with an operating system and software that can produce a view that has a family resemblance to what was expected.

Text and Metadata

We need to account for the knowledge added to a text with markup. Most electronic texts we view on the web have markup that describe how they could be rendered, and this computer-parsable markup seems to be what distinguishes electronic texts from print. An HTML page, an XML document or an RTF document looks like plain print text with the addition of formal information about how the document should look or work. Markup seems an essential possibility for electronic texts.

Take the following statement:

<A href="http://tada.mcmaster.ca/WhatET">What is an electronic text?</A>

There are two obviously different types of information in this sentence, the linguistic text read when it is rendered and the markup, namely the A or Anchor tags surrounding the text. The electronic text is the plain text enhanced with descriptive markup that can be processed by the computer. The markup is not meant to be read by us, but by the computer so that the browser can render this as a link which if clicked will link you to this page.

What are these two orders of electronic textuality? The plain text is the sequence of charactes, punctuation, and non-printing characters that we can see in a view. The plain text is functionally equivalent to what you would read in a print representation even if the typeface and arrangement were different. The markup is also made up of short strings of characters which can, when looking at the code, be read by us. The markup is set off by *escape characters like "<", ">" and "&". (To even show you these characters I have to escape them.) When the computer encounters these characters it recognizes that it is encountering a tag which should be interpreted differently. The name of the tag, its attributes, and its location are used by the computer for processing and rendering the e-text.

We can think about markup as embedded metadata, or data about the text that is embeded in the electronic text. Metadata is usually thought of as editorial information about the text as a whole that is associated with the text but not in it. Markup in a text can likewise be thought of as metadata about segments of text - information about the segment that is associated with it by positioning (the tags surround the segment) but not in the text in the linguistic sense. For that matter metadata can stand outside the file in another file like a CSS file and still perform its function of describing things about the text for the computer.

Dino Buzzetti calls markup "diacritical" in the sense that it describes how the text should be expressed (or rendered) by the computer. A diacritical mark "distinguishes" or alters the character which it marks. As Buzzetti points out, it is both of the text and about the text. As such markup can be thought of as extended punctuation - a full stop is both in the flow of the text and provides information about its performance like when to pause when reading.

As Allen Renear and others have argued (Renear) we can usefully think about a text as "an Ordered Hierarchy of Content Objects" (OHCO thesis).

Bibliography

A. Ferrara. "Ideas for the automatic generation of textual maps" in Augmenting Comprehension: Digital Tools and the History of Ideas. London: Office for Humanities Communication, 2004. pages 189-205.

A. Renear, E. Mylonas and D. Durand. Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies. Scholarly Technology Group at Brown University, 1993.

-- GeoffreyRockwell - 11 Mar 2007


Use this box to quickly add a comment to the page.

more options...