Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

TAPoRware Online Documentation

TAPoRware is a collection tools that enable user to perform text analysis on XML, HTML and plain text files over the Web. Go to http://taporware.mcmaster.ca to try them.

Description

TAPoRware tools are open tools written in Ruby (and in some cases Java) that can be used on the Web or downloaded for local use. Some of the features of TAPoRware are:

  • Each of the tools can be used off our server (TAPoRware) on texts anywhere on the web by entering the URL. TAPoRware tools will retrieve the text, process it and return results to you. TAPoRware tools can also process an uploaded text.
  • TAPoRware tools come in four flavours, those tools that will work on XML texts (including TEI encoded texts), those that work with HTML texts, those that work with plain texts, and those that can work on a combination (Other) of texts. Wherever possible we have similar tools for each type of text.
  • TAPoRware tools typically return results in four forms.
    • HTML that your browser renders for a readable results display.
    • An XML results file that could be saved and used for further processing.
    • The XML results converted so that it can be rendered as HTML in a browser.
    • A Java client-side interactive visualization.
  • You can download and install the TAPoRware suite. They will work on Windows, Mac OS X, and Linux. See instructions on the TAPoRware site.

History

  • TAPoRware tools began as a project by Geoffrey Rockwell while on sabbatical at the University of Virginia Institute for Advanced Technology in the Humanities. They have have been redeveloped by Lian Yan to work within the context of the TAPoR project. Others have begun to contribute tools, notably the Fixed phrase tools contributed by NYU.
  • In May of 2005 Matt Patey began work on the project. He developed a new interface, the help, and visualization tools like the Distribution, Word Rain, and Word Brush tools.

General Notes

General Problems with TAPoRware Tools

The TAPoRware tools generally work only on texts less than a Megabyte in size. As the text is not preindexed certain tools can take a long time, especially tools that use stemming or part-of-speech tagging.

Recipes for Text Analysis

We have a collection of recipes for doing research tasks with text analyis tools. Many of them use the TAPoR versions of the TAPoRware tools. The recipes are for things like identifying a theme in a text.

Contributing: Editing TAPoRware Wiki Pages

We encourage people to add comments to these pages on the TAPoRware tools or to edit the pages themselves.

Individual Tools


HTML Tools

General Notes on HTML Tools

List Words

List HTML Tags

Extract Text

Concordance

Co-occurence

Collocation

Tokenize

Fixed Phrase

Date Finder

Summarizer

Distribution Graph

Comparator

Link Extractor

XML Tools

General Notes on XML Tools

List XML Elements

Extract from XML

List Words

Concordance

Co-occurence

Collocation

Tokenize

Distribution

Fixed Phrase

Hypergraph

Date Finder

Summarizer

Comparator

Transformer

Plain Text Tools

General Notes on Plain Text Tools

List Words

Concordance

Co-occurrence

Collocation

Tokenize

Fixed Phrase

Date Finder

Summarizer

Comparator

Distribution

Speech Tagger

Word Pair Finder

Other Tools

Aggregator

Googlizer

Raining Words

Weighted Centroid

Visual Collocator

Pattern Match -- Raw Grep

Tagger

Beta Tools

Word Brush

Principal Components Analysis

Word Cloud

Analysis Tool Bar

CAPs Finder

HTML Text Extract

Synonym Finder

Get TEI Meta Data

Acronym Finder

Keywords Finder

Compare Control

Web Page Cleaner


Using TAPoRware as a web service

The TAPoRware tools are available to the TAPoR Portal as a web service. The portal has a searchable list of TAPoR Tools. For each tool you can get "Detailed Info" which provides a definition of the web service.


Testing

Click Testing Document to get our TAPoRware testing document.


Browser compatibility table

An ongoing browser compatibiltiy table of supported and unsupported features sorted by major web browsers.

To Do

  • We need to update these wiki pages - as the TAPoRware project is ongoing, these pages are by definition always catching up.
  • We need to go back through the tools and make sure they are Unicode compliant. This will mean upgrading to Ruby 1.8.
  • We need to develop tools that work better for languages other than English.
  • We need to redevelop our XML results language.


Useful Links

TAPoR Project: http://www.tapor.ca/

TAPoR Portal: http://portal.mcmaster.ca

TAPoR Development Portal: http://tapor2-dev.mcmaster.ca

Google Timeline View: http://www.google.com/views?q=geoffrey+rockwell+view:timeline&hl=en&esrch=RefinementBarTopViewTabs&sa=N&ct=timeline

-- GeoffreyRockwell - 27 Jun 2006


Use this box to quickly add a comment to the page.

more options...