Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

Comparator

See: http://taporware.mcmaster.ca/~taporware/textTools/comparator.shtml

Description

This tool compares two plain texts over basic statistics such as first paragraph, second paragraph (these may be the title and the author), number of words, highest and average frequency of words etc. It also list the common words in both texts, the words in each text respectively with their counts. The words can be listed in different orders.

Pseudocode

  • Obtain text 1 and 2 by URLs or form user' local disk. Both texts must be plain text format
  • Get statistics of both text.
  • List and count the words in both text
  • Extract the common words and uncommon words respectively
  • Sort the words based on user specification.
  • Generate output along with the graphics

Ways of Using

  • Enter a valid URL in the source text 1 URL field or enter a local path to upload the plain text 1
  • Enter a valid URL in the source text 2 URL field or enter a local path to upload the plain text 2
  • Select the way words to be counted. If associated field if required, select and/or fill it with proper words/pattern
  • Select the sorting criteria
  • Select the output result format, though currently only HTML is implemented
  • Check if you want to display the result in a new window
  • Click the submit button

Developing a Theme

This can be used to develop a theme. The collocates of a word (or words) that you are typical of a theme might suggest other words associated with the theme.

Semantic Field

The collocates for a target word or pattern might help you identify what words associate with the target. Be careful that some words that are collocates may not be in the same sentence or they might be collocates because they are high frequency words that appear as collocates randomly.

CGI Interface

If you want to use this tool from your web site, here is the CGI Interface: (Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was to designed to allow local file uploading even if you do not use this feature)

Here are the parameters:

Parameter Name Parameter Value Control Type Default Discription
source url/local radio button url This is for plain text 1. Let user select input text (either a url or upload local plain text)
texturl   text   A valid URL that the pointed document should be a plain text
localFile   file   The path to your local plain text file
source2 url/local radio button url This is for plain text 2. Let user select input text (either a url or upload local plain text)
texturl2   text   A valid URL that the pointed document should be a plain text
localFile2   file   The path to your local plain text file
range all/patt/find/stop radio button all Options that let user select the word list he/she want to see
wpat   text   A unix styled pattern. This field corresponding to the value "patt" in the radio button group named "range"
findstop typedin/textfile/glasgow radio button glasgow The option are connected with value "find" and "stop" in the radio button group named "range"
typedinword   text   This text field is corresponding to the value "typedin" of radio button group named "findstop"
wordfile   file   This field is corresponding to the value "textfile" of radio button group named "findstop"
sorting 1/2 selection 1 Sorting criteria corresponding to word counting for text1/ratio of relative count
HowToList 1 selection 1 output format is HTML
taporface   checkbox checked if checked, the result will be displayed in a new window without taporware interface

Use Comparator TAPoRware Tool in Your Web Page

You can add a text URL field and a button in your web page to compare the text of the URL you entered with the text at the current page by call TAPoRware cgi script.

Source URL:

Here is the code for the interface above:

<form method="post" name="textForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/tcomparator.cgi" onsubmit="document.textForm.texturl.value=document.location.href">

<input type="hidden" name="source" value="url" />

<input type="hidden" name="texturl" />

<input type="hidden" name="freetext" value="yes"/>

<input type="hidden" name="source2" value="url" />

Source URL: <input type="text" name="texturl2" />

<input type="hidden" name="freetext2" value="yes"/>

<input type="hidden" name="range" value="stop" />

<input type="hidden" name="findstop" value="glasgow" />

<input type="hidden" name="sorting" value="1" />

<input type="hidden" name="HowToList" value="1" />

<input type="submit" name="doIt" value="Compare" />

</form>

Web Service Interface

Not implemented yet

Known Bugs

To Do

  • We need to include the word or pattern for which the collocates are found.
  • It would be nice to offer the same stop list options as in the list words tools.
  • It would be nice to allow people to specify how many words before and after they want.

-- GeoffreyRockwell - 19 May 2005


Use this box to quickly add a comment to the page.

more options...