Comparator
See:
http://taporware.mcmaster.ca/~taporware/textTools/comparator.shtml
Description
This tool compares two plain texts over basic statistics such as first paragraph, second paragraph (these may be the title and the author), number of words, highest and average frequency of words etc. It also list the common words in both texts, the words in each text respectively with their counts. The words can be listed in different orders.
Pseudocode
- Obtain text 1 and 2 by URLs or form user' local disk. Both texts must be plain text format
- Get statistics of both text.
- List and count the words in both text
- Extract the common words and uncommon words respectively
- Sort the words based on user specification.
- Generate output along with the graphics
Ways of Using
- Enter a valid URL in the source text 1 URL field or enter a local path to upload the plain text 1
- Enter a valid URL in the source text 2 URL field or enter a local path to upload the plain text 2
- Select the way words to be counted. If associated field if required, select and/or fill it with proper words/pattern
- Select the sorting criteria
- Select the output result format, though currently only HTML is implemented
- Check if you want to display the result in a new window
- Click the submit button
Developing a Theme
This can be used to develop a theme. The collocates of a word (or words) that you are typical of a theme might suggest other words associated with the theme.
Semantic Field
The collocates for a target word or pattern might help you identify what words associate with the target. Be careful that some words that are collocates may not be in the same sentence or they might be collocates because they are high frequency words that appear as collocates randomly.
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was to designed to allow local file uploading even if you do not use this feature)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | This is for plain text 1. Let user select input text (either a url or upload local plain text) |
| texturl | | text | | A valid URL that the pointed document should be a plain text |
| localFile | | file | | The path to your local plain text file |
| source2 | url/local | radio button | url | This is for plain text 2. Let user select input text (either a url or upload local plain text) |
| texturl2 | | text | | A valid URL that the pointed document should be a plain text |
| localFile2 | | file | | The path to your local plain text file |
| range | all/patt/find/stop | radio button | all | Options that let user select the word list he/she want to see |
| wpat | | text | | A unix styled pattern. This field corresponding to the value "patt" in the radio button group named "range" |
| findstop | typedin/textfile/glasgow | radio button | glasgow | The option are connected with value "find" and "stop" in the radio button group named "range" |
| typedinword | | text | | This text field is corresponding to the value "typedin" of radio button group named "findstop" |
| wordfile | | file | | This field is corresponding to the value "textfile" of radio button group named "findstop" |
| sorting | 1/2 | selection | 1 | Sorting criteria corresponding to word counting for text1/ratio of relative count |
| HowToList | 1 | selection | 1 | output format is HTML |
| taporface | | checkbox | checked | if checked, the result will be displayed in a new window without taporware interface |
Use Comparator TAPoRware Tool in Your Web Page
You can add a text URL field and a button in your web page to compare the text of the URL you entered with the text at the current page by call
TAPoRware cgi script.
Here is the code for the interface above:
<form method="post" name="textForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/tcomparator.cgi" onsubmit="document.textForm.texturl.value=document.location.href">
<input type="hidden" name="source" value="url" />
<input type="hidden" name="texturl" />
<input type="hidden" name="freetext" value="yes"/>
<input type="hidden" name="source2" value="url" />
Source URL: <input type="text" name="texturl2" />
<input type="hidden" name="freetext2" value="yes"/>
<input type="hidden" name="range" value="stop" />
<input type="hidden" name="findstop" value="glasgow" />
<input type="hidden" name="sorting" value="1" />
<input type="hidden" name="HowToList" value="1" />
<input type="submit" name="doIt" value="Compare" />
</form>
Web Service Interface
Not implemented yet
Known Bugs
To Do
- We need to include the word or pattern for which the collocates are found.
- It would be nice to offer the same stop list options as in the list words tools.
- It would be nice to allow people to specify how many words before and after they want.
--
GeoffreyRockwell - 19 May 2005