Compare Two Documents
See
http://taporware.mcmaster.ca/~taporware/htmlTools/comparator.shtml
Description
This tool compares two html texts over basic statistics such as page title, heading, open text, number of words, highest and average frequency of words etc. It also list the common words in both texts, the words in each text respectively with their counts. The words can be listed in different orders.
Pseudocode
- Obtain text 1 and 2 by URLs or form user' local disk. Both texts must be html format
- Get basic statistics of both text.
- Extract user texts specified by html tags
- Get statistics of both the subtext
- List and count the words in both subtext
- Extract the common words and uncommon words respectively
- Sort the words based on user specification.
- Generate output along with the graphics
Ways of Using
- Enter a valid URL in the source text 1 URL field or enter a local path to upload html text 1
- Enter a valid URL in the source text 2 URL field or enter a local path to upload html text 2
- Enter a HTML tag or multiple tags separated by comma for the subtext you want to compare
- Select the way words to be counted. If associated field if required, select and/or fill it with proper words/pattern
- Select the sorting criteria
- Select the output result format, though currently only HTML is implemented
- Check if you want to display the result in a new window
- Click the submit button
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was to designed to allow local file uploading even if you do not use this feature)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | This is for HTML text 1. Let user select input text (either a url or upload local html text) |
| htmlurl | | text | | A valid URL that the pointed document should be an html text |
| localFile | | file | | The path to your local html text file |
| source2 | url/local | radio button | url | This is for HTML text 2. Let user select input text (either a url or upload local html text) |
| htmlurl2 | | text | | A valid URL that the pointed document should be an html text |
| localFile2 | | file | | The path to your local html text file |
| tagword | | text | body | Valid html element (tag) name or multiple html element name separated by comma. This tag will apply to both the texts |
| range | all/patt/find/stop | radio button | all | Options that let user select the word list he/she want to see |
| wpat | | text | | A unix styled pattern. This field corresponding to the value "patt" in the radio button group named "range" |
| findstop | typedin/textfile/glasgow | radio button | glasgow | The option are connected with value "find" and "stop" in the radio button group named "range" |
| typedinword | | text | | This text field is corresponding to the value "typedin" of radio button group named "findstop" |
| wordfile | | file | | This field is corresponding to the value "textfile" of radio button group named "findstop" |
| sorting | 1/2 | selection | 1 | Sorting criteria corresponding to word counting for text1/ratio of relative count |
| HowToList | 1 | selection | 1 | output format is HTML |
| taporface | | checkbox | checked | if checked, the result will be displayed in a new window without taporware interface |
Use Comparator TAPoRware Tool in Your Web Page
You can add a source URL and button in your web page to compare the text from your URL with the text of current page by call
TAPoRware cgi script.
Here is the code for the interface above:
<form method="post" name="htmlForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/hcomparator.cgi" onsubmit="document.htmlForm.htmlurl.value=document.location.href">
<input type="hidden" name="source" value="url" />
<input type="hidden" name="htmlurl" />
<input type="hidden" name="source2" value="url" />
Sorece URL: <input type="text" name="htmlurl2" />
<input type="hidden" name="tagword" value="body" />
<input type="hidden" name="range" value="stop" />
<input type="hidden" name="findstop" value="glasgow" />
<input type="hidden" name="sorting" value="1" />
<input type="hidden" name="HowToList" value="1" />
<input type="submit" name="doIt" value="Compare" />
</form>
Web Service Interface
Not implemented yet
Known Bugs
To Do
--
MattPatey - 15 Oct 2005