Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

Compare Two Documents

See http://taporware.mcmaster.ca/~taporware/htmlTools/comparator.shtml

Description

This tool compares two html texts over basic statistics such as page title, heading, open text, number of words, highest and average frequency of words etc. It also list the common words in both texts, the words in each text respectively with their counts. The words can be listed in different orders.

Pseudocode

  • Obtain text 1 and 2 by URLs or form user' local disk. Both texts must be html format
  • Get basic statistics of both text.
  • Extract user texts specified by html tags
  • Get statistics of both the subtext
  • List and count the words in both subtext
  • Extract the common words and uncommon words respectively
  • Sort the words based on user specification.
  • Generate output along with the graphics

Ways of Using

  • Enter a valid URL in the source text 1 URL field or enter a local path to upload html text 1
  • Enter a valid URL in the source text 2 URL field or enter a local path to upload html text 2
  • Enter a HTML tag or multiple tags separated by comma for the subtext you want to compare
  • Select the way words to be counted. If associated field if required, select and/or fill it with proper words/pattern
  • Select the sorting criteria
  • Select the output result format, though currently only HTML is implemented
  • Check if you want to display the result in a new window
  • Click the submit button

CGI Interface

If you want to use this tool from your web site, here is the CGI Interface: (Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was to designed to allow local file uploading even if you do not use this feature)

Here are the parameters:

Parameter Name Parameter Value Control Type Default Discription
source url/local radio button url This is for HTML text 1. Let user select input text (either a url or upload local html text)
htmlurl   text   A valid URL that the pointed document should be an html text
localFile   file   The path to your local html text file
source2 url/local radio button url This is for HTML text 2. Let user select input text (either a url or upload local html text)
htmlurl2   text   A valid URL that the pointed document should be an html text
localFile2   file   The path to your local html text file
tagword   text body Valid html element (tag) name or multiple html element name separated by comma. This tag will apply to both the texts
range all/patt/find/stop radio button all Options that let user select the word list he/she want to see
wpat   text   A unix styled pattern. This field corresponding to the value "patt" in the radio button group named "range"
findstop typedin/textfile/glasgow radio button glasgow The option are connected with value "find" and "stop" in the radio button group named "range"
typedinword   text   This text field is corresponding to the value "typedin" of radio button group named "findstop"
wordfile   file   This field is corresponding to the value "textfile" of radio button group named "findstop"
sorting 1/2 selection 1 Sorting criteria corresponding to word counting for text1/ratio of relative count
HowToList 1 selection 1 output format is HTML
taporface   checkbox checked if checked, the result will be displayed in a new window without taporware interface

Use Comparator TAPoRware Tool in Your Web Page

You can add a source URL and button in your web page to compare the text from your URL with the text of current page by call TAPoRware cgi script.

Sorece URL:   

Here is the code for the interface above:

<form method="post" name="htmlForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/hcomparator.cgi" onsubmit="document.htmlForm.htmlurl.value=document.location.href">

<input type="hidden" name="source" value="url" />

<input type="hidden" name="htmlurl" />

<input type="hidden" name="source2" value="url" />

Sorece URL: <input type="text" name="htmlurl2" />  

<input type="hidden" name="tagword" value="body" />

<input type="hidden" name="range" value="stop" />

<input type="hidden" name="findstop" value="glasgow" />

<input type="hidden" name="sorting" value="1" />

<input type="hidden" name="HowToList" value="1" />

<input type="submit" name="doIt" value="Compare" />

</form>

Web Service Interface

Not implemented yet

Known Bugs

To Do

-- MattPatey - 15 Oct 2005


Use this box to quickly add a comment to the page.

more options...