Compare two documents
See
http://taporware.mcmaster.ca/~taporware/xmlTools/comparator.shtml
Description
This tool compares two xml texts over basic statistics such as title, author, date of publication, number of words, highest and average frequency of words etc. It also list the common words in both texts, the words in each text respectively with their counts. The words can be listed in different orders.
Note: the information for the title, author etc is based on TEI DTD. Otherwise, the information may not be extracted.
Pseudocode
- Obtain text 1 and 2 by URLs or form user' local disk. Both texts must be xml format
- Get basic statistics of both text.
- Extract user texts specified by xml element
- Get statistics of both the subtext
- List and count the words in both subtext
- Extract the common words and uncommon words respectively
- Sort the words based on user specification.
- Generate output along with the graphics
Ways of Using
- Enter a valid URL in the source text 1 URL field or enter a local path to upload the xml text 1
- Enter a valid URL in the source text 2 URL field or enter a local path to upload the xml text 2
- Enter a valid xml element name or xpath for the subtext you want to compare. The default is "//"
- Select the way words to be counted. If associated field if required, select and/or fill it with proper words/pattern
- Select the sorting criteria
- Select the output result format, though currently only HTML is implemented
- Check if you want to display the result in a new window
- Click the submit button
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was to designed to allow local file uploading even if you do not use this feature)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | This is for XML text 1. Let user select input text (either a url or upload local html text) |
| xmlurl | | text | | A valid URL that the pointed document should be an XML text |
| localFile | | file | | The path to your local XML text file |
| source2 | url/local | radio button | url | This is for XML text 2. Let user select input text (either a url or upload local xml text) |
| xmlurl2 | | text | | A valid URL that the pointed document should be an xml text |
| localFile2 | | file | | The path to your local xml text file |
| xmlelem | | text | // | Valid xml element name or xpath. This tag will apply to both the texts |
| range | all/patt/find/stop | radio button | all | Options that let user select the word list he/she want to see |
| wpat | | text | | A unix styled pattern. This field corresponding to the value "patt" in the radio button group named "range" |
| findstop | typedin/textfile/glasgow | radio button | glasgow | The option are connected with value "find" and "stop" in the radio button group named "range" |
| typedinword | | text | | This text field is corresponding to the value "typedin" of radio button group named "findstop" |
| wordfile | | file | | This field is corresponding to the value "textfile" of radio button group named "findstop" |
| sorting | 1/2 | selection | 1 | Sorting criteria corresponding to word counting for text1/ratio of relative count |
| HowToList | 1 | selection | 1 | output format is HTML |
| taporface | | checkbox | checked | if checked, the result will be displayed in a new window without taporware interface |
Web Service Interface
Not implemented yet
Known Bugs
To Do
--
MattPatey - 15 Oct 2005