List words within specified element
See
http://taporware.mcmaster.ca/~taporware/xmlTools/listword.shtml
Description
This tool can be used to list all of the words found within a specified element. The query results can be displayed alphabetically, by frequency, by order of appearance, or in reversed alphabetical order. If no element is specified, all words will be returned.
History
Pseudocode
- Obtain XML string by URL or from user's local disk
- Obtain subtext specified by xml element name
- Tokenize text into words using spaces and punctuation marks
- Sort and count words with similar letters ignoring capitalization
- Extract words based on user specified criteria if necessary
- Generate output format based on user's selection
Ways of Using
- Enter a valid URL in the URL field or enter (browse) a local path in the local file field to upload the xml text
- Enter valid xml element names (or paths) seperated by comma, default is "//"
- Select which list you want to get and enter the corresponding text if necessary
- Select sorting criterion
- Check the checkbox underneath the sort selection control to apply the inflectional stemmer (note: check this box will take longer time to process)
- Select output format
- If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
- Finally, click the "Submit" button
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was to designed to allow local file uploading in mind)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | Let user select input text (either a url or upload local xml text) |
| xmlurl | | text | | A valid URL that the pointed document should be an xml text |
| localFile | | file | | The path to your local xml text file |
| xmlelem | | text | // | Valid xml element names or xpaths separated by comma |
| range | all/patt/find/stop | radio button | all | Options that let user select the word list he/she want to see |
| wpat | | text | | A unix styled pattern. This field corresponding to the value "patt" in the radio button group named "range" |
| findstop | typedin/textfile/glasgow | radio button | glasgow | The option are related to the value "find" and "stop" in the radio button group named "range" |
| typedinword | | text | | This text field is corresponding to the value "typedin" of radio button group named "findstop" |
| wordfile | | file | | This field is corresponding to the value "textfile" of radio button group named "findstop" |
| sorting | 1/2/3/4 | selection | 2 | Sorting criteria which are alphabetically/by frequency/by order of first appearance/by reversed alphabetic order in the order of parameter values |
| stem | | checkbox | unchecked | check to apply the inflectional stemmer |
| display | 1/2/3/ | selection | 1 | Display format which are HTML/XML tree/tab Delimited Text in the order of parameter values |
| taporface | | checkbox | checked | display result in a new window without graphics interface (default) or with taporware interface in the same window |
Web (SOAP) Service Interface
Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:
- Endpoint URL: http://taporware.mcmaster.ca:9982
- Service URI: http://taporware.mcmaster.ca/~taporware/webservice
- Service Method: list_Words_XML
- parameters:
- xmlInput -- any well formed xml text
- element -- any valid xml element names (xpath) in the input xml text separated by comma
- listOption -- values are same as parameter "range" in the CGI interface above
- optionSeletion -- values depend on the "list option"
- sorting -- values are same as parameter "sorting" in the CGI interface above
- outFormat -- values are same as parameter "display" in the CGI interface above
Known Bugs
To Do
--
MattPatey - 13 Oct 2005