Find co-occurring words
See
http://taporware.mcmaster.ca/~taporware/xmlTools/cooccur.shtml
Description
This tool looks for two words a certain distance apart from one another. By entering a primary and secondary pattern,
TAPoR will search the document for anywhere where the two patterns are within the user-specified limits of words/sentences/lines or surrounding elements.
Pseudocode
- Obtain XML text by URL or from user's local disk
- Obtain text contained by user specified elements or/and attribute name/value pairs, default is the root element. Note: if you specify attribute value, attribute name must be entered too.
- Find user specified primary pattern along with user specified context -- concordance, within the user specified text
- Extract the concordances which contain the secondary pattern
- Generate output of concordance which contains both patterns
Ways of Using
- Enter a valid URL which points to an xml file in the URL field or enter a local upload xml text (if the file is not an xml, an error message will be returned).
- Enter a valid xml element name or element list seperated by comma, default is "//"
- Enter the primary pattern in the primary pattern field
- Enter the secondary pattern in the secondary pattern field
- Select the context of concordance and the length of context. Note: there are two ways for the context -- ignore element tags and use element tags as context.
- Select output format
- If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
- Finally, click the "Submit" button
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: If you want to upload local xml text to the tool, you need to use attribute name/value pair: enctype="multipart/form-data" within the form tag)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | Let user select input text (either a url or upload local xml text) |
| xmlurl | | text | | A Valid URL pointing to an xml text |
| localFile | | file | | The path to your local html text file |
| xmlpath | | text | // | Valid xml element (tag) name or multple xml element names separated by comma |
| attr_name | | text | | Valid xml attribute name |
| attr_value | | text | | Valid xml attribute value |
| pripat | | text | | primary pattern of the concordance |
| copat | | text | | secondary pattern of the concordance |
| dispop | 1/2 | radio button | 1 | Let user select context type, either ignore element tags or use the tags as context |
| notags | 1/2/3 | selection | Words (1) | context type corresponding the values in the parameter value field: Words/Lines/Sentences. -- ignore tags |
| ctlen | | text | 5 | context length corresponding to the selected context -- ignore tags |
| showtag | 1/2 | radio button | 1 | use closest tag as context(1), or use specified element containing the text as context(2) -- use tag |
| surtag | | text | | specify a tag as context -- use tag |
| HowToList | 1/2/3/4 | selection | 1 | Display format which are HTML/XML text in HTML/XML tree/Tab delimited text in the order of parameter values |
| taporface | | checkbox | checked | display result in a new window without graphics interface (default) or with taporware interface in the same window |
Web Service Interface
Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:
- Endpoint URL: http://taporware.mcmaster.ca:9982
- Service URI: http://taporware.mcmaster.ca/~taporware/webservice
- Service Method: find_Cooccurrence_XML
- parameters:
- xmlInput -- any xml string
- element -- any valid xml element name in the input text
- attributeName -- any valid xml attribute name in the input text
- attributeValue -- attribute value corresponding to the attribute name above in the input text
- pattern -- primary pattern in unix style or regular expression
- copattern -- secondary pattern in unix style or regular expression
- contextOption -- context type: ignore tags or use tags, the values are 1 (ignore tags) and 2 (use tags)
- optionSelection1 -- the meaning and value of this parameter depends on the value of contextOption. If the value of the contextOption is 1, then the value 1/2/3 of this parameter means context of words/lines/sentences (ignore tags). If the value of the contextOption is 2, the value of 1/2 of this parameter means closest element/user specified element (use tags).
- optionSelection2 -- the value of this parameter depends on the two parameters above. If the value of the contextOption is 1, enter a digit number which means the context length in words/lines/ sentences (with optionSelection1 is given as 1/2/3). If the value of the contextOption is 2 and the value of the optionSelection1 is 2 as well, you should give this parameter a string which is the surround xml element name as the context.
- outFormat -- values are same as parameter "HowToList" in the CGI interface above
Known Bugs
To Do
--
MattPatey - 13 Oct 2005