|
Find collocates
See http://taporware.mcmaster.ca/~taporware/xmlTools/collocation.shtml
|
|
Description
Tool takes a word from the user and returns all of the words directly before and directly after it based on the given context. The results are listed alphabetically, by frequency, or by Z-score (an indication of how far and in what direction that item deviates from its distribution's mean, expressed in units of its distribution's standard deviation).
|
< < |
History
|
|
Pseudocode
|
> > |
- Obtain XML string by URL or from user's local disk. If the text is not an XML, return an error message
- Obtain text contained by user specified element name and/or attribute name and value
- Find user specified word/pattern along with user specified context -- concordance.
- If user selects "sorting by z-score", perform span word counting and total words counting, and then calculate the values of z-score
- Otherwise, sort and count the words of the concordance text
- Generate output of the collocates of the concordance text
|
|
Ways of Using
|
> > |
- Enter a valid URL in the URL field or enter a local upload xml text
- Enter a valid xml element name or or xpath existed in the xml text, the default is "//"
- Optionally, enter a valid attribute name and/or attribute value. If you enter attribute vlaue, you must enter attribute name as well.
- Enter a word or pattern in the corresponding text field.
- Select whether or not use xml element name as context. If ignore element tags, select the context of concordance and the length of context. If using tags as context, select either closest tag or user give tag. Enter a tag name if you select user given tag.
- Select the collocates sorting critieia
- Select output format
- If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
- Finally, click the "Submit" button
|
|
CGI Interface
|
> > |
If you want to use this tool from your web site, here is the CGI Interface:
(Note: If you want to upload local xml text to the tool, you need to use attribute name/value pair: enctype="multipart/form-data" within the form tag)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | Let user select input text (either a url or upload local xml text) |
| xmlurl | | text | | A Valid URL pointing to an xml text |
| localFile | | file | | The path to your local html text file |
| xmlpath | | text | // | Valid xml element (tag) name or multple xml element names separated by comma |
| attr_name | | text | | Valid xml attribute name |
| attr_value | | text | | Valid xml attribute value |
| pattern | | text | | pattern of the concordance |
| dispop | 1/2 | radio button | 1 | Let user select context type, either ignore element tags or use the tags as context |
| notags | 1/2/3 | selection | Words (1) | context type corresponding the values in the parameter value field: Words/Lines/Sentences. -- ignore tags |
| ctlen | | text | 5 | context length corresponding to the selected context -- ignore tags |
| showtag | 1/2 | radio button | 1 | use closest tag as context(1), or use specified element containing the text as context(2) -- use tag |
| surtag | | text | | specify a tag as context -- use tag |
| sorting | 1/2/3 | selection | by frequency(1) | sorting corresponding to the values in the parameter value field: by frequency/alphabetically/ by z-score |
| HowToList | 1/2/3/4 | selection | 1 | Display format which are HTML/XML text in HTML/XML tree/Tab delimited text in the order of parameter values |
| taporface | | checkbox | checked | display result in a new window without graphics interface (default) or with taporware interface in the same window |
|
|
Web Service Interface
|
> > |
Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:
- Endpoint URL: http://taporware.mcmaster.ca:9982
- Service URI: http://taporware.mcmaster.ca/~taporware/webservice
- Service Method: find_Collocation_XML
- parameters:
- xmlInput -- any xml string
- element -- any valid xml element name in the input text
- attributeName -- any valid xml attribute name in the input text
- attributeValue -- attribute value corresponding to the attribute name above in the input text
- pattern -- pattern in unix style or regular expression
- contextOption -- context type: ignore tags or use tags, the values are 1 (ignore tags) and 2 (use tags)
- optionSelection1 -- the meaning and value of this parameter depends on the value of contextOption. If the value of the contextOption is 1, then the value 1/2/3 of this parameter means context of words/lines/sentences (ignore tags). If the value of the contextOption is 2, the value of 1/2 of this parameter means closest element/user specified element (use tags).
- optionSelection2 -- the value of this parameter depends on the two parameters above. If the value of the contextOption is 1, enter a digit number which means the context length in words/lines/ sentences (with optionSelection1 is given as 1/2/3). If the value of the contextOption is 2 and the value of the optionSelection1 is 2 as well, you should give this parameter a string which is the surround xml element name as the context.
- sorting -- sorting criteria. The value of 1/2/3 of this parameter corresponds to sorted by frequency/alphabetically/by z-score
- outFormat -- values are same as parameter "HowToList" in the CGI interface above
|
|
Known Bugs
To Do
|