Find Text — Co-occurrence
See
http://taporware.mcmaster.ca/~taporware/textTools/cooccur.shtml
Description
Tool looks for two words a certain distance apart from one another. By entering a primary and secondary pattern,
TAPoR will search the document for anywhere that the two patterns are within the user-specified limits of words, sentences, or lines.
Pseudocode
- Obtain text string by URL or from user's local disk. If the text format is html or xml, strip off all the tags
- Find user specified primary pattern along with user specified context -- concordance
- Extract the concordances which contain the secondary pattern
- Generate output of concordance and word lists before and after the user specified word/pattern in the concordance text
Ways of Using
- Enter a valid URL in the URL field or enter a local path to upload the source text
- Enter the primary pattern in the primary pattern field
- Enter the secondary pattern in the co-pattern field
- Select the context of concordance and the length of context
- Select output format
- If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
- Finally, click the "Submit" button
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: If you want to upload local source text to the tool, you need to use attribute name/value pair: enctype="multipart/form-data" within the form tag)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | Let user select input text (either a url or upload local html text) |
| texturl | | text | | A Valid URL that the pointed document should be an html text |
| localFile | | file | | The path to your local html text file |
| primary | | text | | primary pattern of the concordance |
| co_pattern | | text | | secondary pattern of the concordance |
| context | Word/Line/ Sentence/Paragraph | selection | Word | context type |
| contLength | | text | 5 | context length corresponding to the selected context |
| HowToList | 1/2/3 | selection | 2 | Display format which are HTML/XML text in HTML/XML tree in the order of parameter values |
| taporface | | checkbox | checked | display result in a new window without graphics interface (default) or with taporware interface in the same window |
Use Find Text -- Co-occurrence TAPoRware Tool in Your Web Page
You can add two text field and a button in your web page to get the co-occurrence of the primary and secondary patterns you entered in that page by call
TAPoRware cgi script.
Here is the code that you can cut and paste to your web pages:
<form method="post" name="textForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/tcooccur.cgi" onsubmit="document.textForm.texturl.value=document.location.href">
<input type="hidden" name="source" value="url" />
<input type="hidden" name="texturl" />
<input type="hidden" name="freetext" value="yes"/>
Pattern: <input type="text" name="primary" /><br>
Co-pattern: <input type="text" name="co_pattern" />
<input type="hidden" name="context" value="Word" />
<input type="hidden" name="contLength" value="5" />
<input type="hidden" name="HowToList" value="1" />
<input type="hidden" name="taporface" value="same" />
<input type="submit" name="doIt" value="Get Co-occurrence of the Page" />
</form>
Web Service Interface
Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:
- Endpoint URL: http://taporware.mcmaster.ca:9982
- Service URI: http://taporware.mcmaster.ca/~taporware/webservice
- Service Method: find_Cooccurrence_Plain
- parameters:
- textInput -- any text source. If the text format is html or xml, all the tags will be stripped
- pattern -- primary pattern in unix style or regular expression
- copattern -- secondary pattern in unix style or regular expression
- context -- value can be 1/2/3/4 which coresponding to Words/Lines/Sentences/paragraphs respectively
- contextLength -- length of context
- outFormat -- values are html/xml/others where others give xml text in html
Known Bugs
To Do
--
LianYan - 28 Mar 2007