Find Text — Co-occurrence
See
http://taporware.mcmaster.ca/~taporware/htmlTools/cooccur.shtml
Description
This tool looks for two words a certain distance apart from one another. By entering a primary and secondary pattern,
TAPoRware will search the document for occurrences of the two patterns found within the user-specified limits of words, sentences, or lines. If desired, the results can be narrowed to include words only found within certain tags.
Term Definition
- Co-occurrence can either mean concurrence / coincidence or, in a more specific sense, the above-chance frequent occurrence of two terms from a text corpus alongside each other in a certain order. Co-occurrence in this linguistic sense can be interpreted as an indicator of semantic proximity or an idiomatic expression. In contrast to collocation, co-occurrence assumes interdependency of the two terms.
- pattern: A sequence of characters used either with regular expression notation or for path name expansion, as a means of selecting various character strings or path names, respectively. Values are matched against patterns to see if they should be included/excluded. In patterns "*" matches any string, "?" matches any single character.
- context: the text that occurs before and after a piece of text (or a pattern in this case).
Predefined Parameter Values in Tool Bar
- Source: the page the user is currently in.
- Element:
body or set by site owner
- Context: words
- Context length: 5
- Display format: HTML
Pseudocode
- Obtain HTML string by URL or from user's local disk
- Obtain text contained by user specified tags
- Find user specified primary pattern along with user specified context -- concordance
- Extract the concordances which contain the secondary pattern
- Generate output of concordance and word lists before and after the user specified word/pattern in the concordance text
Ways of Using
- Enter a valid URL in the URL field or enter a local upload html text
- Enter a valid html tag or tag list separated by comma, default is "body"
- Enter the primary pattern in the primary pattern field
- Enter the secondary pattern in the secondary pattern field
- Select the context of concordance and the length of context
- Select output format
- If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
- Finally, click the "Submit" button
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: If you want to upload local html text to the tool, you need to use attribute name/value pair: enctype="multipart/form-data" within the form tag)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | Let user select input text (either a url or upload local html text) |
| htmlurl | | text | | A Valid URL that the pointed document should be an html text |
| localFile | | file | | The path to your local html text file |
| tagtext | | text | body | Valid html element (tag) name or multple html element name separated by comma |
| primary | | text | | primary pattern of the concordance |
| copat | | text | | secondary pattern of the concordance |
| context | 1/2/3 | selection | Words (1) | context type corresponding the values in the parameter value field: Words/Lines/Sentences. |
| conLeng | | text | 5 | context length corresponding to the selected context |
| finddisp | 1/2/3 | selection | 2 | Display format which are XML text in HTML/HTML/XML tree in the order of parameter values |
| taporface | | checkbox | checked | display result in a new window without graphics interface (default) or with taporware interface in the same window |
Use Find Text -- Concordance TAPoRware Tool in Your Web Page
You can add a text field and a button in your web page to get the concordance of the pattern you entered in that page by call
TAPoRware cgi script.
Here is the code that you can cut and paste to your web pages:
<table style="border: solid gray 1pt"><tr><td>
<form method="post" name="htmlForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/hcooccur.cgi" onsubmit="document.htmlForm.htmlurl.value=document.location.href">
<input type="hidden" name="source" value="url" />
<input type="hidden" name="htmlurl" />
<input type="hidden" name="freetext" value="yes"/>
Pattern: <input type="text" name="find_patt" />
Co-Pattern: <input type="text" name="copat" />
<input type="hidden" name="context" value="1" />
<input type="hidden" name="conLeng" value="5" />
<input type="hidden" name="finddisp" value="1" />
<input type="hidden" name="taporface" value="same" />
<input type="submit" name="doIt" value="Get Co-Patterns" />
</form>
</td></tr></table>
Web Service Interface
Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:
- Endpoint URL: http://taporware.mcmaster.ca:9982
- Service URI: http://taporware.mcmaster.ca/~taporware/webservice
- Service Method: find_Cooccurrence_HTML
- parameters:
- htmlInput -- any html string
- htmlTag -- any html element (tag) name or multple html element name separated by comma
- pattern -- primary pattern in unix style or regular expression
- copattern -- secondary pattern in unix style or regular expression
- context -- value can be 1/2/3 which coresponding to Words/Lines/Sentences respectively
- contextLength -- length of context
- outFormat -- values are same as parameter "finddisp" in the CGI interface above
Known Bugs
To Do
--
MattPatey - 13 Oct 2005