Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

Find Text — Co-occurrence

See http://taporware.mcmaster.ca/~taporware/textTools/cooccur.shtml

Description

Tool looks for two words a certain distance apart from one another. By entering a primary and secondary pattern, TAPoR will search the document for anywhere that the two patterns are within the user-specified limits of words, sentences, or lines.

Pseudocode

  • Obtain text string by URL or from user's local disk. If the text format is html or xml, strip off all the tags
  • Find user specified primary pattern along with user specified context -- concordance
  • Extract the concordances which contain the secondary pattern
  • Generate output of concordance and word lists before and after the user specified word/pattern in the concordance text

Ways of Using

  • Enter a valid URL in the URL field or enter a local path to upload the source text
  • Enter the primary pattern in the primary pattern field
  • Enter the secondary pattern in the co-pattern field
  • Select the context of concordance and the length of context
  • Select output format
  • If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
  • Finally, click the "Submit" button

CGI Interface

If you want to use this tool from your web site, here is the CGI Interface: (Note: If you want to upload local source text to the tool, you need to use attribute name/value pair: enctype="multipart/form-data" within the form tag)

Here are the parameters:

Parameter Name Parameter Value Control Type Default Discription
source url/local radio button url Let user select input text (either a url or upload local html text)
texturl   text   A Valid URL that the pointed document should be an html text
localFile   file   The path to your local html text file
primary   text   primary pattern of the concordance
co_pattern   text   secondary pattern of the concordance
context Word/Line/ Sentence/Paragraph selection Word context type
contLength   text 5 context length corresponding to the selected context
HowToList 1/2/3 selection 2 Display format which are HTML/XML text in HTML/XML tree in the order of parameter values
taporface   checkbox checked display result in a new window without graphics interface (default) or with taporware interface in the same window

Use Find Text -- Co-occurrence TAPoRware Tool in Your Web Page

You can add two text field and a button in your web page to get the co-occurrence of the primary and secondary patterns you entered in that page by call TAPoRware cgi script.

Pattern:
Co-pattern:

Here is the code that you can cut and paste to your web pages:

<form method="post" name="textForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/tcooccur.cgi" onsubmit="document.textForm.texturl.value=document.location.href">

<input type="hidden" name="source" value="url" />

<input type="hidden" name="texturl" />

<input type="hidden" name="freetext" value="yes"/>

Pattern: <input type="text" name="primary" /><br>

Co-pattern: <input type="text" name="co_pattern" />

<input type="hidden" name="context" value="Word" />

<input type="hidden" name="contLength" value="5" />

<input type="hidden" name="HowToList" value="1" />

<input type="hidden" name="taporface" value="same" />

<input type="submit" name="doIt" value="Get Co-occurrence of the Page" />

</form>

Web Service Interface

Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:

  • Endpoint URL: http://taporware.mcmaster.ca:9982
  • Service URI: http://taporware.mcmaster.ca/~taporware/webservice
  • Service Method: find_Cooccurrence_Plain
  • parameters:
    • textInput -- any text source. If the text format is html or xml, all the tags will be stripped
    • pattern -- primary pattern in unix style or regular expression
    • copattern -- secondary pattern in unix style or regular expression
    • context -- value can be 1/2/3/4 which coresponding to Words/Lines/Sentences/paragraphs respectively
    • contextLength -- length of context
    • outFormat -- values are html/xml/others where others give xml text in html

Known Bugs

To Do

-- LianYan - 28 Mar 2007


Use this box to quickly add a comment to the page.

more options...