Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

Find co-occurring words

See http://taporware.mcmaster.ca/~taporware/xmlTools/cooccur.shtml

Description

This tool looks for two words a certain distance apart from one another. By entering a primary and secondary pattern, TAPoR will search the document for anywhere where the two patterns are within the user-specified limits of words/sentences/lines or surrounding elements.

Pseudocode

  • Obtain XML text by URL or from user's local disk
  • Obtain text contained by user specified elements or/and attribute name/value pairs, default is the root element. Note: if you specify attribute value, attribute name must be entered too.
  • Find user specified primary pattern along with user specified context -- concordance, within the user specified text
  • Extract the concordances which contain the secondary pattern
  • Generate output of concordance which contains both patterns

Ways of Using

  • Enter a valid URL which points to an xml file in the URL field or enter a local upload xml text (if the file is not an xml, an error message will be returned).
  • Enter a valid xml element name or element list seperated by comma, default is "//"
  • Enter the primary pattern in the primary pattern field
  • Enter the secondary pattern in the secondary pattern field
  • Select the context of concordance and the length of context. Note: there are two ways for the context -- ignore element tags and use element tags as context.
  • Select output format
  • If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
  • Finally, click the "Submit" button

CGI Interface

If you want to use this tool from your web site, here is the CGI Interface: (Note: If you want to upload local xml text to the tool, you need to use attribute name/value pair: enctype="multipart/form-data" within the form tag)

Here are the parameters:

Parameter Name Parameter Value Control Type Default Discription
source url/local radio button url Let user select input text (either a url or upload local xml text)
xmlurl   text   A Valid URL pointing to an xml text
localFile   file   The path to your local html text file
xmlpath   text // Valid xml element (tag) name or multple xml element names separated by comma
attr_name   text   Valid xml attribute name
attr_value   text   Valid xml attribute value
pripat   text   primary pattern of the concordance
copat   text   secondary pattern of the concordance
dispop 1/2 radio button 1 Let user select context type, either ignore element tags or use the tags as context
notags 1/2/3 selection Words (1) context type corresponding the values in the parameter value field: Words/Lines/Sentences. -- ignore tags
ctlen   text 5 context length corresponding to the selected context -- ignore tags
showtag 1/2 radio button 1 use closest tag as context(1), or use specified element containing the text as context(2) -- use tag
surtag   text   specify a tag as context -- use tag
HowToList 1/2/3/4 selection 1 Display format which are HTML/XML text in HTML/XML tree/Tab delimited text in the order of parameter values
taporface   checkbox checked display result in a new window without graphics interface (default) or with taporware interface in the same window

Web Service Interface

Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:

  • Endpoint URL: http://taporware.mcmaster.ca:9982
  • Service URI: http://taporware.mcmaster.ca/~taporware/webservice
  • Service Method: find_Cooccurrence_XML
  • parameters:
    • xmlInput -- any xml string
    • element -- any valid xml element name in the input text
    • attributeName -- any valid xml attribute name in the input text
    • attributeValue -- attribute value corresponding to the attribute name above in the input text
    • pattern -- primary pattern in unix style or regular expression
    • copattern -- secondary pattern in unix style or regular expression
    • contextOption -- context type: ignore tags or use tags, the values are 1 (ignore tags) and 2 (use tags)
    • optionSelection1 -- the meaning and value of this parameter depends on the value of contextOption. If the value of the contextOption is 1, then the value 1/2/3 of this parameter means context of words/lines/sentences (ignore tags). If the value of the contextOption is 2, the value of 1/2 of this parameter means closest element/user specified element (use tags).
    • optionSelection2 -- the value of this parameter depends on the two parameters above. If the value of the contextOption is 1, enter a digit number which means the context length in words/lines/ sentences (with optionSelection1 is given as 1/2/3). If the value of the contextOption is 2 and the value of the optionSelection1 is 2 as well, you should give this parameter a string which is the surround xml element name as the context.
    • outFormat -- values are same as parameter "HowToList" in the CGI interface above

Known Bugs

To Do

-- MattPatey - 13 Oct 2005


Use this box to quickly add a comment to the page.

more options...