Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

Find Text — Concordance

See http://taporware.mcmaster.ca/~taporware/textTools/findtext.shtml

Description

This tool can find phrase/pattern anywhere in a text document. The search can also be used to view a concordance of either words, sentences, or lines surrounding the result.

History

  • Created by Lian Yan in 2004.
  • Adapted to the TAPoR portal in 2005.

Pseudocode

  • Obtain source text string by URL or from user's local disk
  • If the source text format is HTML or XML, strip off all the tags
  • Find user specified word/pattern along with user specified context -- concordance
  • Generate a sparkline to show the word/pattern distribution over each 5% of text chunk
  • Generate output of concordance and/or word lists before and after the user specified word/pattern in the concordance text

Ways of Using

  • Enter a valid URL in the URL field or enter a local path upload the source text
  • Enter word/pattern in the corresponding text field
  • Select the context of concordance and enter a integer as the length of context
  • Select output format
  • If you want to display the word lists before and after the word/pattern, check the "Display words before and after the pattern" box. this is for HTML format only
  • If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
  • Finally, click the "Submit" button
  • In the result page, you can click any word of the concordance to get its concordance without go back to the tool page.

CGI Interface

If you want to use this tool from your web site, here is the CGI Interface: (Note: If you want to upload local source text to the tool, you need to use attribute name/value pair: enctype="multipart/form-data" within the form tag)

Here are the parameters:

Parameter Name Parameter Value Control Type Default Description
source url/local radio button url Let user select input text (either a url or upload local html text)
texturl   text   A Valid URL that pointed plain text, html, or xml text
localFile   file   The path to your local source text file
find_pattern   text   word/pattern of the concordance
context Word/Line/ Sentence/paragraph selection Word context type
contLength   text 5 context length corresponding to the selected context
HowToList 1/2/3 selection 2 Display format which are HTML/XML text in HTML/XML tree in the order of parameter values
beforeafter   checkbox unchecked Indicate if the word list before and after the key word/pattern be displayed
taporface   checkbox checked display result in a new window without graphics interface (default) or with taporware interface in the same window

Use Find Text -- Concordance TAPoRware Tool in Your Web Page

You can add a text field and a button in your web page to get the concordance of the pattern you entered in that page by call TAPoRware cgi script.

Pattern:

Here is the code that you can cut and paste to your web pages:

<table style="border: solid gray 1pt"><tr><td>

<form method="post" name="textForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/tfindtext.cgi" onsubmit="document.textForm.texturl.value=document.location.href">

<input type="hidden" name="source" value="url" />

<input type="hidden" name="texturl" />

<input type="hidden" name="freetext" value="yes"/>

Pattern: <input type="text" name="find_pattern" />

<input type="hidden" name="context" value="Word" />

<input type="hidden" name="contLength" value="5" />

<input type="hidden" name="HowToList" value="1" />

<input type="hidden" name="taporface" value="same" />

<input type="submit" name="doIt" value="Get Concordance of the Page" />

</form>

</td></tr></table>

Web Service Interface

Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:

  • Endpoint URL: http://taporware.mcmaster.ca:9982
  • Service URI: http://taporware.mcmaster.ca/~taporware/webservice
  • Service Method: find_Concordance_Plain
  • parameters:
    • textInput -- any plain, html, or xml string
    • pattern -- unix styled pattern or regular expression
    • context -- value can be 1/2/3/4 which corrsponding to Words/Lines/Sentences/Paragraph respectively
    • contextLength -- length of context
    • outFormat -- values are same as parameter "HowToList" in the CGI interface above
    • bf -- a boolean indicate if the words list before and after the word/pattern be display. "Y" is true

Known Bugs

  • When used through the TAPoR Portal on a plain text with certain Microsoft non-printing characters it fails. The TAPoRware implementation works.
  • Microsoft accented characters don't display or work properly.

To Do

  • Fix it so that it can handle Microsoft extended ASCII characters.

-- LianYan - 07 Dec 2005


Use this box to quickly add a comment to the page.

more options...