Main.TAPoRwarePlainListWords (r1.1 vs. r1.6)
Diffs

 <<O>>  Difference Topic TAPoRwarePlainListWords (r1.6 - 22 May 2008 - LianYan)

META TOPICPARENT TAPoRware

List Words

See http://taporware.mcmaster.ca/~taporware/textTools/listword.shtml
Line: 71 to 71

Changed:
<
<
>
>

Line: 93 to 93

<input type="hidden" name="sorting" value="2" />

Added:
>
>
<input type="hidden" name="sparkline" value="10" />

<input type="hidden" name="display" value="1" />

<input type="hidden" name="taporface" value="same" />


 <<O>>  Difference Topic TAPoRwarePlainListWords (r1.5 - 15 May 2008 - LianYan)

META TOPICPARENT TAPoRware

List Words

See http://taporware.mcmaster.ca/~taporware/textTools/listword.shtml
Line: 58 to 58

sparkline none/5/10/20/50 selection none select the number of high frequency words to generate sparkline word distributions over each 5% chunk of text
taporface   checkbox checked display result in a new window without graphics interface (default) or with taporware interface in the same window
Added:
>
>

Use List Words TAPoRware Tool in Your Web Page

You can add a button in your web page to list all the words in that page by call TAPoRware cgi script.

Here is the code for this function

<form method="post" name="textForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/tlistwordstem.cgi" onsubmit="document.textForm.texturl.value=document.location.href">

<input type="hidden" name="source" value="url" />

<input type="hidden" name="texturl" />

<input type="hidden" name="freetext" value="yes"/>

<input type="hidden" name="range" value="all" />

<input type="hidden" name="sorting" value="2" />

<input type="hidden" name="display" value="1" />

<input type="hidden" name="taporface" value="same" />

<input type="submit" name="doIt" value="List All Words of the Page" />

</form>


Web Service Interface

Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:

 <<O>>  Difference Topic TAPoRwarePlainListWords (r1.4 - 07 Jun 2007 - LianYan)

META TOPICPARENT TAPoRware

List Words

See http://taporware.mcmaster.ca/~taporware/textTools/listword.shtml
Line: 16 to 16

Pseudocode

  • Obtain text string by URL or from user's local disk
Changed:
<
<
  • If the text format is xml or html, strip off all the tags
>
>
  • If the text format is XML or HTML, strip off all the tags

  • Tokenize text into words using taporware tokenizer
  • Apply stemmer if user selects it
  • Sort and count words with the capitalization ignored
Line: 30 to 30

  • Select sorting criterion
  • Check the "Apply inflectional stemmer" box if you want to apply the stemmer
  • Select output format
Changed:
<
<
  • Select a number in the "Display top ..." selection control if you want to see the top freqency words distributions
>
>
  • Select a number in the "Display top ..." selection control if you want to see the top frequency words distributions

  • If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
  • Finally, click the "Submit" button
Added:
>
>
    • In the result page, you can click any word to get its concordance with 5 words of context.

CGI Interface

If you want to use this tool from your web site, here is the CGI Interface:
Line: 42 to 43

Here are the parameters:

Changed:
<
<
Parameter Name Parameter Value Control Type Default Discription
>
>
Parameter Name Parameter Value Control Type Default Description

source url/local radio button url Let user select input text (either a url or upload local html text)
texturl   text   A valid URL pointing plain text, html or xml document
localFile   file   The path to your local text file

 <<O>>  Difference Topic TAPoRwarePlainListWords (r1.3 - 28 Mar 2007 - LianYan)

META TOPICPARENT TAPoRware

List Words

See http://taporware.mcmaster.ca/~taporware/textTools/listword.shtml
Line: 6 to 6

Description

Changed:
<
<
Tool can be used to list all of the words found within a given text document. The query results can be displayed alphabetically, by frequency, by order of appearance, or in reversed alphabetical order.
>
>
This tool lists words found within a given text document in different manners. It can list all words, words matching a pattern, all words except stop-words etc. It can also list words by applying a inflectional stemmer. The results can be sorted alphabetically, by frequency, by order of appearance, or in reversed alphabetical order and displayed in different format.

History

Added:
>
>
  1. List all words, words matching pattern, user selected words and all words except user entered stop words
  2. Add Glasgow stop words as the default stop words
  3. Add inflectional stemmer
  4. Add sparkline images for the top frequency words distribution

Pseudocode

Added:
>
>
  • Obtain text string by URL or from user's local disk
  • If the text format is xml or html, strip off all the tags
  • Tokenize text into words using taporware tokenizer
  • Apply stemmer if user selects it
  • Sort and count words with the capitalization ignored
  • Extract words based on user specified criteria if necessary
  • Generate sparkline if user selects a number in the "Display top ..." selection control
  • Generate output format based on user's selection

Ways of Using

Added:
>
>
  • Enter a valid URL in the URL field or enter a local path to upload text
  • Select which list you want to get and enter the corresponding text if necessary
  • Select sorting criterion
  • Check the "Apply inflectional stemmer" box if you want to apply the stemmer
  • Select output format
  • Select a number in the "Display top ..." selection control if you want to see the top freqency words distributions
  • If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
  • Finally, click the "Submit" button

CGI Interface

Added:
>
>
If you want to use this tool from your web site, here is the CGI Interface: (Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was to designed to allow local file uploading even if you do not use this feature)

Here are the parameters:

Parameter Name Parameter Value Control Type Default Discription
source url/local radio button url Let user select input text (either a url or upload local html text)
texturl   text   A valid URL pointing plain text, html or xml document
localFile   file   The path to your local text file
range all/patt/find/stop radio button all Options that let user select the word list he/she want to see
wpat   text   A unix styled pattern. This field corresponding to the value "patt" in the radio button group named "range"
findstop typedin/textfile/glasgow radio button glasgow The option are connected with value "find" and "stop" in the radio button group named "range"
typedinword   text   This text field is corresponding to the value "typedin" of radio button group named "findstop"
wordfile   file   This field is corresponding to the value "textfile" of radio button group named "findstop"
sorting 1/2/3/4 selection 2 Sorting criteria which are alphabetically/by frequency/by order of first appearance/by reversed alphabetic order in the order of parameter values
stem   checkbox unchecked Indicate if inflectional stemmer would be applied
display 1/2/3/4 selection 2 Display format which are XML tags in HTML/HTML/XML tree/tab Delimited Text in the order of parameter values
sparkline none/5/10/20/50 selection none select the number of high frequency words to generate sparkline word distributions over each 5% chunk of text
taporface   checkbox checked display result in a new window without graphics interface (default) or with taporware interface in the same window

Web Service Interface

Added:
>
>
Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:

  • Endpoint URL: http://taporware.mcmaster.ca:9982
  • Service URI: http://taporware.mcmaster.ca/~taporware/webservice
  • Service Method: list_Words_Plain
  • parameters:
    • textInput -- any text string. If the text format is html or xml, the tags will be stripped
    • listOption -- values are same as parameter "range" in the CGI interface above
    • optionSeletion -- values are corresponding to the "list option"
    • sorting -- values are same as parameter "sorting" in the CGI interface above
    • outFormat -- values are same as parameter "display" in the CGI interface above
  • Note: The service will automatically generate sparkline on the top 10 frequency words' distribution

Known Bugs

To Do

Changed:
<
<
-- MattPatey - 13 Oct 2005
>
>
-- LianYan - 28 Mar 2007


 <<O>>  Difference Topic TAPoRwarePlainListWords (r1.2 - 15 Oct 2005 - MattPatey)

META TOPICPARENT TAPoRware
Added:
>
>

List Words

See http://taporware.mcmaster.ca/~taporware/textTools/listword.shtml

Description


Tool can be used to list all of the words found within a given text document. The query results can be displayed alphabetically, by frequency, by order of appearance, or in reversed alphabetical order.
Added:
>
>

History

Pseudocode

Ways of Using

CGI Interface

Web Service Interface

Known Bugs

To Do


-- MattPatey - 13 Oct 2005

 <<O>>  Difference Topic TAPoRwarePlainListWords (r1.1 - 13 Oct 2005 - MattPatey)
Line: 1 to 1
Added:
>
>
META TOPICPARENT TAPoRware
Tool can be used to list all of the words found within a given text document. The query results can be displayed alphabetically, by frequency, by order of appearance, or in reversed alphabetical order.

-- MattPatey - 13 Oct 2005


Topic: TAPoRwarePlainListWords . { View | Diffs | r1.6 | > | r1.5 | > | r1.4 | More }

Revision r1.1 - 13 Oct 2005 - 22:39 - MattPatey
Revision r1.6 - 22 May 2008 - 15:24 - LianYan