Main.TAPoRwarePlainSummarizer (r1.1 vs. r1.4)
Diffs

 <<O>>  Difference Topic TAPoRwarePlainSummarizer (r1.4 - 13 Jun 2008 - LianYan)

META TOPICPARENT TAPoRware

Text Summarizer

See http://taporware.mcmaster.ca/~taporware/textTools/summarizer.shtml
Line: 55 to 55

sorting 1/2/3/4 selection 2 How the listed words been sorted. They are alphabetically/frequency/first appear/reversed alphabetic in the order corresponding to the value in the parameter value field
display 1 selection 1 The format of output. Currently only HTML is implemented
Added:
>
>

Use Summarizer TAPoRware Tool in Your Web Page

You can add a a button in your web page to get the summation of the page by call TAPoRware cgi script.

Here is the code for this button:

<form method="post" name="textForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/tsummarizer.cgi" onsubmit="document.textForm.texturl.value=document.location.href">

<input type="hidden" name="source" value="url" />

<input type="hidden" name="texturl" />

<input type="hidden" name="freetext" value="yes"/>

<input type="hidden" name="highfre" value="true">

<input type="hidden" name="numoftop" value="10">

<input type="hidden" name="range" value="stop" />

<input type="hidden" name="fromwhere" value="glasgow" />

<input type="hidden" name="sentence" value="true">

<input type="hidden" name="numofword" value="2" />

<input type="hidden" name="concor" value="true">

<input type="hidden" name="concornum" value="three" />

<input type="hidden" name="contLength" value="5" />

<input type="hidden" name="colloc" value="true">

<input type="hidden" name="collength" value="1" />

<input type="hidden" name="distrib" value="true"/>

<input type="hidden" name="sorting" value="2" />

<input type="hidden" name="display" value="1" />

<input type="hidden" name="taporface" value="same" />

<input type="submit" name="doIt" value="Summarize This Page" />


Web Service Interface

Taporware provides web services to any non-benefit organizations. Here is the taporware web services information: (Note: the form layout is customized)

 <<O>>  Difference Topic TAPoRwarePlainSummarizer (r1.3 - 29 Mar 2007 - LianYan)

META TOPICPARENT TAPoRware

Text Summarizer

See http://taporware.mcmaster.ca/~taporware/textTools/summarizer.shtml
Line: 6 to 6

Description

Changed:
<
<
Tool extracts infomation about the xml document provided by user. The information can be author, title, total words and words in specific element etc. It also let users select their interested topics such as (highest) word list, list of sentences that contain more than selected # of words, and elements against text distribution etc.
>
>
This tool creates a summary of statistical information on a given document. It enables the user to select what types of information to display in the summary. Options include high frequency words, sentences with high frequency words, high frequency word context, collocation and element/text distribution.

Changed:
<
<

History

>
>
Note: This tool treats xml and html as plain text with all tags stripped off, so some data will be missed such title, author etc. If your text is xml or html, it is recommend to use the xml or html version of the summarizer.

Pseudocode

Added:
>
>
  • Obtain text string by URL or from user's local disk.
  • If the text format is xml or html, strip off all the tags
  • Get the first two non-empty paragraph
  • Get the words statistics of the text
  • Obtain and list top N high frequency words based on user specified criteria, N is a integer number.
  • Strip text into sentences and obtain the sentences that contain at least n high frequency words. n is specified by user.
  • For each high frequency word, get their concordance with user specified context length in words, then list the first m concordance for each words. m is specified by user.
  • Get collocation of each high frequency word with specified context length in words, and then list them.

Ways of Using

Changed:
<
<
>
>
  • Enter a valid URL pointing to any text document in the URL field or enter a local path to upload the source text
  • Enter an integer in the "List top... words" field
  • Select how the words being listed, select stop words or list only words, do not forget enter the word list if you don't select Glasgow stop words.
  • Enter an integer in the text field in the line "List sentences that have n or more high frequency words"
  • Select from the only selection control in the form for the number of concordance (context) of each high frequency words, and fill the text field followed with context length in words
  • Enter an integer in the text field in the line "List collocation within n words of the high frequency words "
  • In the results panel, select how the listed words are sorted
  • Select display format. But currently only HTML is support.
  • Click the submit button

CGI Interface

Added:
>
>
If you want to use this tool from your web site, here is the CGI Interface: (Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was to designed to allow local file uploading in mind)

Here are the parameters:

Parameter Name Parameter Value Control Type Default Description
source url/local radio button url Let user select input text (either a url or upload local xml text)
texturl   text   A valid URL that points to any text document
localFile   file   The path to your local source text file
numoftop   text 10 Indicate the number of top high frequency will be displayed and used in the following functions
range all/pat/find/stop radio button stop Indicate how the words will be listed
wordpat   text   Enter a pattern here if you want to list words that match this pattern
fromwhere thislist/userfile/glasgow radio button glasgow This variable is used when you select find/stop in the "how to list" selection
thislist   text   If you select "Type in", you need to fill in this field with the words you want or not want to list delimited by comma
userfile   file   If you select "local file" which contain the word list, browser to enter the file path here
numofword   text 2 Indicate the sentences that contain the number of high frequency words
concornum first/three/all selection first Indicate the number of concordance for each high frequency word to be displayed
contLength   text 5 The length of context in words for concordance (context)
collength   text 1 the context length for collocation of each high frequency word
sorting 1/2/3/4 selection 2 How the listed words been sorted. They are alphabetically/frequency/first appear/reversed alphabetic in the order corresponding to the value in the parameter value field
display 1 selection 1 The format of output. Currently only HTML is implemented

Web Service Interface

Added:
>
>
Taporware provides web services to any non-benefit organizations. Here is the taporware web services information: (Note: the form layout is customized)

  • Endpoint URL: http://taporware.mcmaster.ca:9982
  • Service URI: http://taporware.mcmaster.ca/~taporware/webservice
  • Service Method: summarizer_plain
  • parameters:
    • textInput -- any text string including xml, html and plain text. However, all tags will be stripped
    • numoftop -- an text field that let user to enter the number of top frequency words to be used in the following functions
    • listOption -- indicate how the words are listed. The values are all/patt/find/stop(default). They stand for all words/words matching pattern/list user specified words only/not list user specified words
    • optionSelection -- value depends on the selection of the "lostOption".
    • numofhigh -- a number of top frequency words contained in sentences
    • numofstart -- a selection user can specify how many concordance entries for each top frequency word will be displayed. The values are first/three/all meaning first, first three and all concordance of each top word
    • sorting -- how the listed words are sorted: 1 - alphabetically, 2 - by frequency, 3 - in the order of appearance, or 4 - in reversed alphabetic order
    • outFormat -- result format, set is to "html" because it is the format implement currently

Known Bugs

To Do

Update help
Changed:
<
<
-- MattPatey - 13 Oct 2005
>
>
-- LianYan - 29 Mar 2007


 <<O>>  Difference Topic TAPoRwarePlainSummarizer (r1.2 - 15 Oct 2005 - MattPatey)

META TOPICPARENT TAPoRware
Added:
>
>

Text Summarizer

See http://taporware.mcmaster.ca/~taporware/textTools/summarizer.shtml

Description


Tool extracts infomation about the xml document provided by user. The information can be author, title, total words and words in specific element etc. It also let users select their interested topics such as (highest) word list, list of sentences that contain more than selected # of words, and elements against text distribution etc.
Added:
>
>

History

Pseudocode

Ways of Using

CGI Interface

Web Service Interface

Known Bugs

To Do

Update help

-- MattPatey - 13 Oct 2005

 <<O>>  Difference Topic TAPoRwarePlainSummarizer (r1.1 - 13 Oct 2005 - MattPatey)
Line: 1 to 1
Added:
>
>
META TOPICPARENT TAPoRware
Tool extracts infomation about the xml document provided by user. The information can be author, title, total words and words in specific element etc. It also let users select their interested topics such as (highest) word list, list of sentences that contain more than selected # of words, and elements against text distribution etc.

-- MattPatey - 13 Oct 2005


Topic: TAPoRwarePlainSummarizer . { View | Diffs | r1.4 | > | r1.3 | > | r1.2 | More }

Revision r1.1 - 13 Oct 2005 - 22:44 - MattPatey
Revision r1.4 - 13 Jun 2008 - 14:00 - LianYan