Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

Raining Words

Try It

Description

`Raining Words' is designed to display high frequency words such that high frequency words are rendered larger and move more slowly than words with lower frequencies. The source HTML document (URL or local file) is read and reduced to the contents of an HTML element (e.g. <body>). It is then filtered using a stop-word list. The resulting text is then scanned for the top 20 high frequency words.

Note: The input text must be in HTML format. Otherwise, it will cause file type not match error.

Issues

This tool requires the JRE (v1.4.2 and up) in order to work properly.* This is likely to cause problems with OS X users using browsers other than Safari. By default, Firefox and other OS X web browsers use an older version of the JRE which is incompatible with this tool.* There is a solution that allows Firefox to use JRE 1.4.2+ which can be found here. Unfortunately this fix does not seem to work for the OS X version of Internet Explorer.

* This is only relevant if the desired output is Java.

Pseudocode

  • Obtain text string by URL which points to an HTMLtext
  • Extract text based on user specified HTML tag name
  • Tokenize text into words using taporware tokenizer
  • Sort and count words with the capitalization ignored
  • Extract words based on user specified criteria if necessary
  • Get th top 20 high frequency words and generate a java applet

Ways of Using

  • Enter a valid URL in the URL field
  • Enter the HTML element name in the corresponding field in which you want to investigate the text
  • Select which list you want to get and enter the corresponding text if necessary
  • If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
  • Finally, click the "Submit" button

CGI Interface

If you want to use this tool from your web site, here is the CGI Interface:

Here are the parameters:

Parameter Name Parameter Value Control Type Default Discription
htmlurl   text   A valid URL pointing to an html document
tagtext   text body The element name in which the text in investigated
range all/patt/find/stop radio button all Options that let user select the word list he/she want to see
wpat   text   A unix styled pattern. This field corresponding to the value "patt" in the radio button group named "range"
findstop typedin/textfile/glasgow radio button glasgow The option are connected with value "find" and "stop" in the radio button group named "range"
typedinword   text   This text field is corresponding to the value "typedin" of radio button group named "findstop"
wordfile   file   This field is corresponding to the value "textfile" of radio button group named "findstop"
taporface on checkbox checked display result in a new window without graphics interface (default) or with taporware interface in the same window

Use Raining Words TAPoRware Tool in Your Web Page

You can add a button in your web page to display the raining words in the text of your current page by call TAPoRware cgi script.

Here is the code for the raining words button:

<form method="post" name="htmlForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/brainingwords.cgi" onsubmit="document.htmlForm.htmlurl.value=document.location.href">

<input type="hidden" name="htmlurl" />

<input type="hidden" name="tagtext" value="body" />

<input type="hidden" name="range" value="stop" />

<input type="hidden" name="findstop" value="glasgow" />

<input type="hidden" name="taporface" value="on" />

<input type="submit" name="doit" value="Raining Words" />

</form>

Web Service Interface

Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:

  • Endpoint URL: http://taporware.mcmaster.ca:9982
  • Service URI: http://taporware.mcmaster.ca/~taporware/webservice
  • Service Method: rain_words
  • parameters:
    • textInput -- any text string. If the text format is html or xml, the tags will be stripped
    • element -- xml or html tag in the input text in which the text will be investigated
    • listOption -- values are 1/2/3/4 that corresponding to All words/Words in the list below/Words not in the list below/Words match pattern
    • pattern -- enter a pattern if you select "Words match pattern" in the radio selection above
    • exclusion -- radio buttons which allow user to select stop word list: 1 -- user typed in words separated by comma, 2 -- Glasgow stop words
    • typein -- a field allow user to type in stop list if user click "1" in the exclusion radio button

To Do

  • Needs to open in a new window
  • Needs to have the help
  • Needs to have a title
  • We need to add this to portal

Responsibility

This was programmed by Matt Patey as part of the TAPoR project.

-- MattPatey - 28 Jun 2005


Use this box to quickly add a comment to the page.

more options...