Weighted Centroid
Try It
Description
This applet displays a circular graph based on word distribution data.
The text is divided up into an arbitrary number of units, which are positioned around the circumference of the circle in a clockwise sequence. The more times a word appears in a particular text unit, the closer the word will be to that unit in the circle. If a word appears an equal number of times in all units, it be located in the centre of the circle.
Words are colour coded based on the amount of times they appear in the text as a whole. Blue words have the highest word count. Rolling over a word will display lines representing its connections to the units. Clicking a word will keep its lines visible after you move the mouse off of it. Click the word again to remove the lines. The darker the line, the more times the word was found in that unit.
Additionally, all the words found in the graph are listed on the left side of the applet. There is a scroll bar for viewing the words, should they extend past the bottom of the applet. This list of words features the same rollover and clicking functionality as those found in the the graph itself.
This tool uses the
processing library.
* This tool requires the
JRE (v1.4.2 and up) in order to work properly.
Pseudocode
- get data from the Weighted Centroid program and process it to determine maximum values
- create nodes for each word using the data
- calculate node position
- calculate lines connecting node to each text unit that it occurs in
- initialize the processing environment
- draw words, word list, circle, and text units
- add listener for mouse events
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was to designed to allow local file uploading even if you do not use this feature)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Description |
| source | url/local | radio button | url | Lets the user select input text (either an URL or a local file for upload) |
| texturl | | text | | A valid URL pointing to a text, html, or xml file |
| localFile | | file | | The path to a local text, html, or xml file |
| freetext | on/off | checkbox | on | Turn on to treat xml/html as plain text |
| disType | 1/2/3 | radio button | 2 | Defines the granularity of the graph: paragraphs/n percentage of text/chunks of n words |
| percent | 5/10/20/50 | selection | 10 | Specifies the percentage for use with option 2 of disType |
| chunk | | text | 100 | Specifies the chunk of words for use with option 3 of disType |
| topfre | 0/5/10/20/50 | selection | 10 | Specifies how many of the highest frequency words to be included in the result |
| stoplist | on/off | checkbox | on | Specifies whether to exclude Glasgow Stop Words from the results (on = exclude) |
| user | on/off | checkbox | off | Specifies whether to include extra user-defined words in the results |
| userword | | text | | The user-defined words to include in the results |
| HowToList | 1/2 | selection | 2 | Determines how to display the results: HTML table/Java applet |
Use Weighted Centroid TAPoRware Tool in Your Web Page
You can add a text field and a button in your web page to a word/pattern distribution of the current page by call
TAPoRware cgi script.
Here is the code for the button:
<form method="post" name="textForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/tweighted.cgi" onsubmit="document.textForm.texturl.value=document.location.href">
<input type="hidden" name="source" value="url" />
<input type="hidden" name="texturl" />
<input type="hidden" name="freetext" value="Y" />
<input type="hidden" name="disType" value="2" />
<input type="hidden" name="percent" value="10" />
<input type="hidden" name="topfre" value="30" />
<input type="hidden" name="stoplist" value="on" />
<input type="hidden" name="HowToList" value="2" />
<input type="submit" name="doIt" value="Weighted Centroid" />
</form>
Web Service Interface
Taporware provides web services to any non-benefit organizations. here is the taporware web services information:
- Endpoint URL: http://taporware.mcmaster.ca:9982
- Service URI: http://taporware.mcmaster.ca/~taporware/webservice
- Service Method: weighted_centroid
- parameters:
- textSource -- any text string. If the text format is html or xml, all the tags will be stripped
- suboption -- subtext unit selection, the values 1/2/3 are corresponding to paragraph/percent of characters/chunk of text in words
- percent -- this selection is related to the choice of "percent of characters"
- chunk -- this text field is related to the choice of "chunk of text in words" in the suboption parameter
- topWords -- number of top frequency words (may or may not exclude stop words, see below) to be investigated
- glasgow -- a boolean value (Y) to exclude the glasgow stop words in the top frequency word list
- userwords -- a text field for user enter his/hers stop words (separated by comma). This list will combined with the glosgow stop list if you select it
- outFormat -- values are 1/2 which are corresponding to HTML and java applet respectively
Responsibility
This tool was programmed by Andrew MacDonald as part of the
TAPoR project.
--
AndrewMacdonald - 10 Apr 2006