Pattern Distribution
See
http://taporware.mcmaster.ca/~taporware/textTools/patterndistrib.shtml
Description
This tool generates different kind of distribution graphs of word/pattern over specified trunk of text.
Pseudocode
- Obtain text string by URL or from user's local disk. If the text is XML or HTML, strip off all the tags
- Chop text into different chunks based on the user selection
- Find and record the user specified word/pattern in each chunk, if "show relative distribution" is selected, the total words in each trunk of the text are counted and recorded too
- Based on user selection, generate the corresponding graphics
Ways of Using
- Fill in the source text in the source panel either by a valid URL or from your local disk path
- Select the subtext you want to the distribution over to.
- Select or enter the corresponding content according to your subtext
- Check or uncheck "show relative distribution" box
- Enter word/pattern in "What to find" panel
- Select the format of distribution
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface: (Note: If you want to upload local xml text to the tool, you need to use attribute name/value pair: enctype="multipart/form-data" within the form tag)
Here are the parameters:
| Parameter Name | | Control Type | Default | Description |
| source | url/local | radio button | url | Let user select input text (either a url or upload local source text) |
| texturl | | text | | A Valid URL pointing to an xml text |
| localFile | | file | | The path to your local html text file |
| disType | 3/4/5 | radio button | 4 | Indicate chunk of subtext, They are paragraph/chunk of percentage/trunk of words in order of value in "Parameter Value" field |
| percent | 1/5/10/20/25 | selection | 10 | If distribution over percentage of text is selected, select a percentage number here |
| chunk | | text | 100 | If distribution over chunk of text is selected, enter the number of words each chunk in this field |
| relative | | checkbox | unchecked | Check it to display both absolute and relative word distribution in the chunks |
| find_pattern | | text | | This is the word/pattern you want to see in the distribution |
| HowToList | 1/2/3/4 | radio button | 2 | The format of distribution. In order of the value, it represent SVG/HTML/Tab delimited/java applet. Note: you need SVG viewer to view SVG and java runtime to view applet. |
| taporface | | checkbox | checked | Indicate if open result in new window (no taporware interface) |
Use Date Finder TAPoRware Tool in Your Web Page
You can add a text field and a button in your web page to a word/pattern distribution of the current page by call
TAPoRware cgi script.
Here is the code the the interface above:
<form method="post" name="textForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/tdistrib.cgi" onsubmit="document.textForm.texturl.value=document.location.href">
<input type="hidden" name="source" value="url" />
<input type="hidden" name="texturl" />
<input type="hidden" name="disType" value="4" />
<input type="hidden" name="percent" value="10" />
<input type="hidden" name="relative" value="1" />
Word/Pattern:<input type="text" name="find_pattern" />
<input type="hidden" name="HowToList" value="4" />
<input type="submit" name="doIt" value="Submit" />
</form>
Web Service Interface
Taporware provides web services to any non-benefit organizations. Here is the taporware web services information:
(Note: the form layout is customized)
- Endpoint URL: http://taporware.mcmaster.ca:9982
- Service URI: http://taporware.mcmaster.ca/~taporware/webservice
- Service Method: pattern_Distribution_Plain
- parameters:
- textSource -- any text string, if the text format is html or xml, all the tags will be stripped
- option -- distribution over chunk of text
- 3 -- over paragraph
- 4 -- over percentage of text (counted by characters)
- 5 -- over number of words in each chunk
- percent -- for distribution over percentage of the text. Please use selection control and put it in line with the percentage selection of the subtext.
- chunk -- for distribution over chunk of words. It should be in line with the chunk of text selection
- relative -- check to display relative distribution as well. Checked value is "Y"
- pattern -- the field of word or pattern distribution to be generated
- outFormat -- display format. the values are:
- 1 -- SVG
- 2 -- HTML
- 3 -- tab delimited text
- 4 -- java applet
Known Bugs
To Do
Write help and walkthrough
--
MattPatey - 15 Oct 2005