Distribution Graph
Try It
Description
This tool enables users to see a graphical representation of the distribution of words, patterns or HTML tags over the course of an HTML document. Results are displayed as percentages of patterns/words found in each chunk (e.g. tag, percentage of document, block of
n words). A number of options are also available that allow the user to view and or interact with the graph in real-time.
Issues
This tool requires the JRE (v1.4.2 and up) in order to work properly.
* This is likely to cause problems with OS X users using browsers other than Safari. By default, Firefox and other OS X web browsers use an older version of the JRE which is incompatible with this tool.
* There is a solution that allows Firefox to use JRE 1.4.2+ which can be found
here. Unfortunately this fix does not seem to work for the OS X version of Internet Explorer.
* This is only relevant if the desired output is Java
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was designed to allow local file uploading even if you do not use this feature)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | Let user select input text (either a url or upload local html text) |
| htmlurl | | text | | A valid URL that the pointed document should be an html text |
| localFile | | file | | The path to your local html text file |
| disType | 4/1/5 | radio | 4 | subtext the distribution -- corresponding to percentage/element/chunk of words respectively (note: the following 3 controls will be paired with the radio buttons in order |
| percent | 2/5/10/25/50 | select | 10 | percentage of text the distribution over |
| elemonly | | text | body | the HTML element name that the distribution is over |
| chunk | | text | 100 | the number of words of subtext the distribution is over |
| relative | | checkbox | unchecked | indicate if the relative distribution is displayed |
| find_patt | | text | | the word or pattern which is used in the distribution |
| HowToList | 1/2/3/4 | select | 2 | the display formats, corresponding to SVG/HTML/Tab delimited text/Java applet respectively |
Use Distribution TAPoRware Tool in Your Web Page
You can add a button and a text field in your web page to list all the words in that page by call
TAPoRware cgi script.
Here is the code for the interface:
<form method="post" name="htmlForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/hdistrib.cgi" onsubmit="document.htmlForm.htmlurl.value=document.location.href">
<input type="hidden" name="source" value="url" />
<input type="hidden" name="htmlurl" />
<input type="hidden" name="disType" value="4" />
<input type="hidden" name="percent" value="10" />
<input type="hidden" name="relative" value="1" />
Word/Pattern: <input type="text" name="find_patt" />
<input type="hidden" name="HowToList" value="4" />
<input type="submit" name="doIt" value="Submit" />
</form>
Web Service Interface
Taporware provides web services to any non-benefit organizations. Here is the taporware web services infomation:
- Endpoint URL: http://taporware.mcmaster.ca:9982
- Service URI: http://taporware.mcmaster.ca/~taporware/webservice
- Service Method: pattern_Distribution_HTML
- parameters:
- textSource -- any HTML text
- option -- subtext that the distribution is over, see "disType" of CGI interface for the values
- percent -- percentage of text, can be 2/5/10/25
- element -- valid HTML element
- chunk -- number of words per unit of subtext
- relative -- Y/N, indicate if relative distribution is displayed
- outForm -- output format. values of 1/2/3/4 are corresponding to Java applet/HTML/tab delimited text/SVG respectively
To Do
- Add title
- Check Help - we need help
- Get rid of "Save Data"
- Incorporate pan/zoom feature for viewing large data sets (optional)
- Incorporate summary of findings (i.e. document statistics, individual result information etc.)
- Incorporate ability to change graph style on the fly (e.g. colour scheme, grid/value display etc.)
- Include different types of graphs (i.e. point display, charts etc.)
--
MattPatey - 19 Aug 2005