Web Page Cleaner
See
http://taporware.mcmaster.ca/~taporware/betaTools/webcleaner.shtml
Description
Tool uses two ways to convert HTML web page to plain text. 1. Strip all HTML tags and do some formatting based on the property of the tag. 2. Use third party library to convert HTML to plain text directly
Pseudocode
- Obtain submitted text string by URL or from user's local disk
- Perform convert action based on user' choice
Ways of Using
- Enter a valid URL in the URL field or enter (browse) a local path to upload source HTML
- Select the way of cleaning
- If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
- Finally, click the "Submit" button
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was to designed to allow local file uploading even if you do not use this feature)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | Let user select input text (either a url or upload local html text) |
| htmlurl | | text | | A valid URL that the pointed document should be an html text |
| localFile | | file | | The path to your local html text file |
| cleanOpt | 1/2 | select | 1 | the way of cleaning: 1 -- strip all tags, 2 -- use HTML to text converter |
Use Web Cleaner TAPoRware Tool to clean the Web Page
You can add a button in your web page to see the cleaned text by call
TAPoRware cgi script.
Here is the code for this button:
<form method="post" name="htmlForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/webcleaner.cgi" onsubmit="document.htmlForm.htmlurl.value=document.location.href">
<input type="hidden" name="source" value="url" />
<input type="hidden" name="htmlurl" />
<input type="hidden" name="cleanOpt" value="1"/>
<input type="submit" name="doit" value="Clean It" />
</form>
Web Service Interface
--
LianYan - 16 Nov 2007