Extract Text
See
http://taporware.mcmaster.ca/~taporware/htmlTools/extract.shtml
Description
This tool displays text found within specific tags in an HTML document.
History
Pseudocode
- Obtain HTML string by URL or from user's local disk
- Extract text based on user specified element(s)
- Generate output based on user specified display format
Ways of Using
- Enter a valid URL in the URL field or enter a local upload html text
- Enter a valid html tag or tag list seperated by comma, default is "body"
- Select output format
- Click "submit" button
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: If you want to upload local html text to the tool, you need to use attribute name/value pair: enctype="multipart/form-data" within the form tag)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | Let user select input text (either a url or upload local html text) |
| htmlurl | | text | | A Valid URL that the pointed document should be an html text |
| localFile | | file | | The path to your local html text file |
| tagtext | | text | body | Valid html element (tag) name or multple html element name separated by comma |
| textdisp | 1/2/3 | selection | 2 | Display format which are HTML/XML text in HTML/XML treein the order of parameter values |
| taporface | | checkbox | checked | display result in a new window without graphics interface (default) or with taporware interface in the same window |
Use Extract HTML TAPoRware Tool in Your Web Page
You can add a button in your web page to extract text contained in specified HTML tags in that page by call
TAPoRware cgi script.
Here is the code for the tool interface
<form method="post" name="htmlForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/hextract.cgi" onsubmit="document.htmlForm.htmlurl.value=document.location.href">
<input type="hidden" name="source" value="url" />
<input type="hidden" name="htmlurl" />
<input type="hidden" name="freetext" value="yes"/>
HTML Tags: <input type="text" name="tagtext" value="body" />
<input type="hidden" name="textdisp" value="1" />
<input type="hidden" name="taporface" value="same" />
<input type="submit" name="doit" value="Extract Text" />
</form>
Web Service Interface
Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:
Known Bugs
To Do
--
MattPatey - 13 Oct 2005