Link Extractor
See
http://taporware.mcmaster.ca/~taporware/htmlTools/linkextractor.shtml
Description
This tool extracts all links (under HTML tag <a>) from the input HTML source text and try its best to convert all the relative links to absolute links, such that by clicking any link in the result page, user can go to the referred page directly.
It' better to input a URL as source.
Pseudocode
- Obtain HTML string by URL or from user's local disk
- Extract all the href attribute values of the tag <a>
- Based on input source's URL, convert all the relative links to absolute links.
Ways of Using
- Enter a valid URL in the URL field or enter a local upload html text
- If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
- Click the "Submit" button
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was to designed to allow local file uploading even if you do not use this feature)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | Let user select input text (either a url or upload local html text) |
| htmlurl | | text | | A valid URL that the pointed document should be an html text |
| localFile | | file | | The path to your local html text file |
| taporface | | checkbox | checked | display result in a new window without graphics interface (default) or with taporware interface in the same window |
Use Link Extractor TAPoRware Tool in Your Web Page
You can add a button in your web page to list all the words in that page by call
TAPoRware cgi script.
Here is the code for this button interface:
<form method="post" name="htmlForm" enctype="multipart/form-data" target="_blank"
action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/hlinkextractor.cgi"
onsubmit="document.htmlForm.htmlurl.value=document.location.href">
<input type="hidden" name="source" value="url" />
<input type="hidden" name="htmlurl" />
<input type="submit" name="doIt" value="Extract Links" />
</form>
--
LianYan - 31 May 2007