Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

Keywords Finder

See: http://taporware.mcmaster.ca/~taporware/betaTools/keywordFinder.shtml

Description

This tool tries to find the possible keywords or key phrases of a source text and recommend them to user. The principle of this tool is based on the assumption that keywords or key phrases will occur more frequently in the text than other words or phrases. Other assumptions are: verb, adverb etc. should not be keyword.

To count the same word with different forms correctly, a inflectional stemmer is applied to all words. However, this may cause some confusion because some key words may not appear in the text in their original format. For example, if human rights is a key phrase, the tool will recommend human right as key phrase. So it's up to the user to use the recommended keywords/key phrases to obtain the correct set of keywords/key phrases.

To help user make the right decision on selecting the keywords/key phrases, the result page of this tool lists 20 top frequency words, 10 top frequency word pairs and word triplets respectively. If you click any word/phrase in the list, the tool will generate the concordance of the word/phrase with the context of 8 words for you. Due to the same reason mention above, some words/phrases may not generate any concordance because the words have been changed to their original format. In this case, please go to here to get the concordance.

If you have any suggestion to improve this tool or see some bugs, please send me an email -- lyan (at) mcmaster (dot) ca.

Pseudocode

  • Get source text by URL or form user's local disk. If the text format is XML or HTML, strip off all the tags
  • Obtain the word list in frequency, applying stop words and stemmer
  • Obtain the word pair list in frequency, with no stop words in the pair, also apply stemmer
  • Obtain the word triplet list in frequency, with no stop words in the triplet with stemmer applied as well
  • Pick top 20 frequency non-verb, no-adverb single words using speech tagger
  • Pick top 10 or (more than one occurring words, which is less) frequency no-verb word pairs and word triplets respectively.
  • Match the selected single words with the pairs and triplets, and match the pairs with the triplet. If a word finds a match in the pairs/triplet, keep the pairs/triplet and strip the word. similarly for pair matching triplet. Give the matched pair or triplet higher order (put it before other words)
  • Generate the final results

Way of using

  • Enter a valid URL in the URL field or enter a local path to upload the source text
  • Click the "Submit" button

CGI Interface

If you want to use this tool from your web site, here is the CGI Interface: (Note: You need to use attribute name/value pair: enctype="multipart/form-data" within the form tag because the tool was to designed to allow local file uploading even if you do not use this feature)

Here are the parameters:

Parameter Name Parameter Value Control Type Default Discription
source url/local radio button url Let user select input text (either a url or upload local html text)
texturl   text   A valid URL that points to XML, HTML and plain text
localFile   file   The path to your local html text file

Use Keywords Finder TAPoRware Tool in Your Web Page

You can add a button in your web page to extract text contained in specified HTML tags in that page by call TAPoRware cgi script.

Here is the code for the button:

<form method="post" name="htmlForm" enctype="multipart/form-data" target="_blank" action="http://taporware.mcmaster.ca/~taporware/cgi-bin/prototype/keywordFinder.cgi" onsubmit="document.htmlForm.texturl.value=document.location.href">

<input type="hidden" name="source" value="url" />

<input type="hidden" name="texturl" />

<input type="submit" name="doit" value="Get Keywords" />

</form>

-- LianYan - 15 Jun 2007


Use this box to quickly add a comment to the page.

more options...