Main.TAPoRwareXMLTokenize (r1.1 vs. r1.5)
Diffs

 <<O>>  Difference Topic TAPoRwareXMLTokenize (r1.5 - 28 Mar 2007 - LianYan)

META TOPICPARENT TAPoRware

Tokenize XML document or specific elements

See http://taporware.mcmaster.ca/~taporware/xmlTools/tokenize.shtml
Line: 53 to 53

Changed:
<
<
  • Service Method: tokenizer_HTML
>
>
  • Service Method: tokenizer_XML

  • parameters:
Changed:
<
<
    • htmlInput -- any html text
    • htmlTag -- any valid html element names separated by comma
    • tokenType -- A selection for token type: Word/Sentence/Paragraph/char/pat
    • option1 -- values depend on the "tokenType". If you select "character" as token type, fill this field with character token. If you select "pattern" as token type, "unix" or "regexp", no quotation need to be filled in this field to indicate the pattern type.
    • option2 -- Used when select pattern as tokentype only. Its value depends on "option1". Enter the right pattern accordingly.
>
>
    • xmlInput -- any well-formed xml text
    • element -- any valid xml element names or xpath in the submited xml document
    • attributeName -- an attribute name existed in the xml document
    • attributeValue -- an attribute value existed in the xml document. You must specified attribute name is this one is given
    • tokenType -- A selection for token type: Word/Sentence/Paragraph/char/pat/elem
    • option1 -- values depend on the "tokenType". If you select "character" as token type, fill this field with character token. If you select "pattern" as token type, "unix" or "regexp", no quotation need to be filled in this field to indicate the pattern type. If "element as separator" is selected, enter an valid xml element name in this field.
    • option2 -- Used when select "pattern" or "element" as tokentype only. If "pattern as separator is selected, its value depends on "option1". Enter the right pattern accordingly. If "element as separator" is select, Enter "Yes" to keep element with token, enter "No" to ignore element

    • displayOption -- Indicate ho to treat separators. The values are the same as the parameter "dospop" in the CGI interface above
Changed:
<
<
    • outFormat -- values are the same as the parameter "HowToList" in the CGI interface above
>
>
    • outFormat -- values are html for "MHTL", xml for "XML tree" and anyhing for "XML in HTML text"

Known Bugs


 <<O>>  Difference Topic TAPoRwareXMLTokenize (r1.4 - 26 Mar 2007 - LianYan)

META TOPICPARENT TAPoRware

Tokenize XML document or specific elements

See http://taporware.mcmaster.ca/~taporware/xmlTools/tokenize.shtml
Line: 36 to 36

xmlpath   text // Valid xml element (tag) name or multple xml element names separated by comma
attr_name   text   Valid xml attribute name
attr_value   text   Valid xml attribute value
Added:
>
>
token Word/Line/Sentence/ Paragraph/char/elem/pat radio Word A radio button group allows you to select token type
character   text   If you select "characters" as token type, you need to fill this field
element   text   If "Separate on tags" is selected, this field need to to be filled with a valid xml tag of the submited xml document
keeptag   checkbox unchecked Check it to keep the element with tokens when select "Separate on tags"
pFormat unix/regexp radio button unix When select "Pattern" as token type, this control is used to indicate the pattern type -- unix style of regular expression
unix   text   When "Pattern" and "Unix" are selected, enter a unix styled pattern here
regexp   text   When "Pattern" and "regexp" are selected, enter a regular expression here
dispop 1/2/3/4 selection 1 Indicate how to treat separators. The ways to deal with separators are in order of 'Parameter value' Strip separator/Keep separator as token/Keep with previous token/Keep with following token
HowToList? 1/2/3/4 selection 1 Indicate how to display the results in the browser. Thers are in order of the "parameter value" respectively HTML/XML text in HTML/XML tree/Tab delimited text
taporface   checkbox checked Indicate if displayed with the taporware interface

Web Service Interface

Added:
>
>
Taporware provides web services to any non-benefit organizations. Here is the taporware web services infomation:

  • Endpoint URL: http://taporware.mcmaster.ca:9982
  • Service URI: http://taporware.mcmaster.ca/~taporware/webservice
  • Service Method: tokenizer_HTML
  • parameters:
    • htmlInput -- any html text
    • htmlTag -- any valid html element names separated by comma
    • tokenType -- A selection for token type: Word/Sentence/Paragraph/char/pat
    • option1 -- values depend on the "tokenType". If you select "character" as token type, fill this field with character token. If you select "pattern" as token type, "unix" or "regexp", no quotation need to be filled in this field to indicate the pattern type.
    • option2 -- Used when select pattern as tokentype only. Its value depends on "option1". Enter the right pattern accordingly.
    • displayOption -- Indicate ho to treat separators. The values are the same as the parameter "dospop" in the CGI interface above
    • outFormat -- values are the same as the parameter "HowToList" in the CGI interface above

Known Bugs

To Do

Changed:
<
<
-- MattPatey - 13 Oct 2005
>
>
-- LianYan - 26 Mar 2007


 <<O>>  Difference Topic TAPoRwareXMLTokenize (r1.3 - 01 Nov 2006 - LianYan)

META TOPICPARENT TAPoRware

Tokenize XML document or specific elements

See http://taporware.mcmaster.ca/~taporware/xmlTools/tokenize.shtml
Line: 8 to 8

Description

This tool splits an XML document at specified points, or tokens. These tokens can be words, lines, sentences, paragraphs, characters, patterns, or tags. The results can be listed with the token removed or preserved before or after the split.
Deleted:
<
<

History


Pseudocode

Changed:
<
<
>
>
  • Get user submitted xml text from specified URL in the internet or from user's local disk
  • Get the sub-text user want to tokenize specified by element name, and/or attribute
  • Run tokenizer program
  • Format output

Ways of Using

Changed:
<
<
>
>
  • Enter a valid URL in the URL field or enter a local upload xml text
  • Enter a valid xml element name or or xpath existed in the xml text, the default is "//"
  • Select token type
  • Select the way to treat separators
  • Select output format
  • Click the submit button

CGI Interface

Added:
>
>
If you want to use this tool from your web site, here is the CGI Interface: (Note: If you want to upload local xml text to the tool, you need to use attribute name/value pair: enctype="multipart/form-data" within the form tag)

Here are the parameters:

Parameter Name Parameter Value Control Type Default Discription
source url/local radio button url Let user select input text (either a url or upload local xml text)
xmlurl   text   A Valid URL pointing to an xml text
localFile   file   The path to your local html text file
xmlpath   text // Valid xml element (tag) name or multple xml element names separated by comma
attr_name   text   Valid xml attribute name
attr_value   text   Valid xml attribute value

Web Service Interface

Known Bugs


 <<O>>  Difference Topic TAPoRwareXMLTokenize (r1.2 - 15 Oct 2005 - MattPatey)

META TOPICPARENT TAPoRware
Added:
>
>

Tokenize XML document or specific elements

See http://taporware.mcmaster.ca/~taporware/xmlTools/tokenize.shtml

Description


This tool splits an XML document at specified points, or tokens. These tokens can be words, lines, sentences, paragraphs, characters, patterns, or tags. The results can be listed with the token removed or preserved before or after the split.
Added:
>
>

History

Pseudocode

Ways of Using

CGI Interface

Web Service Interface

Known Bugs

To Do


-- MattPatey - 13 Oct 2005

 <<O>>  Difference Topic TAPoRwareXMLTokenize (r1.1 - 13 Oct 2005 - MattPatey)
Line: 1 to 1
Added:
>
>
META TOPICPARENT TAPoRware
This tool splits an XML document at specified points, or tokens. These tokens can be words, lines, sentences, paragraphs, characters, patterns, or tags. The results can be listed with the token removed or preserved before or after the split.

-- MattPatey - 13 Oct 2005


Topic: TAPoRwareXMLTokenize . { View | Diffs | r1.5 | > | r1.4 | > | r1.3 | More }

Revision r1.1 - 13 Oct 2005 - 21:51 - MattPatey
Revision r1.5 - 28 Mar 2007 - 13:46 - LianYan