Find text within XML document or specified element
See
http://taporware.mcmaster.ca/~taporware/xmlTools/findtext.shtml
Description
This tool can find text anywhere in an XML document using the Find Text tool. The search can be narrowed to specified elements or attributes, and all results are returned with a concordance of either words, sentences, lines or surrounding elements.
History
Pseudocode
- Obtain XML string by URL or from user's local disk
- Obtain text contained by user specified element/attribute
- Find user specified word/pattern along with user specified context -- concordance
- Generate output of concordance and word lists before and after the user specified word/pattern in the concordance text
Ways of Using
- Enter a valid URL in the URL field or enter (browse) a local path to upload xml text
- Enter valid xml names (xpaths) seperated by comma, default is "//"
- Optionally, enter valid attribute name and value in pairs
- Enter word or pattern in the corresponding
Word/Pattern field
- Select the context of concordance and the length of context
- Select output format
- If you want the results displayed in the same window with taporware interface, uncheck the check box - "Open results in new window"
- Finally, click the "Submit" button
CGI Interface
If you want to use this tool from your web site, here is the CGI Interface:
(
Note: If you want to upload local html text to the tool, you need to use attribute name/value pair: enctype="multipart/form-data" within the form tag)
Here are the parameters:
| Parameter Name | Parameter Value | Control Type | Default | Discription |
| source | url/local | radio button | url | Let user select input text (either a url or upload local xml text) |
| xmlurl | | text | | A valid URL that the pointed document should be an xml text |
| localFile | | file | | The path to your local xml text file |
| xmlpath | | text | // | Valid xml element names (xpaths) in the input xml text separated by comma |
| attr_name | | text | | Valid attribute name in the input xml text |
| attr_value | | text | | Valid attribute value corresponding to the attribute name above |
| pattern | | text | | unix styled pattern of the concordance |
| dispop | 1/2 | radio button | 1 | indicate whether xml elements are included in the analysis process, 1 -- ignore elements, 2 -- use elements |
| notags | 1/2/3 | selection | Words (1) | context type with ignoring tags corresponding the values in the parameter value field: Words/Lines/Sentences. |
| ctlen | | text | 5 | context length corresponding to the selected context above |
| showtag | 1/2 | radio button | 1 | this control is related to dispop (2), 1 -- use closest elements as conext , 2 -- user needs to specify an element in the field below |
| surtah | | text | | enter element name surrounding the pattern as context |
| HowToList | 1/2/3/4 | selection | 1 | Display format which are HTML/XML text in HTML/XML tree/Tag delimited text in the order of parameter values |
| beforeafter | | checkbox | unchecked | indicate whether to display words before and after the pattern (HTML only) |
| taporface | | checkbox | checked | display result in a new window without graphics interface (default) or with taporware interface in the same window |
Web Service Interface
Taporware provides web services to any non-benefit organizations. here is the taporware web services infomation:
- Endpoint URL: http://taporware.mcmaster.ca:9982
- Service URI: http://taporware.mcmaster.ca/~taporware/webservice
- Service Method: find_Concordance_XML
- parameters:
- xmlInput -- any well formed xml string
- element -- any xml element name in the input xml text
- attributeName -- attribute name found in the input xml text
- attributeValue -- attribute value corresponding to the attribute name above, must be enter in pairs
- pattern -- unix styled pattern or regular expression
- contextOption -- indicate ignoring tags (1) or including tags (2) in the process
- optionSelection1 -- the meaning of this parameter depends on the selection above. if
contextOption is 1, then ignore tags, therefore the values of 1/2/3 stand for word/line/sentence. if contextOption is 2, then the values of 1/2 stand for closest tags/user specified tags, and value 3 is not used.
- optionSelection2 -- The meaning of this parameter depends on the
contextOption and optionSelection1. If you select "Ignore tags" above, enter a digital number here for the length of words/lines/sentences. If you select "use tags" and "surrounding Element' above, enter a valid xml element name.
- outFormat -- values are same as parameter "finddisp" in the CGI interface above
Known Bugs
To Do
--
MattPatey - 13 Oct 2005