|
XTeXT and the TAPoR Portal
|
|
What is XTeXT?
XTeXT is a text search and retrieval/web application platform provided to the TAPoR project by isagn inc. It is based on search technology developed at the University of Waterloo.
|
< < |
It is fast, scalable, extensible and under continuous improvement.
|
> > |
It is fast, scalable, extensible and under continuous improvement. Read more about XTeXT or XTeXT workshops.
|
|
How can you use XTeXT through the portal?
Here is a quick introduction which assumes you have an appropriate TAPoR account set up. The following steps walk you through creating a repository, adding text to it and using the XTeXT tools to search the repository and retrieve relevant results for use with other tools.
|
|
- Go to Workbench
- element names
- select the repository name from the list.
|
< < |
-
-
- select 'List Elements' tool.
|
> > |
-
-
- select 'List XML Elements' tool.
|
|
-
-
- click 'Use Tool on Source Text'
- when the Tool Broker panel appears, click submit.
- result: a list containing the element names and their counts is returned.
- element retrieval
|
< < |
-
-
- select 'Extract XML element'.
- provide an element name from the previous results.
|
> > |
-
-
- select 'Extract XML elements'.
- provide an XML element name occurring in the previous results.
|
|
-
-
- submit.
- result: a list of all of the named elements is retrieved.
- concordance listing
|
< < |
-
-
- still in 'Search in XML element'.
|
> > |
-
-
- still in 'Find in XML element'.
|
|
-
-
- provide terms to seek. this can be a word or phrase.
|
< < |
-
-
- provide the number of words to retrieve on either side of the match.
|
> > |
-
-
- optionally, provide span, the number of words to retrieve on either side of the match and limit, the maximum number of results to return.
|
|
-
-
- submit.
- result: a concordance listing of matches occuring within the selected elements is returned.
|
|
XTeXT and the TAPoR Portal
|
|
It is fast, scalable, extensible and under continuous improvement.
How can you use XTeXT through the portal?
|
< < |
Here is a quick tutorial which assumes you have an appropriate TAPoR account set up.
|
> > |
Here is a quick introduction which assumes you have an appropriate TAPoR account set up. The following steps walk you through creating a repository, adding text to it and using the XTeXT tools to search the repository and retrieve relevant results for use with other tools.
<!--
|
|
- you add a repository to
myTexts, then load texts (e.g. all your XML files) to that repository.
- then you use the XTeXT tools to search the repository and retrieve relevant results for use with other tools.
|
< < |
|
|
- you can also add more text later.
|
> > |
-->
|
|
- Go to myTools
- click 'Click here to add XTeXT tools'.
|
|
-
-
-
- from your aggregate texts.
- click 'Add'
- Go to Workbench
|
> > |
-
- element names
|
|
-
- select the repository name from the list.
- select 'List Elements' tool.
- click 'Use Tool on Source Text'
- when the Tool Broker panel appears, click submit.
|
< < |
-
- a list of the element names and counts is returned.
-
- select 'Search in XML element'.
|
> > |
-
-
- result: a list containing the element names and their counts is returned.
- element retrieval
- select 'Extract XML element'.
|
|
-
- provide an element name from the previous results.
- submit.
|
< < |
-
- a list of all of those elements is retrieved.
|
> > |
-
-
- result: a list of all of the named elements is retrieved.
- concordance listing
|
|
-
- still in 'Search in XML element'.
- provide terms to seek. this can be a word or phrase.
- provide the number of words to retrieve on either side of the match.
- submit.
|
< < |
-
- a concordance listing of matches occuring within selected elements is returned.
|
> > |
-
-
- result: a concordance listing of matches occuring within the selected elements is returned.
|
|
History of XTeXT TAPoR interface
|
|
XTeXT and the TAPoR Portal
|
|
It is fast, scalable, extensible and under continuous improvement.
How can you use XTeXT through the portal?
|
> > |
Here is a quick tutorial which assumes you have an appropriate TAPoR account set up.
|
|
|
< < |
It is a two stage process:
- you add a repository to
myTexts, then load texts (e.g. all your XML files) to that repository,
|
> > |
- you add a repository to
myTexts, then load texts (e.g. all your XML files) to that repository.
|
|
- then you use the XTeXT tools to search the repository and retrieve relevant results for use with other tools.
|
> > |
- you can also add more text later.
- Go to myTools
- click 'Click here to add XTeXT tools'.
- Go to myTexts
- click 'Add Repository'
- provide a name and description.
- click 'Add'
- click 'Load Text to Repository'.
- select the respository name.
- indicate the source of the input text:
- from a URL.
- from uploaded file.
- from typed/pasted input.
- from your aggregate texts.
- click 'Add'
- Go to Workbench
- select the repository name from the list.
- select 'List Elements' tool.
- click 'Use Tool on Source Text'
- when the Tool Broker panel appears, click submit.
- a list of the element names and counts is returned.
-
- select 'Search in XML element'.
- provide an element name from the previous results.
- submit.
- a list of all of those elements is retrieved.
-
- still in 'Search in XML element'.
- provide terms to seek. this can be a word or phrase.
- provide the number of words to retrieve on either side of the match.
- submit.
- a concordance listing of matches occuring within selected elements is returned.
|
|
History of XTeXT TAPoR interface
- We had a demonstration for it working at the Face of Text conference. Now we are working at the full implementation
|
|
XTeXT and the TAPoR Portal
|
|
What is XTeXT?
|
> > |
XTeXT is a text search and retrieval/web application platform provided to the TAPoR project by isagn inc. It is based on search technology developed at the University of Waterloo.
It is fast, scalable, extensible and under continuous improvement.
|
|
How can you use XTeXT through the portal?
|
< < |
- The idea is that a user can add texts to a repository and then can use XTeXT tools to search their repository.
|
> > |
It is a two stage process:
- you add a repository to
myTexts, then load texts (e.g. all your XML files) to that repository,
- then you use the XTeXT tools to search the repository and retrieve relevant results for use with other tools.
|
|
History of XTeXT TAPoR interface
- We had a demonstration for it working at the Face of Text conference. Now we are working at the full implementation
|
< < |
Specifications for the interface (how do we want it to work)
|
> > |
Specifications for the interface
|
|
|
< < |
- Users will be able to create, delete, edit, and add to repositories. (Will it only be advanced users? What will be the storage issues? How the interface for this work?)
- Repositories could be exported and imported
- There will be one or more XTeXT tools that are in the TAPoR tools list that can be used on repositories that are public or belong to me. These might include:
- A XML element lister
- An XML extractor
- A find tool that searches the repository. It might have a XPath field to restrict the search. It generates a concordance.
|
> > |
- Users can create, delete, edit, and add to repositories. (Will it only be advanced users? What will be the storage issues? How the interface for this work?)
- There are XTeXT tools in the TAPoR tools list that can be used on repositories that are public or belong to you. These include:
- An XML element lister; returns a list of element names and counts.
- XML element extractor; returns a list of the matching elements. elements are specified by XPath or element names.
- find tools; return a concordance for a term or phrase in the repository. search is restricted by XPath or element names.
- These are tools not in place at present:
|
|
-
- Advanced find
- A list the texts tool that gives you a list of texts in a repository (and the XPath info to restrict a search to just that text)
- A word list tool that generates a list of words and counts
- Cooccurence and Collocates (Nice, but can be done with two passes)
|
> > |
Further neat ideas for later projects
|
|
- There might be advanced tools that will be developed later.
|
> > |
-
- Repositories could be exported and imported
|
|
-
- Comparison tool that compares two subsets of the repository (comparison of relative frequencies) (Later)
- For stuff with date tags like an RSS feed - be able to do a date search and get a distribution graph of some sort. (Later)
- Add data mining and cluster searching facility. Not sure what this would be, but it could be cool. (Later)
|
|
- Users should be able to launch a bot that gathers stuff for them.
- Allow XTeXT searching to happen from different sites - so that we can have installations at multiple sites. (????)
|
> > |
- Export and Import for Basic users without diskspace
- Look at indexing RSS feeds and other materials automatically.
- Develop bots that add stuff automatically to XTeXT repositories
|
|
To do
|
< < |
- Install a version on TAPoR 1 so that it can be adapted to work with the portal
- Install for other nodes a version they can test and play with
- Figure out the interface for managing repositories
- Get XTeXT working as a Web service through the portal. Get the tools working through the portal (even if the repository is a test one).
- Get the managing interface integrated into the portal so we can create, edit, delete and add to repositories.
|
|
- Get the public private model working
- Testing and fixing
- Develop the user manuals and a tutorial on using XTeXT through the portal (shared)
- Develop an installation manual for XTeXT for nodes and train them to install
- Backup of repositories
|
< < |
Further neat ideas for later projects
- Export and Import for Basic users without diskspace
- Look at indexing RSS feeds and other materials automatically.
- Develop bots that add stuff automatically to XTeXT repositories
|
> > |
Done
- Install a version on TAPoR 1 so that it can be adapted to work with the portal
- Install for other nodes a version they can test and play with
- Figure out the interface for managing repositories
- Get XTeXT working as a Web service through the portal. Get the tools working through the portal (even if the repository is a test one).
- Get the managing interface integrated into the portal so we can create, edit, delete and add to repositories.
|
|
-- GeoffreyRockwell - 21 Sep 2005
|