Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

TAPoR Text Analysis Tutorial

This page provides a tutorial on simple text analysis strategies using the TAPoR Portal.

What's the point?

TAPoR is a portal for:

  • Collecting texts - TAPoR lets you keep a library of references with links to original documents on the web or elsewhere
  • Analyzing texts - You can then pass these texts to tools that analyze the text and then store the results

Warnings: The portal is still being tested and there are certain features that do not work smoothly. For example, in many cases you will have to refresh the screen to see the effect of a change. Please report bugs or inconsistencies using the "Report a problem/ make a suggestion" link in the myHelp panel.

Logging-in to the TAPoR Portal

portalmain.gif

  • Investgate TAPoR: There are three main parts in the portal home page and each part may have more than one panels. The left part contains information about TAPoR panel and electronic text panel. The right part contains TAPoR news panel and others. The middle part is the place that allows you to run the text analysis tools TAPoR portal hosted.
    • The Try It panel on the top-center panel gives you a quick start on what the tools can do for you. Select a text (1), select a tool (2), and click the button (3), you see the result.
    • The TAPoR Tools panel allows you to browse, search and the run the tools by clicking the Try it link at bottom of each tool.
    • the TAPoR Texts panel allows you to search and browse the public text saved in the portal
    • Before requesting an advanced account, you can further investigate the portal by signing up a BASIC account by click on the link "Sign Up for an Account Here" on the left part of the page or under the login box. By fills up the required fields, and submit the form, you will get an confirmation email within half a hour to the email address you privided with the corresponding login information.

Sign Up for an Account Here

  • Set up an Advanced Account: Before being able to log in to the TAPoR portal as an advanced user, an advanced account needs to be set up for you. To request an account, you need to email the account administritor via email at: tapor(at)mcmaster(dot)ca. You will receive an email from the account administritor within a few days indicating what your personal UserID? and Password are.

Login

  • Log In: Enter your Username and Password into the correct fields in the Login area. Please note that this area is case sensitive. Next, click on "Login".

Navigation Menu Explained

header.gif

For most of the users, the Navigation Menu is shown as above. On the left hand side, there are three navigation buttons which direct you to three main function areas of the portal. The INTRO and HELP buttons are found on the right hand side. A popup window will be open when either of them is clicked, however, the contents of the popup window changes with the selection of the three navigation buttons on the left.

  • myTexts: The main purpose of this page is let you define and add texts you want to analyze using the TAPoR tools to the portal. You can edit, delete text as well.
  • Workbench: This page allows you to run tools on any text of your selection. If you are a tool developer, you can register your web services tools to the portal.
  • myPreferences: This page let you set the look and feel of portal interface, add or change you personal information, subscribe and publish news

Details about each function or page will be described below.

Using News

Once you log in, the default page is myTexts. On the low-right corner, you will see recent news items to the news channels that you have subscribed to. These could be news channels on the portal or news from blogs elsewhere.

To choose which channels you want to subscribe to go to myPreferences and select from the list. These will then appear on the news area.

If you have an Advanced account you can create a channel and edit news.

Setting up the electronic texts that you want to analyze

Overview for Setting Up Texts

Once you are logged into the Portal, Yor are in myTexts page. You can begin setting up a text to analyze. These texts are not typically loaded into the portal, but you can provide the information so that you have a full bibliographic reference and the portal can get the text when it needs to.

There are 3 main ways to add texts: (1) Using a Quick Add link. (2) Add your own text by clicking the "Add New" button in the myTexts Library panel, or (3) Add TAPoR public texts in the TAPoR Texts panel to your Library. The "myTexts" Library panel shows you which texts you have available to analyze. If it is your first time using TAPoR, the scrollbar area will be empty.

There is a fourth way to add a text to your Library, but this method does not require you to be logged into the Portal immediately. This method involves clicking a Quick Add button while viewing another site on the Internet. This will then take you to your Tapor account so that you can log in and, once logged in, will add the site’s content as a text to your Library.

The following sections will guide you through the process of adding a text to your library using the various methods available:

Quick Add
Adding Your Own Text
Defining File Format, Tags and Visibility
Defining More Details
MyTexts Library
TAPoR Texts
DUCT TAPoR

Quick Add (Texts)

Quick Add (Texts)
  • When you first login, you will see a popup window which is the introduction to the myTexts page (you can check the check box at the bottom to hide this window. If you want to see this window again, click the INTRO link on the right hand side of the navigation menu). The "Introduction" popup houses the Quick Add option. Here you can select the link of any of the sample texts (i.e. Click here to add "MOVING UPTOWN: Nineteenth-century Views of Manhattan" to your texts), and it will automatically be added to your myTexts library.

Adding your own texts

mylibrary.gif

  • To add your own text, click the "Add New" button in the myTexts Library panel.
  • Add and Edit Texts Panel: This panel appears to the right of the myTexts Library panel after you click the "Add New" button. It allows you to choose a label, source, file format, and tag for each text, and will allow you to choose whether the text’s visibility is private or public.

Label for Text

  • Label for Text: Type a label for your text in the text box under "Label for text". The label you give a particular text is what appears in the "myTexts" panel, so choose a label that clearly identifies the text you are defining.

Text Source

  • Text Source: The "Text Source" area allows you to select where you will retrieve your text from. There are four options for accessing your text: "URL", "File Upload", "Text" and "Aggregate Text".

Text Source: Url Tab

  • "URL": Here, you type the full URL of the webpage you would like to analyze. For example, to study the text on the home page of the CBC News website, enter "http://www.cbc.ca" into the text field provided. NOTE: you must make sure to include the “http://” in your URL. Click on "VIEW" to see the webpage that you have specified to analyze. Ensure that this is the page you want to study; otherwise, correct the URL.

Text Source: File Upload Tab

  • "File Upload": If the text for analysis is located on your computer, you need to upload the file to TAPoR. Click on the "Browse..." button to open up the "File Upload" window. Find where your text file is located on your computer. Highlight the text, and click the "Open" button. NOTE: the text file must be either XML, HTML, TEI, plain text, MEP, DocBook, or TARL. For instance, a document saved as a Microsoft Word Document cannot be uploaded. If you attempt to add a text with an invalid file format for the TAPoR portal, an exclamation mark (!) will appear next to "File Upload". Clicking on the button labeled "..." will allow you to make a new file selection.

Text Source: Text Tab

  • "Text": You can type the text you would like to analyze directly into the text box available when you select the "Text" tab. Type the text, or copy and paste text from a document into the text box. There is no word limit.

aggregtext.gif

  • "Aggregate Text": To combine two or more separate texts into one text for analysis, select "Aggregate Text". This allows you to define as a single text a work that on the Web is spread out over many Web pages. For example a book where each chapter is a separate Web page. To combine Web pages add a full URL into the "URL" text field. NOTE: you must make sure to include the "http://" in your URL. To define the second text, click on "Add URL". A second field appears, within which another URL can be entered. To add a third text, click "Add URL" again, and continue this process until you have defined all the texts you wish to combine.

If you wish to include a text that is already within your "myTexts" library, select the text from the drop down menu within the "Text List" area. Click "Add Text’s URL". Notice that the text has been added to the list of text sources to be combined.

There are small 'up arrow' and 'down arrow' icons located next to each URL. Clicking on the "up arrow" will send the text one above the preceding text, while clicking the "down arrow" will send the text one below the text beneath it.

To delete a URL, select it by clicking the "Select" box next to the URL. You may select multiple URLS for deletion at one time. After making your selections, click "Remove Selected URL(s)".

Defining File Format, Tags, and Visibility

format.gif
Once you have chosen a text source, you can define the file format, tags and visibility.

  • File Format: This allows you to specify the way the text is encoded or to specify its format. For example, a Web page is going to be "HTML", but some scholarly texts on the Web will be in XML formats like "TEI" (Text Encoding Initiative) or "DocBook". This will help you match tools that can operate on specific formats.

  • Tags: Tags allow you to create sections/categories within your myTexts library. These are like keywords and you can have more than one. Select what tag you would like the text to appear under.

To access the Tag Manager, click "More Details". A button labeled "Open Tag Manager" will appear next to the current choice of tags. Click on the button to open the Tag Manager window.

To add a new tag, type in a new tag name in the text field labeled "Tag Name". Click "Add Tag". Click "Close". The new tag will now appear within the "Tags" dropdown menu in the central "Add and Edit Texts" section.

To edit an existing tag, select the tag from the drop down menu next to "Select a Tag to Edit/Delete" and click the "Edit" button. The tag's current name will appear in the area next to "Tag Name". Type in a new name for the tag and click "Update Tag". Finally, click "Close". The tag will now have a new name in your myTexts library. To delete an existing tag, select the tag from the drop down menu next to "Select a Tag to Edit/Delete" and click the "Delete" button. Click "Close". NOTE: You cannot delete tags that contain texts.

  • Visibility: There are two options for visibility: Private, or Public. Select Private for the text to only be accessible through your own TAPoR account, or select Public to make the text available to anyone with a TAPoR account. If you select Public you can share your texts with other people.

  • "Add Text" Button: Once you have completed the steps for adding a text, simply click on the "Add Text" button, and your text will be added to your "myTexts" section. NOTE: You must at least enter a label for the text, choose a text source, specify a tag, and indicate visibility in order to upload a text. The added fields that appear after clicking "More Details" are not mandatory.

Define More Details

You can define more details for the text you are adding to your myTexts library by clicking on the "More Details" button. This allows you to enter enough information for a bibliographic reference that can then be formatted for you in different ways.

  • Type: Specify the type of text by selecting Web Page, Journal Article, Book Chapter, Book, or Other.
  • Authors: Input the author’s last name and first name in the provided fields. To add more authors, click "Add Author" and repeat the process. To delete an author, click within the selection box and hit "Remove Author".
  • Enter date, title, publisher, issue, place of publication, volume, and pages in the provided fields.
  • Secondary Authors: Input the secondary author's last name and first name in the provided fields. To add more secondary authors, click "Add Author" and repeat the process. To delete an author, click within the selection box and hit "Remove Author".
  • Enter secondary title, language, creators, source, rights and description in the provided fields. If choosing "other" for language, the text field becomes active for inputting the other language.

myTexts Library

All of the texts that you add appear in your myTexts library (See the myTexts Library image above).

  • Show: The dropdown menu next to "Show" allows you to choose which texts within your library you would like to see. Selecting "All" will show all of the texts within your collection, regardless of which tag they are categorized under. You may select a specific tag, or category, to limit the list of texts to those within that specific tag.

  • Edit: Select a text within the library of texts. Click "Edit". The "Add and Edit Texts" panel will display the information that is available for that text. Edit any field, and hit "Update Text".

  • Delete: Select a text for deletion. Hit the "Delete" button. The text will be removed from your list of texts. NOTE: Deleted texts cannot be retrieved without going through the "Add Text" process again. Make sure you wish to delete a text before hitting "Delete".

  • View: Select a text, and hit the "View" button. Your text will appear in a new window.

analyze.gif

  • Analyze Text: Select a text in your text library list above and click this button, a new window open as above. The right panel is the text you selected and the left panel is the text tag with compatiable tools and all tools list in the portal server. Select your favorite tool and the tool interface will be displayed in the tool area. Yor can enter the required parameters and run the tool over the text.

TAPoR Texts

The TAPoR Texts panel shows all of the texts that are publicly available to all TAPoR account holders. Therefore, any text that you set to "public" visibility will appear in the TAPoR texts panel. This allows you to add to you account texts that others have defined using the add to favorites button which looks like a "plus" and a "heart".

You can search for a public text three ways: according to the tag it is filed under, what file format it is, or you can view a listing of all the texts and sort them by label, author or date.

Once you have found a text you would like to add to your MyTexts library, simply click on the "Add to myTexts" button (the button with an addition sign and a heart). The text will now appear in your myTexts library.

DUCT TAPoR

D.U.C.T. is a Digitally Unified Collections of Texts (D.U.C.T.) developed by the Electronic Text Centre (ETC), and is an infrastructure that brings together TAPoR content and those functions to be performed against it, most notably the managing and searching of texts and associated interoperability issues. For more information on D.U.C.T. please visit: http://dev.hil.unb.ca/Texts/Engine/display.php?c=brief

The following steps describe how to search for texts within D.U.C.T. from within the TAPoR portal.

  1. Show Duct Search Coplet: Within the myHelp panel, click the link labeled "Show Duct Search Coplet". A panel labeled D.U.C.T Search appears under the Introduction panel.
  2. Keywords: Enter the keywords you wish to search for in the text field labeled "Keywords to include". Enter any keywords that you wish not to be present in the search results in the text field labelled "Keywords to Disallow". You may enter as many keywords as you wish, by simply separating each term with a space.
  3. Perform Search: Select either "All terms must be present", or "At least one term must be present", and then click "Perform Search". Once the results appear, click on any article link to be taken to its location within D.U.C.T. Click the "BACK" button to be taken back to the D.U.C.T. search options.
  4. Reset Form: Clicking "Reset Form" will clear the keywords you have requested to search/omit to allow you to type in a new set of keywords.
  5. Toggle Advanced Search: For more advanced search options, click the "Toggle Advanced Search" button. Enter keywords into any of the fields, and select either "must contain", or "must not contain". Next, hit the "Perform Search" button. Once the results appear, click on any article link to be taken to its location within D.U.C.T. Click the "BACK" button to be taken back to the D.U.C.T. search options.

wbench.gif

Overview for Using the Workbench

The Workbench is the area where you add tools to your tool library and put them to use on your texts. The tools and texts you have available to use are housed in your My Texts Library and My Tools Library. Select a text, select a tool to use on that text, and then hit the "Use Tool on Source Text" button. The Tool Broker will display the options available for the tool you have chosen to use. Once the required information is filled out, you can hit "Submit" to view the results.

Overview for Setting Up Tools

There are 2 main ways to add tools to your myTools Library: click Add Tool in the My Texts and Tools Panel, or clicking on the "Add to My Tools" button (addition and heart symbol) next to a tool in the TAPoR Tools panel, Which is a new window when you search and browse tools in the TAPoR Tools panel. The "myTools" Library panel shows you which tools you have add to use. If it is your first time using TAPoR, the scrollbar area will be empty.

Add Tools from TAPoR Tools Panel

All of the tools available on TAPoR can be found within the "TAPoR Tools" panel. There are 4 ways to search for the tools you would like to use: type, source format, new and popular, and all a-z. To add a tool to your MyTools Library, simply click on the "Add to My Tools" button (addition and heart symbol).

Each tool appears with a brief description of its function. For more information, press "Detailed Info". "Website" will open the TAPoRware website in a new window. To the right of every tool name is written HTML, PLAIN, and/or XML; this indicates what type of text the tool will work with.

To try a tool, click "Try It", or select the arrow icon. Both of these options lead to the "Tool Broker". A "Use Tool" window will appear, within which you will need to input the information necessary for the tool. Click "Submit" to see results. The results will appear in a new window if "Show results in a new window" was selected.

The following describes the 4 ways to search for tools:

  • 1) Type: Allows you to search for the tool you would like to use according to the type of text analysis in does.
    • Search: If you need a tool that runs searches on texts, select "Search". All of the tools that fall within the category of search tools are listed under "Search Tools". The search tools currently available on TAPoR are: Find Concordance (HTML, XML, or plain text), Find Co-occurrence (HTML, XML or plain text), and Find Collocation (HTML, XML or plain text).
    • Text Gathering: If you need a tool that gathers text, select, "Text Gathering". The text gathering tools currently available on TAPoR are: Extract Text (XML), and Googlizer (HTML, XML, or plain text).
    • List and Statistical: If you need a tool that creates lists or gives statistical results, select "List and Statistical". The list and statistical tools currently available on TAPoR are: List Element (XML), List Words (HTML, XML, or plain), HyperPo, Linguistic Micro View, Summarizer (Plain text), Comparator (Plain text), and QMatrix.
    • Visualization: If you need a tool that performs visualization of texts, select "Visualization".
    • Editing: If you need a tool that performs editing functions on texts, select "Editing". The editing tools currently available on TAPoR are: XSL Transformer, TEI Transformer, JTidy Transformer, and Tokenizer (Plain text).
    • Research Support: If you need research support tools, select "Research Support".
    • Miscellaneous: If you need tools that do not fall under any of the previous categories, select "Miscellaneous". The miscellaneous tool currently available on TAPoR is: LiteMorph.

  • 2) Source Format: Allows you to search for the tool you would like to use according to the format of the source text.
    • HTML Tools: If you need a tool that works on an HTML source, select "HTML Tools".
    • XML Tools: If you need a tool that works on an XML source, select "XML Tools".
    • Plain Text Tools: If you need a tool that works on a plain text source, select "Plain Text Tools".

  • 3) New and Popular: Allows you to select tools to use according to whether they are new or popular.
    • Top 10 Tools: If you would like to use a tool that falls within the category of Top Ten Most Popular Tools, select "Top 10 Tools".
    • New Tools: If you would like to use a tool that is recent, select "New Tools".

  • 4) All A-Z: Allows you to search through an alphabetical listing of all the tools available.

Add Tool Button

addtools.gif

Click the ADD Tool button, a "Add and Edit tools" panel opens to the right.

  • See how a tool works: select a tool in the available tool list and click USE Tool button, you can run the tool before add it to you tool library.
  • Add tools: Select a tool, modify the tool label if you like, add some note for the tool, select a category (tag) you want the tool belongs to, and click the Add Tool button at the bottom. Do not forget to click the Continue button appear at the top.
  • If you want to add/delete/edit tags, click the Open Tag Manager button.

myTools Library

All of the tools that are added appear in the myTools library. The following are the buttons that are available in the library and a description of their function in addition the ADD Tool button.

  • Edit: Select a tool within the library of tools. Click "Edit". The "Add and Edit Tools" panel will display the information that is available for that tool. Editing the tool label changes the name of the tool within your myTools library. You can edit the notes, about the tool, or the tag that the tool is categorized under. To apply changes, hit "Update Tool".

  • Delete: Select a tool for deletion. Hit the "Delete" button. The tool will be removed from your list of texts. NOTE: Deleted tools cannot be retrieved without going through the "Add" process again. Make sure you wish to delete a tool before hitting "Delete".

  • Use Tool on Source Text:

    • Selecting a text: In the "Text" area of the my Texts and Tools panel, select whichever text you would like to analyze. To view the text (to make sure that it is the text you intent to use, etc), you may click on the "View Text" button. The text will appear in a new window.

    • Selecting a tool: Once a text has been selected in the "Text" section above, certain tools become highlighted in blue. The blue tools are those that can be used to analyze the text. The tools that are not highlighted cannot be used on the text chosen, due to the format of the text. Select which tool you would like to use on the chosen text.

    • Use Tool on Source Text: Once a text and a tool (that was highlighted in blue) have been chosen, click on the "Use Took on Source Text" button. The Tool Broker in the center column will dislay further fields that need to be filled out.

Tool Broker

The Tool Broker displays the option available for the tool that was chosen. At the top of the tool broker, in bold text, is the name of the chosen tool and the format of text compatible with the tool in brackets; for example, List Words (HTML).

  • Source Text: The Source Text area allows you to define which text to analyze with the chosen tool.
    • MyTexts: If you have already chosen a tool from the My Texts and Tools panel, the "myTexts" radio button will be selected, and the text you chose will be displayed in the dropdown menu beneath. If you would like to change to another text that is in your myTexts Library, simply drop down the menu, and select a different text. Note that only texts that are compatible with the chosen tool will appear in the dropdown menu.
    • Data Bench: Select the radio button next to "Data Bench". Select a data set stored in your Data Bench to analyze it with the chosen tool.
    • URL: Select the radio button next to "URL". Type in the URL of the web page you would like to analyze. Note: you must enter the full URL including "http://".
    • Upload Text: Select the radio button next to "Upload Text". Click on "Browse" and find the document you would like to analyze with the selected tool. Ensure that the document is in the format that corresponds with the tool - for instance a .html page for a tool that analyzes HTML. Note: Uploading a text within the Workbench will not store it in your MyTexts Library for future use. If you would like the text to reside in your MyTexts Library, upload it via the MyTexts area of the portal (See Section 4 of the tutorial).
    • Text: Select the radio button next to "Text". Type in or Copy and Paste whatever text you would like analyzed into the text field.

  • Parameters: In the Parameters area, you must fill in all the required fields (indicated by an asterics '*'). Then click the "Submit" button. If the "Show results in new window" box was checked, your results will appear in a new window.

  • Results: When your results appear, you can choose whether to save the results to your Data Bench, or to save them to your computer.
    • Save Results to Data Bench: The Data Bench is where you can store the results of your projects within your own myTAPoR account. To store the results, simply click on "Save Results to Data Bench". Next, the "Save Results" window will appear. Here, you will enter a label for the results, and any additional notes. Click "Save Results". Your results will now be saved in the Data Bench with the label you provided.
    • Save to My Computer: If you would like the results of your analysis to be saved onto your computer, click on "Save to My Computer". Note: the results will be saved as an XML document.

Data Bench

Databench

The Data Bench is where your saved results are stored. All of the stored results are listed in the dropdown menu. Each saved result is listed with a reference number (Ref#), label and tool. The reference number is assigned according to the order in which the results were saved. The Label is whatever you assigned the label to be after clicking "Save Results to Data Bench". The tool that is listed next to the label indicates which tool was used to generate the saved result.

  • ViewDetails: Select a data set from the dropdown list. Hit "ViewDetails". A new window will appear with the details of that result set, including:
    • Result label: the label you assigned the result
    • Tool Used: the tool that was used to generate the result
    • Your Notes: the notes you added with the results
    • Results generated on: the date and time the results were generated
    • the arguments you used to produce the results.

  • ViewResult: Select a data set from the dropdown list. Hit "ViewResult". A new window will appear displaying the results.

  • ViewResultCode: Select a data set from the dropdown list. Hit "ViewResultCode". A new window will appear displaying the HTML code of the results page.

  • Save to My Texts: Select a data set from the dropdown list. Hit "Save to My Texts". The results will be saved to your MyTexts Library, and will appear according to the label you assigned it.

  • Save to My Computer: Select a data set from the dropdown list. Hit " save to My Computer". The results will be saved to your computer as an XML document.

What next?

Now that you are familiar with the TAPoR, the next step is to try using it to accomplish some real world Text Analysis tasks. We have developed a series of recipes which are designed to accomplish this. There are three simple recipes which are a great introduction to the portal.

-- ShawnDay - 3 November 2006


Use this box to quickly add a comment to the page.

more options...