Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

Text Analysis Experiment, November 14, 2007

N.B.: We ran out of time for this experiment but we're keeping the text as there were some valuable insights.

Experiment run on Novermber 14th by Geoffrey Rockwell and Stefan Sinclair on recent (US) Presidential Debates.

Planning

  • iterative refinement of corpus and analysis
    • gathering corpus
    • first tools
    • clean up of corpus
    • more tools
    • enrichment of corpus
    • write-up
  • developing a voice and style for academic text analysis

Theoretical Reflections

  • an important step in working with a corpus is developing a sense of what relevant text units are
  • alternate practices (or sometimes phases) for text analysis
    • figuring out what you can figure out with text analysis (questions about tools)
    • exploratory text analysis (formation of questions or hypotheses about corpora)
    • directed text analysis (answering questions or testing hypotheses)
  • rigour and consistency of documenting process dependent on which practice (from above) you're engaged in

Hypotheses

  1. text analysis is iterative (things change as you're going along)
  2. 3 phases to text analysis
    1. gathering
    2. cleaning and enrichment
    3. analyein (unloose, break apart)
    4. data synthesis
    5. conceptual synthesis
  3. reporting text analysis involves 2 necessary components
    1. questions asked and interpretations made
    2. process for repeating process
  4. academic text analysis needn't necessarily engage with secondary literature
    • academic text analysis can be a self-sufficient step
    • academic text analysis can also sometimes lead to a subsequent step of integrating with secondary scholarship
  5. all analysis is comparison
  6. inter-chunk (text unit) comparison is most fruitful for interpretation

Steps

Gather Corpus

Done within TAPoR Portal

Prepare (Clean and Enrich) Texts

We found we tended to extract the plain text from web pages and save a version of the text as plain. We used tags to be able to manage the variant texts. We also saved a combined plain text.

  • In theory we could start adding light XML to be able distinguish the Republican and Democratic.
  • To make the text useful for tools outside the portal (TAPoRware Compare, for example) we made them public.

Analysis

  • One thing that is useful is to gather tools you want to use in the Workbench.

Notes

  • documenting sources
  • look for multiple sources
  • web-based versions often best if available in single page, print edition

first tools

clean up of corpus

more tools

enrichment of corpus

write-up

Results

Interface & Tool Issues

  • revisit taporize
    • can we have widgets resize fluidly
    • can we have more metadata populated
    • should we drop skin or a simpler skin
  • Make Comparator a Web Service TAPoR tool
  • Comparator needs to produce decent HTML

Wish List

  • tools need to be able to compare to control corpora
  • interface for drag-n-drop organizer from lists
  • repository that supports
    • hierarchy of texts
    • alternate text sources
    • track context of source
    • store content of URL and subset of content (example: select text on web page, "add it", which takes URL with full content but also marks sub-content
  • a fill-it tool that populates TAPoR Text bibliographic entries based on metadata (as available)
  • TAPoR search should support finding all occurrences of a word (not just single matches)
  • If TAPoRware had a "get" method then we could paste a URL that ran the tool (with parameters) right in blogs, portal Research Logs and elsewhere.

What Next

  • Watch the videos
  • Move over into doing critical summarization
  • Explore some more

-- StefanSinclair - 14 Nov 2007


Use this box to quickly add a comment to the page.

more options...