Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

Compendium Code Archiving

The Globalization Compendium currently resides on a server running Slackware Linux at McMaster. This server is pointed at from http://www.globalautonomy.ca. The Compendium is running on tapor1.mcmaster.ca. The operating environment of this server is detailed here. The Compendium user-facing process rovides information stored as TEI encoded XML and from data stored in a mySQL database. See Content section for more info on the document's pre-server life as it is turned into TEI-encoded XML docuemtns by a staff working for project editor. This document describes the process by which material is received from the editor and trsanformed into data for use by the Compendium.

Adding Documents to the Compendium

The way in which the data received is processed depends on the type of document being received.
  • If it is a research paper, position paper, south-north dialogue item or research summary ...
    1. The Editor validates the XML code via the editor tool.
    2. The Compendium editor uploads a TEI encoded text file to the Compendium server via the editor tool.
    3. The text file is run through the TAPoR Extract Text tool to create an untagged text file.
    4. The untagged text file is used with XIndice to add document terms to the master index.
    5. The TEI-encoded text file is placed in a directory for nightly indexing via Lucene.
    6. The TAPoRWare Tagger is used to find glossary items in a document and to tag them with references to the glossary item.
    7. Lucene walks through the directory document by document extracting title, author, category, date and other meta data to add to the index.
    8. The original document is storied in the ori* directory and the transformed XML is storied in a directory based on its entry type.
  • If it is a glossary entry,..
    1. The Compendium editor uploads a TEI encoded textfile to the Compendium server.
    2. The textfile is run through the TAPoR Extract Text tool to create an untagged textfile.
    3. The untagged text file is used with XIndice to add document terms to the master index.
    4. The TAPoRWare Tagger is used to find glossary items in a document and to tag them with references to the glossary item, in this specific case excluding the glossary term of the document itself.
    5. The Glossary item is merged into the master glossary document.
  • If it is a bibliographic entry...
    1. The editor verifies that the entry has not been previsouly entered.
    2. Bibliographic entries are made directly into the MySQL database by the Compendium editor
    3. A flatfile export of the MySQL bibliography database is created and called bibl_data.txt.
  • Images and Figures
    1. These are managed manually by the editor working with the system administrator.
    2. They are oded directly into the XML file and are stored in a figures directory for access by the compendium when serving an entry.

The code base for the Compendium exists as a collection of DTDs, XSL and Java routines that are bound to a series of frameworks for use with the Tomcat Servlet container.

    1. An archive of the complete tomcat directory is stored here.
    2. An explanation of the individual files and directories comprising the Compendium is available on this page?.

Output

When a user requests a document from the Compendium...
  1. The server consults the index and retrieves the XML file from the appropriate directory.
  2. An XSL Transformation is applied to retrieve appropriate titles, glossary entries and these are is output as an fully formed HTML document.
  3. To the top of each entry page is added three options: Print (display light HTML version), PDF (processed using FOP) and XML (which displays the source XML of the document). These options are triggered via the appropriate .jsp file depending on the entry type.

Best Archiving Practise

Documentation


800px-Uml_diagram.svg.png

Best practises for code documentation specify that the lines of code be documented by the programmer. Ideally, this line by line documentation is written such that it can be compiled using a documentation generator such as JavaDoc? that can aggregate the inline documentation into an HTML or XML document to accompany the code proper. Additionally UML should be provided to abstract the Compendium as an object model. The individual objects in the UML should be linked to the actual code as developed. This can facilitate the sharing for the implicit logic of the code as written and also allow for cross propagation of the code base between alternate development environments. The technical specifications of the system should be compiled into a working document. Ideally this will also contain developer commentary to justify tool choices along with current version references, links to tool source code if possible.

Printing

The actual code developed for serving this should be committed to archival grade paper and filed along with the content locally at McMaster library as well as the National Library of Canada. Additionally the UML diagram and linked documentation of the system model should accompany this printed material.

Digital Storage

The source code itself will be committed via a tarball of the working directory as well as printout of the functions themselves. The Bibilographic and contributor database is currently contained in a MySQL? database and should be exported as a query file to allow for its replication. It should also be exported as an text flatfile that would allow for its import into alternative data sources.

To Do

  • List of code files, what they do, what they interact with
  • File structure of code and content
  • Technical description with specifications and dependencies?

Appendix

-- ShawnDay - 6 June 2008


Use this box to quickly add a comment to the page.

more options...