Main.PaperCode (r1.1 vs. r1.8)
Diffs

 <<O>>  Difference Topic PaperCode (r1.8 - 06 Jun 2008 - ShawnDay)

META TOPICPARENT ProblemOverview

Compendium Code Archiving

TOC: No TOC in "Main.PaperCode"

Changed:
<
<
The Globalization Compendium currently resides on a server running Linux at McMaster. The operating environment of the server is detailed here. It receives TEI encoded articles from the Compendium editor. See Content section for more info on the document's pre-server life. This document describes the nature of the code itself and best practises for archiving it.
>
>
The Globalization Compendium currently resides on a server running Slackware Linux at McMaster. This server is pointed at from http://www.globalautonomy.ca. The Compendium is running on tapor1.mcmaster.ca. The operating environment of this server is detailed here. The Compendium user-facing process rovides information stored as TEI encoded XML and from data stored in a mySQL database. See Content section for more info on the document's pre-server life as it is turned into TEI-encoded XML docuemtns by a staff working for project editor. This document describes the process by which material is received from the editor and trsanformed into data for use by the Compendium.

Adding Documents to the Compendium

The way in which the data received is processed depends on the type of document being received.
Changed:
<
<
  • If it is a research paper, position paper, or research summary ...
    1. The Compendium editor uploads a TEI encoded text file to the Compendium server.
    2. The text file is run through the TAPoR Extract Text tool to create an untagged text file.
    3. The untagged text file is used with XIndex to add document terms to the master index.
    4. The TEI-encoded text file is placed in a directory for nightly indexing via Lucene.
    5. The TAPoRWare Tagger is used to find glossary items in a document and to tag them with references to the glossary item.
    6. Lucene walks through the directory document by document extracting title, author, category, date and other meta data to add to the index.
>
>
  • If it is a research paper, position paper, south-north dialogue item or research summary ...
    1. The Editor validates the XML code via the editor tool.
    2. The Compendium editor uploads a TEI encoded text file to the Compendium server via the editor tool.
    3. The text file is run through the TAPoR Extract Text tool to create an untagged text file.
    4. The untagged text file is used with XIndice to add document terms to the master index.
    5. The TEI-encoded text file is placed in a directory for nightly indexing via Lucene.
    6. The TAPoRWare Tagger is used to find glossary items in a document and to tag them with references to the glossary item.
    7. Lucene walks through the directory document by document extracting title, author, category, date and other meta data to add to the index.
    8. The original document is storied in the ori* directory and the transformed XML is storied in a directory based on its entry type.

  • If it is a glossary entry,..
    1. The Compendium editor uploads a TEI encoded textfile to the Compendium server.
    2. The textfile is run through the TAPoR Extract Text tool to create an untagged textfile.
Changed:
<
<
    1. The untagged text file is used with XIndex to add document terms to the master index.
>
>
    1. The untagged text file is used with XIndice to add document terms to the master index.

    1. The TAPoRWare Tagger is used to find glossary items in a document and to tag them with references to the glossary item, in this specific case excluding the glossary term of the document itself.
    2. The Glossary item is merged into the master glossary document.
  • If it is a bibliographic entry...
Changed:
<
<
    1. Bibliographic entries are made directly into the MySQL database by the Compendium editor
    2. A flatfile export of the MySQL bibliography database is created and called bibl_data.txt.
>
>
    1. The editor verifies that the entry has not been previsouly entered.
    2. Bibliographic entries are made directly into the MySQL database by the Compendium editor
    3. A flatfile export of the MySQL bibliography database is created and called bibl_data.txt.

  • Images and Figures
Changed:
<
<
These are managed manually by the editor working with the system administrator.
>
>
    1. These are managed manually by the editor working with the system administrator.
    2. They are oded directly into the XML file and are stored in a figures directory for access by the compendium when serving an entry.

The code base for the Compendium exists as a collection of DTDs, XSL and Java routines that are bound to a series of frameworks for use with the Tomcat Servlet container.

Added:
>
>
    1. An archive of the complete tomcat directory is stored here.
    2. An explanation of the individual files and directories comprising the Compendium is available on this page?.

Output

When a user requests a document from the Compendium...
Changed:
<
<
  1. The server consults the index and retrieves the XML file from the directory,
  2. An XSL Transformation is applied to retrieve appropriate titles, glossary entries and this is output as an HTML document
  3. To the top of each HTML page is added three options: Print (display light HTML version), PDF (processed using FOP) and XML (which displays the source XML of the document)
>
>
  1. The server consults the index and retrieves the XML file from the appropriate directory.
  2. An XSL Transformation is applied to retrieve appropriate titles, glossary entries and these are is output as an fully formed HTML document.
  3. To the top of each entry page is added three options: Print (display light HTML version), PDF (processed using FOP) and XML (which displays the source XML of the document). These options are triggered via the appropriate .jsp file depending on the entry type.

Best Archiving Practise

Documentation

Line: 44 to 51

Printing

The actual code developed for serving this should be committed to archival grade paper and filed along with the content locally at McMaster library as well as the National Library of Canada. Additionally the UML diagram and linked documentation of the system model should accompany this printed material.

Digital Storage

Changed:
<
<
The source code itself will be committed... The Bibilographic database is currently contained in a MySQL? database and should be exported as a query file to allow for its replication. It should also be exported as an XML flatfile that would allow for its import into alternative data sources.
>
>
The source code itself will be committed via a tarball of the working directory as well as printout of the functions themselves. The Bibilographic and contributor database is currently contained in a MySQL? database and should be exported as a query file to allow for its replication. It should also be exported as an text flatfile that would allow for its import into alternative data sources.

To Do

Changed:
<
<
  • List of code files, what they do, what they interact with
  • File structure of code and content
  • Technical description with specifications and dependencies
>
>
  • List of code files, what they do, what they interact with
  • File structure of code and content
  • Technical description with specifications and dependencies?

Appendix

Changed:
<
<
-- ShawnDay - 19 May 2007
>
>
-- ShawnDay - 6 June 2008

META FILEATTACHMENT 800px-Uml_diagram.svg.png attr="" comment="" date="1179611418" path="800px-Uml_diagram.svg.png" size="36234" user="ShawnDay" version="1.1"

 <<O>>  Difference Topic PaperCode (r1.7 - 04 Jun 2008 - ShawnDay)

META TOPICPARENT ProblemOverview

Compendium Code Archiving

TOC: No TOC in "Main.PaperCode"

Changed:
<
<
The Globalization Compendium currently resides on a server running Linux at McMaster. It receives TEI encoded articles from the Compendium editor. See Content section for more info on the document's pre-server life. This document describes the nature of the code itself and best practises for archiving it.
>
>
The Globalization Compendium currently resides on a server running Linux at McMaster. The operating environment of the server is detailed here. It receives TEI encoded articles from the Compendium editor. See Content section for more info on the document's pre-server life. This document describes the nature of the code itself and best practises for archiving it.

Adding Documents to the Compendium

The way in which the data received is processed depends on the type of document being received.

 <<O>>  Difference Topic PaperCode (r1.6 - 14 Aug 2007 - GeoffreyRockwell)

META TOPICPARENT ProblemOverview

Compendium Code Archiving

Line: 47 to 47

The source code itself will be committed... The Bibilographic database is currently contained in a MySQL? database and should be exported as a query file to allow for its replication. It should also be exported as an XML flatfile that would allow for its import into alternative data sources.
Added:
>
>

To Do

  • List of code files, what they do, what they interact with
  • File structure of code and content
  • Technical description with specifications and dependencies

Appendix


 <<O>>  Difference Topic PaperCode (r1.5 - 24 May 2007 - GeoffreyRockwell)

META TOPICPARENT ProblemOverview
Added:
>
>

Compendium Code Archiving


TOC: No TOC in "Main.PaperCode"

The Globalization Compendium currently resides on a server running Linux at McMaster. It receives TEI encoded articles from the Compendium editor. See Content section for more info on the document's pre-server life. This document describes the nature of the code itself and best practises for archiving it.

Line: 12 to 14

    1. The untagged text file is used with XIndex to add document terms to the master index.
    2. The TEI-encoded text file is placed in a directory for nightly indexing via Lucene.
    3. The TAPoRWare Tagger is used to find glossary items in a document and to tag them with references to the glossary item.
Changed:
<
<
    1. Lucene walks the directory document by document extracting title, author, category, date and other meta data to add to the index.
>
>
    1. Lucene walks through the directory document by document extracting title, author, category, date and other meta data to add to the index.

  • If it is a glossary entry,..
    1. The Compendium editor uploads a TEI encoded textfile to the Compendium server.
    2. The textfile is run through the TAPoR Extract Text tool to create an untagged textfile.
Line: 20 to 22

    1. The TAPoRWare Tagger is used to find glossary items in a document and to tag them with references to the glossary item, in this specific case excluding the glossary term of the document itself.
    2. The Glossary item is merged into the master glossary document.
  • If it is a bibliographic entry...
Changed:
<
<
    1. Bibliographic entries are made directly into the MySQL? database by the Compendium editor
    2. A flatfile export of the MySQL? bibliography database is created and called bibl_data.txt.
>
>
    1. Bibliographic entries are made directly into the MySQL database by the Compendium editor
    2. A flatfile export of the MySQL bibliography database is created and called bibl_data.txt.

  • Images and Figures
These are managed manually by the editor working with the system administrator.

The code base for the Compendium exists as a collection of DTDs, XSL and Java routines that are bound to a series of frameworks for use with the Tomcat Servlet container.

Output

When a user requests a document from the Compendium...
Changed:
<
<
  1. The server consults the index and retrieves the text file from the directory,
  2. AN XSL is applied to retrieve appropriate titles, glossary entries and this is output as an HTML document
  3. To the top of each HTML page is added three options: print (display untagged text), PDF (sent using FOP) and XML (which displays the source of the document)
>
>
  1. The server consults the index and retrieves the XML file from the directory,
  2. An XSL Transformation is applied to retrieve appropriate titles, glossary entries and this is output as an HTML document
  3. To the top of each HTML page is added three options: Print (display light HTML version), PDF (processed using FOP) and XML (which displays the source XML of the document)

Best Archiving Practise

Documentation


800px-Uml_diagram.svg.png

Changed:
<
<
Best practises for code documentation specify that the lines of code be documented by the programmer. Ideally, this line by line documentation is written such that it can be compiled using a documentation generator such as JavaDoc? that can aggregate the inline documentation into and HTML or XML document to accompany the code proper.
>
>
Best practises for code documentation specify that the lines of code be documented by the programmer. Ideally, this line by line documentation is written such that it can be compiled using a documentation generator such as JavaDoc? that can aggregate the inline documentation into an HTML or XML document to accompany the code proper.

Additionally UML should be provided to abstract the Compendium as an object model. The individual objects in the UML should be linked to the actual code as developed. This can facilitate the sharing for the implicit logic of the code as written and also allow for cross propagation of the code base between alternate development environments. The technical specifications of the system should be compiled into a working document. Ideally this will also contain developer commentary to justify tool choices along with current version references, links to tool source code if possible.

 <<O>>  Difference Topic PaperCode (r1.4 - 19 May 2007 - ShawnDay)
Changed:
<
<
META TOPICPARENT ArchivePlan?
>
>
META TOPICPARENT ProblemOverview

TOC: No TOC in "Main.PaperCode"

The Globalization Compendium currently resides on a server running Linux at McMaster. It receives TEI encoded articles from the Compendium editor. See Content section for more info on the document's pre-server life. This document describes the nature of the code itself and best practises for archiving it.


 <<O>>  Difference Topic PaperCode (r1.3 - 19 May 2007 - ShawnDay)

META TOPICPARENT ArchivePlan?
TOC: No TOC in "Main.PaperCode"

The Globalization Compendium currently resides on a server running Linux at McMaster. It receives TEI encoded articles from the Compendium editor. See Content section for more info on the document's pre-server life. This document describes the nature of the code itself and best practises for archiving it.

Changed:
<
<

Input

>
>

Adding Documents to the Compendium


The way in which the data received is processed depends on the type of document being received.
  • If it is a research paper, position paper, or research summary ...
Added:
>
>
    1. The Compendium editor uploads a TEI encoded text file to the Compendium server.
    2. The text file is run through the TAPoR Extract Text tool to create an untagged text file.
    3. The untagged text file is used with XIndex to add document terms to the master index.
    4. The TEI-encoded text file is placed in a directory for nightly indexing via Lucene.
    5. The TAPoRWare Tagger is used to find glossary items in a document and to tag them with references to the glossary item.
    6. Lucene walks the directory document by document extracting title, author, category, date and other meta data to add to the index.

  • If it is a glossary entry,..
Added:
>
>
    1. The Compendium editor uploads a TEI encoded textfile to the Compendium server.
    2. The textfile is run through the TAPoR Extract Text tool to create an untagged textfile.
    3. The untagged text file is used with XIndex to add document terms to the master index.
    4. The TAPoRWare Tagger is used to find glossary items in a document and to tag them with references to the glossary item, in this specific case excluding the glossary term of the document itself.
    5. The Glossary item is merged into the master glossary document.

  • If it is a bibliographic entry...
Added:
>
>
    1. Bibliographic entries are made directly into the MySQL? database by the Compendium editor
    2. A flatfile export of the MySQL? bibliography database is created and called bibl_data.txt.
  • Images and Figures
These are managed manually by the editor working with the system administrator.

The code base for the Compendium exists as a collection of DTDs, XSL and Java routines that are bound to a series of frameworks for use with the Tomcat Servlet container.

Output

Added:
>
>
When a user requests a document from the Compendium...
  1. The server consults the index and retrieves the text file from the directory,
  2. AN XSL is applied to retrieve appropriate titles, glossary entries and this is output as an HTML document
  3. To the top of each HTML page is added three options: print (display untagged text), PDF (sent using FOP) and XML (which displays the source of the document)

Best Archiving Practise

Documentation

Added:
>
>

800px-Uml_diagram.svg.png


Best practises for code documentation specify that the lines of code be documented by the programmer. Ideally, this line by line documentation is written such that it can be compiled using a documentation generator such as JavaDoc? that can aggregate the inline documentation into and HTML or XML document to accompany the code proper. Additionally UML should be provided to abstract the Compendium as an object model. The individual objects in the UML should be linked to the actual code as developed. This can facilitate the sharing for the implicit logic of the code as written and also allow for cross propagation of the code base between alternate development environments.
Line: 25 to 45

The source code itself will be committed... The Bibilographic database is currently contained in a MySQL? database and should be exported as a query file to allow for its replication. It should also be exported as an XML flatfile that would allow for its import into alternative data sources.
Added:
>
>

Appendix


-- ShawnDay - 19 May 2007

Added:
>
>
META FILEATTACHMENT 800px-Uml_diagram.svg.png attr="" comment="" date="1179611418" path="800px-Uml_diagram.svg.png" size="36234" user="ShawnDay" version="1.1"

 <<O>>  Difference Topic PaperCode (r1.2 - 19 May 2007 - ShawnDay)

META TOPICPARENT ArchivePlan?
TOC: No TOC in "Main.PaperCode"
Changed:
<
<
The Globalization Compendium currently resides on a server. It receives TEI encoded articles from the Compendium editor. See Content section for more info on the document's pre-server life. This document describes the nature of the code itself and best practises for archiving it.
>
>
The Globalization Compendium currently resides on a server running Linux at McMaster. It receives TEI encoded articles from the Compendium editor. See Content section for more info on the document's pre-server life. This document describes the nature of the code itself and best practises for archiving it.

Input

The way in which the data received is processed depends on the type of document being received.
Changed:
<
<
* If it is a research paper, position paper, or...it is ..
>
>
  • If it is a research paper, position paper, or research summary ...

* If it is a glossary entry,.. * If it is a bibliographic entry...
Added:
>
>
The code base for the Compendium exists as a collection of DTDs, XSL and Java routines that are bound to a series of frameworks for use with the Tomcat Servlet container.

Output

Best Archiving Practise

Added:
>
>

Documentation

Best practises for code documentation specify that the lines of code be documented by the programmer. Ideally, this line by line documentation is written such that it can be compiled using a documentation generator such as JavaDoc? that can aggregate the inline documentation into and HTML or XML document to accompany the code proper. Additionally UML should be provided to abstract the Compendium as an object model. The individual objects in the UML should be linked to the actual code as developed. This can facilitate the sharing for the implicit logic of the code as written and also allow for cross propagation of the code base between alternate development environments. The technical specifications of the system should be compiled into a working document. Ideally this will also contain developer commentary to justify tool choices along with current version references, links to tool source code if possible.

Printing

Changed:
<
<
The actual code developed for serving this should be committed to archival grade paper and filed along with the content locally at McMaster library as well as the National Library of Canada. Additionally the UML diagram of the structure of the system should accompany this printed material.
>
>
The actual code developed for serving this should be committed to archival grade paper and filed along with the content locally at McMaster library as well as the National Library of Canada. Additionally the UML diagram and linked documentation of the system model should accompany this printed material.

Digital Storage

Changed:
<
<
The UML diagram of the codebase should refer to the specific files which will also be committed to archival quality DVD and filed with paper media above.
>
>
The source code itself will be committed... The Bibilographic database is currently contained in a MySQL? database and should be exported as a query file to allow for its replication. It should also be exported as an XML flatfile that would allow for its import into alternative data sources.

-- ShawnDay - 19 May 2007


 <<O>>  Difference Topic PaperCode (r1.1 - 19 May 2007 - ShawnDay)
Line: 1 to 1
Added:
>
>
META TOPICPARENT ArchivePlan?

The Globalization Compendium currently resides on a server. It receives TEI encoded articles from the Compendium editor. See Content section for more info on the document's pre-server life. This document describes the nature of the code itself and best practises for archiving it.

Input

The way in which the data received is processed depends on the type of document being received. * If it is a research paper, position paper, or...it is .. * If it is a glossary entry,.. * If it is a bibliographic entry...

Output

Best Archiving Practise

Printing

The actual code developed for serving this should be committed to archival grade paper and filed along with the content locally at McMaster library as well as the National Library of Canada. Additionally the UML diagram of the structure of the system should accompany this printed material.

Digital Storage

The UML diagram of the codebase should refer to the specific files which will also be committed to archival quality DVD and filed with paper media above.

-- ShawnDay - 19 May 2007


Topic: PaperCode . { View | Diffs | r1.8 | > | r1.7 | > | r1.6 | More }

Revision r1.1 - 19 May 2007 - 16:45 - ShawnDay
Revision r1.8 - 06 Jun 2008 - 19:26 - ShawnDay