Skip to content.

Find topic

Web tools

Help

Tools

       Analysis Tool Bar  +

Optical Character Recognition and Using Electonic Texts

This page is a script for a workshop at McMaster University on how to:

  • scan docutments,
  • how to use OCR software to create,
  • and what you can do with an electronic text.

Scanning a Document for OCR- OCR Software Initiated

1. Turn on the Scanner- Turn on Microtek Scanner (wait for the system to recognize the hardware)

2. Place Document- Place document to be scanned and read face down towards the back edge of the scanner

3. Launch Abbyy- Launch Abbyy Finereader 7.0 Professional Edition found in Start menu

4. Initiate Scan- At the top of the Abbyy Finereader workspace are 4 buttons that will be used to conduct the OCR scan; the first that we will use is the ‘scan’ button
• Click the ‘scan’ button in Abbyy Finereader to initiate a preview of the document
→ this action will launch the Microtek ScanWizard control panel in a window entitled ABBYY ScanManager 7.0

5. Adjust Scan Settings- Maximize the Control Panel (ABBYY Scan Manager) appearing on the task bar and adjust the settings appropriate to an OCR scan

• Original: Select ‘Text Document'
• Scan Type: select ‘Gray’ for most documents
• Purpose: select ‘OCR Text’ to set an appropriate DPI resolution

6. Scan- In the Microtek ScanWizard Control Panel Window click on the 'Scan'- button
→ When the Microtek scanner has completed the scan of your document it will automatically appear in the 'image' window of the ABBYY Finereader workspace

7. Resume- Close the ABBYY ScanManager (control panel) window and resume OCR process using ABBYY Finereader


Scanning a Document for OCR- ScanWizard (Microtek Software) Initiated

1. Turn on the Scanner- Turn on Microtek Scanner (wait for the system to recognize the hardware)

2. Place Document- Place document to be scanned and read face down towards the back edge of the scanner

3. Launch Scan Wizard 5- Open Microtek's ScanWizard 5 (shortcut on desktop)
→ the software will automatically initiate a preview of your document

4. Adjust Scan Settings- In the Scan Wizard control panel, adjust the settings appropriate to an OCR scan

• Original: Select ‘Text Document'
• Scan Type: select ‘Gray’ for most documents
• Purpose: select ‘OCR Text’ to set an appropriate DPI resolution
5. Scan Use the 'Scan to' button at the top of the control panel to select a destination to save your document
NOTE: The resulting document is still in an image (tif) format, and has not been converted into an editable text file. The Optical Character Recognition can be conducted by opening the document in either ABBYY Finereader or OmniPage Pro.

OCR with ABBYY

1. Launch Abbyy- Launch Abbyy Finereader 7.0 Professional Edition found in Start menu

2. Open image document- In ABBYY Finereader, open the document to be converted to text using ABBYY Finereader Optical Character Recognition Software
→ You have the option of either browsing among files already saved to your computer or proceeding to scan a hard copy directly into ABBYY

• Scan Document: To continue by scanning a paper document into the software, proceed by following the steps outlined above in Scanning a Document for OCR- OCR Software Initiated

3. Workspace View- Once the file has been opened/scanned to ABBYY Finereader, the default view will show an image of your document on the left side of the screen in a frame entitled 'Image'
→ This is not an editable text file, it is simply a picture (image) of the text

4. Read- Conversion of the image file requires us to instruct the software to 'Read' the image. This process is where the actual OCR takes place. In ABBYY Finereader, at the top of the workspace, click the ‘Read’ button to create a text document of the scanned image document
→ Once the Optical Character Recognition, or 'Read' process is complete, Another view of the document will appear in your workspace beside the 'Image' . This is an editable, searchable text file.

5. Check Spelling - Check Spelling to perform a spell check on the new text document, click on the ‘Check Spelling’ button at the top of the Abbyy Finereader workspace

6. Save - Save your text file by clicking the ‘Save’ button.
→ The file will remain an editable text document that can be opened in other text editors or reopened in ABBYY or OmniPage for searching and batch editing.

Editing Electronic Texts with ABBYY

Searching

1. Advanced Search window - Click on ‘Edit’ in the main menu bar and select ‘Advanced Search’ in the dropdown menu

2. Find Word - At the top of the page will appear an advanced search input field, enter the word or phrase (string of text) to be searched, click the ‘Search’ button to perform search

3. Search Page - to the right of the search input field will appear {Page # dd/mm/year …sentence fragment containing search string… }
→ Click on the ‘Page #’ to find all instances of the word/phrase, if there is more than one batch (page) in your workspace, you will be prompted by a dialog window to search other pages (batches)

Replacing

1. Open ‘Replace’ dialog box - To replace one or more instances of a word or string of text, select ‘Replace’ in the dropdown menu under ‘Edit’ on the main menu bar

1. Input Text - Text Two input fields will appear in the ‘Replace’ dialog box; in the ‘Find what’ field, enter the text to be searched and in the ‘Replace with’ field enter the text that will replace the original string

• To scroll through and replace instances individually, click the ‘Replace’ button for each instance
• To replace all instances of the string click ‘Replace all’
NOTE: If any pronoun is searched, for example, ‘me,’ every pronoun appearing within the document (ie. Me, you, he, she, etc) will be picked up and highlighted, however, when instructed to ‘Replace all,’ the program selects the original term entered in the search

What you can do with an electronic text

Creating a PDF library of electronic texts for your research.

Creating text files so that you can search and analyze texts

Using TAPoRware and HyperPo to analyze a text

Tagging texts

Other types of support you can get

-- GeoffreyRockwell - 06 May 2005


Use this box to quickly add a comment to the page.

more options...