Add a French language Text to TAPoRThis recipe takes a French language text and adds it to the TAPoR workspace for textual analysis. This recipe ensures that the fundamental task of loading text into a text analysis environment is accomplished correctly. For proper analysis, the text must be interpreted by the computer in the same way in which you enter it, including accented characters. There are a variety of ways in which text can be encoded by operating systems and applications during text entry and storage. This recipe will ensure that your text has been entered and encoded properly for analysis and that you can enter search terms and parameters from your browser to complete analytical tasks. Ingredients
This recipe is applied to a sample text in Exercise to Add a French language Text to TAPoRSteps
DiscussionText EditorsYou may require a text editor to encode your text into UTF-8 or Latin-1 to maintain the accents and special characters in the textual language. On a Windows system, this can be done through NotePad and under Macintosh OSX through TextEdit. On Unix-based systems, you will find a text editor installed as part of the standard system install. Word processors typically provide a much deeper tool set for formatting text and generally save documents in their native format which is not appropriate for importing into a text analysis environment. However, they too can be used to save a plain text file with appropriate encoding by following the appropriate steps.
Web Page EncodingTo verify that the web page that you wish to import into TAPoR is encoded in either UTF-8 or Latin-1, you need to check the browser settings. In Internet Explorer, simply go to the View Menu and select the Encoding Option. This should read Unicode (UTF-8). On Firefox, the option is Character Encoding under the View menu. This should also read Unicode (UTF-8). If this is not the case, then you can manually select the encoding you wish to use from this menu. On other web browsers, the process should be similar. Please consult their help files for specific instructions on character encoding. If you view the page source for your web page, it may contain the HTML line:"<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>"Which will indicate that it is encoded properly for text analysis. Glossary
Next Steps/Further Information
| |