List Words to Identify Themes
This exercise uses
this Recipe to identify simple themes within a sample text.
It applies a recipe to real textual example which is freely available on the Internet so you can do the steps yourself and see the results.
Exercise Steps
- This exercise uses Volume 2 of Thomas Macaulay's History of England which can be downloaded from Project Gutenberg.
- Run the TAPoR List Words Tool to generate a word list sorted by frequency. The result should resemble the following:
| Word | | Count |
|---|
| The | ------ | 3591 |
| Of | ------ | 2057 |
| And | ------ | 1360 |
| To | ------ | 1234 |
| A | ------ | 850 |
| Was | ------ | 848 |
| In | ------ | 758 |
| Had | ------ | 686 |
| Been | ------ | 265 |
| Be | ------ | 255 |
| Not | ------ | 246 |
| At | ------ | 240 |
| On | ------ | 213 |
| From | ------ | 212 |
| Who | ------ | 201 |
| They | ------ | 187 |
| Their | ------ | 174 |
| All | ------ | 153 |
| King | ------ | 139 |
 | The most frequently used words are function words such as 'The', 'A', etc. They don't appear to be particularly unique, so we decide to eliminate common function words. |
- Run the TAPoR List Words Tool again, applying a list of words to exclude from the list. One useful stop list, the Glasgow stop words list, is available here. The result should be similar to:
| Word | | Count |
|---|
| King | ------ | 139 |
| Great | ------ | 115 |
| Parliament | ------ | 92 |
| England | ------ | 86 |
| House | ------ | 83 |
| Men | ------ | 81 |
| Time | ------ | 75 |
| Government | ------ | 74 |
| Charles | ------ | 73 |
| Power | ------ | 68 |
| Party | ------ | 66 |
| Public | ------ | 59 |
| Years | ------ | 57 |
| France | ------ | 56 |
| Long | ------ | 56 |
| English | ------ | 55 |
| Court | ------ | 54 |
| Commons | ------ | 53 |
| State | ------ | 52 |
| Church | ------ | 51 |
| New | ------ | 46 |
| Man | ------ | 46 |
| Country | ------ | 46 |
 | The list of frequent words is now more intriguing. Words such as : King, Great, Parliament, England, House, Men, Time, Government, Charles, Power, Party, Public Years, Just immediately stand out. |
- Thus, with one simple list words tool you can easily identify the themes of power, monarchy, the common man and time in Macaulay's History of England.
Next Steps/Further Information
--
ShawnDay - 3 November 2006