TAToo Technical Details
As TAToo is developed, code that may be found useful will be posted here. Note that it isn't the most elegant code; consider it a base to improve upon. Any improvements or suggestions? Email Peter at
organisciak@
Ignore this text.gmail.com
Regular Expression for Splitting Words
This is the Regex used in version one of TAToo's local list words.
/\b([\w]+)([-]?)([\'\w]*)\b/g
Note that Actionscript's of "/w" (word characters) is pretty bad, and does not include accented characters. Therefore, what TAToo uses (and what would be needed in other Actionscript applications) is hardcoding of special characters. Below is an example with é and ç.
/\b([\w|é|ç]+)([-]?)([\'\w|é|ç]*)\b/g
The full list of special characters that TAToo counts as being within words is the following:
à|á|â|ã|ä|å|æ|ç|è|é|ê|ë|ì|í|î|ï|ð|ñ|ò|ó|ô|õ|ö|ø|ù|ú|û|ü|ý|ÿ
Regular Expression for Cleaning HTML
TAToo takes a webpage, removes the tags and comments, and analyzes the text that's left over.
//Note: due to complexity of regular expression it is iteratively broken up and commented below
var cleaningExpression = "<head\\b[^>]*>(((?!<\/?head).)*)<\/head>" //remove head tag and content
+"|" +"<(script|style)\\b[^>]*>(((?!<\/?(script|style)).)*)<\/(script|style)>" //remove script/style tags and their content
+"|" +"<\/?\\w+((\\s+\\w+(\\s*=\\s*(?:\".*?\"|'.*?'|[^'\">\\s]+))?)+\\s*|\\s*)\/?>" //remove any html tags
+"|" +"(\\<|<)!--\s*.*?\\s*--(\\>|>)" //remove HTML comments
+"|" +"&(nbsp);";//remove non breaking stops
var removeTagsExp:RegExp=new RegExp(cleaningExpression,"gis");
Calculating Difference Between Hex Colours (or "Hex to Bits to Decimal to Hex")
TAToo's word cloud calculates the difference between the background color and white, and colors words based on the scale between these two colors. Since alpha doesn't work for HTML text (as the word cloud runs on), it runs on the functions below.
Converting HTML 'Hex' String to AS Hex Format
function cssToHex (input) {
var pattern:RegExp=/\#/g;
input=input.replace(pattern,"0x");
return input;
}
Converting An Integer to Hex
By way of
this website.
function d2h( d:int ) : String {
var c:Array = [ '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' ];
if( d > 255 ) d = 255;
var l:int = d / 16;
var r:int = d % 16;
return c[l]+c[r];
}
Bitwise Functions to Average Colours
function diffColors (startColor, endColor, increments, currentIncrement) {
startColor = cssToHex (startColor); //converts #123456 format to the 0x123456 hex format used by AS
endColor = cssToHex (endColor);
var rDifference = (endColor >> 16 & 0xFF) - (startColor >> 16 & 0xFF);
var gDifference = (endColor >> 8 & 0xFF) - (startColor >> 8 & 0xFF);
var bDifference = (endColor & 0xFF) - (startColor & 0xFF);
var rIncrement = Math.round(rDifference / increments);
var gIncrement = Math.round(gDifference / increments);
var bIncrement = Math.round(bDifference / increments);
var currentR = (startColor >> 16 & 0xFF)+(currentIncrement * rIncrement);
var currentG = (startColor >> 8 & 0xFF)+(currentIncrement * gIncrement);
var currentB = (startColor & 0xFF)+(currentIncrement * bIncrement);
return ("#" + d2h(currentR) + d2h(currentG) + d2h(currentB)) //spits out an HTML-like #123456 format
}
TAToo sends
increments the count of the highest occurring word and
currentIncrements is given the count of the word currently being colored. Thus, the highest occurring word becomes the end colour (in TAToo's case, white).
CSS Styling
TAToo used to support CSS styling. This has been deprecated in favour of a simpler PARAM based styling. However, for posterity, this is how that worked.
Defining Stylesheet location in HTML
<param name="fstylesheet" value="style.css" />
Finding Stylesheet location in Actionscript 3
import flash.external.ExternalInterface;
if (ExternalInterface.available) {
styleSheetLocation = ExternalInterface.call('document.getElementsByName("fstylesheet")[0].value.toString');
}
Loading Stylesheet
The defined CSS was loaded like any other web page, with
cssLoaded as a listener for the it is loaded.
if (styleSheetLocation!=null) {
var cssUrl:URLRequest=new URLRequest(styleSheetLocation);
var cssLoader:URLLoader = new URLLoader();
cssLoader.load(cssUrl);
cssLoader.addEventListener(Event.COMPLETE,cssLoaded);
}
Interpreting CSS Definitions
In the partial snippet below, the "background-color" definition from the
TAPORcategoryHeader class is parsed from "#000000" format to the hexadecimal "0x000000" integer preferred by Actionscript.
function cssLoaded(event:Event):void {
var css:StyleSheet = new StyleSheet();
css.parseCSS(URLLoader(event.target).data);
var pattern:RegExp=/\#/g;
var categoryHeaderCSS:Object=css.getStyle(".TAPORcategoryHeader");
var str=categoryHeaderCSS.backgroundColor;
str=str.replace(pattern,"0x");
var str2=parseInt(str,16);
catColor=str2;
}
Notes
For more information or to try this feature, try r16 or earlier at TAToo from the
code repository.
--
PeterOrganisciak - 23 Sep 2009