Skip to content

WordPress, nginx, and Dreamhost

So I found the blog to be slow, really slow.

I checked my memory and cpu usage and it seemed all normal. The command ps -eF was useful for listing processes, but my host also has a graph of used resources and it all looked well within my limits.

I learned about apache benchmark and ran some tests such as ab -n 100 -c 5 http://www.mythiclogos.com/ and found that it would take between 1-2 seconds for the initial page to get back. From the server itself, the benchmark dropped to much less, but still high, I thought.

I tried some caching plugins. I have eventually settled on WP Super Cache which is a plugin I understand.

But it still was slow. I suspected Apache was being slow for some reason. I know that it can get crippled under heavy loads, but I have no heavy loads. Still, my host offers a drop-down box for switching the web server. So I tried nginx. It took me a day to figure out how it worked, how to get my wordpress sites to work with it using WP Super Cache redirects, and all that. And now I find the performance of my sites seem to be exceedingly fast.

The benchmark tests seem to suggest it is twice as fast, but I think that is an underestimate. Before when loading pages I would count to 6 before I had a page. Now, many of them are in a count of 2 or 3. But that is when I use the ones with cookies on them. When I clear the cookies, so that the blogs think I am just some guy visiting, there is no count. It seems instantaneous. That’s awesome.

The difference is that WP Super Cache serves cached pages to unknown visitors, but not to the known ones. As the maintainer, I can stomach a second delay to make sure that I have the most recent stuff and nothing weird happens. But to others, having a zero second delay is incredible. So WP Super Cache makes compressed html pages of the semi-dynamic pages. Some of the comments could get out-of-date, but whatever. It makes the primary page loads wicked fast which is what I care about. I imagine one could add some ajaxy thing to check for new comments and updates. I am sure there are already plugins for that.

I should note that nginx helps immensely with static html pages. I think it can work under tremendous loads without breaking while Apache crawls to a halt. But it does not help too much with PHP. I think it does keep the processes persistent and maybe the useful scripts in memory???, but for the most part, PHP is a separate process using FastCGI and nginx has nothing to do with it. It just does the pre-screening and serves static html pages.

One thing that I think is broken is the stats trackers. I believe they work by throwing in some PHP processing code that logs the users that visit. Since PHP is never touched, it can’t log that. One fix could be to do some sort of fake-download of an image or an ajax call to track the visitor. Not sure. It is a shame though as I have enjoyed seeing stats of visitors from around the world.

Anyway, the main code to get nginx working I got from a post about nginx+WP Super Cache. I added an extra line to deal with possibly a missing slash at the end:

location /blog/ {
# enable search for precompressed files ending in .gz
# nginx needs to be complied using –-with-http_gzip_static_module
# for this to work, comment out if using nginx from aptitude
gzip_static on;

# if the requested file exists, return it immediately
if (-f $request_filename) {
break;
}

set $supercache_file '';
set $supercache_uri $request_uri;

if ($request_method = POST) {
set $supercache_uri '';
}

# Using pretty permalinks, so bypass the cache for any query string
if ($query_string) {
set $supercache_uri '';
}

if ($http_cookie ~* "comment_author_|wordpress|wp-postpass_" ) {
set $supercache_uri '';
}

# if we haven't bypassed the cache, specify our supercache file
if ($supercache_uri ~ ^(.+/)$) {
        set $supercache_file /blog/wp-content/cache/supercache/$http_host/$1index.html;
}

if ($supercache_uri ~ ^(.+[^/])$) {
        set $supercache_file /blog/wp-content/cache/supercache/$http_host/$1/index.html;
}


# only rewrite to the supercache file if it actually exists
if (-f $document_root$supercache_file) {
rewrite ^(.*)$ $supercache_file break;
}

# all other requests go to WordPress
if (!-e $request_filename) {
rewrite . /blog/index.php last;
}
}

location / {
#### fill with exactly same as above
#### this is for those who have wordpress installed in one directory (blog/ above), but serving it from say the root directory. 
#### seems to work fine. 
}

Now, I find this to work out very well. The nginx wiki suggests that the if statements are evil because it is not an actual programming if statement. One cannot nest them and apparently their behavior is unpredictable from a novice point of view. Still, the above worked out well. It might be improved upon, but if something is working, well, leave it alone.

I also very much enjoyed the reloading of nginx. It was a single command that restarted it instantly and my changes were live. I also enjoyed that I could read and almost understand the configuration script. Compare that with some crazy apache rewrite rules that make me want to run away in terror.

There is much more to configuring nginx, but my host, Dreamhost, already had it all done and had it setup so that I can configure the domain-specific bits for each domain in my own directory. No need to mess around with the full config. It is a pretty sweet setup. I should mention that I have the VPS (Virtual Private Server) for $15 a month. It may be a little expensive for basic hosting, but man I enjoy having command lines and the ability to blow things up at will. They have a nice “Reboot” and a “Reset” option for the private servers that I have used often when experimenting.

In conclusion, caching and nginx leads to super fast performance with the supposed promised ability to withstand the deluges of a day of crazy hits in case a post becomes wildly popular.

Tagged

Pratt parsing 2; same level associativity

Version 4 has an error in it.

Try parsing 4^3^2. Parser 4 gives (4^3)^2 = 4096. Google gives 262144 as does Parser 5alpha. They parse it as 4^(3^2).

So how does Pratt parsing handle this? First, note that it is only an issue for operators with the same binding power. Different binding powers are there exactly to deal with implied associativity.

Parser 4 has this behavior because it keeps chugging along until the next token’s binding power is smaller than the previous one or the same. Thus, when it encounters something of its own level, it stops and deals with what it has already before moving on.

This is what we want for something like 7-3-4= (7-3)-4 not 7-(3-4). But as noted above, the power operator is not of that kind. So what do we do?

Well, when dealing with infix operators, we pass a binding power to the expression function to tell the stuff on the right when to stop. That stuff stops when the passed in power is greater than the right power. So we need to pass in a smaller power so that the process keeps going. That is, instead of passing a binding power of 20, we pass one of 19. Thus when it encounters one of its own, it compares 19 to 20 and decides to keep going.

See Wiki article on operator associativity.

Tagged ,

Parsing Version 4

Skipping version 3 entirely (let’s just not go there, shall we?), I have now a version 4 on the web. It currently does arithmetic without parentheses.

But what it does do is take the arithmetic expression(s) and parse them into trees, bottom-up with the computed values in the nodes and operations above. The graphics are not pretty just yet, but it works and it is completely in the browser–no server-side shenanigans. This is all thanks to jsxgraph; they just saved me months of work, work which, unlike parser writing, I had no desire to do.

Below the graph is a bunch of strange looking code–this is text that can be fed into graphviz for a different graph view.

The really nice thing about this version is that I define the grammar and behavior write in the index.html file. I have achieved separation of the general parser stuff from the behavior of the language I want to deal with.

This is the definition of arithmetic:

    var simpleParSet = {symbolDefinitions:{
                "+": {types:{infix:20}},
                "-": {types:{infix:20}},
                "*": {types:{infix:30}},
                "/": {types:{infix:30}},
                "^": {types:{infix:40}},
                "\u00AC": {types:{symbol:0}}
                }
    };
    
    var par = prattCrockford.makeTree(simpleParSet);

And to make a tree, the call is

tree = new par({str:"3+7*5^2"});

And that’s it. Tree defined. It has no behavior.

To add behavior, I have a seeder method that can install behavior into the nodes. This can be done globally across all parsers, across all trees for a language, or for a single tree. With prototype chaining this is easy; I call seeder at the level I want even though it is defined at the bottom of the chain. Awesome.

Once seeder is done, you can then call walker to walk the tree or shooter to walk the parse order. And that’s it.

Here is an example of the use of the seeder function to seed all of this language’s trees:

    var makeCompute = function (fun) {
        return function () {
            this.computedValue = fun (this.nodes.left.compute(), this.nodes.right.compute()); 
            return this.computedValue;
        }
    };
    par.prototype.seeder({
        "(number)" : function(sym) {
             sym.prototype.compute = function () {
                  this.computedValue = this.value; 
                 return this.value;};
        },
        "+" : function (sym) {
                sym.prototype.compute = makeCompute(function (l,r) {return l+r;});
        },
        "-" : function (sym) {
                sym.prototype.compute = makeCompute(function (l,r) {return l-r;});
        },
        "*" : function (sym) {
                sym.prototype.compute = makeCompute(function (l,r) {return l*r;});
        },
        "/" : function (sym) {
                sym.prototype.compute = makeCompute(function (l,r) {return l/r;});
        },
        "^" : function (sym) {
                sym.prototype.compute = makeCompute(function (l,r) {return Math.pow(l,r);});
        },

        "\u00AC" : function(sym) {
                sym.prototype.compute = $.noop;
        },
        "(end)" : function(sym) {
             sym.prototype.compute = $.noop;
        }
        },
        function (sym) {
            sym.prototype.compute = function () {this.computedValue = NaN; return this.computedValue;}
        }
    );

Notice how I created a helper function makeCompute that essentially reduces boilerplate code and allows one to specify just the differing behavior. All sorts of tricks like this can be used to make this more efficient. I also have a couple of $.noop’s for symbols that should not be processed (end of lines, for example). I will probably make a common convention to deal with them so that that boilerplate can be removed. I may remove them from the trunk of the tree altogether and only leave them in the source token stream.

Anyway, now that we have the seeder function, now what?

        tree = new par({str:"3+4\n7*5^2+1});
        $.log(tree.walker("compute"));

The return value will be an array of the computed values of the different arithmetic expressions in the input: [7, 176]

And that is how to write a parser. I can continue to add behaviors of all kind, using the behavior data I already generated, and even remove behaviors later if I want or overwrite. I have complete flexibility and power on how to make my trees behave.

Now I need to add parsing behaviors.

Tagged ,

Pratt Parsing 1

So I thought I would start a series on how Pratt parsing works. This is actually to help me understand this 3 months from now. This time, we will look at parsing -7+3*4 + 9

We first tokenize getting -,7, +, 3, *, 4, +, 9

One can use a regex for tokenizing such as

/(?:([a-zA-Z][a-zA-Z0-9]*)|((?:[0-9]*\.[0-9]+|[0-9]+)(?:[E][\-+]?[0-9]+)?)|([^a-zA-Z0-9\s]))/g

and a word match is in the first, a number in the second, and other symbols in the third (one character taken). You can use this in a stream like fashion where each token is taken one at a time. A whole list of tokens is not needed prior to parsing and there is good reason not to (foreign code parsing or comment parsing, for example).

The expression code is something like:

function expression (bp) {
   var t = node; //node is a global variable containing current node; we are about to get a new one.
   loadNewNodeFromTokens; 
   left = t.nud();  
   while (rightBindingPower < node.leftBindingPower) {
      t = node; 
      left = t.led(left); 
   }
}

nud basically creates a representation of the object as a standalone. For a number, it creates a number token. For a prefix operator, such as negation, it will actually parse an expression that sucks up the next one and returns a suitable node for further processing.

led takes what is to the left, created by the nud, and then does something, say get a right node to combine for infix operators such as addition

Once we have the token stream, we start processing.

  1. First up -. This is a prefix. It has a nud that says, parse expression with bp = 30 (high bp).
  2. The 7 is seen next with a nud that returns it.
  3. + is parsed. Its bp is 10. So the loop does not commence and 7 is returned up to the nud of negation
  4. Negation is now done. It takes 7 and negates it. + is still the next node to deal with.
  5. + has a bp of 10 and the rbp is 0. So loop commences.
  6. Next symbol parsed is node= 3 and + is stored in t.
  7. We process + led with the left of (-7) being passed in. In that led, it calls expression with a rbp of 10.
  8. 3 is processed at the start of the +’s right expression parsing. It returns itself and * is up next.
  9. * has a binding power of 20. This is greater than 10 and so we enter the loop
  10. node is 4, we do an led of *’s passing a bp of 20 to the expression.
  11. 4 is now nud’d. The next node is + whose bp = 10 < 20 of the *. We do not process the loop and we return 4
  12. 4 is now passed to * as the right node.
  13. * returns a node structure like *(3, 4) and this goes back to the + level
  14. Does the + loop iterate? The next token is a +. So it will not iterate with strict inequality. With equality, it would and this choice will change whether the node structure is +(+(-7, *(3,4)),9) or +(-7, +(*(3,4), 9). Both are equivalent because of the associativity of addition, but the first is what comes with strict inequality, I believe.
  15. So at this point we have everything done to the left of the second plus. It is now contained in the left variable in the first, outer expression loop (bp =0)
  16. We process the + and grab out the 9 into the right.

So ends pratt parsing 1

Tagged ,

Stop infinity with the power of $.maxStop()

Tired of infinite loops? Infinite recursion got you stuck? Tired of introducing meaningless count variables just beat the scourge of an unresponsive script?

Then I’ve got the tool for you. Hot off the assembly lines, tested thoroughly in rugged environments across the landscape, $.maxStop() will stop the horizontal eight in its tracks!

To use:

if ($.maxStop() ) {return;}  //in a loop replace return with break

Kills recursion and infinite loops.

Here is the code for the plugin:


$.maxStop = $.maxStop || (function () {
    var count, max;
    var defaultMax = 1000; 
    ret = function (numMsg) {
        if (typeof numMsg === "number") {
            if (numMsg >0) {
                if (numMsg !== max) { //
                    count = -1; 
                    max = numMsg;
                }
            } else if (numMsg <0 ){
                count += numMsg;
            } else {
                if (max !== defaultMax) {
                    count = -1; 
                    max = defaultMax;  //0 resets to default max. 1 is minimal count
                }
            }
        } else if (typeof numMsg === "string") {
            $.log(numMsg, ":", count, "/", max); 
        }  else if (numMsg) {
            $.log(count, "/", max, numMsg);
        }
        count += 1;
        if (count <= max) {
            return false;
        } else {
            return true; 
        }
    };
    return ret;
})();

So the function takes in a number, a string, else, or ignores. A number will set the maximum and if it actually changes, reset the count. What’s great is that you can use this one statement as demonstrated below in a repeating situation and it won’t reset. So in a single statement, you can set how much iteration you can tolerate and check whether it has been done.

The string will put it in debug mode. It will print current count and a message. You can output any object state by stringifying; see example.

Anything else will also put in debug and prints to console the object for clicking through and examining.

Each call will add 1 to the counter–hence resetting counter to -1. And then return true if stoppage is needed, false otherwise.

Here are some examples:

$.stringify = $.stringify || function (obj) {  //encapsulate for older browsers not to choke without JSON.stringify method
    if (JSON && JSON.stringify) {
        return JSON.stringify (obj);
    } else {
        return "no JSON; upgrade broswer!";
    }
}

$.tests.load("$.maxStop", [
    function () {
        $.maxStop(10); 
        var localCount = 0; 
        while (1) {
            if ($.maxStop($.stringify({localCount:localCount})+"loop going")) {$.maxStop("loop will terminate"); break;}
            localCount += 1;
        }
        return ["while", localCount, 10];
    }, 
    function () {
        var count = 0;
        var f = function me () {
            if ($.maxStop(0)) {$.maxStop("recursion stopping"); return;} 
            count+= 1; 
            me(); 
        } 
        f(); 
        return ["recursion", count, 1001];
    }, 
    function () {
        $.maxStop(30); 
        var count = 0;
        var f = function me () {
            if ($.maxStop()) {$.maxStop("recursion"); return;} 
            count+= 1; 
            me(); 
        } 
        while (1) {
            if ($.maxStop()) {$.maxStop("loop"); break;}
            f(); 
            count += 1;
        }
        $.maxStop(0); //restores it for actual use later. 
        return ["while and recursion", count, 30, "this shows the same maxStop is being used"];
    }, 
    
]);

So enjoy and be careless! $.maxStop() has got your back.***

***not good to leave this in actual code, particularly for loops/recursions that could be huge. It should be an unnecessary function call in production code.

Tagged ,

tokenizer prototyped

So I refactored my tokenizer into a prototype construct. I am not sure if I did it completely optimally, but I think it is reasonable.

Almost all of the properties are generic to all tokenizers in my vision, so that is encapsulated in the MasterTokenizer function. The prototype has all the methods that tokenizers need. The MasterTokenizer creates functions that will serve as tokenizer templates for a given parser, so a ParserTokenizer. It prototype is defined by

ParserTokenizer.prototype = new MasterTokenizer(longMatch);

This keeps the prototype chain alive. So when we create a tokens using a string, the prototype will be that of MasterTokenizer. But the initialization is done by ParserTokenizer.

The longMatch is the bit that differentiates different ParserTokenizers. This is the same set of symbols for the same parser, but varies for different parsers. It represents symbols that have more than one character such as // or +=. Long symbols not in longMatch will be broken up into one character symbols.

The odd bit is that I needed to introduce another function that actually makes the ParserTokenizer. I couldn’t figure out how to get the prototypes to work out with the different initializations without. Perhaps there isn’t a way. So the bit of code I find interesting is :

   pcf.MasterTokenizer = function (longMatch) {
         longMatch = longMatch || {};
         this.longMatch = longMatch; 
    }; //this is the basic prototype object constructor for a parser type. Note that the "this" is the prototype for ParserToken
    
    pcf.MasterTokenizer.prototype = { 
          //lots of properties for tokenizing
    };

     pcf.makeParserTokenizer = function(longMatch) {
        var ParserTokenizer = function (stringToParse,startIndex) {
            if (typeof stringToParse !== "string") {stringToParse = "";}
            this.str = stringToParse; 
            this.ws(); 
            this.match = {};
            this.startIndex = startIndex || 0; //startIndex tells reg where to start parsing. 
            //this is the token generating object and is now ready to parse out. 
        }; 
        ParserTokenizer.prototype = new pcf.MasterTokenizer(longMatch);
        return ParserTokenizer;
     };
    
    pcf.DefaultTokenizer = pcf.makeParserTokenizer({});
    var TestTokenizer = pcf.makeTokenizer({"+": ["+", "+-"]});
    tokenStream = new TestTokenizer("3+4*!2+-4" )  //result: [3, "+", 4, "*", "!", 2, "+-", 4]

The last lines show how to use it. The third to last line is how to make a default tokenizer with no long symbols. The next line is one with a long symbol: +- and the last line shows the creation of a new stream of tokens. Note longMatch is part of the prototype of ParserTokenizer.

So in summary, creating a new object as the prototype of something else gives that the chaining prototype behavior. This allows a hierarchy of dynamic inheritance. My mind is taking its time to understand this. And it is hard to come up with good names with this stuff.

Tagged ,

prototyping code

So I am yet again refactoring my code. I hope this is the last time and then I can get into adding behavior.

I did manage to separate all the behaviors and now have seeders, walkers, and shooters working in version 3. But I broke my index page; it is all in Firebug debugging. So waiting until version 4 for the next release.

But as I was doing this, I noticed that I kept passing the parser or the tree around and wondering whether they were in scope when I needed them. And getting confused as to which one I needed. To help highlight what I needed, I decided to make these methods of the objects. And now with my insight of prototype, I decided to attach them to the underlying constructor. The only problem was, well, there was no constructor.

And then it became clear to me that a lot of the fog of my code seems to be coming from that. What prototyping allows one to do is separate the process from the details. And I failed to do that consistently. And so it becomes a mess.

Take as example the tokenizer object. I have struggled with how to make it customizable within the code. So I was making a tokenizer for each parser and passing in options. I think it is fine to still pass in options for initialization, but it might also be nice to switch out relevant components on the fly.

So by putting generic default behavior in the prototype, we can do exactly that. And furthermore, we can overwrite it locally for a short time and then remove it with the delete keyword.

As an example:

var f = function () {};
var g = new f();  //g now uses f.prototype for inheritance. g is an empty object except for that. 
f.prototype.say = "hi"; 
console.log(g.say); //hi
g.say = "bye"; 
console.log(g.say); //bye
g.say = undefined; 
console.log(g.say); // undefined in gray box
g.say = "back";
console.log(g.say); //back
delete g.say; 
console.log(g.say); //hi

So in our tokenizer, we can have a property for applying the regex as a prototype. This could chunk words, numbers, special characters. But then we hit a comment such as //whatever !!$#!@#$

This comment should be slurped up without pause until the end of the line. Now one can just replace the regex and the next token will be the comment string. No fuss, no muss. And when done, we delete the local override and the tokenizer goes back to its usual parsing. One can envision much more complicated scenarios in which a whole slew of behaviors suddenly morph. That is the power of prototyping.

Of course, this is incredibly dangerous. But when trying to be generic, you either need this or a 10,000 line program. I will stick with the 1000 line kind of program.

Tagged ,

Matryoshka, you $.russianDolls

So as I have been rewriting my code, I found that I wanted to add more and more behavior on top of each other. For example, as I create symbols, I want to have some of them process a value on startup. Or I want to have checking behavior added to the usual processing. So without a prior knowledge as to what I need and how it will fit, and trying to avoid a cascade of if statements, how do we do this?

Key to our success is the Function method .apply. What it does is allow us to call a function, give the “this” context, and an array of arguments. (.call passes the second, third, … arguments as arguments). So if we have function fun, then fun.apply(cool, [3, “two”]) is essentially the same as cool.fun(3, “two”) That is, it looks to fun like it is a method of the object cool hence this = cool, and the arguments passed are [3, “two”]

So let’s say we have an object obj with function behave. But we want to add new behavior to behave. We still want the old behavior, but we also want the new. So we can do the following:

obj = {behave: function () {console.log("Please type");}}
obj.behave ();  //log: "Please type"
var newBehave = function () {console.log("Here's Johnny");}
var oldBehave =  obj.behave;
obj.behave = function () {
  oldBehave.apply(this, arguments);
  newBehave.apply(this, arguments);
}
obj.behave();  //log: "Please type",  log:"Here's Johnny"

Now when we call obj.behave, first oldBehave gets called with the arguments passed to obj.behave and with the this context being obj (unless obj.behave is called differently; in any event, whatever its this context is passed along), and then newBehave gets called also with the this and arguments being passed along.

By the way, arguments is a special array-like object which contains all passed arguments. Iterating over it is how we can deal with variable argument functions. It also has this nifty feature we are using here of just being able to pass along the arguments.

Also notice the use of a closure. I need to use oldBehave because if I called obj.behave in that function, it looks up obj.behave at the time of running in that scope. So presumably it finds itself and calls itself. This leads to the original never being called and, more annoyingly, an infinite recursion, I believe. Ouch. But by using oldBehave, we store the old function in a closure which will always be around to the new function and not lead to any recursion.

Okay, now we can write a little plugin that takes care of the details for us. Largely, we want one for the usual ideals of clarity, tedious error-checking, and the feeling of accomplishing something without doing so (building code to help us build code that helps us build programs to help us parse our documents to help us make pages that get interpreted in …)

$.russianDolls = function (obj, property, fun, funfirst) {
        var oldfun; 
        if (obj && typeof property === "string" && typeof fun === "function") {
            oldfun = obj[property];
            if (typeof oldfun === "function") {
                if (funfirst) { //old fun goes last
                    obj[property] = function () {
                        fun.apply(this, arguments);
                        oldfun.apply(this, arguments);
                    };
                } else { //old fun goes first
                    obj[property] = function () {
                        oldfun.apply(this, arguments);
                        fun.apply(this, arguments);
                    };
                }
            } else { //no old fun so new property
                //if (oldfun) {$.log("$.russianDolls: old property not a function!", arguments, oldfun)}
                obj[property] = fun;
            }
        } else {
            //$.log("russianDolls: some argument is not right type: ", obj, property, fun, funfirst);
        }
};

So we test the first that nothing is going to give us an error; if it does we quit and do nothing. This has the benefit that if we pass an empty function (such as when passing in a parameter from another function call that is not always used), we add nothing to our object. Then we save the old function into oldfun. If oldfun exists and is a function, we decide whether to use it first (default, else part) or second, and then we assemble the new behavior into a separate function. Note that we do not want to put any of these ifs in the new function otherwise it needs to evaluate them every time we call the new function. With this ordering, the code is told once what needs to be done and can then proceed without checking conditions each time. If no oldfun exists or it is not a function, then we create a new function and store it in that property. Note that one might decide on other behavior for when oldfun is not a function, in particular, issuing a warning such as the commented out line.

Here is a sample test:

        var a = {hi:"Waiting", addText:function(n){this.hi = this.hi + "Original"+n;}}; 
        $.russianDolls(a, "addText", function(n,s){this.hi = this.hi + "After"+n+s;});
        $.russianDolls(a, "addText", function(){this.hi =  this.hi + "Before";}, true);  //new function goes first
        a.addText("n","s"); 
        $.russianDolls(a, "whoAmI", function(){this.hi = this.hi+"New";}); //adding a new function
        a.whoAmI();
        if (a.hi === "WaitingBeforeOriginalnAfternsNew") {rejoice();}

Notice how the parameters get passed even if they are not used.

One use for this is I find that I sometimes iterate over an empty object for $.each when I am debugging. This throws an error but I have no idea where in the code it is. So one could add

if (debug) {$.russianDolls($, "each", function(obj, fun) {if (!(obj)) {console.log("each not given object", obj, fun)} }, true)}

obj is useless of course (undefined), but fun should be there and clicking it in firebug leads you to the code where the function is defined.

One could also have another plugin which could capture the error for misbehaving functions. I think this might be dangerous so I will not implement it quite yet.

One thing I have been thinking about is that the plugins I am writing encapsulate useful ideas. By naming them, by abstracting the behavior out to a single name, I can then express it with clarity and precision. And this is really why I want to learn parsing techniques (the refactoring is coming along nicely). Once one can define the expression of an idea in language, it becomes easy to say it, to use it, to think it. And with that is power. Give me your name, and I will have power over you.

P.S. Happy Halloween! This seems an appropriate post–a doppelganger creator function.

Tagged ,

The rise of the .prototype

Today I have gotten to a point for another version in stone to go back to. It features a sequence of evaluations of an expression with boxes showing what will be evaluated, bolded numbers showing what was evaluated, and a grey last number showing that that number is where uncertainty lies. It also uses engineering notation (more or less scientific notation for massive/minuscule numbers and normal numbers for ordinary numbers). Or to say it another way, engineering notation tries to have as few symbols on the page as possible. That’s actually how I implemented it.

After tonight, I will do a massive refactoring of the code. I am now seeing the trees for the branches, so to speak.

One of my big stumbling blocks in understanding parsing was how to separate each of the pieces. I think I now understand how to do that separation. Hence the major refactorization. All of this is based on Crockford’s article on Pratt parsing. Great article.

So here is my ordering:

  1. Tokenizer. This takes a string and parses it into tokens. This is fairly straightforward. We are taking a one-dimensional object and breaking it into tiny pieces.

    For me, I have a reg ex that separates the string into words with non-leading numbers allowed, but not underscores or other symbols; into numbers with possible E notation; and one character symbols. This tokenizer does not tokenize all at once. It tokenizes as the parsing happens. It also has the facility to recognize stings of symbols that form a sensible pattern. That is loaded up on a per-language basis. Otherwise, it is all generic.

    For example if the language has the symbols “=” and “+=” and “+”, then Tokenizer will tokenize “3 + as += as6a -=8” into 3, +, as, +=, as6a, -, =, 8 assuming “-=” was not in the language.

    Information on where the match happened in the raw string and the raw match itself is put into the generated node.

  2. Parser. This takes tokens and converts each token into a node. The node is prototyped on a symbol. This prototyping is what I was not appreciating and merits further explanation below.

    Each token is parsed by looking up its type in a symbol table and then instantiated. It inherits through the prototyping the methods for telling it what it to do if it is sitting by itself (like a number or a prefix operator) or if it is with company (like a + in between, focus on there being somebody to the left).

    The algorithm then goes through sucking up tokens and pairing them up based on a notion of binding powers. I am still wrapping my head around this and Crockford discusses it well. But essentially, we start at the beginning of an expression with the presumption that that token is happy with no one to the left. Then we proceed to the next token and we see whether the two tokens can make a connection (4 + 5 would lead to 4 for a first token, + saying that it looks to the left and we’re good; a prefix such as -3 would have – start sucking the 3 up earlier and the – says it knows what to do). And then we continue that, trying to figure out what binds more strongly than what. That is the tricky part, but it comes out pretty nicely.

    Crockford hard-coded the binding power levels. What I did was setup objects in an array where each level of the array corresponds to another binding power level and each object holds the symbols and their types.

    Anyway, we have that tokens get matched up with each other. Let’s say we are looking at 4+5. The node t(+) has a property called this.sub = [t(4), t(5)]. So I have an array with the related nodes to the left and right for the +. This node may itself be in another node and so this is how we make a tree of an expression. And then we can assemble these expressions into a tree itself, perhaps a linear array mimicking the vertical line of descent when we write lines of code.

    So a node might have the properties sub =[list of nodes], type, raw string, other separators it used (for example “[” will have a matching “]” but at most one node will be created so one could have info about where and what the end match was. One could have a container node attached to the primary expression node and have nodes created for each separator that lives in the tree only through the container.

    As we parse, we can also put each token that we come across into a flatList so that we can get a sense of what was written. This deals with the trouble, say, of reassembling what parentheses were used in writing down an arithmetic expression or, more usefully, replacing bits of an expression with its value and having the rest be the original. Note that we cannot evaluate along a flatList, but we can do stuff after the evaluation through the tree has been done.

    So this tree structure, which I have built already, is what I want to refactor the noise out of. My tree has a variety of extra “features” like termites eating out a tree.

  3. Seeder. To actually do something with the tree, we need to give the symbols something to do. Crockford’s article stopped at that point. He built the trees with an “arity” feature and printed out the tree of the code. He was writing a short article to people who probably understood what to do with a tree. I did not. To me, much of the difficulty is still present in that tree. But at least we now have the 2-D structure.

    To give behavior, this is where the prototyping really shines. We need access to the symbol table. Remember, each node is an instance of a symbol. In other words, symbolTable[“+”] is a function used with the new command: new symbolTable[“+”] creates a node of type “+”. That means any objects in the new SymbolTable[“+”].prototype will become properties of the node. So we can attach actions to this prototype and presto, we have active nodes across our tree. As long as these functions keep the domino effect going, we can start at the top of the tree and evaluations will trickled done.

    For example, let’s say we define evaluate as a method on each of the nodes, i.e.,

     SymbolTable["+"].prototype.evaluate = function () {
          var left = this.sub[0].evaluate();
          var right = this.sub[1].evaluate(); 
         return left + right;
    }; 
    SymbolTable["(number)"].prototype.evaluate = function () {
          return this.value;  // (number) is the type for a number and they all have an innate value given in the tokenizing phase
    }
    

    I haven’t tried this, but I think I just defined addition for this. And we could define these prototypes within a closure if we say wanted to create a tree of values or a list of instructions as I did in version 2 above. All of it can be self-enclosed. And notice that we are populating the behavior of the nodes after the tree is in existence, but we are not walking the tree to add this functionality. Awesome.

  4. Walker. The seeders above do nothing unless one starts walking.

    If we have a list of boughs (expressions) of our tree, we can walk along each bough by calling the function of the top node in each bough and allowing it to call its sub node’s functions. When done, we do the next bough and so on down the list until we are done.

  5. Shooter. Walk in a straight line.

    If we want to and if it is appropriate (no loops, no function definitions), we can take the evaluated walked tree and go over our flatList and get an impression of how that original string gets evaluated as the tree is walked. More for pedagogy than function, I think.

So that is my current understanding. The breakthrough tonight was understanding the power and role of prototyping. Wow. That stuff is awesome.

As I do more, I feel closer and closer to realizing my long-time dream (from high school, really), of writing a math parser environment for the exploration of math. I see now how to emulate a calculator, a programming language, and I think markup languages. Even in this version 2, it is already of use to students in highlighting and seeing how things are evaluated.

My idea for the programming side is to have two versions. One version has no non-linear leaps, no loops, no function definitions. Just straight evaluation. I call this safe code (calculator code?) and it allows instructors, say, to load the student code and run it in a batch to evaluate results. Or even a server-side auto-checker such as node.js (I want a server that understands javascript).

And then a second version which has the non-linear stuff that can cause troubles. Still no access to DOM and security issues, but the potential for crashing, I suppose. This would be for defining new functions. For example if my default does not have an integrator function, then it could be defined in this more capable language which would be Good Mathematicized Javascript (cos would be a global function, for example!).

This is such fun.

Tagged

DSLs, SCSS, htmPlus

I was reading a book on Domain Specific Languages on my Safari Online subscription on my iPad. I was trying to learn more about them and how to parse.

I read a big chunk of it and I have to say it is a quick and enjoyable read. But I am not sure if I got much out of it. He suggests making a semantic model as the result of parsing. That way, one separates parsing from action. This sounds good. For some of my languages of interest, I think this will make a lot of sense. But for others, I am generating documents or code and as the author mentions, the semantic model is generally the parse tree itself.

The other big thing I got from the book is that a lot of the standard parsing techniques have issues and can be complicated. This suggests to me that the Pratt parsing that I am pursuing might be a good idea. Apparently regular languages (regex) can’t handle nesting. Context-free languages (handles nesting) that most parsers deal with seem to have issues with variable assignments (bizarre to me) and that context-specific languages do not have generalized parsing. Also, mixing in different languages together as I want to do is not supported in general. So all of that suggests that what I am learning and doing is not totally stupid.

One little side note from my reading were the two languages graphviz and scss. Graphviz is a simple language for constructing visualization of graphs. That sounds like a nice tool to have. There are also pstricks packages that can do this too. It might be fun to compare.

Of much more interest to me is scss which is at sass. This is a language which is essentially CSS but with a few extras to make it more programmatic. For example, being able to assign variable names for the colors one will use or being able to compute length units with arithmetic functions. Also the nesting and extensions minimize the need to repeat initial selectors or styling. The idea is to write a .scss file and then have a translation into a .css file. It looks pretty useful to me and I think I might code up a JavaScript parser for it; the parser I see for it is in Ruby.

A similar project that I am inspired to pursue is an html markup variant, which I am currently calling htmPlus; I leave off the L in html to indicate that it is like html but with a little less noise. My current idea is to create a syntax with as minimal extra as necessary but still have all the tags of html. I do not like the wiki syntax I see since it doesn’t really teach html at all. Using # for a numbered item or *bolded* does not get to what element is being used which prevents being able to read html, write css, or do jQuery selectors. So I want a language which is very close. But the idea of typing out

some stuff

is painful (how do GUIs deal with attributes?). Instead, my idea is \div.contthis#me some stuff // Same content, but less noise. The closing “bracket” could also be //div or //.contthis or //#me and it would close up all tags in between. So that can minimize useless closures.

(Why //? Well, it gives the feeling of closing html tags and it matches with the escape \ to escape the word into a tag. The only bummer is that it conflicts with comments in JavaScript).

I am torn about paragraphs. They should be \p sentences…… But I really like using two newlines to create a paragraph even though that leads to a misleading experience in coding html. I guess for classes and ids, \p would be needed such as in \p#nifty for

blah

Also in there would be a language for doing some useful programmatic replacement. I noticed when trying to write up a page on a recent diet that I have had, that I wanted to write up recipes with ingredients, directions, calories, fiber, maybe even cost of ingredients, etc. And you can well imagine that you might want the ingredient nutrients to bubble up from their own level and that you also will be formatting them in a certain way. So you have to decide on the markup structure and then put in the data. Well, if you decide to change the structure, that is hard. And there is lesss noise when reading simple data structures. Some would solve this with a server-side script serving up data and using templates there, but I think it would be nice to be able to write up pages as stand-alone items. Thus, htmPlus fills the gap. Write out the data and templates in the markup and then translate it into a proper html page. And, of course, if one decides to grab the data and put it elsewhere, then it should be easy to refactor it since the data will be in a data structure (JSON, to be specific).

The tags of htmPlus will look superficially like TeX with all of the slashes. But the similarity would end there. And I am fine with that. To mix in other languages, I would use something like this: \m math stuff // or \js script code // but the script code would not be in the html page, but used in generating it using my own custom js parsing (for security). To do scripting in the page itself, one would use \script code // though again // would be a problem with comments. I am thinking of a comment structure of “comments” that I could use in these various languages. Not sure, but I want stuff that I prefer non-shift needing keys.

Just some random thoughts from today.

Tagged