The rise of the .prototype

Today I have gotten to a point for another version in stone to go back to. It features a sequence of evaluations of an expression with boxes showing what will be evaluated, bolded numbers showing what was evaluated, and a grey last number showing that that number is where uncertainty lies. It also uses engineering notation (more or less scientific notation for massive/minuscule numbers and normal numbers for ordinary numbers). Or to say it another way, engineering notation tries to have as few symbols on the page as possible. That’s actually how I implemented it.

After tonight, I will do a massive refactoring of the code. I am now seeing the trees for the branches, so to speak.

One of my big stumbling blocks in understanding parsing was how to separate each of the pieces. I think I now understand how to do that separation. Hence the major refactorization. All of this is based on Crockford’s article on Pratt parsing. Great article.

So here is my ordering:

Tokenizer. This takes a string and parses it into tokens. This is fairly straightforward. We are taking a one-dimensional object and breaking it into tiny pieces.
For me, I have a reg ex that separates the string into words with non-leading numbers allowed, but not underscores or other symbols; into numbers with possible E notation; and one character symbols. This tokenizer does not tokenize all at once. It tokenizes as the parsing happens. It also has the facility to recognize stings of symbols that form a sensible pattern. That is loaded up on a per-language basis. Otherwise, it is all generic.

For example if the language has the symbols “=” and “+=” and “+”, then Tokenizer will tokenize “3 + as += as6a -=8″ into 3, +, as, +=, as6a, -, =, 8 assuming “-=” was not in the language.

Information on where the match happened in the raw string and the raw match itself is put into the generated node.
Parser. This takes tokens and converts each token into a node. The node is prototyped on a symbol. This prototyping is what I was not appreciating and merits further explanation below.
Each token is parsed by looking up its type in a symbol table and then instantiated. It inherits through the prototyping the methods for telling it what it to do if it is sitting by itself (like a number or a prefix operator) or if it is with company (like a + in between, focus on there being somebody to the left).

The algorithm then goes through sucking up tokens and pairing them up based on a notion of binding powers. I am still wrapping my head around this and Crockford discusses it well. But essentially, we start at the beginning of an expression with the presumption that that token is happy with no one to the left. Then we proceed to the next token and we see whether the two tokens can make a connection (4 + 5 would lead to 4 for a first token, + saying that it looks to the left and we’re good; a prefix such as -3 would have – start sucking the 3 up earlier and the – says it knows what to do). And then we continue that, trying to figure out what binds more strongly than what. That is the tricky part, but it comes out pretty nicely.

Crockford hard-coded the binding power levels. What I did was setup objects in an array where each level of the array corresponds to another binding power level and each object holds the symbols and their types.

Anyway, we have that tokens get matched up with each other. Let’s say we are looking at 4+5. The node t(+) has a property called this.sub = [t(4), t(5)]. So I have an array with the related nodes to the left and right for the +. This node may itself be in another node and so this is how we make a tree of an expression. And then we can assemble these expressions into a tree itself, perhaps a linear array mimicking the vertical line of descent when we write lines of code.

So a node might have the properties sub =[list of nodes], type, raw string, other separators it used (for example “[" will have a matching "]” but at most one node will be created so one could have info about where and what the end match was. One could have a container node attached to the primary expression node and have nodes created for each separator that lives in the tree only through the container.

As we parse, we can also put each token that we come across into a flatList so that we can get a sense of what was written. This deals with the trouble, say, of reassembling what parentheses were used in writing down an arithmetic expression or, more usefully, replacing bits of an expression with its value and having the rest be the original. Note that we cannot evaluate along a flatList, but we can do stuff after the evaluation through the tree has been done.

So this tree structure, which I have built already, is what I want to refactor the noise out of. My tree has a variety of extra “features” like termites eating out a tree.
Seeder. To actually do something with the tree, we need to give the symbols something to do. Crockford’s article stopped at that point. He built the trees with an “arity” feature and printed out the tree of the code. He was writing a short article to people who probably understood what to do with a tree. I did not. To me, much of the difficulty is still present in that tree. But at least we now have the 2-D structure.
To give behavior, this is where the prototyping really shines. We need access to the symbol table. Remember, each node is an instance of a symbol. In other words, symbolTable["+"] is a function used with the new command: new symbolTable["+"] creates a node of type “+”. That means any objects in the new SymbolTable["+"].prototype will become properties of the node. So we can attach actions to this prototype and presto, we have active nodes across our tree. As long as these functions keep the domino effect going, we can start at the top of the tree and evaluations will trickled done.

For example, let’s say we define evaluate as a method on each of the nodes, i.e.,
```
 SymbolTable["+"].prototype.evaluate = function () {
      var left = this.sub[0].evaluate();
      var right = this.sub[1].evaluate();
     return left + right;
};
SymbolTable["(number)"].prototype.evaluate = function () {
      return this.value;  // (number) is the type for a number and they all have an innate value given in the tokenizing phase
}
```
I haven’t tried this, but I think I just defined addition for this. And we could define these prototypes within a closure if we say wanted to create a tree of values or a list of instructions as I did in version 2 above. All of it can be self-enclosed. And notice that we are populating the behavior of the nodes after the tree is in existence, but we are not walking the tree to add this functionality. Awesome.
Walker. The seeders above do nothing unless one starts walking.
If we have a list of boughs (expressions) of our tree, we can walk along each bough by calling the function of the top node in each bough and allowing it to call its sub node’s functions. When done, we do the next bough and so on down the list until we are done.
Shooter. Walk in a straight line.
If we want to and if it is appropriate (no loops, no function definitions), we can take the evaluated walked tree and go over our flatList and get an impression of how that original string gets evaluated as the tree is walked. More for pedagogy than function, I think.

So that is my current understanding. The breakthrough tonight was understanding the power and role of prototyping. Wow. That stuff is awesome.

As I do more, I feel closer and closer to realizing my long-time dream (from high school, really), of writing a math parser environment for the exploration of math. I see now how to emulate a calculator, a programming language, and I think markup languages. Even in this version 2, it is already of use to students in highlighting and seeing how things are evaluated.

My idea for the programming side is to have two versions. One version has no non-linear leaps, no loops, no function definitions. Just straight evaluation. I call this safe code (calculator code?) and it allows instructors, say, to load the student code and run it in a batch to evaluate results. Or even a server-side auto-checker such as node.js (I want a server that understands javascript).

And then a second version which has the non-linear stuff that can cause troubles. Still no access to DOM and security issues, but the potential for crashing, I suppose. This would be for defining new functions. For example if my default does not have an integrator function, then it could be defined in this more capable language which would be Good Mathematicized Javascript (cos would be a global function, for example!).

This is such fun.

Mythic Logos