Skip to content

tokenizer prototyped

So I refactored my tokenizer into a prototype construct. I am not sure if I did it completely optimally, but I think it is reasonable.

Almost all of the properties are generic to all tokenizers in my vision, so that is encapsulated in the MasterTokenizer function. The prototype has all the methods that tokenizers need. The MasterTokenizer creates functions that will serve as tokenizer templates for a given parser, so a ParserTokenizer. It prototype is defined by

ParserTokenizer.prototype = new MasterTokenizer(longMatch);

This keeps the prototype chain alive. So when we create a tokens using a string, the prototype will be that of MasterTokenizer. But the initialization is done by ParserTokenizer.

The longMatch is the bit that differentiates different ParserTokenizers. This is the same set of symbols for the same parser, but varies for different parsers. It represents symbols that have more than one character such as // or +=. Long symbols not in longMatch will be broken up into one character symbols.

The odd bit is that I needed to introduce another function that actually makes the ParserTokenizer. I couldn’t figure out how to get the prototypes to work out with the different initializations without. Perhaps there isn’t a way. So the bit of code I find interesting is :

   pcf.MasterTokenizer = function (longMatch) {
         longMatch = longMatch || {};
         this.longMatch = longMatch;
    }; //this is the basic prototype object constructor for a parser type. Note that the "this" is the prototype for ParserToken

    pcf.MasterTokenizer.prototype = {
          //lots of properties for tokenizing
    };

     pcf.makeParserTokenizer = function(longMatch) {
        var ParserTokenizer = function (stringToParse,startIndex) {
            if (typeof stringToParse !== "string") {stringToParse = "";}
            this.str = stringToParse;
            this.ws();
            this.match = {};
            this.startIndex = startIndex || 0; //startIndex tells reg where to start parsing.
            //this is the token generating object and is now ready to parse out.
        };
        ParserTokenizer.prototype = new pcf.MasterTokenizer(longMatch);
        return ParserTokenizer;
     };

    pcf.DefaultTokenizer = pcf.makeParserTokenizer({});
    var TestTokenizer = pcf.makeTokenizer({"+": ["+", "+-"]});
    tokenStream = new TestTokenizer("3+4*!2+-4" )  //result: [3, "+", 4, "*", "!", 2, "+-", 4]

The last lines show how to use it. The third to last line is how to make a default tokenizer with no long symbols. The next line is one with a long symbol: +- and the last line shows the creation of a new stream of tokens. Note longMatch is part of the prototype of ParserTokenizer.

So in summary, creating a new object as the prototype of something else gives that the chaining prototype behavior. This allows a hierarchy of dynamic inheritance. My mind is taking its time to understand this. And it is hard to come up with good names with this stuff.

Post a Comment

Your email is never published nor shared.