Parsing Notes

Posted Friday, October 16, 2020 by Sri. Tagged MEMO
EDITING PHASE:outlining ideas...

What is a Recursive Descent Algorithm

class Parser index = 0 tokens = [] expr = [] advance():index++ peep()=>tokens[index+1] current()=>tokens[index] parse(str) gets a tokener gets tokens { type, value? } uses this.current() to mark the current place in the parsing if not EOF starts with least precedent grabber: const expr = add(), which will then call the next higher if (expr) EXPR.push(expr) ones in this order: add < sub < mul < logical < primary primary is what starts the actual advancement .. is NUM return new Literal(current value) .. is LPAREN get this.add() again, advance, return new Grouping(expr)

tracing add: add() left = this.sub() -> go down left side of tree recursively if current()=='+' advance() return Binary(left,'ADD',this.sub()) else return left (could be any kind of node) sub()
left = this.mul() if current()=='-' advance() return Binary(left,'SUB',this.mul()) else return left (could be any kind of node) mul() left = this.all() if current()=='*' advance() return Binary(left,'MUL',this.all()) else return left (could be any kind of node) all() left = this.primary() case '>=' advance() return Binary(left,'GREATER_EQUAL', this.add()) case '<=' advance() return Binary(left,'LESS_EQUAL', this.add()) case '==' advance() return Binary(left,'EQUAL_EQUAL', this.add()) case '>' advance() return Binary(left,'GREATER_THAN', this.add()) case '<' advance() return Binary(left,'LESS_THAN', this.add()) case '!=' advance() return Binary(left,'BANG_EQUAL', this.add()) default: return left primary() advance() if current()=='NUM' return Literal(current().value) if current()=='LPAREN' expr=this.add advance() return Grouping(expr)

WHAT THIS ALL DOES 1 + (2 * 3) NUM{1} ADD LPAREN NUM{2} MUL NUM{3} RPAREN start with add(NUM{1})...construct tree that has order of precedence return ASTS

class Evaluator asats = asts visitor = new Visitor() evaluate() visitor.visitExpressions(this.asts) this walks an array of expressions and visits each one

The way this works is that AST inplmenets nodes that are either Binary, Literal, or Grouping. They all implement visit(), which The Visitor class has equivalent visitBinary, visitLiteral, visitGrouping, plus visitExpression visitor instances receive the AST node (Binary, Literal, or Grouping)

class Visitor visitExpressions(exprs) for expr of exprs expr.visit(this)

utils isOp ['=', '+', '-', '*', '/', '>', '<', '>=', '<=', '==', '!='] isNum !isNaN(parseFloat(v)) && isFinite(v) isAlpha RegExp(/[1]+$/i)

class Token inst = null tokens = [] { type, value } tokenize(str) s contains built tokens, one at a time. it's cleared every time a token is emitted loop over string length s+=str[index] peek = str[index+1] * look for transitions between character types *

if "number transition" push { type:'NUM', value:s } clear s if "open or close paren" push { type:'LPAREN'} or { type:'RPAREN' } clear s if "operator transition" push { type:'OP', value:s } clear s if "line terminator" push { type:'EOL' } clear s if "end of string" push {type:'EOF'} clear s


AST Types

Binary(left, operator, right) left, operator, right visit: return visitor.visitBinary(this) Literal(value) value visit: return visitor.visitLiteral(this) Grouping(expr) expr visit: visitor.visitGrouping(this)

token types


  1. a-z0-9 ↩︎