Parsing
Table of Contents
Definition
Parsing is the transformation of a character stream (surface syntax) into a structured internal representation (typically an AST). It is the first phase of an interpreter pipeline.
Parsing is a large and complex field (ambiguity, operator precedence, error recovery). PLAI deliberately minimises its treatment of parsing by exploiting Lisp's s-expression syntax.
The Bicameral Design (PLAI Ch.2)
PLAI's approach splits parsing into two levels:
- Lower level (reader): Racket's
read/ quotation converts character streams into s-expressions (nested lists of symbols, numbers, strings). This is unambiguous because input is fully parenthesised. - Upper level (parser function): A
parsefunction converts s-expressions into typed AST nodes, performing well-formedness checking.
In Common Lisp, the built-in reader plays the same role as Racket's read.
The upper-level parser uses ematch with quasiquote patterns.
Tolk's Parser
(defmethod parse (sexp)
"Upper-level parser: s-expression → AST"
(ematch sexp
((type number) (make-instance 'num :n sexp))
(`(+ ,l ,r) (make-instance 'plus :l (parse l) :r (parse r)))
(`(* ,l ,r) (make-instance 'mult :l (parse l) :r (parse r)))))
(defmethod parse ((p pathname))
"Pathname method: reads a .rkt file, returns list of parsed ASTs"
(mapcar #'parse (read-racket-file p)))
The quasiquote patterns (+ ,l ,r) match a three-element list whose first
element is the symbol +. This is the fare-quasiquote extension enabling
destructuring in trivia patterns, activated by the named-readtables
declaration in-readtable :fare-quasiquote at the top of each source file.
PLAI's Parser (plai-typed)
(define (parse [s : s-expression]) : ArithC
(cond [(s-exp-number? s) (numC (s-exp->number s))]
[(s-exp-list? s)
(let ([sl (s-exp->list s)])
(case (s-exp->symbol (first sl))
[(+) (plusC (parse (second sl)) (parse (third sl)))]
[(*) (multC (parse (second sl)) (parse (third sl)))]
[else (error 'parse "invalid list input")]))]
[else (error 'parse "invalid input")]))
Racket's plai-typed requires explicit casts (s-exp->number, s-exp->symbol
etc.) because s-expressions have their own distinct type. Common Lisp requires
no such casts.
Comparison
| Aspect | PLAI (plai-typed) | Tolk (Common Lisp) |
|---|---|---|
| First phase | Racket read / quotation |
CL reader / quotation |
| Second phase | parse with cond + predicates |
parse with ematch + patterns |
| Type casting | Required (s-exp->number etc.) |
Not needed |
| Pathname support | No built-in | Separate defmethod on pathname |
Related
- Interpreter Architecture ← cluster hub
- Abstract Syntax Tree
- Interpreter
- Tolk Arith
- Trivia —
ematchfor pattern matching - Fare-quasiquote — quasiquote patterns
- Named-readtables — activates the fare-quasiquote readtable
- Wikipedia: Parsing