Tolk Python Language Fragment
Table of Contents
- Definition
- Parsing Pipeline
- What the Parser Covers
- Literals
- Arithmetic Operators
- Unary Operators
- Comparison Operators
- Boolean Operators
- Identifiers and Variable Names
- Assignment
- Subscript Assignment
- Ternary Conditional
- Lambda Expressions
- Function Calls
- Function Definitions
- Return Statements
- If Statements (with elif/else)
- Lists and Subscripts
- Multi-Line Programs
- What the Parser Does NOT Cover
- Built-in Functions
- Scoping and Execution Model
- Sources
- Related
Definition
tolk/python implements a parser and interpreter for a delimited fragment
of the Python programming language. It uses parser combinators (via smug)
rather than s-expression reading, and handles Python's indentation-based
syntax via a preprocessor. This page catalogues exactly what the parser
covers and what it does not.
Parsing Pipeline
- Indentation preprocessor (
preprocessor.lisp): Converts leading whitespace into in-band control tokens (STXfor INDENT,ETXfor DEDENT,FSfor NEWLINE), making the token stream whitespace-insensitive. - Parser combinators (
parser.lisp): A recursive-descent grammar usingsmugcombinators. Operator precedence is encoded through grammar layering (.sum>.term>.factor>.primary>.atom). - AST construction: Parse results are CLOS instances from
ast.lisp.
What the Parser Covers
Literals
- Integer literals:
42,0,7(only integers; no floats, hex, octal, or binary). - Boolean literals:
True(t),False(nil). - None literal:
None(represented as:none).
No string literals. No float literals. No complex or imaginary numbers.
Arithmetic Operators
Supported, with standard precedence (PEMDAS):
+- addition (
:add) -- subtraction (
:sub) *- multiplication (
:mult) /- division (
:div)
Operator precedence is encoded grammatically: * and / bind tighter than + and -
Associativity is left-to-right. Parentheses override precedence: (2 + 3) * 4.
Unary Operators
+- unary plus (
:uadd) -- unary minus (
:usub) not- boolean negation (
:not)
Unary minus has higher precedence than binary operators: -2 + 3 parses as ((-2) + 3).
Comparison Operators
==- equality (
:eq) !=- inequality (
:noteq) <- less than or equal (
:lte) >=- greater than or equal (
:gte) <- less than (
:lt) >- greater than (
:gt)
Comparisons are a single compare node with one operator/comparator pair
(chained comparisons like 0 < x < 10 are not supported).
Boolean Operators
and- conjunction (
:and) or- disjunction (
:or)
Short-circuit semantics are standard. and and or form boolop nodes
with a list of operands. Precedence: not > and > or.
Identifiers and Variable Names
Identifiers follow Python naming rules: a letter or underscore followed
by letters, digits, or underscores. Examples: x, foo_bar, _private.
Keywords (True, False, None, and, or, not, lambda, def,
return, if, else, elif) are reserved and cannot be used as names.
Names carry a context slot :ctx set to :load (reading) or :store (target
of assignment). This mirrors Python's ast.Name.ctx.
Assignment
x = 5
Simple assignment to a single identifier. The target is a name node with
:ctx :store. No tuple unpacking, no multiple assignment (a = b = 1),
no augmented assignment (+, -=, etc.).
Subscript Assignment
arr[i] = value
Indexed assignment. Parsed as a subscript-assign node carrying the array
name, index expression, and value expression. This is a statement, distinct
from subscript expressions.
Ternary Conditional
x if True else y
The body (then-branch) comes first, matching Python's syntax. Parsed as an
ifexpr node with :test, :body, and :orelse slots.
Lambda Expressions
lambda x: x + 1
lambda x, y: x + y
lambda: 42
Parameter list (zero or more comma-separated identifiers), colon,
single-expression body. Parsed as a lambda-expr node.
Function Calls
f(1, 2)
add(3, 4)
Comma-separated argument list in parentheses. Zero-argument calls are
supported. Calls can be chained: f(1)(2). Parsed as call nodes.
Function Definitions
def f(x): x + 1
def add(x, y): x + y
def hello(): 42
A name, optional parameter list, colon, and a body. The body is parsed as a
*=block= (INDENT … DEDENT), a return-stmt, or a single expr. Functions
are first-class; they create closures at runtime.
Return Statements
return 42
Returns a value from a function. Parsed as a return-stmt node.
If Statements (with elif/else)
if x > 0:
y = 1
elif x < 0:
y = -1
else:
y = 0
if, optional elif (desugared to nested if-stmt nodes), optional else.
Each branch contains a block of statements. The test expression appears before
the colon (if test:).
Lists and Subscripts
[1, 2, 3]
arr[i]
arr[1:5]
arr[::2]
- List literals: Comma-separated expressions in square brackets. Empty
lists (
[]) are supported. Parsed aslist-expr. - Subscripts: Single index (
arr[i]) parsed assubscript. Slices (arr[start:stop:step]) parsed asslicenodes; all three parts are optional.
No dictionary or set literals. No tuple syntax (parsing
1, 2, 3is not supported). No comprehension syntax.
Multi-Line Programs
The parse-program function handles multi-statement programs. Statements
are separated by NEWLINE tokens (inserted by the preprocessor). The result
is a list of statement ASTs (not wrapped in a module node).
Statements at the top level may be assignments, function definitions, if statements, or expressions. Execution returns the value of the last statement.
What the Parser Does NOT Cover
Some notable omissions. This list is not exhaustive:
| Feature | Status |
|---|---|
| String literals | Not supported |
| Float literals | Not supported |
| Hex/octal/binary literals | Not supported |
while/for loops |
Not supported |
class definitions |
Not supported |
import/from statements |
Not supported |
try/except/finally |
Not supported |
with statements |
Not supported |
pass/break/continue |
Not supported |
Decorators (@decorator) |
Not supported |
yield/yield from |
Not supported |
global/nonlocal statements |
Not supported |
| Augmented assignment (+= etc.) | Not supported |
| Tuple unpacking (a, b = 1, 2) | Not supported |
| Multiple assignment (a = b = 1) | Not supported |
Chained comparisons (0 < x < 5) |
Not supported |
| Dictionary/set literals | Not supported |
| Comprehensions (list/dict/set) | Not supported |
Star args (*args, **kwargs) |
Not supported |
| F-strings / format strings | Not supported |
| Type annotations | Not supported |
del statements |
Not supported |
assert statements |
Not supported |
Built-in Functions
The default environment provides four built-in functions:
| Builtin | Signature | Description |
|---|---|---|
len |
(obj) |
Length of a vector (list). Fails otherwise |
range |
(stop), (start,stop), (start,stop,step) |
Sequence of integers via make-array |
print |
(&rest args) |
Writes to standard-output, returns :none |
append |
(list, element) |
vector-push-extend the element into the list |
All builtins are builtin CLOS instances wrapping Lisp lambdas.
Scoping and Execution Model
- LEGB-like scoping (Local, Enclosing, Global, Builtin): Environment is an alist mapping names to store locations. Store is an alist mapping locations to values.
- Immutable env/store: Extended by consing, never mutated in place.
- Pre-allocation (
collect-names/pre-allocate): Before execution, all top-level names are collected and pre-allocated store slots. This enables forward references for recursion and closures (letrecpattern). - Closures: Created for
lambdaanddefforms. Carry their ownparams,body, and capturedenv.
Sources
sources/advprog/tolk/src/python/parser.lisp— parser combinators and grammarsources/advprog/tolk/src/python/preprocessor.lisp— indentation preprocessingsources/advprog/tolk/src/python/environment.lisp— env/store and builtinssources/advprog/tolk/src/python/execute.lisp— top-level executionsources/advprog/tolk/src/python/ast.lisp— AST and runtime classessources/advprog/tolk/tests/python/parsing.lisp— parser tests
Related
- Tolk ← cluster hub
- Tolk Arith — the arithmetic interpreter (Racket subset)
- Tolk Python AST — AST node classes
- Abstract Syntax Tree
- Parsing
- Interpreter
- Wikipedia: Python