UP | HOME

Tolk Python Language Fragment

Table of Contents

Definition

tolk/python implements a parser and interpreter for a delimited fragment of the Python programming language. It uses parser combinators (via smug) rather than s-expression reading, and handles Python's indentation-based syntax via a preprocessor. This page catalogues exactly what the parser covers and what it does not.

Parsing Pipeline

  1. Indentation preprocessor (preprocessor.lisp): Converts leading whitespace into in-band control tokens (STX for INDENT, ETX for DEDENT, FS for NEWLINE), making the token stream whitespace-insensitive.
  2. Parser combinators (parser.lisp): A recursive-descent grammar using smug combinators. Operator precedence is encoded through grammar layering (.sum > .term > .factor > .primary > .atom).
  3. AST construction: Parse results are CLOS instances from ast.lisp.

What the Parser Covers

Literals

  • Integer literals: 42, 0, 7 (only integers; no floats, hex, octal, or binary).
  • Boolean literals: True (t), False (nil).
  • None literal: None (represented as :none).

No string literals. No float literals. No complex or imaginary numbers.

Arithmetic Operators

Supported, with standard precedence (PEMDAS):

+
addition (:add)
-
subtraction (:sub)
*
multiplication (:mult)
/
division (:div)

Operator precedence is encoded grammatically: * and / bind tighter than + and - Associativity is left-to-right. Parentheses override precedence: (2 + 3) * 4.

Unary Operators

+
unary plus (:uadd)
-
unary minus (:usub)
not
boolean negation (:not)

Unary minus has higher precedence than binary operators: -2 + 3 parses as ((-2) + 3).

Comparison Operators

==
equality (:eq)
!=
inequality (:noteq)
<
less than or equal (:lte)
>=
greater than or equal (:gte)
<
less than (:lt)
>
greater than (:gt)

Comparisons are a single compare node with one operator/comparator pair (chained comparisons like 0 < x < 10 are not supported).

Boolean Operators

and
conjunction (:and)
or
disjunction (:or)

Short-circuit semantics are standard. and and or form boolop nodes with a list of operands. Precedence: not > and > or.

Identifiers and Variable Names

Identifiers follow Python naming rules: a letter or underscore followed by letters, digits, or underscores. Examples: x, foo_bar, _private.

Keywords (True, False, None, and, or, not, lambda, def, return, if, else, elif) are reserved and cannot be used as names.

Names carry a context slot :ctx set to :load (reading) or :store (target of assignment). This mirrors Python's ast.Name.ctx.

Assignment

x = 5

Simple assignment to a single identifier. The target is a name node with :ctx :store. No tuple unpacking, no multiple assignment (a = b = 1), no augmented assignment (+, -=, etc.).

Subscript Assignment

arr[i] = value

Indexed assignment. Parsed as a subscript-assign node carrying the array name, index expression, and value expression. This is a statement, distinct from subscript expressions.

Ternary Conditional

x if True else y

The body (then-branch) comes first, matching Python's syntax. Parsed as an ifexpr node with :test, :body, and :orelse slots.

Lambda Expressions

lambda x: x + 1
lambda x, y: x + y
lambda: 42

Parameter list (zero or more comma-separated identifiers), colon, single-expression body. Parsed as a lambda-expr node.

Function Calls

f(1, 2)
add(3, 4)

Comma-separated argument list in parentheses. Zero-argument calls are supported. Calls can be chained: f(1)(2). Parsed as call nodes.

Function Definitions

def f(x): x + 1
def add(x, y): x + y
def hello(): 42

A name, optional parameter list, colon, and a body. The body is parsed as a *=block= (INDENT … DEDENT), a return-stmt, or a single expr. Functions are first-class; they create closures at runtime.

Return Statements

return 42

Returns a value from a function. Parsed as a return-stmt node.

If Statements (with elif/else)

if x > 0:
    y = 1
elif x < 0:
    y = -1
else:
    y = 0

if, optional elif (desugared to nested if-stmt nodes), optional else. Each branch contains a block of statements. The test expression appears before the colon (if test:).

Lists and Subscripts

[1, 2, 3]
arr[i]
arr[1:5]
arr[::2]
  • List literals: Comma-separated expressions in square brackets. Empty lists ([]) are supported. Parsed as list-expr.
  • Subscripts: Single index (arr[i]) parsed as subscript. Slices (arr[start:stop:step]) parsed as slice nodes; all three parts are optional.

No dictionary or set literals. No tuple syntax (parsing 1, 2, 3 is not supported). No comprehension syntax.

Multi-Line Programs

The parse-program function handles multi-statement programs. Statements are separated by NEWLINE tokens (inserted by the preprocessor). The result is a list of statement ASTs (not wrapped in a module node).

Statements at the top level may be assignments, function definitions, if statements, or expressions. Execution returns the value of the last statement.

What the Parser Does NOT Cover

Some notable omissions. This list is not exhaustive:

Feature Status
String literals Not supported
Float literals Not supported
Hex/octal/binary literals Not supported
while/for loops Not supported
class definitions Not supported
import/from statements Not supported
try/except/finally Not supported
with statements Not supported
pass/break/continue Not supported
Decorators (@decorator) Not supported
yield/yield from Not supported
global/nonlocal statements Not supported
Augmented assignment (+= etc.) Not supported
Tuple unpacking (a, b = 1, 2) Not supported
Multiple assignment (a = b = 1) Not supported
Chained comparisons (0 < x < 5) Not supported
Dictionary/set literals Not supported
Comprehensions (list/dict/set) Not supported
Star args (*args, **kwargs) Not supported
F-strings / format strings Not supported
Type annotations Not supported
del statements Not supported
assert statements Not supported

Built-in Functions

The default environment provides four built-in functions:

Builtin Signature Description
len (obj) Length of a vector (list). Fails otherwise
range (stop), (start,stop), (start,stop,step) Sequence of integers via make-array
print (&rest args) Writes to standard-output, returns :none
append (list, element) vector-push-extend the element into the list

All builtins are builtin CLOS instances wrapping Lisp lambdas.

Scoping and Execution Model

  • LEGB-like scoping (Local, Enclosing, Global, Builtin): Environment is an alist mapping names to store locations. Store is an alist mapping locations to values.
  • Immutable env/store: Extended by consing, never mutated in place.
  • Pre-allocation (collect-names / pre-allocate): Before execution, all top-level names are collected and pre-allocated store slots. This enables forward references for recursion and closures (letrec pattern).
  • Closures: Created for lambda and def forms. Carry their own params, body, and captured env.

Sources

  • sources/advprog/tolk/src/python/parser.lisp — parser combinators and grammar
  • sources/advprog/tolk/src/python/preprocessor.lisp — indentation preprocessing
  • sources/advprog/tolk/src/python/environment.lisp — env/store and builtins
  • sources/advprog/tolk/src/python/execute.lisp — top-level execution
  • sources/advprog/tolk/src/python/ast.lisp — AST and runtime classes
  • sources/advprog/tolk/tests/python/parsing.lisp — parser tests

Related