2 Syntax Model

8.17.0.6

2 Syntax Model🔗ℹ

The syntax of a Rhombus program is defined by

A read pass converts characters to an intermediate abstract represented as a syntax object. This surface syntax is shrubbery notation as defined in Shrubbery Notation.
An expand pass processes a syntax object to produce one that is fully parsed and ready for evaluation. The expansion pass is extensible within Rhombus itself, so that the syntax of a Rhombus program can be customized Binding information in a syntax object drives the expansion process, and when the expansion process encounters a binding form, it extends syntax objects for subexpressions with new binding information.

2.1 Identifiers, Binding, and Scopes🔗ℹ

An identifier is a source-program entity. Parsing (i.e., expanding) a Rhombus program reveals that some identifiers correspond to variables, some refer to syntactic forms (such as fun, which is the syntactic form for functions), some refer to transformers for macro expansion, and some are quoted to produce symbols or syntax objects. An identifier binds another (i.e., it is a binding) when the former is parsed as a variable or syntactic form and the latter is parsed as a reference to the former; the latter is bound.

For example, as a fragment of source, the text

def x = 5
x

includes two identifiers: def and x (which appears twice). When this source is parsed in a context where def has its usual meaning, the first x binds the second x.

Bindings and references are determined through scope sets. A scope corresponds to a region of the program that is either in part of the source or synthesized through elaboration of the source. Nested binding contexts (such as nested functions) create nested scopes, while macro expansion creates scopes that overlap in more complex ways. Conceptually, each scope is represented by a unique token, but the token is not directly accessible. Instead, each scope is represented by a value that is internal to the representation of a program.

A form is a fragment of a program, such as an identifier or a function call. A form is represented as a syntax object, and each syntax object has an associated set of scopes (i.e., a scope set). In the above example, the representations of the xs include the scope that corresponds to the def form.

When a form parses as the binding of a particular identifier, parsing updates a global table that maps a combination of an identifier’s symbol and scope set to its meaning: a variable, a syntactic form, or a transformer. An identifier refers to a particular binding when the reference’s symbol and the identifier’s symbol are the same, and when the reference’s scope set is a superset of the binding’s scope set. For a given identifier, multiple bindings may have scope sets that are subsets of the identifier’s; in that case, the identifier refers to the binding whose set is a superset of all others; if no such binding exists, the reference is ambiguous (and triggers a syntax error if it is parsed as an expression). A binding shadows any binding (i.e., it is shadowing any binding) with the same symbol but a subset of scopes.

For example, in

fun (x):
x

in a context where fun corresponds to the usual syntactic form, the parsing of fun introduces a new scope for the binding of x. Since the second x receives that scope as part of the fun body, the first x binds the second x. In the more complex case

fun (x):
fun (x):
x

the inner run creates a second scope for the second x, so its scope set is a superset of the first x’s scope set—which means that the binding for the second x shadows the one for the first x, and the third x refers to the binding created by the second one.

A top-level binding is a binding from a definition at the top-level; a module binding is a binding from a definition in a module; all other bindings are local bindings. Within a module, references to top-level bindings are disallowed. An identifier without a binding is unbound.

Throughout the documentation, identifiers are typeset to suggest the way that they are parsed. A hyperlinked identifier like run indicates a reference to a syntactic form or variable. A plain identifier like x is a variable or a reference to an unspecified top-level variable.

2.1.1 Binding Spaces🔗ℹ

A binding space, or just space, represents a distinct syntactic category that has its own set of bindings. A binding space is implemented by a specific scope for the space; an identifier is bound in a space if its binding includes the space’s scope in its scope set. As a special case, the expression space has no correspond scope; bindings in that space correspond to the absence of other space’s scopes.

Binding forms bind identifiers in specific spaces. The def, let, and fun forms, for example, bind in the expression space. They may also bind in the static-information space to record static information about the binding.

The import and export forms include support for bindings spaces through subforms like only_space and except_space.

2.1.2 Binding Phases🔗ℹ

Every binding has a phase level in which it can be referenced, where a phase level normally corresponds to an integer (but the special label phase level does not correspond to an integer). Phase level 0 corresponds to the run time of the enclosing module (or the run time of top-level expressions). Bindings in phase level 0 constitute the base environment. Phase level 1 corresponds to the time during which the enclosing module (or top-level expression) is expanded; bindings in phase level 1 constitute the transformer environment. Phase level -1 corresponds to the run time of a different module for which the enclosing module is imported for use at phase level 1 (relative to the importing module); bindings in phase level -1 constitute the template environment. The label phase level does not correspond to any execution time; it is used to track bindings (e.g., to identifiers within documentation) without implying an execution dependency.

An identifier can have different bindings in different phase levels. More precisely, the scope set associated with a form can be different at different phase levels; a top-level or module context implies a distinct scope at every phase level, while scopes from macro expansion or other syntactic forms are added to a form’s scope sets at all phases. The context of each binding and reference determines the phase level whose scope set is relevant.

2.2 Syntax Objects🔗ℹ

A syntax object combines a simpler Rhombus value, such as a symbol or list, with lexical information, source-location information, and syntax properties. The lexical information of a syntax object comprises a set of scope sets, one for each phase level. In particular, an identifier is represented as a syntax object containing a symbol, and its lexical information can be combined with the global table of bindings to determine its binding (if any) at each phase level.

For example, a List identifier might have lexical information that designates it as the List from the rhombus language (i.e., the built-in List). Similarly, a fun identifier’s lexical information may indicate that it represents a function form. Some other identifier’s lexical information may indicate that it references a top-level variable.

When a syntax object represents a more complex expression than an identifier or simple constant, its internal components can be extracted. Even for extracted identifiers, detailed information about binding is available mostly indirectly; two identifiers can be compared to determine whether they refer to the same binding (i.e., syntax_meta.equal_binding), or whether the identifiers have the same scope set so that each identifier would bind the other if one were in a binding position and the other in an expression position (i.e., syntax_meta.equal_name_and_scopes).

For example, when the program written as

fun (x):
x + 6

is represented as a syntax object, then two syntax objects can be extracted for the two xs. Both the syntax_meta.equal_binding and syntax_meta.equal_name_and_scopes predicates will indicate that the xs are the same. In contrast, the fun identifier is not syntax_meta.equal_binding or syntax_meta.equal_name_and_scopes to either x.

The lexical information in a syntax object is independent of the rest of the syntax object, and it can be copied to a new syntax object in combination with an arbitrary other Rhombus value. Thus, identifier-binding information in a syntax object is predicated on the symbolic name of the identifier as well as the identifier’s lexical information; the same question with the same lexical information but different base value can produce a different answer.

For example, combining the lexical information from fun in the program above to #'x would not produce an identifier that is syntax_meta.equal_binding to either x, since it does not appear in the scope of the x binding. Combining the lexical context of the 6 with #'x, in contrast, would produce an identifier that is syntax_meta.equal_name_and_scopes to both xs.

The Syntax.literal_local form bridges the evaluation of a program and the representation of a program. Specifically, Syntax.literal_local'datum' produces a syntax object that preserves all of the lexical information that datum had when it was parsed as part of the Syntax.literal_local form. Note that the Syntax.literal form is similar, but it removes certain scopes from the datum’s scope sets. Just using quotes, as in 'datum', is similar to using Syntax.literal, except that an escaping $ is recognized within datum.

2.3 Expansion🔗ℹ

Expansion recursively processes a syntax object in a particular phase level, starting with phase level 0. Bindings from the syntax object’s lexical information drive the expansion process, and cause new bindings to be introduced for the lexical information of sub-expressions. In some cases, a sub-expression is expanded in a phase deeper (having a bigger phase level number) than the enclosing expression.

2.3.1 Internal Definitions🔗ℹ

An internal-definition context supports local definitions mixed with expressions. Forms that allow internal definitions document such positions using the body meta-variable.

Expansion relies on partial expansion of each body in an internal-definition sequence. Partial expansion of each body produces a form matching one of the following cases:

A definition form like def: The binding table is immediately enriched with bindings for the definition form. Further expansion of the definition is deferred, and partial expansion continues with the rest of the body.
A transformer definition form like expr_meta.macro: The right-hand side is expanded and evaluated, and a transformer binding is installed for the body sequence before partial expansion continues with the rest of the body.

After all body forms are partially expanded, if no definitions were encountered, then the expressions are collected into a sequence as the internal-definition context’s expansion. Otherwise, at least one expression must appear after the last definition.

Before partial expansion begins, expansion of an internal-definition context begins with the introduction of a fresh outside-edge scope on the content of the internal-definition context. This outside-edge scope effectively identifies syntax objects that are present in the original form. An inside-edge scope is also created and added to the original content; furthermore, the inside-edge scope is added to the result of any partial expansion. This inside-edge scope ensures that all bindings introduced by the internal-definition context have a particular scope in common.

2.3.2 Module Expansion, Phases, and Visits🔗ℹ

Expansion of a module form proceeds in a similar way to expansion of an internal-definition context: an outside-edge scope is created for the original module content, and an inside-edge scope is added to both the original module and any form that appears during a partial expansion of the module’s top-level forms to uncover definitions and imports.

A import form not only introduces bindings at expansion time, but also visits the referenced module when it is encountered by the expander. That is, the expander instantiates any variables defined in the module within meta, and it also evaluates all expressions for transformer bindings via meta.bridge, expr_meta.macro and similar.

Module visits propagate through imports in the same way as module instantiation. Moreover, when a module is visited at phase 0, any module that it imports import meta is instantiated at phase 1, while further import meta -1s leading back to phase 0 causes the required module to be visited at phase 0 (i.e., not instantiated).

During compilation, the top-level of module context is itself implicitly visited. Thus, when the expander encounters import meta, it immediately instantiates the required module at phase 1, in addition to adding bindings at phase level 1 (i.e., the transformer environment). Similarly, the expander immediately evaluates any form that it encounters within meta.

Phases beyond 0 are visited on demand. For example, when the right-hand side of a phase-0 expr_meta.macro is to be expanded, then modules that are available at phase 1 are visited. More generally, initiating expansion at phase n visits modules at phase n, which in turn instantiates modules at phase n+1. These visits and instantiations apply to available modules in the enclosing namespace’s module registry; a per-registry lock prevents multiple threads from concurrently instantiating and visiting available modules.

When the expander encounters import and import meta within a module context, the resulting visits and instantiations are specific to the expansion of the enclosing module, and are kept separate from visits and instantiations triggered from a top-level context or from the expansion of a different module.

2.4 More to Come🔗ℹ

Terms still needing definition: transformer, base phase, syntactic form, expression, channel, inspector, source location, syntax property, will, module context, module registry, top level context, partial expansion, namespace.