11.1 Syntax Objects
A syntax object encapsulates a shrubbery term, group, or multi-group sequence with binding scopes and other metadata on individual terms, and metadata potentially on individual syntax objects. See Shrubbery Notation for information on shrubbery notation, and specifically Parsed Representation for information on representing shrubbery terms as Rhombus values. The Syntax.make function takes such a value and wraps it as a syntax object, so that it can accumulate binding scopes or hold other metadata, and functions like Syntax.unwrap expose that structure.
In addition to normal shrubbery structure, a syntax object can contain parsed terms, which are opaque. The meaning and internal structure of a parsed term depends on the parser that produced it. In the case of parsing a syntax object as a Rhombus expression via expr_meta.Parsed, a parsed term encapsulates a Racket expression. Pattern matching and functions like Syntax.unwrap treat parsed terms as opaque.
A quoted sequence of terms using '…' is parsed as an implicit use of the #%quotes form, which is normally bound to create a syntax object. For example, '1.000' is a syntax object that wraps the number 1.0.
Metadata for a syntax object can include a source location and the raw source text for a term, such as "1.000" for a 1.0 that was written originally as 1.000. Raw-source metadata is used when printing a syntax error for a syntax object. Besides the main text of a term, metadata can include a prefix string and/or suffix string, which is used when printing a sequence of terms to reflect the original layout. A group syntax object internally starts with a group tag that normally contains only prefix and suffix text, leaving the group elements to supply their own text forms. Finally, a syntax object can contain a tail string or and/or a tail suffix; those normally appear only on a tag at the start of a syntax object that represents a pair of parentheses, brackets, braces or quotes, where the tail string corresponds to the closer, and the tail suffix corresponds to text after the closer.
A syntax object that results from a match using a syntax class annotation has have fields in addition to the methods of all syntax objects. If a field from a syntax class has the same name as a Syntax method, the field takes precedence for dynamic access and for static access using Syntax.matched_of with the syntax class’s name.
Constructs a syntax object. When a single term is present, the result is a single-term syntax object. When a single term ... group is present with multiple terms, the result is a group syntax object. The general case is a multi-group syntax object.
The #%quotes form is implicitly used when '…' is used in an expression position. See also Implicit Forms.
> '1'
'1'
> 'pi'
'pi'
> '1 + 2'
'1 + 2'
> '1 + 2
3 + 4'
'1 + 2
3 + 4'
A $ as a term unquotes (i.e., escapes) the expression afterward; the value of that expression replaces the $ term and expression. The value is normally a syntax object, but except for lists, other kinds of values are coerced to a syntax object. Nested '…' forms are allowed around $ and do not change whether the $ escapes.
'x y z'
'x 3 z'
'«x '3' z»'
The result of the expression after $ can be a list, in which case and the elements of the list are spliced as terms in place of the $ term and expression within the enclosing group. If the result is a syntax object, it can be a single-term syntax object or a group syntax object; in the latter case, the group terms are spliced in place of the escape.
> 'x $[1, 2, 3] z'
'x 1 2 3 z'
> 'x $('1 2 3') z'
'x 1 2 3 z'
Similarly, when an $ escape is alone within its enclosing group, then the result of the expression after $ can be a multi-group syntax object, in which case the group sequence is spliced in place of the escape.
> 'x; $('1; 2 3; 4'); z'
'x
1
2 3
4
z'
A ... as a term must follow a term that includes at least one escape, and each of those escapes must contain a repetition instead of an expression. The preceding term is replaced as many times as the repetition supplies values, where each value is inserted or spliced into the enclosing sequence.
'(1 + 1) (1 + 2) (1 + 3)'
'0 + 1 + 2 + 3'
'0 + 1 + 2 + 3'
Multiple escapes can appear in the term before ..., in which the repetitions are drawn in parallel (assuming that they are at the same repetition depth), repetition ... can be nested around escapes, consecutive ... splice deeper repetitions, and so on, following the normal rules of repetitions.
Quotes work as a repetition to construct multiple syntax objects within another kind of repetition context, such as forming a list. All escapes must then be repetitions, instead of just expressions, and the depth of the repetition is the amount of repetition depth left over from the deepest escape.
> ['[$x, ...]', ...]
['[1, 2, 3]', '[4]', '[5, 6]']
binding operator | |
|
Matches a syntax object consistent with terms. Identifiers and operators are matched symbolically (unrelated to binding), and other atomic terms are matched using == on unwrapped syntax objects.
A $ as a term escapes to a subsequent unquoted binding that is matched against the corresponding portion of a candidate syntax object. A ... in term following a subpattern matches any number of instances of the preceding subpattern, and escapes in the pattern are bound as repetitions. A ... ~nonempty following a subpattern matches one or more instances, instead of zero or more instances. A ... ~once following a subpattern matches zero instances or one instance. Multiple ... can appear within a sequence; when matching is ambiguous, matching prefers earlier ... repetitions to later ones.
['1', '2']
| '($x/1) ...': [x, ...]
['1', '2', '3']
| '$x ... * 3': [x, ...]
['1', '+', '2']
['1', '+', '2']
['3']
| '$x ... ~nonempty $y ... ~nonempty': values([x, ...], [y, ...])
['1', '+', '2', '*']
['3']
['1', ['!'], '3']
['1', [], '3']
Each $ escape is in either a term, group, or multi-group context. A $ escape is in a term context if it is followed by another escape within the same group. A $ escape is a multi-group context when it is alone within its group and when the group is alone within its enclosing group sequence. All other escapes are in a group context. An escape may impose constraints more limiting than its context, such as using Term within an escape in a group context. Escaping to a group pattern in a term context is a syntax error, as is using a multi-group pattern in a group or term context. A sequence escape (such as a use of a syntax class of kind ~sequence) can be used in a term context.
Group: syntax class incompatible with this context
'1 + 2 + 3'
'2 + 3'
A $ or ... as the only term matches each of those literally. To match $ or ... literally within a larger sequence of terms, use $ to escape to a nested pattern, such as $('$'). Simialrly, to match a literal ~nonempty or ~once after a ... repetition, use $('~nonempty') or $('~once').
> match Syntax.literal '1 $ 2'
['1', '2']
To match identifier or operators based on binding instead of symbolically, use $ to escape, and then use bound_as within the escape.
The #%quotes form is implicitly used when '…' is used in a binding position. See also Implicit Forms.
annotation | |
| |
annotation | |
| |
annotation | |
| |
annotation | |
| |
annotation | |
| |
annotation | |
| |
annotation | |
| |
annotation | |
| |
annotation | |
Term matches only a single-term syntax object.
Group matches only a single-group syntax object.
Block matches only a block (which is a single-term syntax object).
TermSequence matches only a single-group syntax object or a multi-group sequence with zero groups.
Identifier matches only an identifier (which is a single-term syntax object).
Operator matches only an operator (which is a single-term syntax object).
Name matches a syntax object that is an identifier, operator, or dotted multi-term group that fits the shape of an op_or_id_name.
IdentifierName matches a syntax object that is an identifier or dotted multi-term group that fits the shape of an id_name.
annotation | |
|
syntax_class ManyThenOne
fun describe(mto :: Syntax.matched_of(ManyThenOne)):
> [mto.a, ...]
['1', '2', '3']
> describe(mto)
"matched ['1', '2', '3'] followed by 4"
a: field is a repetition;
use requires static mode
'4'
Only allowed within a '…' expression form, escapes so that the value of expr is used in place of the $ form.
The expr must be either a single term or a sequence of .-separated identifiers. To escape only an identifier (or .-separated identifier sequence) with an unescaped . afterward, use parentheses around the identifier (or sequence).
binding operator | ||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||
|
Only allowed within a '' binding pattern, escapes to a unquoted binding pattern. Typically, the unquoted pattern has an id that is not bound as a unquote binding operator; the id is then bound to the corresponding portion of the syntax object that matches the '' form.
['1', '2', '3']
An $id escape matches a single term, a group, or a multi-group sequence, depending on its context. It matches a multi-group sequence only when the $id escape is alone within its group and the group is along within a block or '…' form. Otherwise, the escape matches a group only when it is alone within its group. In all other contexts, a $id escape matches a single term. Beware that syntax patterns in macro and similar forms treat certain $id escapes specially.
A _ as a syntax pattern binding matches any input, like an identifier does, but without binding an identifier.
'2'
A parenthesized escape is the same as the escape itself. Parentheses are needed to use the :: operator, since an $ escape must be followed by a single term, and a use of the :: operator consistent of three terms.
'2'
'2'
Empty parentheses as an escape, $(), serve as a group pattern that is only useful as a group tail, where it matches an empty tail. This escape is primarily intended for use with macro-definition forms like macro.
An escape that contains a '…'-quoted term matches the term as a nested syntax-object pattern. In a term context, a multi-term escape is spliced into the enclosing group. One use of a quoted escape is to match a literal $ or ... so that it is not treated as an escape or binding repetition in an enclosing pattern.
'3'
'3'
['1', '2', '3']
The :: operator is used to associate a syntax class with an identifier. See :: for more information.
The &&, ||, and ! operators combine matches. See &&, ||, and ! for more information.
The pattern form is a shorthand for using :: with an inline syntax_class form. See pattern for more information.
Other syntax pattern binding forms can be defined with unquote_bind.macro.
For use within a $ escape within a syntax pattern. See $.
unquote binding | |
|
For use within a $ escape for a nested binding pattern. See $.
A && binds all variables from its arguments, while || and ! bind none of them.
Independent matching for && means that in a term context, combining a variable binding with a splicing multi-term binding will not enable a multi-term splicing match for the variable; instead, the pattern will fail to match a multi-term splice.
A ! can only be used in a term context, negating a term binding.
> a
'(1 2 3)'
> b
'1 2 3'
def: value does not satisfy annotation
value: ’1 2 3 done’
annotation: ’$(b && ’$_ $_ $_’) $end’
unquote binding | |
| |
unquote binding | |
| |
unquote binding | |
The match.cut form prevents backtracking in the case that pattern after the cut fails to match, and instead leads to an immediate match failure, which typically implies an immediate error.
The match.delimit form delimits cuts within stx_bind, causing match failure there to backtrack as allowed outside the match.delimit form.
The match.commit form causes the first found match to stx_bind to be the only considered match, meaning that backtracking will not try alternative matches to stx_bind.
The match.cut form can only appear within a term sequence pattern. When match.cut is used within a pattern in a syntax class, then the syntax class delimits the cut; that is, failure implies a non-match of the syntax class, and not necessarily a failure of a match context using the syntax class. When match.cut appears within !, the ! operator delimits the cut, so that failure counts as success for the ! form.
> match '1 2'
"ok"
> match '1 3'
| '1 3': "does not get here"
match: expected the literal 2
> match '1 3'
| '$(match.delimit '1 $match.cut 2')': "ok"
| '1 3': "else"
"else"
> match '1 1 3'
| '$(match.commit '1 ...') $x': x
'3'
> match '1 1 1'
| '$(match.commit '1 ...') $x': x
match: expected more terms starting with any term
Unquote binding operator for use with $ that binds id for a match to syntax_class.
The syntax_class_ref can be a predefined class such as Term, Identifier, or Group, among others, it can be a class defined with syntax_class, or it can be an parenthesized inline syntax_class form that omits the class name. A class defined with syntax_class may expect arguments, which must be supplied after the syntax class name.
The id before :: refers to the matched input, and it is a repetition if the syntax class has classification ~sequence. The identifier can be combined with . to access fields (if any) of the syntax class. If id is _, then it is not bound.
A block supplied after syntax_class_ref exposes fields of match as directly bound pattern identifier. For each field_id as pattern_id that is supplied, then pattern_id is bound directly to the to the named field’s value. Supplying just an field_id binds using the same identifier. Supplying open is a shorthand for listing every field to bind using its own name, and it cannot appear multiple times or be combined with expose clauses for individual fields.
syntax_class Wrapped:
kind: ~term
| '($content)'
['(2)', '2']
'2'
'2'
'2'
> match '(hello there)'
| '$(whole :: (syntax_class:
kind: ~term
| '($content)'))':
[whole, whole.content]
['(hello there)', 'hello there']
binding operator | ||||||||||||
| ||||||||||||
| ||||||||||||
unquote binding | ||||||||||||
| ||||||||||||
| ||||||||||||
|
When directly used in a binding context, pattern acts as a shorthand for a syntax pattern with the pattern form as the only term.
fun simplify(e):
match e
| '($e)': simplify(e)
| '0 + $e': simplify(e)
| '$e + 0': simplify(e)
| (pattern
match_when same(simplify(b), simplify(c))):
simplify(a)
| (pattern
match_when same(simplify(b), simplify(c))):
simplify(a)
| ~else: e
unquote binding | ||||||||||||||||||||
| ||||||||||||||||||||
| ||||||||||||||||||||
unquote binding | ||||||||||||||||||||
| ||||||||||||||||||||
| ||||||||||||||||||||
| ||||||||||||||||||||
| ||||||||||||||||||||
|
In a match that does not use a particular pattern_case, the pattern variables of that case are bound to either #false or [] by default, the latter when the pattern variable is a repetition. A default clause within a pattern_case can specify a different default; each id_maybe_rep names a variable with its depth in the same way as for field. The expr or body sequence within a default clause has the scope of the enclosing group_option_sequence or term_option_sequence form; it is not in the scope of definitions within the pattern_case body, because it is used for non-matches instead of matches.
A description clause can provide a string that is used when multiple matches are found for the enclosing pattern_case. The string is expected to be a plural noun suitable to replace a generic “options” in an error message.
> def f_2:
list_proc:
~min_args: 2
> f_2(1, 2)
[1, 2]
> f_2(1, 2, 3)
fun: wrong number of arguments in function call
expected: 2
given: 3
> def f_2_3:
list_proc:
~min_args: 2
~max_args: 3
> f_2_3(1, 2, 3)
[1, 2, 3]
> list_proc:
~min_args: 2
~min_args: 3
list_proc: mulitple uses of option not allowed
expression | |
| |
| |
expression | |
|
There’s no difference in result between using '…' or
() after Syntax.literal—
Metadata, such as raw source text, is preserved for the term sequence. Most scopes are also preserved, but the syntax object’s scope sets are pruned to omit the scope for any binding form that appears between the Syntax.literal form and the enclosing top-level context, module body, or phase level crossing, whichever is closer.
> Syntax.literal 'x'
'x'
> Syntax.literal (x)
'x'
> Syntax.literal '1 ... 2'
'1 ... 2'
> Syntax.literal '$ $ $'
'$ $ $'
expression | |
| |
| |
expression | |
| |
| |
expression | |
| |
| |
expression | |
|
expression | |
| |
| |
expression | |
| |
| |
expression | |
| |
| |
expression | |
| |
| |
expression | |
| |
| |
expression | |
|
> Syntax.make(1.0)
'1.0'
> Syntax.make([#'parens, '1.0', '2', '"c"'])
'(1.0, 2, "c")'
> Syntax.make([#'alts, ': result1', ': result2'])
'| result1
| result2'
> Syntax.make(['1.0', '2', '"c"'])
Syntax.make: invalid as a shrubbery term representation
value: [’1.0’, ’2’, ’"c"’]
function | |||
|
> Syntax.make_group([1.0, 2, "c"])
'1.0 2 "c"'
> Syntax.make_group(['if', 'test', [#'alts, ': result1', ': result2']])
'if test
| result1
| result2'
> Syntax.make_group(['1 2'])
Syntax.make_group: invalid as a shrubbery term representation
value: ’1 2’
> Syntax.make_sequence(['1 2 3', 'a b'])
'1 2 3
a b'
> Syntax.make_op(#'#{+})
'+'
function | |||
> Syntax.make_id("hello" +& 7, 'here')
'hello7'
function | |||
|
Unless keep_name is true, the name argument can be any value, and the name of the generated identifier may be derived from name for debugging purposes (especially if it is a string, symbol, or identifier). If keep_name is true, the name argument must be an identifier, symbol, or (readable) string, and the result identifier has exactly the given name.
> Syntax.make_temp_id("hello")
'hello12'
> Syntax.make_temp_id("hello", ~keep_name: #true)
'hello'
> Syntax.unwrap('1.0')
1.0
> Syntax.unwrap('(a, "b", ~c)')
['parens', 'a', '"b"', '~c']
> Syntax.unwrap(': b; c')
['block', 'b', 'c']
> Syntax.unwrap('| a | b')
['alts', ': a', ': b']
> Syntax.unwrap('1 2 3')
Syntax.unwrap: multi-term syntax not allowed in term context
syntax: ’1 2 3’
> Syntax.unwrap_op('+')
#'#{+}
Following the usual coercion conventions, a term syntax object for stx is acceptable as a group syntax object.
> Syntax.unwrap_group('1.0')
['1.0']
> Syntax.unwrap_group('1 2 3')
['1', '2', '3']
> Syntax.unwrap_group('a: b; c')
[
'a',
':
b
c'
]
> Syntax.unwrap_group('1; 2; 3')
Syntax.unwrap_group: multi-group syntax not allowed in group context
syntax:
’1
2
3’
Following the usual coercion conventions, a term or group syntax object for stx is acceptable as a multi-group syntax object.
> Syntax.unwrap_sequence('1.0')
['1.0']
> Syntax.unwrap_sequence('1 2 3')
['1 2 3']
> Syntax.unwrap_sequence('1; 2; 3')
['1', '2', '3']
> Syntax.unwrap_all('(1 + 2)')
[#'parens, [#'group, 1, [#'op, #'#{+}], 2]]
> Syntax.name_to_symbol('apple')
#'apple
> Syntax.name_to_symbol('+')
#'#{+}
> Syntax.name_to_symbol('fruit.apple')
#'#{fruit.apple}
> Syntax.name_to_symbol('fruit.(++)')
#'#{|fruit.(++)|}
method | |||
| |||
| |||
method | |||
Syntax-object metadata exists at both term and group layers, and it exists separately at each layer for a group that contains a single term. The Syntax.relocate method uses and adjusts term-level metadata, while Syntax.relocate_group method uses and adjusts group-level metadata. A group does not have a source location independent of its content, so Syntax.relocate_group does not accept a Srcloc as to.
When a term is a parenthesis, brackets, braces, quotes, block or alternatives form, then metadata is specifically associated with the leading tag in the underlying representation of the form. In the case of a single-term operator, metadata is taken from the operator token, not the op tag. For a group syntax object, metadata is associated group tag in its underlying representation.
See also Syntax.property and Syntax.group_property for accessing or updating specific properties with in metadata.
method | ||||
| ||||
| ||||
method | ||||
| ||||
| ||||
method | ||||
|
The Syntax.relocate_ephemeral_span function accepts any syntax object, which can be a term, group, or multi-group sequence. It attaches metadata to the syntax object in way that may get lost if the syntax object is deconstructed or adjusted in any way. This mode is intended for communicating source information from a macro expansion in the case that it cannot be inferred automatically.
All three functions add an immediate, ephemeral #'relocated syntax property to the result syntax object, which overrides any default automatic relocation, such as by expr.macro.
function | ||||
|
method | ||||
| ||||
| ||||
method | ||||
|
method | ||||
| ||||
method | ||||
method | |||||
|
A raw-text prefix or suffix is preserved in the result only when keep_prefix or keep_suffix is true, respectively. If as_inner is true, then an “inner” prefix or suffix is preserved independent of keep_prefix or keep_suffix; typically, an inner prefix corresponds to @ to start at-expression notation before a term.
method | |||
| |||
| |||
method | |||
| |||
| |||
method | |||
| |||
| |||
method | |||
|
Source is text is represented as a tree built of Pairs, PairList.empty, and strings, where the in-order concatenation of the string forms the source text.