#lang rhombus/scribble/manual
@(import:
    "common.rhm" open
    "../macro.rhm")

@(def bind_eval = macro.make_macro_eval())

@title(~tag: "bind-macro-protocol"){Binding Low-Level Protocol}

Binding macros are typically written by expansion to existing forms as
described in @secref("bind-macro"), but @rhombus(bind.macro) form also
supports a low-level protocol. A macro opts into the low-level protocol
by returning a result build with @rhombus(bind_meta.pack). To support
patterns, repetitions, annotations with nested expressions, static
information, @rhombus(let) versus @rhombus(def), and more, the low-level
binding protocol is considerably more complicated than other Rhombus
extension contexts.

A binding form using the low-level protocol has five parts:

@itemlist(

 @item{A compile-time function to report ``upward'' @tech{static
  information} about the variables that it binds. The function receives
 ``downward'' information provided by the context, such as static
 information inferred for the right-hand side of a @rhombus(def) binding
 or imposed by enclosing binding forms. If a binding form has subforms,
 it can query those subforms, pushing its down information ``downward''
 and receiving the subform information ``upward''.}

 @item{A compile-time ``oncer'' function that generates definitions to
 be evaluated once each time the binding form is reached. Typically,
 one-time definitions wrap an expression within a binding form, such as
 expressions in @rhombus(Any.of, ~annot) associated to a binding with the
 @rhombus(::, ~bind) operator; see @secref("annotation-satisfying") for
 more discussion. The matcher step (described next) can directly refer to
 definitions generated by the oncer step.}

 @item{A compile-time ``matcher'' function that generates code that
 checks the input value for a match. The generated block may include
 additional definitions before branching or after branching toward
 success, but normally no bindings visible to code around the binding
 should be created at this step. Also, any action that commits to a
 binding match should be generated by the second or third compile-time
 functions (in the next bullets). A matcher communicates to the second or
 third function through identifiers that are explicitly designated as
 evidence for the match.}

 @item{A compile-time ``committer'' function that generates intermediate
 definitions, possibly using evidence recorded by the matcher. These
 definitions happen only after the match is
 successful, if at all, and the bindings are visible only after the
 matching part of the expansion. These bindings are not affected by
 @rhombus(let), however, and so they should not include user-supplied
 names. Often, no intermediate definitions are needed, but they can be
 useful if some value needs to be computed once and referenced my
 multiple user-visible bindings (which does not work for definitions
 generated in the fourth function of the bindings are scoped to only
 later definition by @rhombus(let)). When a binding form is used in a
 match-only position, such as within a @rhombus(matching) annotation on
 the right-hand side of @rhombus(is_a), then the committer function is
 not used.}

 @item{A compile-time ``binder'' function that generates definitions for
 the bound variables (i.e., the ones described by the function in the
 first bullet above), possibly using evidence values from the matcher,
 and possibly using bindings created by the committer. These definitions
 happen only after the match is
 successful and after intermediate bindings that need a successful match.
 These bindings are the ones that are affected by @rhombus(let). The
 generated definitions do not need to attach static information reported
 by the first bullet's function; that information will be attached by the
 definition form that drives the expansion of binding forms. When a
 binding form is used in a match-only position, then the binder function
 is not used.}

)

The first of these functions, which produces static information, is
always called first, and its results might be useful to the last
four. Operationally, parsing a binding form gets the first function,
and then that function reports the other four along with the static
information that it computes.

To make binding work both in definition contexts and @rhombus(match)
search contexts, the check-generating function (third bullet above)
must be parameterized over the handling of branches. Toward that end, it
receives three extra arguments: the name of an @rhombus(if)-like form
that we'll call @rhombus(IF), a @rhombus(success) form, and a
@rhombus(failure) form. The transformer uses the given @rhombus(IF) to
branch to a block that includes @rhombus(success) or just
@rhombus(failure). The @rhombus(IF) form must be used in tail position
with respect to the generated code, where the ``then'' part of an
@rhombus(IF) is still in tail position for nesting. The transformer must
use @rhombus(failure) and only @rhombus(failure) in the ``else'' part of
each @rhombus(IF), and it must use @rhombus(success) exactly once within
a ``then'' branch of one or more nested @rhombus(IF)s.

Unfortunately, there's one more complication. The result of a macro
must be represented as syntax---even a binding macro---and a functions as
a first-class compile-time value should not be used as syntax. (Such
representation are sometimes called ``3-D syntax,'' and they're best
avoided.) So, a low-level binding macro must uses a defunctionalized
representation of functions. That is, a parsed binding reports a
function name for its static-information compile-time function, plus
data to be passed to that function, and those two parts form a
closure. Among the static-information function's returns are names for
a once-generator, match-generator, commit-generator, and binding-generator function,
plus data to be passed to those functions (effectively: the fused
closure for those four functions).

In full detail, a low-level parsed binding result from @rhombus(bind.macro)
transformer is represented as a syntax object with two parts:

@itemlist(

 @item{The name of a compile-time function that is bound with
  @rhombus(bind.infoer).}

 @item{Data for the @rhombus(bind.infoer)-defined function, packaged as
  a single syntax object. This data might contain parsed versions of other
  binding forms, for example.}

)

These two pieces are assembled into a parenthesized-tuple syntax object,
and then packed with the @rhombus(bind_meta.pack) function to turn it into
a valid binding expansion (to distinguish the result from a macro
expansion in the sense of producing another binding form).

The function bound with @rhombus(bind.infoer) will receive two syntax
objects: a representation of ``downward'' static information and the
parsed binding's data. The result must be a single-object tuple with
the following parts:

@itemlist(

 @item{A string that is used for reporting a failed match. The string is
  used as an annotation, and it should omit information that is local to
  the binding. For example, when @rhombus(List.cons(x, y), ~bind) is used as a binding
  pattern, a suitable annotation string might be
  @rhombus("matching(List.cons(_, _))") to phrase the binding constraint as an
  annotation and omit local variable names being bound (which should not
  be reported to the caller of a function, for example, when an argument
  value in a call of the function fails to match).}

 @item{An identifier that is used as a name for the input value, at least
   to the degree that the input value uses an inferred name. For
   example, @rhombus(proc) as a binding form should cause its right-hand value
   to use the inferred name @rhombus(proc), if it can make any use of an
   inferred name.}

 @item{``Upward'' static information associated with the overall value for a
   successful match with the binding. This information is used by the
   @rhombus(matching) annotation operator, for example, as well as propagated
   outward by binding forms that correspond to composite data types.
   The information is independent of static information for individual
   names within the binding, but it should be the same as information
   for any binding that corresponds to the full matched value. For
   example, @rhombus(Posn(x, y)) binds @rhombus(x) and @rhombus(y), and it may not have any
   particular static information for @rhombus(x) and @rhombus(y), but a matching value
   has static information suitable for @rhombus(Posn), anyway; so, using
   @rhombus(p :: matching(Posn(_, _)), ~bind) makes @rhombus(p) have @rhombus(Posn)
   static information.}

 @item{A list of individual names that are bound by the overall binding,
   a description of valid uses for each name (e.g., as an expression, as
   a repetition of some depth), and ``upward'' static information for each name.
   For example, @rhombus(Posn(x, y)) as a binding pattern binds @rhombus(x) and @rhombus(y)
   as identifiers that can be used as expressions, and static information
   about @rhombus(x) and @rhombus(y) might come from annotations in the
   definition of @rhombus(Posn).
   The final transformer function described in the third bullet above
   is responsible for actually binding each name and associating
   static information with it, but this summary of binding enables cooperation
   with composite binding forms, so that those that create repetitions.}

 @item{The name of a compile-time oncer function that is bound with
   @rhombus(bind.oncer).}

 @item{The name of a compile-time matcher function that is bound with
   @rhombus(bind.matcher).}

 @item{A tree of evidence identifiers that are defined by the matcher
   and whose values should be accessible to the committer and binder
   functions. In common cases, these identifiers are provided back to the
   committer and binder functions unchanged; in other cases, the committer
   and binder are not used in the scope of matcher definitions, but the
   values of the evidence variables are transferred to other variables that
   are reported to the committer and binder for their use, instead.}

 @item{The name of a compile-time committer function that is bound with
   @rhombus(bind.committer).}

 @item{The name of a compile-time binder function that is bound with
   @rhombus(bind.binder).}

 @item{Data for the @rhombus(bind.matcher)- and
  @rhombus(bind.binder)-defined functions, packaged as a single syntax
  object.}

)

The functions bound with @rhombus(bind.oncer), @rhombus(bind.matcher),
@rhombus(bind.committer), and @rhombus(bind.binder) are called
with the syntax-object data from the sixth tuple slot.
Before that argument, except for @rhombus(bind.oncer), the functions
also receive an identifier for the matcher's input.
The match-building transformer in addition
receives the @rhombus(IF) form name, a @rhombus(success) form, and a @rhombus(failure) form.
The committer and binder in between receive a tree of evidence identifiers,
possibly the ones directly bound by the matcher, or possibly replacement
identifiers (in the same shape as the original tree).

Here's a use of the low-level protocol to implement a @rhombus(fruit) pattern,
which matches only things that are fruits according to @rhombus(is_fruit):

@examples(
  ~eval: bind_eval
  ~defn:
    import:
      rhombus/meta open

    bind.macro 'fruit($id)':
      bind_meta.pack('(fruit_infoer,
                       // remember the id:
                       $id)')

    bind.infoer 'fruit_infoer($static_info, $id)':
      '("matching(fruit(_))",
        $id,
        // no overall static info:
        (),
        // `id` is bound,
        //  `~repet ()` means usable as repetition, and
        //  `()` means no static info:
        (($id, [~repet ()], ())),
        fruit_oncer,
        fruit_matcher,
        (), // no evidence needed
        fruit_committer,
        fruit_binder,
        // binder needs id:
        $id)'

    bind.oncer 'fruit_oncer($id)':
      ''

    bind.matcher 'fruit_matcher($arg, $id, $IF, $success, $failure)':
      '$IF is_fruit($arg)
       | $success
       | $failure'

    bind.committer 'fruit_committer($arg, (), $id)':
      ''

    bind.binder 'fruit_binder($arg, (), $id)':
      'def $id: $arg'

    fun is_fruit(v):
      v == "apple" || v == "banana"

  ~repl:
    def fruit(snack) = "apple"
    snack
    ~error:
      def fruit(dessert) = "cookie"
)

The @rhombus(fruit) binding form assumes (without directly checking)
that its argument is an identifier, and its infoer discards static
information. Binding forms normally need to accommodate other, nested
binding forms, instead. A @rhombus(bind.macro) transformer with
can receive already-parsed sub-bindings as
arguments, and the infoer function can use @rhombus(bind_meta.get_info) on
a parsed binding form to call its internal infoer function. The result
is packed static information, which can be unpacked into a tuple syntax
object with @rhombus(bind_meta.unpack_info). Normally,
@rhombus(bind_meta.get_info) should be called only once to avoid
exponential work with nested bindings, but @rhombus(bind_meta.unpack_info)
can used any number of times.

As an example, here's an infix @rhombus(<&>) operator that is similar
to @rhombus(&&, ~bind). It takes two
bindings and makes sure a value can be matched to both. The binding
forms on either size of @rhombus(<&>) can bind variables. The
@rhombus(<&>) builder is responsible for binding the input name that
each sub-binding expects before it deploys the corresponding builder.
The only way to find out if a sub-binding matches is to call its
builder, providing the same @rhombus(IF) and @rhombus(failure) that the
original builder was given, and possibly extending the @rhombus(success)
form. A builder must be used in tail position, and it's
@rhombus(success) position is a tail position.

@examples(
  ~eval: bind_eval
  ~defn:
    bind.macro '$a <&> $b':
      bind_meta.pack('(anding_infoer,
                       ($a, $b))')

    bind.infoer 'anding_infoer($static_info, ($a, $b))':
      let a_info = bind_meta.get_info(a, static_info)
      let b_info = bind_meta.get_info(b, static_info)
      def '($a_ann, $a_name, ($a_s_info, ...), ($a_var_info, ...),
            $_, $_, $a_evidence, $_, $_, $_)':
        bind_meta.unpack_info(a_info)
      let '($b_ann, $b_name, ($b_s_info, ...), ($b_var_info, ...),
            $_, $_, $b_evidence, $_, $_, $_)':
        bind_meta.unpack_info(b_info)
      let ann:
        "and("
          +& Syntax.unwrap(a_ann) +& ", " +& Syntax.unwrap(b_ann)
          +& ")"
      '($ann,
        $a_name,
        ($a_s_info, ..., $b_s_info, ...),
        ($a_var_info, ..., $b_var_info, ...),
        anding_oncer,
        anding_matcher,
        ($a_evidence, $b_evidence),
        anding_committer,
        anding_binder,
        ($a_info, $b_info))'

    bind.oncer 'anding_oncer(($a_info, $b_info))':
      let '($_, $_, $_, $_,
            $a_oncer, $_, $_, $_, $_, $a_data)':
        bind_meta.unpack_info(a_info)
      let '($_, $_, $_, $_,
            $b_oncer, $_, $_, $_, $_, $b_data)':
        bind_meta.unpack_info(b_info)
      '$a_oncer($a_data)
       $b_oncer($b_data)'

    bind.matcher 'anding_matcher($in_id, ($a_info, $b_info),
                                 $IF, $success, $failure)':
      let '($_, $_, $_, $_,
            $_, $a_matcher, $_, $_, $_, $a_data)':
        bind_meta.unpack_info(a_info)
      let '($_, $_, $_, $_,
            $_, $b_matcher, $_, $_, $_, $b_data)':
        bind_meta.unpack_info(b_info)
      '$a_matcher($in_id, $a_data, $IF,
                  $b_matcher($in_id, $b_data, $IF, $success, $failure),
                  $failure)'

    bind.committer 'anding_committer($in_id,
                                     ($a_evidence, $b_evidence),
                                     ($a_info, $b_info))':
      let '($_, $_, $_, $_,
            $_, $_, $_, $a_committer, $_, $a_data)':
        bind_meta.unpack_info(a_info)
      let '($_, $_, $_, $_,
            $_, $_, $_, $b_committer, $_, $b_data)':
        bind_meta.unpack_info(b_info)
      '$a_committer($in_id, $a_evidence, $a_data)
       $b_committer($in_id, $b_evidence, $b_data)'

    bind.binder 'anding_binder($in_id,
                               ($a_evidence, $b_evidence),
                               ($a_info, $b_info))':
      let '($_, $_, $_, $_,
            $_, $_, $_, $_, $a_binder, $a_data)':
        bind_meta.unpack_info(a_info)
      let '($_, $_, $_, $_,
            $_, $_, $_, $_, $b_binder, $b_data)':
        bind_meta.unpack_info(b_info)
      '$a_binder($in_id, $a_evidence, $a_data)
       $b_binder($in_id, $b_evidence, $b_data)'
  ~repl:
    def one <&> 1 = 1
    one
    ~error:
      def two <&> 1 = 2
  ~defn:
    class Posn(x, y)
  ~repl:
    def Posn(0, y) <&> Posn(x, 1) = Posn(0, 1)
    x
    y
)

One subtlety here is the syntactic category of @rhombus(IF) for a builder
call. The @rhombus(IF) form might be a definition form, or it might be
an expression form, and a builder is expected to work in either case, so
a builder call's category is the same as @rhombus(IF). An @rhombus(IF)
alternative is written as a block, as is a @rhombus(success) form, but
the block may be inlined into a definition context.

The @rhombus(<&>) infoer is able to just combine any names and
``upward'' static information that receives from its argument bindings,
and it can simply propagate ``downward'' static information. When a
binding operator reflects a composite value with separate binding forms
for component values, then upward and downward information needs to be
adjusted accordingly.


@(close_eval(bind_eval))
