posted by Ben Greenman
Type tailoring is a technique for adding domain-specific type checkers to a typed host language. Using the Type Systems as Macros approach to building typed languages, implementing type tailoring in Typed Racket is straightforward. Any library can apply the core idea, and you can try programming with type tailorings by downloading the
trivial package (requires Racket v6.4 or later).
Background: Programming the Type Elaborator
Many typed languages include both a type checker and a type elaborator. The type elaborator translates source code to an explicitly typed representation for the type checker to validate. Normally, programmers cannot extend the behavior of the type elaborator without modifying the compiler.
In Typed Racket, the Racket macro expander serves as a type elaborator. A fully-expanded Typed Racket program is valid input to the type checker. This design makes Typed Racket macro-extensible; programmers can write their own macros to grow the syntax of the language. More on this in a minute.
Recently, Stephen Chang observed that individual macros can implement the same pipeline as Typed Racket — macros can expand their sub-terms and perform static checks on the results. Such macros are essentially type rules, and a well-designed library of such macros is essentially a type system. Hence Type Systems as Macros: a library for building typed, domain-specific languages (DSLs) using the Racket macro system.
Type tailoring is programming a language’s type elaborator to implement domain-specific typing rules on top of an existing type system. Any language that lets programmers extend the behavior of its type elaborator supports type tailoring. In particular, Typed Racket programmers can use the Type Systems as Macros technique to implement tailored APIs as macros. These macros can statically check properties of source code, communicate the results of their checks to other macros, and expand to type annotated code for the Typed Racket type checker.
As we will see, type tailoring can even extend a “conventional” typed language with static checks that would otherwise require dependent types.
Case Study: Regular Expressions
regexp-match in Racket
Racket’s regular expression library will match a string, path, byte string, or input port against a given pattern.
The result of matching a pattern against an input source is
#f if no part of the input matched the pattern. Otherwise, the result is a pair containing:
- the subsequence of the input that matched the pattern, and
- a list of subsequences that were captured by parenthesized sub-patterns in the pattern (capture groups).
Note that a capture group may fail to capture any subsequence. If this happens, the list of subsequences contains
#f in the corresponding position.
regexp-match in Typed Racket
Typed Racket includes a type signature for
regexp-match that accepts the same variety of input sources, but conservatively rejects some “correct” uses of the output.
$ racket -I typed/racket > (regexp-match "(a*)b" "aaabbb") ; - : (U False (Pairof String (Listof (U False String)))) ; '("aaab" "aaa") > (let ([m (regexp-match "(a*)b" "aaabbb")]) (when m (string-length (second m)))) ; Type Checker: Polymorphic function `second' could not be applied to arguments: ; .... ; Arguments: (Pairof String (Listof (U False String))) ; Expected result: String ; in: (second m)
The issue is that
string-length expects a string, but Typed Racket reasons that the call to
second might return a string or false. One solution is to guard the value
(second m) with a dynamic check. Another is to apply type tailoring to
We can implement a type-tailored interface to
regexp-match with a macro that:
- checks whether its first argument is a string literal
- counts the number of matched parentheses in the string literal
- expands to code designed to convince Typed Racket that the result is a list with the proper number of arguments.
The third point is crucial, and demonstrates how type tailoring can translate program properties that are implicit in the source code to facts that the type checker can understand.
Here is the macro:
(define-syntax (smart-match stx) (syntax-parse stx [(_ pattern:str other-args ...) (define num-groups (count-groups (syntax-e #`pattern))) #`(let ([m (regexp-match pattern other-args ...)]) (if m (list (car m) #,@(for/list ([i (in-range num-groups)]) #`(or (list-ref m (+ 1 #,i)) (error 'smart-match)))) m))] [(_ args ...) #`(regexp-match args ...)] [_ #`regexp-match]))
Scroll to the bottom of this post for an implementation of
Quick guide to reading the macro:
syntax-parseis a pattern-matcher, the three clauses match possible uses of
- #` is a constructor for syntax objects
#,@(for/list ....)term escapes to Racket’s
for/listcombinator to build a list of syntax objects, then splices the generated syntax into the program — as if the programmer had written out a list with
And here is a “client side” use of the macro:
It’s exactly the result we want, and one renaming away from the code we want to write.
smart-match has a few obvious limitations. First, it reports an error if a capture group fails to capture a string. Second, it fails to tailor calls where the pattern is a byte string or regular expression literal. In the cases,
smart-match defaults to Typed Racket’s
regexp-match. Third, it fails to tailor calls where the pattern is a variable.
The first problem is easy to fix by changing
count-groups to return a list of
num-groups booleans encoding whether each capture group definitely or maybe captures a string when the overall match is successful. The second problem is also just a matter of engineering.
For the third problem, we can use the Type Systems as Macros technique of using the macro expander to replace bound occurrences of a variable with type-annotated variables. The domain-specific type we attach describes the capture groups; for example, given the variable declaration:
(define x #rx"([a-z]+)@gmail\\.com")
the type annotation is
'(#t) because it contains one capture group that is certain to capture a string when used in a successful
regexp-match. This annotation can be attached to occurrences of
x at expansion-time as a syntax property for
smart-match to extract.
trivialpackage includes a tailored
regexp-matchthat addresses all three issues. Note: the fix for the third issue changes the expansion-time meaning of
defineto infer and propagate domain-specific types.
Now that we’ve thoroughly criticized
smart-match, it’s worth pointing out its strengths.
smart-matchdoes not re-implement regular expression matching. It re-uses Racket’s existing, correct, and performant
smart-matchsupports the existing syntax for regular expressions.
smart-matchvalidates some programs that Typed Racket conservatively rejects.
smart-matchrejects some programs that Typed Racket would accept, yet evaluate to runtime errors — for instance,
(second (regexp-match "r" "racket")).
Lastly, the use of
list-ref exploits its conservative type to improve the conservative type of
regexp-match. Whether you find this use clever or brittle, it is a proof to Typed Racket that does not resort to type casts.
Implementing general type checker plugins that can convince the host type system of their correctness is a hard problem. I know of three high-quality type checker plugins for Haskell, and all three assert their correctness to GHC (
Comparison: Dependently Typed Haskell
At this year’s POPL, Stephanie Weirich gave a keynote about The Influence of Dependent Types on the Haskell type system. After a brief introduction, her fifth slide gave motivation for dependently typed Haskell: domain-specific type checkers.
The rest of the talk was a code walk of a dependently typed regular expression matcher! Stephanie’s regular expression syntax supports Python-style named capture groups, and a successful match returns a record with group names as keys. These keys are reflected in the return type. Of course the point of the talk was not regular expressions, but rather how some impressive GHC plugins compose to solve a practical problem.
What I want to point out, however, is that if the goal is “domain-specific type checkers”, type tailoring is a more direct solution. So if you are a programmer using Haskell or OCaml or Racket or Scala or Clojure or Java or any other language with a decent syntax extension system, you don’t need to wait for “dependently typed
X” to add more static checking to your library. Write a macro!
More Type Tailoring
There is a full implementation of type-tailored
regexp-match on the Racket package server. Try it out by installing the trivial-pkg package:
$ raco pkg install trivial
The package also includes tailorings for:
as well as syntax for defining new tailorings.
For further reading on type tailoring, and a sketch of how to prove the soundness of a tailoring, we have a draft paper:
The draft also reports on a small evaluation of our tailored
regexp-match on existing code. We searched the Racket distribution and libraries for uses of
regexp-match and found over 300 in untyped code. We then converted these uses to Typed Racket. After converting, most uses did not compile using Typed Racket’s
regexp-match, but swapping in the tailored
regexp-match (a 1-line change) removed the type errors in over 250 cases.
Here is a simple implementation for
count-groups. This file contains a more robust implementation (look for
(define-for-syntax (count-groups str) (define last-open-paren (box #f)) (define num-groups (for/fold ([num-groups 0]) ([c (in-string str)] [i (in-naturals)]) (case c [(#\() (set-box! last-open-paren i) num-groups] [(#\)) (if (unbox last-open-paren) (begin (set-box! last-open-paren #f) (+ 1 num-groups)) (error 'smart-match "unmatched ')' at position ~a in '~a'" i str))] [else num-groups]))) (if (unbox last-open-paren) (error 'smart-match "unmatched '(' at position ~a in '~a'" (unbox last-open-paren) str) num-groups))