Type Tailoring
posted by Ben Greenman
Type tailoring is a technique for adding domain-specific type checkers to a typed host language. Using the Type Systems as Macros approach to building typed languages, implementing type tailoring in Typed Racket is straightforward. Any library can apply the core idea, and you can try programming with type tailorings by downloading the trivial
package (requires Racket v6.4 or later).
Background: Programming the Type Elaborator
Many typed languages include both a type checker and a type elaborator. The type elaborator translates source code to an explicitly typed representation for the type checker to validate. Normally, programmers cannot extend the behavior of the type elaborator without modifying the compiler.
In Typed Racket, the Racket macro expander serves as a type elaborator. A fully-expanded Typed Racket program is valid input to the type checker. This design makes Typed Racket macro-extensible; programmers can write their own macros to grow the syntax of the language. More on this in a minute.
Recently, Stephen Chang observed that individual macros can implement the same pipeline as Typed Racket — macros can expand their sub-terms and perform static checks on the results. Such macros are essentially type rules, and a well-designed library of such macros is essentially a type system. Hence Type Systems as Macros: a library for building typed, domain-specific languages (DSLs) using the Racket macro system.
Type tailoring is programming a language’s type elaborator to implement domain-specific typing rules on top of an existing type system. Any language that lets programmers extend the behavior of its type elaborator supports type tailoring. In particular, Typed Racket programmers can use the Type Systems as Macros technique to implement tailored APIs as macros. These macros can statically check properties of source code, communicate the results of their checks to other macros, and expand to type annotated code for the Typed Racket type checker.
As we will see, type tailoring can even extend a “conventional” typed language with static checks that would otherwise require dependent types.
Case Study: Regular Expressions
regexp-match
in Racket
Racket’s regular expression library will match a string, path, byte string, or input port against a given pattern.
$ racket > (regexp-match "r" "racket") ; '("r") > (regexp-match "z" "racket") ; #f > (regexp-match "r|z" "racket") ; '("r") > (with-input-from-string "hello world" (λ () (regexp-match "hello (.*)$" (current-input-port)))) ; '(#"hello world" #"world")
The result of matching a pattern against an input source is #f
if no part of the input matched the pattern. Otherwise, the result is a pair containing:
- the subsequence of the input that matched the pattern, and
- a list of subsequences that were captured by parenthesized sub-patterns in the pattern (capture groups).
Note that a capture group may fail to capture any subsequence. If this happens, the list of subsequences contains #f
in the corresponding position.
$ racket > (regexp-match "(r)|(z)" "racket") ; '("r" "r" #f)
regexp-match
in Typed Racket
Typed Racket includes a type signature for regexp-match
that accepts the same variety of input sources, but conservatively rejects some “correct” uses of the output.
$ racket -I typed/racket > (regexp-match "(a*)b" "aaabbb") ; - : (U False (Pairof String (Listof (U False String)))) ; '("aaab" "aaa") > (let ([m (regexp-match "(a*)b" "aaabbb")]) (when m (string-length (second m)))) ; Type Checker: Polymorphic function `second' could not be applied to arguments: ; .... ; Arguments: (Pairof String (Listof (U False String))) ; Expected result: String ; in: (second m)
The issue is that string-length
expects a string, but Typed Racket reasons that the call to second
might return a string or false. One solution is to guard the value (second m)
with a dynamic check. Another is to apply type tailoring to regexp-match
!
Type Tailored regexp-match
We can implement a type-tailored interface to regexp-match
with a macro that:
- checks whether its first argument is a string literal
- counts the number of matched parentheses in the string literal
- expands to code designed to convince Typed Racket that the result is a list with the proper number of arguments.
The third point is crucial, and demonstrates how type tailoring can translate program properties that are implicit in the source code to facts that the type checker can understand.
Here is the macro:
(define-syntax (smart-match stx) (syntax-parse stx [(_ pattern:str other-args ...) (define num-groups (count-groups (syntax-e #`pattern))) #`(let ([m (regexp-match pattern other-args ...)]) (if m (list (car m) #,@(for/list ([i (in-range num-groups)]) #`(or (list-ref m (+ 1 #,i)) (error 'smart-match)))) m))] [(_ args ...) #`(regexp-match args ...)] [_ #`regexp-match]))
Scroll to the bottom of this post for an implementation of
count-groups
.
Quick guide to reading the macro:
syntax-parse
is a pattern-matcher, the three clauses match possible uses ofsmart-match
- #` is a constructor for syntax objects
- the
#,@(for/list ....)
term escapes to Racket’sfor/list
combinator to build a list of syntax objects, then splices the generated syntax into the program — as if the programmer had written out a list withnum-groups
elements
And here is a “client side” use of the macro:
(let ([m (smart-match "(a*)b" "aaabbb")]) (when m (string-length (second m)))) ; 3
It’s exactly the result we want, and one renaming away from the code we want to write.
Improving smart-match
Our smart-match
has a few obvious limitations. First, it reports an error if a capture group fails to capture a string. Second, it fails to tailor calls where the pattern is a byte string or regular expression literal. In the cases, smart-match
defaults to Typed Racket’s regexp-match
. Third, it fails to tailor calls where the pattern is a variable.
The first problem is easy to fix by changing count-groups
to return a list of num-groups
booleans encoding whether each capture group definitely or maybe captures a string when the overall match is successful. The second problem is also just a matter of engineering.
For the third problem, we can use the Type Systems as Macros technique of using the macro expander to replace bound occurrences of a variable with type-annotated variables. The domain-specific type we attach describes the capture groups; for example, given the variable declaration:
(define x #rx"([a-z]+)@gmail\\.com")
the type annotation is '(#t)
because it contains one capture group that is certain to capture a string when used in a successful regexp-match
. This annotation can be attached to occurrences of x
at expansion-time as a syntax property for smart-match
to extract.
The
trivial
package includes a tailoredregexp-match
that addresses all three issues. Note: the fix for the third issue changes the expansion-time meaning ofdefine
to infer and propagate domain-specific types.
Practical Benefits
Now that we’ve thoroughly criticized smart-match
, it’s worth pointing out its strengths.
smart-match
does not re-implement regular expression matching. It re-uses Racket’s existing, correct, and performantregexp-match
.smart-match
supports the existing syntax for regular expressions.smart-match
validates some programs that Typed Racket conservatively rejects.smart-match
rejects some programs that Typed Racket would accept, yet evaluate to runtime errors — for instance,(second (regexp-match "r" "racket"))
.
Lastly, the use of list-ref
exploits its conservative type to improve the conservative type of regexp-match
. Whether you find this use clever or brittle, it is a proof to Typed Racket that does not resort to type casts.
Implementing general type checker plugins that can convince the host type system of their correctness is a hard problem. I know of three high-quality type checker plugins for Haskell, and all three assert their correctness to GHC (
type-nat-solver
,natnormalize
, anduom-plugin
).
Comparison: Dependently Typed Haskell
At this year’s POPL, Stephanie Weirich gave a keynote about The Influence of Dependent Types on the Haskell type system. After a brief introduction, her fifth slide gave motivation for dependently typed Haskell: domain-specific type checkers.
The rest of the talk was a code walk of a dependently typed regular expression matcher! Stephanie’s regular expression syntax supports Python-style named capture groups, and a successful match returns a record with group names as keys. These keys are reflected in the return type. Of course the point of the talk was not regular expressions, but rather how some impressive GHC plugins compose to solve a practical problem.
What I want to point out, however, is that if the goal is “domain-specific type checkers”, type tailoring is a more direct solution. So if you are a programmer using Haskell or OCaml or Racket or Scala or Clojure or Java or any other language with a decent syntax extension system, you don’t need to wait for “dependently typed X
” to add more static checking to your library. Write a macro!
More Type Tailoring
There is a full implementation of type-tailored regexp-match
on the Racket package server. Try it out by installing the trivial-pkg package:
$ raco pkg install trivial
The package also includes tailorings for:
- format strings
- fixed-length lists and vectors
- N-ary functions
- constant-folding arithmetic
as well as syntax for defining new tailorings.
For further reading on type tailoring, and a sketch of how to prove the soundness of a tailoring, we have a draft paper:
The draft also reports on a small evaluation of our tailored regexp-match
on existing code. We searched the Racket distribution and libraries for uses of regexp-match
and found over 300 in untyped code. We then converted these uses to Typed Racket. After converting, most uses did not compile using Typed Racket’s regexp-match
, but swapping in the tailored regexp-match
(a 1-line change) removed the type errors in over 250 cases.
Appendix: count-groups
Here is a simple implementation for count-groups
. This file contains a more robust implementation (look for parse-groups
).
(define-for-syntax (count-groups str) (define last-open-paren (box #f)) (define num-groups (for/fold ([num-groups 0]) ([c (in-string str)] [i (in-naturals)]) (case c [(#\() (set-box! last-open-paren i) num-groups] [(#\)) (if (unbox last-open-paren) (begin (set-box! last-open-paren #f) (+ 1 num-groups)) (error 'smart-match "unmatched ')' at position ~a in '~a'" i str))] [else num-groups]))) (if (unbox last-open-paren) (error 'smart-match "unmatched '(' at position ~a in '~a'" (unbox last-open-paren) str) num-groups))