Ben Greenman's Publications

A Misconception-Driven Adaptive Tutor for Linear Temporal Logic (CAV 2025)

Siddhartha Prasad, Ben Greenman, Tim Nelson, and Shriram Krishnamurthi
[pdf] [bibtex] [slides] [doi] [artifact] [blog post]

(tool paper) We have built an automated, adaptive tutoring system that is grounded in several years of research on misconceptions in Linear Temporal Logic (LTL). The tutor is free to use online:
https://ltl-tutor.xyz/

Lightweight Diagramming for Lightweight Formal Methods (ECOOP 2025)

Siddhartha Prasad, Ben Greenman, Tim Nelson, and Shriram Krishnamurthi
[pdf] [CnD] [bibtex] [doi] [slides] [artifact]

Model finders such as Alloy and Forge discover instances of a datatype specification and counterexamples to conjectures. Instances can help users debug their designs, but only if they are properly understood. This paper introduces a visualizer called CnD that avoids misleading diagrams by default and offers a low-code way to communicate domain-specific constraints. As an interactive tool for incrementally refining default graphs, CnD helps shed light on ``unknown unknowns'' in the early stages of a specification design.

Toward a Corpus Study of the Dynamic Gradual Type (HATRA 2024)

Dibri Nsofor and Ben Greenman
[pdf] [bibtex] [slides] [blog post]

We are seeking the reason behind dynamic (any) types across thousands of Python projects. This paper begins with two hundred sample projects and presents: the design of a corpus study, eight patterns of dynamic use, and plans to identify the patterns automatically.

Type Tailoring (ECOOP 2024)

Ashton Wiersdorf, Stephen Chang, Matthias Felleisen, and Ben Greenman
[pdf] [bibtex] [doi] [artifact] [blog post]

Type systems evolve too slowly to keep up with evolving libraries. Type tailoring offers a lightweight solution: let end users modify the elaboration of surface code into the internal language of the typechecker. This paper introduces type tailoring as a theme that has emerged across programming languages, it analyzes the core technical dimensions that make tailoring work, and it outlines a vision for first-class language support.

Misconceptions in Finite-Trace and Infinite-Trace Linear Temporal Logic (FM 2024)

Ben Greenman, Siddhartha Prasad, Antonio Di Stasio, Shufang Zhu, Giuseppe De Giacomo, Shriram Krishnamurthi, Marco Montali, Tim Nelson, and Milda Zizyte
[pdf] [bibtex] [doi] [artifact] [google drive] [slides] [img] [blog post]
Instruments on Qualtrics: [beginner] [LTL] [LTLf]
Instrument PDFs: [beginner] [LTL] [LTLf]

With the growing use of temporal logics, it is critical that users have a clear understanding of what a specification means. We introduce two test instruments, a dataset of over 3,000 responses, and a refined catalog of misconceptions that cover both infinite-trace LTL and finite-trace LTLf. The catalog is of interest to formula authors and maintainers, and to tool builders, who can use it to identify mistakes. The test instruments are suitable for classroom teaching or self-study.

Forge: A Tool and Language for Teaching Formal Methods (OOPSLA 2024)

Tim Nelson, Ben Greenman, Siddhartha Prasad, Tristan Dyer, Ethan Bove, Qianfan Chen, Charles Cutting, Thomas Del Vecchio, Sidney LeVine, Julianne Rudner, Ben Ryjikov, Alexander Varga, Andrew Wagner, Luke West, and Shriram Krishnamurthi
[pdf] [bibtex] [doi] [blog post]

Forge brings lightweight formal methods to experienced programmers. It is deeply inspired by Alloy and has been refined through four semesters of teaching Forge. The key additions are a gradual progression of languages, a toolkit for domain-specific visualizations, and a form of unit testing for logical predicates.

Privacy-Respecting Type Error Telemetry at Scale (Programming 8.3, 2024)

Ben Greenman, Alan Jeffrey, Mitesh Shah, and Shriram Krishnamurthi
[pdf] [slides] [bibtex] [doi] [blog post]

Lessons on the usability and gradual adoption of the Luau type system derived from months of telemetry data. The telemetry does not reveal any source code or precise error outputs, which makes it useful for other programming systems but difficult to analyze.

Conceptual Mutation Testing for Student Programming Misconceptions (Programming 8.2, 2024)

Siddhartha Prasad, Ben Greenman, Tim Nelson, and Shriram Krishnamurthi
[pdf] [slides] [bibtex] [doi] [blog post]

How can instructors come up with buggy solutions to a problem that match common student misconceptions? This paper introduces a semi-auto, semi-manual process for distilling misconceptions from actual solution attempts by students. A key idea is to cluster attempts by specific problem characteristics.

Generating Programs Trivially: Student Use of Large Language Models (CompEd 2023)

Siddhartha Prasad, Ben Greenman, Tim Nelson, and Shriram Krishnamurthi
[pdf] [bibtex] [doi] [blog post]

We gave students full, in-IDE access to GPT-3 during a course on system specification. Did GPT make the course trivial? No. In fact, students rarely used the extension. They preferred to do their own work (!). Our VSCode extension is available [link]. We hope other educators try similar experiments so that we can all learn how to best use LLMs to improve classroom learning.

How Profilers Can Help Navigate Type Migration (OOPSLA 2023)

Ben Greenman, Matthias Felleisen, and Christos Dimoulas
[pdf] [bibtex] [artifact] [v2 slides] [v1 slides] [code] [youtube]

Can off-the-shelf profiling tools lead the way from a slow gradually-typed program to a faster one, in a 3D type migration lattice? For now, probably not! We explored over 1 million configurations and the best-case strategy succeeded in only 50% of scenarios and only when allowing two performance regressions. On the bright side, our large-scale simulation experiment shows that the rational programmer method is useful beyond contracts and error messages. Special thanks to Cloudlab for free single-user machines to collect data on.

Rhombus: A New Spin on Macros without All the Parentheses (OOPSLA 2023)

Matthew Flatt, Taylor Allred, Nia Angle, Stephen De Gabrielle, Robert Bruce Findler, Jack Firth, Kiran Gopinathan, Ben Greenman, Siddhartha Kasivajhula, Alex Knauth, Jay McCarthy, Sam Phillips, Sorawee Porncharoenwase, Jens Axel Søgaard, and Sam Tobin-Hochstadt,
[pdf] [bibtex] [artifact]

Rhombus brings language-oriented programming out of the Lisp world and into a normal (Python-like) syntax. The key features that make it possible are: Shrubbery notation, multiple spaces for expansion, unified pattern languages, and a protocol for static information.

How to Evaluate Blame for Gradual Types, Part 2 (ICFP 2023)

Lukas Lazarek, Ben Greenman, Matthias Felleisen, and Christos Dimoulas
[pdf] [bibtex]

Is blame helpful when the code is correct but the types are wrong?

FlowFPX: Nimble Tools for Debugging Floating-Point Exceptions (JuliaCon 2023)

Taylor Allred, Xinyi Li, Ashton Wiersdorf, Ben Greenman, and Ganesh Gopalakrishnan
[pdf] [arxiv] [slides] [bibtex] [doi] [youtube] [website]

Reliable numerical computations are central to scientific computing, but the floating-point arithmetic that makes computations run quickly is unreliable. What to do when a NaN appears? Call FlowFPX: a toolkit for tracking, fuzzing, and visualizing exception flows.

Photo from JuliaCon 2023 in Cambridge, MA.

GTP Benchmarks for Gradual Typing Performance (ACM REP 2023)

Ben Greenman
[pdf] [html] [slides] [bibtex] [artifact]

The GTP Benchmark Suite combines practical programs, a domain-specific experimental method, and software for reproducibility. Since 2014 the benchmarks have supported research in gradual typing performance and related topics including type migration and debugging.

Typed--Untyped Interactions: A Comparative Analysis (TOPLAS 2023)

Ben Greenman, Christos Dimoulas, and Matthias Felleisen
[pdf] [bibtex] [supplement]

The literature presents many strategies for enforcing the integrity of types when typed code interacts with untyped code. This paper presents a uniform evaluation framework that characterizes the differences among some major existing semantics for typed--untyped interaction. Type system designers can use this framework to analyze the guarantees of their own dynamic semantics.

Little Tricky Logic: Misconceptions in the Understanding of LTL (Programming 7.2, 2023)

Ben Greenman, (S)am Saarinen, Tim Nelson, and Shriram Krishnamurthi
[pdf] [slides] [bibtex] [artifact] [google sheet] [blog post] [quick survey instruments]

It may be folk knowledge that LTL is tricky, but is it really, in what ways, and what can we do about it? This paper offers initial answers based on four survey rounds; in particular, a code book of misconceptions and three test instruments.

Gradual Soundness: Lessons from Static Python (Programming 7.1, 2023)

Kuang-Chen Lu, Ben Greenman, Carl Meyer, Dino Viehland, Aniket Panse, and Shriram Krishnamurthi
[pdf] [slides] [bibtex] [artifact] [src] [blog post] [SH] [youtube]

Static Python is a sound gradual type system for Python by Instagram that can quickly decide whether to allow an untyped value into typed code. How did they do it? With a carefully-designed type system that incorporates Shallow, Concrete, and Primitive types to allow a tradeoff between strong type guarantees and easy gradual migrations.

This collaboration between Brown and Meta started from an X post by Guido. Big thanks to Guillaume Duboc for rediscovering the post in 2024!

Making Hay from Wheats: A Classsourcing Method to Identify Misconceptions (Koli Calling 2022)

Siddhartha Prasad, Ben Greenman, Tim Nelson, John Wrenn, and Shriram Krishnamurthi
[pdf] [talk] [bibtex] [doi] [blog post] [art]

Idea: collecting input/output examples from students and analyzing the incorrect examples can help us teach better. We use reference implementations called wheats to classify examples as incorrect. The classification is the starting point for our method, hence we're making a good thing from wheats. Hay!

The art is by MidJourney AI. Thanks Koli Calling! Love the 6-fingered dude.

Deep and Shallow Types for Gradual Languages (PLDI 2022)

Ben Greenman
[pdf] [supplement] [slides] [bibtex] [src] [SH]

Deep gradual types offer compositional guarantees but depend on expensive higher-order contracts. Shallow types enforce only local properties, but can be implemented with first-order checks. This paper presents a language design that supports both deep and shallow types to utilize their complementary strengths. Coming soon to Typed Racket: https://github.com/racket/typed-racket/pull/948

Types for Tables: A Language Design Benchmark (Programming 6.2, 2022)

Kuang-Chen Lu, Ben Greenman, and Shriram Krishnamurthi
[pdf] [supplement] [bibtex] [src]

The Brown Benchmark for Table Types (B2T2) provides a focus for research on type systems for tabular programming via five components: (1) a definition of tables; (2) a set of example datasets; (3) an API of tabular operations with detailed type constraints; (4) a set of short programs that use the table operations, each of which raises specific typing questions; and (5) a set of buggy programs with explanations that describe the exact bug in the program.

A Transient Semantics for Typed Racket (Programming 6.2, 2022)

Ben Greenman, Lukas Lazarek, Christos Dimoulas, and Matthias Felleisen
[pdf] [bibtex] [slides] [youtube] [private src]

This paper reports on the surprising difficulties of adapting the Transient semantics of Reticulated Python to the rich migratory type system of Typed Racket. The resulting implementation, Shallow Typed Racket, is faster than the standard Deep Typed Racket but only when the Transient blame assignment strategy is disabled. For language designers, this report provides valuable hints on how to equip an existing compiler to support a Transient semantics. For theoreticians, the negative experience with Transient blame calls for a thorough investigation of this strategy.

How to Evaluate Blame for Gradual Types (ICFP 2021)

Lukas Lazarek, Ben Greenman, Matthias Felleisen, and Christos Dimoulas
[pdf] [bibtex] [youtube]

Is blame information helpful? We generated programs with type boundary errors and simulated a rationally-acting programmer to find out. The results confirm the theory, but only barely.

Deep and Shallow Types (Ph.D. dissertation, December 2020)

Ben Greenman
[pdf] [bibtex] [slides] [youtube] [src] [NEU DRS pdf] [ProQuest]

Presents four contributions: (1) a systematic method to evaluate the performance of a mixed-typed language; (2) formal properties that measure the strength of types; (3) a scaled-up Transient semantics; and (4) the design and implementation of a language that combines Deep types (complete monitoring), Shallow types (type soundness), and untyped code. The combination helps avoid many worst-of-both-worlds situations that arise with Deep types only or Shallow types only.

Complete Monitors for Gradual Types (OOPSLA 2019)

Ben Greenman, Matthias Felleisen, and Christos Dimoulas
[pdf] [supplement] [bibtex] [slides] [youtube] [src]

Adapts complete monitoring into a (syntactic, operational) proof technique for migratory typing. A semantics satisfies complete monitoring iff it protects all channels of communication between typed and untyped code. Introduces blame soundness and blame completeness to test whether systems that fail complete monitoring can identify exactly the codes responsible for a run-time type mismatch.

How to Evaluate the Performance of Gradual Type Systems (JFP 2019)

Ben Greenman, Asumu Takikawa, Max S. New, Daniel Feltey, Robert Bruce Findler, Jan Vitek, and Matthias Felleisen
[pdf] [bibtex] [slides] [video abstract] [mp4 source]

Archival version of Is Sound Gradual Typing Dead?. Evaluates the relative performance of different gradual typing systems for the same language, and introduces a method based on simple random sampling to approximate the number of D-deliverable configurations in a program.

Note: The data for the zordoz benchmark is flawed; in fact, there is little improvement between v6.2 and v6.4 on this benchmark. For details, see the release notes for version 3 of the GTP benchmark suite.

The Behavior of Gradual Types: A User Study (DLS 2018)

Preston Tunnell Wilson, Ben Greenman, Justin Pombrio and Shriram Krishnamurthi
[pdf] [bibtex] [slides] [data]

Presents the results of surveying professional developers, computer science students, and Mechanical Turk workers on their preferences between three gradual typing semantics.

Collapsible Contracts: Fixing a Pathology of Gradual Typing (OOPSLA 2018)

Daniel Feltey, Ben Greenman, Christophe Scholliers, Robert Bruce Findler, and Vincent St-Amour
[pdf] [bibtex]

Design and evaluation of ``space-efficient'' contracts for Racket. Really the contracts are time and space efficient because they are collapsible (i.e. support a merge operation). The implementation is based on Michael Greenberg's model of eidetic contracts.

A Spectrum of Type Soundness and Performance (ICFP 2018)

Ben Greenman and Matthias Felleisen
[pdf] [supplement] [bibtex] [slides] [youtube] [src] [redex models] [artifact]

Compares the theory, performance, and implications (for a programmer) of three approaches to adding types to a dynamically-typed language.

Update 2019-11-13: as it turns out, the paper describes a complete spectrum in ye olde "data processing" sense (Kelly-Bootle, Stan. The Devil's DP Dictionary. McGraw-Hill Inc. 1981).

spectrum n. plural spectrums; mandatory DP usage wide spectrum(s). [Latin spectrum "an apparition."] Any range of one or two items or features.

† A range of from three to seven features (the legal maximum) is known as a complete or total spectrum. Ranges of eight or more are called unfair trading.

On the Cost of Type-Tag Soundness (PEPM 2018)

Ben Greenman and Zeina Migeed
[pdf] [bibtex] [slides] [script] [poster] [artifact]

Systematically evaluates the performance of Reticulated. Introduces a method for approximating the number of D-deliverable configurations in a benchmark through simple random sampling. Validates the method with: (1) a tiny bit of statistics, and (2) by comparing some approximations to the ground truth.

Migratory Typing: 10 Years Later (SNAPL 2017)

Sam Tobin-Hochstadt, Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, Ben Greenman, Andrew M. Kent, Vincent St-Amour, T. Stephen Strickland, Asumu Takikawa
[pdf] [bibtex]

Reflections on Typed Racket.

Type Systems as Macros (POPL 2017)

Stephen Chang, Alex Knauth, Ben Greenman.
[pdf] [bibtex] [artifact]

Type systems enforce a syntactic discipline on programmers (language users). Macros systems let programmers (language designers) change the syntax of their language. This paper argues that a macro system is an ideal vehicle for implementing type systems.

Is Sound Gradual Typing Dead? (POPL 2016)

Asumu Takikawa, Daniel Feltey, Ben Greenman, Max S. New, Jan Vitek, Matthias Felleisen.
[pdf] [bibtex] [poster] [artifact]

Proposes a method for evaluating the performance of gradual type systems.

Position Paper: Performance Evaluation for Gradual Type Systems (STOP 2015)

Asumu Takikawa, Daniel Feltey, Ben Greenman, Max S. New, Jan Vitek, Matthias Felleisen.
[pdf] [bibtex] [slides]

Gradual type systems need comprehensive performance evaluation, and this evaluation needs to benchmark all ways of gradually typing example programs. Just checking the fully-untyped and fully-typed versions of a program is not enough!

Getting F-Bounded Polymorphism into Shape (PLDI 2014)

Ben Greenman, Fabian Muehlboeck, Ross Tate (and wisdom from the Ceylon team).
[pdf] [bibtex] [poster] [teaser] [slides]

We show how to achieve decidable subtyping in object-oriented languages and argue that our approach is backwards compatible with existing Java projects.