Tutorial: Zero to Sixty in Racket

:: Racket, tutorial, by Ben Greenman

Racket is excellent for incrementally growing scripts into full-fledged programs. This post steps through the evolution of one small program and highlights the Racket tools that enable incremental advances.

Why should anyone use Racket? There are two reasons:
  1. You have a problem that can only be solved with Racket’s language-building tools

  2. Racket is a nice language to program in. (Has lexical scope, parentheses, active users...)

My favorite part of Racket is how it supports a certain development style of evolving scripts into programs. When I start coding (after design, before debugging), I can focus the problem at hand. Next come examples, unit tests, and types to be sure the solution is correct. Finally, I worry about aesthetics, efficiency, and how the solution can be used as a library in a larger context.

Bottom line: with Racket, my coding is aligned with my priorities. And as I transition from "no code" to "working code" to "robust code" to "re-usable code", the program is almost always runnable.

Problem: A KWIC Index Production System

A KWIC index system reads input from a file, divides each line of the file into whitespace-separated words, and outputs (in alphabetical order) all circular shifts of all lines.

The first circular shift of a line "A B C" is the line "B C A". The second circular shift is "C A B".

Building a KWIC index is a historical problem. According to D.L. Parnas (1972):

Except under extreme circumstances (huge data base, no supporting software) such a system could be implemented by a good programmer within a week or two.

See also: Yannis’s Law.

Today, I bet only Agda and Scratch programmers would need the full two weeks. We’ll be done in 20 minutes.

A Script

To start, open a file and type:

You can name the file anything, like kwic.rkt or rkt.kwic or foo. Racket doesn’t care, but it does need the #lang line to read the contents of the file.

Though, you should use the .rkt extension.

The first part of the solution is a function to read input from a file into a list of strings for further processing. The built-in function file->lines does exactly this, but we’ll for-loop instead.

(define (kwic-read filename)
  (with-input-from-file filename
    (λ ()
      (for/list ([line (in-lines)])
        line))))

When called with a filename like "heart-of-darkness.txt", the function uses for/list to build a list of lines by reading from a port with in-lines. The port is the data from filename, thanks to with-input-from-file.

Next is a function to convert a list of strings into a list of lists of words. Here we’ll just use library functions.

(define (kwic-split lines)
  (map string-split lines))

By default, string-split divides a string into a list of whitespace-separated substrings. You can always supply a different delimiter, or use regexp-split to divide by a regular expression.

Two tasks left! First we generate all circular shifts for a list of strings words by folding up a list with one shift of words for each word.

(define (circular-shift words)
  (append (rest words) (list (first words))))
 
(define (all-circular-shifts words)
  (for/fold ([all-shifts (list words)])
            ([i (in-range 1 (length words))])
    (cons (circular-shift (first all-shifts))
          all-shifts)))

Second, we alphabetize and print the shifts.

(define (alphabetize all-shifts)
  (sort all-shifts shift<?))
 
(define (shift<? shift1 shift2)
  (match* (shift1 shift2) ; destruct multiple values
   [('() _) ; first list empty, don't care about second
    #t]
   [(_ '()) ; first list non-empty, second empty
    #f]
   [((cons s1 shift1-rest) (cons s2 shift2-rest))
    (or (string<? s1 s2)
        (and (string=? s1 s2)
             (shift<? shift1-rest shift2-rest)))]))
 
(define (kwic-display all-sorted-shifts)
  (define (display-words words)
    (display (first words))
    (for ([word (in-list (cdr words))])
      (display " ")
      (display word))
    (newline))
  ; for-each is like map, but returns (void)
  (for-each display-words all-sorted-shifts))

Gluing it all together, here’s the full script (with type annotations in comments).

#lang racket
 
; type Words = (Listof String)
; type Lines = (Listof Words)
 
; Path-String -> (Listof String)
(define (kwic-read filename)
  (with-input-from-file filename
    (λ ()
      (for/list ([line (in-lines)])
        line))))
 
; (Listof String) -> Lines
(define (kwic-split lines)
  (map string-split lines))
 
; Words -> (Listof Words)
(define (all-circular-shifts words)
  (for/fold ([all-shifts (list words)])
            ([i (in-range 1 (length words))])
    (cons (circular-shift (first all-shifts))
          all-shifts)))
 
; Move first element to last position
; Words -> Words
(define (circular-shift words)
  (append (rest words) (list (first words))))
 
; Lines -> Lines
(define (alphabetize all-shifts)
  (sort all-shifts shift<?))
 
; Lexicographic order on equal-length lists of words
; Words Words -> Boolean
(define (shift<? shift1 shift2)
  (match* (shift1 shift2)
   [('() _) ; first list empty, don't care about second
    #t]
   [(_ '()) ; first list non-empty, second empty
    #f]
   [((cons s1 shift1-rest) (cons s2 shift2-rest))
    (or (string<? s1 s2)
        (and (string=? s1 s2)
             (not (null? shift1-rest))
             (shift<? shift1-rest shift2-rest)))]))
 
; Lines -> Void
(define (kwic-display all-sorted-shifts)
  (define (display-words words)
    (display (first words))
    (for ([word (in-list (cdr words))])
      (display " ")
      (display word))
    (newline))
  (for-each display-words all-sorted-shifts))
 
; Lines -> (Listof Lines)
(define (all-circular-shifts* lines)
  (map all-circular-shifts lines))
 
; Path-String -> Void
(define (kwic-index file-name)
  (define all-lines (kwic-split (kwic-read file-name)))
  (define all-shifts (append* (all-circular-shifts* all-lines)))
  (kwic-display (alphabetize all-shifts)))
 
; End-to-end test
; -> Void
(define (run-test)
  (define test-file "test.txt")
  ; Make a file and test
  (unless (file-exists? test-file)
    (with-output-to-file test-file
      (λ ()
        (displayln "imagine if this")
        (displayln "took 2 weeks to write"))))
  (kwic-index test-file))
 
(run-test)

Running the file should print:

2 weeks to write took
if this imagine
imagine if this
this imagine if
to write took 2 weeks
took 2 weeks to write
weeks to write took 2
write took 2 weeks to

Testing and Submodules

Any top-level expressions in a file can work as unit tests. The equal? statement below checks whether the first circular shift of '("A" "B" "C") is '("B" "C" "A").

(define (circular-shift words)
  (append (rest words) (list (first words))))
 
(equal? (circular-shift '("A" "B" "C")) '("B" "C" "A"))

Running the file now prints #t to the console, meaning the test passed. We can use error or raise-user-error to make failures easier to notice. Or we can use the RackUnit testing library.

(define (circular-shift words)
  (append (rest words) (list (first words))))
 
(require rackunit) ; import the testing library
(check-equal?
  (circular-shift '("A" "B" "C"))
  '("B" "C" "A"))

These tests run each time the module does. If you prefer to run tests only in a specific context, and not when the module is run or imported as a library, you can move them to a separate file or into a submodule.

(define (circular-shift words)
  (append (rest words) (list (first words))))
 
(module+ test ; Open a submodule named 'test'
  (require rackunit)
  (check-equal?
    (circular-shift '("A" "B" "C"))
    '("B" "C" "A")))

Running the module normally via racket kwic.rkt will not run code in the submodule. Instead, use raco test to run the tests.

>  raco test kwic.rkt
raco test: (submod "kwic.rkt" test)
1 test passed

The reason we used module+, instead of Racket’s module and module* forms is that module+ inherits the language and namespace of its containing module and can be incrementally extended. This way, we can keep tests near the relevant code.

(module+ test
  (require rackunit))
 
(define (circular-shift words)
  (append (rest words) (list (first words))))
 
(module+ test
  (check-equal?
    (circular-shift '("A" "B" "C"))
    '("B" "C" "A")))
 
(define (alphabetize all-shifts)
  (sort all-shifts shift<?))
 
(module+ test
  (check-equal?
    (alphabetize '(("racket" "is")
                   ("as")
                   ("racket" "does")))
    '(("as")
      ("racket" "does")
      ("racket" "is"))))

RackUnit in a separate file or test submodule is the unofficial standard for testing Racket programs.

Recognizing Patterns, Avoiding Repetition

Every unit test we’ve written uses check-equal?.

(module+ test
  (check-equal?
    (kwic-split '())
    '())
  (check-equal?
    (kwic-split '("hello    world"))
    '(("hello" "world")))
  (check-equal?
    (kwic-split '(" lost " " in " "space"))
    '(("lost") ("in") ("space")))
  (check-equal?
    (kwic-split '("something"))
    '()))

These tests follow a simple pattern that we can express as a syntax rule.

(module+ test
  (define-syntax-rule (check-equal?* [i o] ...)
    (begin
      (check-equal? i o)
      ...))
 
  (check-equal?*
    [(kwic-split '())
     '()]
    [(kwic-split '("hello    world"))
     '(("hello" "world"))]
    [(kwic-split '(" out " " in " "the ozone"))
     '(("out") ("in") ("the" "ozone"))]
    [(kwic-split '("something"))
     '()]))

The ... are not pseudocode! They denote Kleene-star repetition, like a sextile ("*") in a regular expression. In this case, the input pattern is a sequence of lists with two S-expressions, i and o. Uses of i and o in the rule must be followed by one ... to splice the captured S-expressions into the result.

Many languages offer higher-order functions and polymorphism to abstract common behaviors. Syntax extensions are a different way to avoid repeating yourself. After 30 years, we are still discovering what syntax extensions are useful for.

See this recent Racket mailing list post for some applications.

Adding Static Types

Changing the #lang line to typed/racket adds static type-checking to our program. If we only change the language and run the code as-is, there will be type errors. But we can use submodules again to incrementally check our design with types.

Note: typed regions are another way to embed typed code into untyped contexts.

#lang racket
 
 (module t typed/racket
   ; Need to annotate:
   ; - function parameters
   ; - for-loop return types
 
   (: kwic-read : Path-String -> (Listof String))
   (define (kwic-read filename)
     (with-input-from-file filename
       (λ ()
         (for/list ([line (in-lines)])
                   : (Listof String)
           line))))
 
   ; Next migration step: move other untyped functions here
 
   (provide (all-defined-out)))
 (require 't)
 
 (define (kwic-split lines)
   (map string-split lines))
 
 ; <rest of file omitted>

After scooping all functions into the Typed Racket bubble, we can remove the submodule declaration and change #lang racket to #lang typed/racket.

Finally, a Library

Other modules can import our functions if we use a provide statement. By convention, exports belong at the top of a file.

 
#lang typed/racket
 
(provide kwic-index)
 
; <definitions here>

Then any typed or untyped module can use kwic-index by writing (require "kwic.rkt").

As a finishing touch, we can use the racket/cmdline library inside a main submodule to give a basic front-end interface. Similar to module+ test, a module+ main declares code that inherits the file’s bindings and language but is only run when the program is executaed.

Here is the complete typed and tested code listing. The main submodule is at the bottom.

#lang typed/racket
(module+ test
  (require typed/rackunit)
 
  (define-syntax-rule (check-equal?* [i o] ...)
    (begin
      (check-equal? i o)
      ...)))
 
(define-type Words (Listof String))
(define-type Lines (Listof Words))
 
(: kwic-read : Path-String -> (Listof String))
(define (kwic-read filename)
  (with-input-from-file filename
    (λ ()
      (for/list ([line (in-lines)])
                : (Listof String)
        line))))
 
(module+ test
  (let ([tmpfile (make-temporary-file)])
    (with-output-to-file tmpfile #:exists 'replace
      (λ ()
        (displayln "The Nellie,")
        (displayln "a cruising yawl,")
        (displayln "swung to her anchor without a flutter of sails,")
        (displayln "and was at rest.")))
    (define actual (kwic-read tmpfile))
    (define expect (file->lines tmpfile))
    (delete-file tmpfile)
    (check-equal? actual expect)))
 
(: kwic-split : (Listof String) -> Lines)
(define (kwic-split lines)
  (map #{string-split :: (String -> Words)} lines))
 
(module+ test
  (check-equal?*
    [(kwic-split '())
     '()]
    [(kwic-split '("hello    world"))
     '(("hello" "world"))]))
 
; Move first element to last position
(: circular-shift : Words -> Words)
(define (circular-shift words)
  (append (rest words) (list (first words))))
 
(module+ test
  (check-equal?*
    [(circular-shift '("A" "B" "C"))
     '("B" "C" "A")]))
 
(: all-circular-shifts : Words -> (Listof Words))
(define (all-circular-shifts words)
  (for/fold ([all-shifts (list words)])
            ([i (in-range 1 (length words))])
            : (Listof Words)
    (cons (circular-shift (first all-shifts))
          all-shifts)))
 
(module+ test
  (check-equal?*
    [(all-circular-shifts '("A" "B" "C"))
     '(("C" "A" "B") ("B" "C" "A") ("A" "B" "C"))]))
 
(: alphabetize : Lines -> Lines)
(define (alphabetize all-shifts)
  (sort all-shifts shift<?))
 
(module+ test
  (check-equal?*
    [(alphabetize '(("A" "B" "C") ("B" "C") ("A")))
     '(("A") ("A" "B" "C") ("B" "C"))]))
 
; Lexicographic order on equal-length lists of words
(: shift<? : Words Words -> Boolean)
(define (shift<? shift1 shift2)
  (match* (shift1 shift2)
   [('() _) ; first list empty, don't care about second
    #t]
   [(_ '()) ; first list non-empty, second empty
    #f]
   [((cons s1 shift1-rest) (cons s2 shift2-rest))
    (or (string<? s1 s2)
        (and (string=? s1 s2)
             (shift<? shift1-rest shift2-rest)))]))
 
(module+ test
  (check-equal?*
    [(shift<? '() '())
     #t]
    [(shift<? '("A" "B") '("A" "C"))
     #t]))
 
(: kwic-display : Lines -> Void)
(define (kwic-display all-sorted-shifts)
  (: display-words : Words -> Void)
  (define (display-words words)
    (display (first words))
    (for ([word (in-list (cdr words))])
      (display " ")
      (display word))
    (newline))
  (for-each display-words all-sorted-shifts))
 
(module+ test
  (parameterize ([current-output-port (open-output-string)])
    (kwic-display '(("A") ("B" "C")))
    (check-equal?
      (get-output-string (current-output-port))
      "A\nB C\n")))
 
(: all-circular-shifts* : Lines -> (Listof Lines))
(define (all-circular-shifts* lines)
  (map all-circular-shifts lines))
 
(module+ test
  (check-equal?
    (all-circular-shifts* '(("A" "B" "C") ("D")))
    '((("C" "A" "B") ("B" "C" "A") ("A" "B" "C")) (("D")))))
 
(: kwic-index : Path-String -> Void)
(define (kwic-index file-name)
  (define all-lines (kwic-split (kwic-read file-name)))
  (define all-shifts (append* (all-circular-shifts* all-lines)))
  (kwic-display (alphabetize all-shifts)))
 
(module+ test
  (parameterize ([current-output-port (open-output-string)])
    (define tmpfile (make-temporary-file))
    (with-output-to-file tmpfile #:exists 'replace
      (λ ()
        (displayln "imagine if this")
        (displayln "took 2 weeks to write")))
    (kwic-index tmpfile)
    (delete-file tmpfile)
    (check-equal?
      (get-output-string (current-output-port))
      (string-join '(
        "2 weeks to write took"
        "if this imagine"
        "imagine if this"
        "this imagine if"
        "to write took 2 weeks"
        "took 2 weeks to write"
        "weeks to write took 2"
        "write took 2 weeks to\n") "\n"))))
 
(module+ main
  (require racket/cmdline)
  (: *output-to* (Parameterof Any))
  (define *output-to* (make-parameter #f))
  (command-line
    #:program "kwic index"
    #:once-each
    [("-o" "--output")
     output-to ; user-supplied input
     "Write output to file"
     (*output-to* output-to)] ; update the parameter *output-to*
    #:args (file-name)
    (define output-to (*output-to*)) ; read from parameter *output-to*
    (define out-port
      (if (string? output-to)
        (open-output-file output-to #:exists 'replace)
        (current-output-port)))
    (parameterize ([current-output-port out-port])
      (kwic-index (cast file-name Path-String)))
    (when (string? output-to)
      (close-output-port out-port))))

Sample interactions:
> racket kwic.rkt
kwic index: expects 1 <file-name> on the command line, given 0 arguments
 
> echo "It is a truth universally acknowledged" > pride-and-prejudice.txt
> racket kwic.rkt -o index.out pride-and-prejudice.txt
> wc -l index.out
6 index.out

Closing

We started with functions, wrote (and quarantined) unit tests, reinforced our design with types, and added a command-line interface. Going forward we could add Scribble documentation and share our work as a package.

For more on building languages with Racket: