Documenting and Packaging Software

When you produce software, there are several different ways that you might include prose alongside your executable code:

comments within the implementation to explain how it works [Example: comment in the GTK implementation];
descriptions of the software’s public interfaces, such as functions, datatypes [Example: GtkWindow];
build and installation instructions [Example: GTK build instructions]; and
a user guide explaining how to use the software after it is installed [Example: Disk Usage Analyzer manual].

The first kind of documentation applies to all software, but our focus here is on the last three kinds of documentation.

Depending on whether the software is a library or a complete application, usually the second or fourth kind of documentation will dominate. While interface (API) documentation and end-user documentation are often written in very different styles with different tools, many guidelines that apply to both kinds of documentation. For example, API documentation usually needs some supporting material that resembles the more holistic and example-driven style of end-user documentation. Meanwhile, end-user documentation can benefit from the careful definition and use of terminology that is typical of API documentation.

The third kind of documentation, instructions for building and installing software, is often an afterthought. Depending on your programming language’s tools for builds and distribution, little may be needed in this category. For C/C++-based libraries, however, some build and installation instructions are usually necessary.

Documentation is not just a chore that programmers must complete to make their work more useful. API and user documentation, especially, works well as a design tool. For an experienced developer, the process of writing documentation for software will frequently triggers changes to the software’s implementation or interface. Explaining your work to others has the almost magical effect of helping you understand your own work better and see how it could be improved.

1 General Documentation Rules

No matter what level of documentation you are writing, a few rules always apply:

Have a clear picture of your audience
Decide up front what you will assume that your audience already knows. For library documentation, the most basic reasonable assumption is that a reader is familiar with the language where your library is meant to be used; take advantage of all of the standard terminology of that language and its community, and avoid using non-standard terminology. For end-user documentation, you might reasonably assume that readers know about numbers and arithmetic, but they might not be familiar with, say, the limitations of floating-point precision or the limited range (if any) of integers. Domain-specific concepts may or may not need definition, depending on your intended audience.
After deciding on your audience, keep it in mind whenever you start using a concept in the documentation. If it’s not a concept that you’re assuming as a prerequisite, then define it.
Don’t repeat yourself (DRY), mostly
Just like in code, repeating information in documentation causes problems because the copies are likely to get out of sync. In code, you avoid repetition by defining helper routines and calling them. In documentation, avoid repetition by explaining an idea in one place and link to it (by hyperlink or section reference) from other places.
When using tools that generate documentation, you may be tempted to use abstraction facilities in the tool to write text once in the document source and have the text duplicated in the rendered documentation. Avoid that temptation. After X has been described, your reader will almost certainly prefer to see “X is like Y, except for ...” rather than see large descriptions of X and Y separately that the reader must compare to find out where X and Y differ.
One big exception to this rule is the guide versus reference strategy, which we describe in Guide versus Reference. The guide mode provides information in an approximate but easy-to-understand way, while the reference mode may repeat that information in a more dense and more precise way.
Examples, examples, examples
Just as examples are important for understanding and testing an implementation, examples are crucial in all levels of documentation. Anything that you define—a term, a process, a function, or a datatype—should be accompanied by relevant examples.
Be consistent, including with typesetting and grammar
Grammar matters; take the time to write clearly and properly. You don’t necessarily have to be formal, of course, but nicely written sentences make documentation easier to read for the same reason that ready-to-compile source code is easier to read than code that doesn’t even compile.
Be consistent with how names are spelled, and use consistent fonts in rendered documents. For example, if your software is called “Smiley,” then always write it as “Smiley” and don’t sometime write “SMILEY” or “smiley” or “SmIleY.”

2 Overall Documentation Structure

The bullets at the top of this document are not in the right order for presenting to potential users. Instead, your documentation should have the following structure:

Description: what the software does, the problem that it solves, and how it’s meant to be used
Getting started: installation and basic usage
User guide: complete guide to using an application (may not apply for just a library)
API documentation: classes, functions, etc. (may not apply to just an application)

3 Description

Since you will be preparing an MSDscript distribution to be reviewed by a peer/instructor/TA who already knows about MSDscript, you may be tempted to skip the general description. Instead, write for an audience that has never heard of MSDscript, so they will need at least a few sentences of introduction and explanation.

4 Build and Installation Instructions

Modern languages include package managers and build systems and relieve programmers from specifying how to build their software, at least as long as they stay within the package ecosystem. Maven and npm are two prominent examples. In the case of C/C++, there is no widely adopted package-management system. Indeed, C and C++ tend to expose so much operating-system details that writing portable code is a difficult problem all by itself.Fortunately, MSDscript is simple enough that we can stick to portable features.

The most traditional and common pattern for building C/C++ code is

./configure

make

make install

The ./configure step runs a script that detects details about the current environment and generates a Makefile, then make uses the Makefile to build within the current directory, and then make install installs the built application and/or libraries.

Another popular approach is to use CMake, which is an alternative to a configure script. Running cmake detects environment details and creates a Makefile or other platform-specific build description, and cmake --build derive make or other tools.

A less common but sometimes-used approach is provide projects for an IDE such as Xcode. This approach is less common mainly because it is less portable.

Whatever mechanism you choose, your audience will need a step-by-step explanation. If you use CMake to build, for example, then the instructions should be enough to build even for a reader who has never used CMake before.

5 User Manuals

A user manual is in some ways the easiest kind of documentation to write. It describes how software should work from the outside, without details about the internal implementation. There may be concepts that space both usage and implementation of a system, and those concepts should be defined at this level. Concepts that are only about the implementation, meanwhile, can be ignored. For example, a user of the msdscript command-line program will have to know about numbers, variables, and functions, but they do not have to know about environments or continuations. A user of MSDscript doesn’t need to know that an AddExpr class exists, but they need to know how to write addition expressions that are accepted in --interp mode. To put it another way, your MSDscript user manuals should mstly work for anyone’s implementation, not just yours, at least to the degree that they support the same command-line flags and exactly the same language syntax.

When you’re describing software that works at the command line, you can normally assume that your reader knows about running programs, passing them arguments, and redirecting input and output. If a command-line program depends on an input on end-of-file, it’s probably a good idea to remind the user about Ctl-D.

If you find that describing how to use a piece of software is difficult, then consider the possibility that the software should change. Documenting the user interface (including the way that command-line arguments are handled, etc.) is a great opportunity to improve the interface itself.

6 API Documentation

The heart of any API documentation is an enumeration of functions, datatypes, classes, global variables, and similar elements that you would find in your language’s interface definition files (e.g., ".h" files). Repeating all of those those elements as they are found in the implementation is typically necessary; tools for many languages can extract that information from source files (e.g., Javadoc). Still, necessary does not imply sufficient. The description of a function argument, for example, normally needs to go beyond the information that is expressed by the argument’s type, and the description of a function or class normally needs to go well beyond just the name of the function or class.

See the GtkWindow documentation for an example. The documentation page starts with a simple enumeration of all of the functions provided by GTK to operate on GtkWindow objects, but clicking any of the functions jumps to a detailed description. The detailed description shows the arguments types and return types, but each argument is also described in more words. Every function has at least a few words of description, and some functions merit more description than others.

The lack of examples is a big limitation the GTK documentation, but examples are difficult to write for GUI libraries, so the lack of example is also understandable. The Racket documentation on list operations illustrates a best-case scenario where functions take and return simple values, and the documentation shows many examples.

6.1 Completeness

API documentation is a contract between the API producer and consumer, and as such, it should be as complete as possible. It should describe all of the publicly accessible elements of a library, and it should specify what happens under all possible conditions. True completeness is difficult to achieve, and there are many situations where leaving behavior unspecified is reasonable or necessary. Beware, however, that leaving any exported bindings undocumented is usually the wrong choice; if a binding is accessible, someone will use it, and then you may be stuck maintaining whatever interface it happens to implement.

That said, C/C++ makes this problem harder, because there’s not a way to make definitions visible across files but invisible outside of a library that is made of multiple files. For C/C++, developers typically fall back to a naming conventions: entities in a certain namespace or with certain prefix are considered public and are documented, while other names are internal and not documented. Clients of a library may refer to the private name, anyway, but there’s some understanding that the client is taking on the risk in that case, not the library provider.

6.2 Guide versus Reference

There’s a tension in API documentation between explaining everything completely and explaining enough for readers to get the high-level idea. A complete explanation may involve many corner cases or interactions that make be the overall idea more difficult to grasp—a “forest for the trees” problem. Describing just the high-level idea, however, likely leaves many behaviors unspecified.

To balance this tension, a good strategy is to think about your documentation in two different modes: guide versus reference. These two modes may be covered by separate documents, as they are for the Racket Guide and Racket Reference; cross-references between the two documents can help readers get to relevant information within one to the other. Alternatively, guide-level explanations may appear as a kind of introduction to each section of a documentation just before a denser reference portion, as illustrated in the GTK documentation.

6.3 Hyperlinking

Good reference documentation is extensively hyperlinked. For example, code in the Racket Reference has every identfier hyperlinked to its documentation, and technical terms are similarly hyperlinked to definitions. Good hyperlinking requires a good tool, however.For Racket documentation, the tool is Scribble. Javadoc supports a certain amount of automatic hyperlinking, and the GTK documentation uses a similar tool that extracts documentation from source annotations. Not every documentation task will merit this level of investment.

1	General Documentation Rules
2	Overall Documentation Structure
3	Description
4	Build and Installation Instructions
5	User Manuals
6	API Documentation