Chapter 26 The “Tail Modulo Constructor” program transformation

(Introduced in OCaml 4.14)

Note: this feature is considered experimental, and its interface may evolve, with user feedback, in the next few releases of the language.

Consider this natural implementation of the List.map function:

let rec map f l = match l with | [] -> [] | x :: xs -> let y = f x in y :: map f xs

A well-known limitation of this implementation is that the recursive call, map f xs, is not in tail position. The runtime needs to remember to continue with y :: r after the call returns a value r, therefore this function consumes some amount of call-stack space on each recursive call. The stack usage of map f li is proportional to the length of li. This is a correctness issue for large lists on systems configured with limited stack space – the dreaded Stack_overflow exception.

# let with_stack_limit stack_limit f = let old_gc_settings = Gc.get () in Gc.set { old_gc_settings with stack_limit }; Fun.protect ~finally:(fun () -> Gc.set old_gc_settings) f ;;
val with_stack_limit : int -> (unit -> 'a) -> 'a = <fun>
# with_stack_limit 20_000 (fun () -> List.length (map Fun.id (List.init 1_000_000 Fun.id)) );;
Stack overflow during evaluation (looping recursion?).

In this implementation of map, the recursive call happens in a position that is not a tail position in the program, but within a datatype constructor application that is itself in tail position. We say that those positions, that are composed of tail positions and constructor applications, are tail modulo constructor (TMC) positions – we sometimes write tail modulo cons for brevity.

It is possible to rewrite programs such that tail modulo cons positions become tail positions; after this transformation, the implementation of map above becomes tail-recursive, in the sense that it only consumes a constant amount of stack space. The OCaml compiler implements this transformation on demand, using the [@tail_mod_cons] or [@ocaml.tail_mod_cons] attribute on the function to transform.

let[@tail_mod_cons] rec map f l = match l with | [] -> [] | x :: xs -> let y = f x in y :: map f xs
# List.length (map Fun.id (List.init 1_000_000 Fun.id));;
- : int = 1000000

This transformation only improves calls in tail-modulo-cons position, it does not improve recursive calls that do not fit in this fragment:

(* does *not* work: addition is not a data constructor *) let[@tail_mod_cons] rec length l = match l with | [] -> 0 | _ :: xs -> 1 + length xs
Warning 71 [unused-tmc-attribute]: This function is marked @tail_mod_cons but is never applied in TMC position.

It is of course possible to use the [@tail_mod_cons] transformation on functions that contain some recursive calls in tail-modulo-cons position, and some calls in other, arbitrary positions. Only the tail calls and tail-modulo-cons calls will happen in constant stack space.

General design

This feature is provided as an explicit program transformation, not an implicit optimization. It is annotation-driven: the user is expected to express their intent by adding annotations in the program using attributes, and will be asked to do so in any ambiguous situation.

We expect it to be used mostly by advanced OCaml users needing to get some guarantees on the stack-consumption behavior of their programs. Our recommendation is to use the [@tailcall] annotation on all callsites that should not consume any stack space. [@tail_mod_cons] extends the set of functions on which calls can be annotated to be tail calls, helping establish stack-consumption guarantees in more cases.

Performance

A standard approach to get a tail-recursive version of List.map is to use an accumulator to collect output elements, and reverse it at the end of the traversal.

let rec map f l = map_aux f [] l and map_aux f acc l = match l with | [] -> List.rev acc | x :: xs -> let y = f x in map_aux f (y :: acc) xs

This version is tail-recursive, but it is measurably slower than the simple, non-tail-recursive version. In contrast, the tail-mod-cons transformation provides an implementation that has comparable performance to the original version, even on small inputs.

Evaluation order

Beware that the tail-modulo-cons transformation has an effect on evaluation order: the constructor argument that is transformed into tail-position will always be evaluated last. Consider the following example:

type 'a two_headed_list = | Nil | Consnoc of 'a * 'a two_headed_list * 'a let[@tail_mod_cons] rec map f = function | Nil -> Nil | Consnoc (front, body, rear) -> Consnoc (f front, map f body, f rear)

Due to the [@tail_mod_cons] transformation, the calls to f front and f rear will be evaluated before map f body. In particular, this is likely to be different from the evaluation order of the unannotated version. (The evaluation order of constructor arguments is unspecified in OCaml, but many implementations typically use left-to-right or right-to-left.)

This effect on evaluation order is one of the reasons why the tail-modulo-cons transformation has to be explicitly requested by the user, instead of being applied as an automatic optimization.

Why tail-modulo-cons?

Other program transformations, in particular a transformation to continuation-passing style (CPS), can make all functions tail-recursive, instead of targeting only a small fragment. Some reasons to provide builtin support for the less-general tail-mod-cons are as follows:

Note: OCaml call stack size

In OCaml 4.x and earlier, bytecode programs respect the stack_limit runtime parameter configuration (as set using Gc.set in the example above), or the l setting of the OCAMLRUNPARAM variable. Native programs ignore these settings and only respect the operating system native stack limit, as set by ulimit on Unix systems. Most operating systems run with a relatively low stack size limit by default, so stack overflow on non-tail-recursive functions are a common programming bug.

Starting from OCaml 5.0, native code does not use the native system stack for OCaml function calls anymore, so it is not affected by the operating system native stack size; both native and bytecode programs respect the OCaml runtime’s own limit. The runtime limit is set to a much higher default than most operating system native stacks, with a limit of at least 512MiB, so stack overflow should be much less common in practice. There is still a stack limit by default, as it remains useful to quickly catch bugs with looping non-tail-recursive functions. Without a stack limit, one has to wait for the whole memory to be consumed by the stack for the program to crash, which can take a long time and make the system unresponsive.

This means that the tail modulo constructor transformation is less important on OCaml 5: it does improve performance noticeably in some cases, but it is not necessary for basic correctness for most use-cases.

1 Disambiguation

It may happen that several arguments of a constructor are recursive calls to a tail-modulo-cons function. The transformation can only turn one of these calls into a tail call. The compiler will not make an implicit choice, but ask the user to provide an explicit disambiguation.

Consider this type of syntactic expressions (assuming some pre-existing type var of expression variables):

type var (* some pre-existing type of variables *) type exp = | Var of var | Let of binding * exp and binding = var * exp

Consider a map function on variables. The direct definition has two recursive calls inside arguments of the Let constructor, so it gets rejected as ambiguous.

let[@tail_mod_cons] rec map_vars f exp = match exp with | Var v -> Var (f v) | Let ((v, def), body) -> Let ((f v, map_vars f def), map_vars f body)
Error: [@tail_mod_cons]: this constructor application may be TMC-transformed in several different ways. Please disambiguate by adding an explicit [@tailcall] attribute to the call that should be made tail-recursive, or a [@tailcall false] attribute on calls that should not be transformed. This call could be annotated. This call could be annotated.

To disambiguate, the user should add a [@tailcall] attribute to the recursive call that should be transformed to tail position:

let[@tail_mod_cons] rec map_vars f exp = match exp with | Var v -> Var (f v) | Let ((v, def), body) -> Let ((f v, map_vars f def), (map_vars[@tailcall]) f body)

Be aware that the resulting function is not tail-recursive, the recursive call on def will consume stack space. However, expression trees tend to be right-leaning (lots of Let in sequence, rather than nested inside each other), so putting the call on body in tail position is an interesting improvement over the naive definition: it gives bounded stack space consumption if we assume a bound on the nesting depth of Let constructs.

One would also get an error when using conflicting annotations, asking for two of the constructor arguments to be put in tail position:

let[@tail_mod_cons] rec map_vars f exp = match exp with | Var v -> Var (f v) | Let ((v, def), body) -> Let ((f v, (map_vars[@tailcall]) f def), (map_vars[@tailcall]) f body)
Error: [@tail_mod_cons]: this constructor application may be TMC-transformed in several different ways. Only one of the arguments may become a TMC call, but several arguments contain calls that are explicitly marked as tail-recursive. Please fix the conflict by reviewing and fixing the conflicting annotations. This call is explicitly annotated. This call is explicitly annotated.

2 Danger: getting out of tail-mod-cons

Due to the nature of the tail-mod-cons transformation (see Section 26.3 for a presentation of transformation):

The fact that calls in tail position in the source program may become non-tail calls if they go from a tail-mod-cons to a non-tail-mod-cons function is surprising, and the transformation will warn about them.

For example:

let[@tail_mod_cons] rec flatten = function | [] -> [] | xs :: xss -> let rec append_flatten xs xss = match xs with | [] -> flatten xss | x :: xs -> x :: append_flatten xs xss in append_flatten xs xss
Warning 71 [unused-tmc-attribute]: This function is marked @tail_mod_cons but is never applied in TMC position. Warning 72 [tmc-breaks-tailcall]: This call is in tail-modulo-cons position in a TMC function, but the function called is not itself specialized for TMC, so the call will not be transformed into a tail call. Please either mark the called function with the [@tail_mod_cons] attribute, or mark this call with the [@tailcall false] attribute to make its non-tailness explicit.

Here the append_flatten helper is not annotated with [@tail_mod_cons], so the calls append_flatten xs xss and flatten xss will not be tail calls. The correct fix here is to annotate append_flatten to be tail-mod-cons.

let[@tail_mod_cons] rec flatten = function | [] -> [] | xs :: xss -> let[@tail_mod_cons] rec append_flatten xs xss = match xs with | [] -> flatten xss | x :: xs -> x :: append_flatten xs xss in append_flatten xs xss

The same warning occurs when append_flatten is a non-tail-mod-cons function of the same recursive group; using the tail-mod-cons transformation is a property of individual functions, not whole recursive groups.

let[@tail_mod_cons] rec flatten = function | [] -> [] | xs :: xss -> append_flatten xs xss and append_flatten xs xss = match xs with | [] -> flatten xss | x :: xs -> x :: append_flatten xs xss
Warning 71 [unused-tmc-attribute]: This function is marked @tail_mod_cons but is never applied in TMC position. Warning 72 [tmc-breaks-tailcall]: This call is in tail-modulo-cons position in a TMC function, but the function called is not itself specialized for TMC, so the call will not be transformed into a tail call. Please either mark the called function with the [@tail_mod_cons] attribute, or mark this call with the [@tailcall false] attribute to make its non-tailness explicit.

Again, the fix is to specialize append_flatten as well:

let[@tail_mod_cons] rec flatten = function | [] -> [] | xs :: xss -> append_flatten xs xss and[@tail_mod_cons] append_flatten xs xss = match xs with | [] -> flatten xss | x :: xs -> x :: append_flatten xs xss

Non-recursive functions can also be annotated [@tail_mod_cons]; this is typically useful for local bindings to recursive functions.

Incorrect version:

let[@tail_mod_cons] rec map_vars f exp = let self exp = map_vars f exp in match exp with | Var v -> Var (f v) | Let ((v, def), body) -> Let ((f v, self def), (self[@tailcall]) body)
Warning 51 [wrong-tailcall-expectation]: expected tailcall Warning 51 [wrong-tailcall-expectation]: expected tailcall Warning 71 [unused-tmc-attribute]: This function is marked @tail_mod_cons but is never applied in TMC position.

Recommended fix:

let[@tail_mod_cons] rec map_vars f exp = let[@tail_mod_cons] self exp = map_vars f exp in match exp with | Var v -> Var (f v) | Let ((v, def), body) -> Let ((f v, self def), (self[@tailcall]) body)

In other cases, there is either no benefit in making the called function tail-mod-cons, or it is not possible: for example, it is a function parameter (the transformation only works with direct calls to known functions).

For example, consider a substitution function on binary trees:

type 'a tree = Leaf of 'a | Node of 'a tree * 'a tree let[@tail_mod_cons] rec bind (f : 'a -> 'a tree) (t : 'a tree) : 'a tree = match t with | Leaf v -> f v | Node (left, right) -> Node (bind f left, (bind[@tailcall]) f right)
Warning 72 [tmc-breaks-tailcall]: This call is in tail-modulo-cons position in a TMC function, but the function called is not itself specialized for TMC, so the call will not be transformed into a tail call. Please either mark the called function with the [@tail_mod_cons] attribute, or mark this call with the [@tailcall false] attribute to make its non-tailness explicit.

Here f is a function parameter, not a direct call, and the current implementation is strictly first-order, it does not support tail-mod-cons arguments. In this case, the user should indicate that they realize this call to f v is not, in fact, in tail position, by using (f[@tailcall false]) v.

type 'a tree = Leaf of 'a | Node of 'a tree * 'a tree let[@tail_mod_cons] rec bind (f : 'a -> 'a tree) (t : 'a tree) : 'a tree = match t with | Leaf v -> (f[@tailcall false]) v | Node (left, right) -> Node (bind f left, (bind[@tailcall]) f right)

3 Details on the transformation

To use this advanced feature, it helps to be aware that the function transformation produces a specialized function in destination-passing-style.

Recall our map example:

let rec map f l = match l with | [] -> [] | x :: xs -> let y = f x in y :: map f xs

Below is a description of the transformed program in pseudo-OCaml notation: some operations are not expressible in OCaml source code. (The transformation in fact happens on the Lambda intermediate representation of the OCaml compiler.)

let rec map f l =
  match l with
  | [] -> []
  | x :: xs ->
    let y = f x in
    let dst = y ::{mutable} Hole in
    map_dps f xs dst 1;
    dst

and map_dps f l dst idx =
  match l with
  | [] -> dst.idx <- []
  | x :: xs ->
    let y = f x in
    let dst' = y ::{mutable} Hole in
    dst.idx <- dst';
    map_dps f xs dst' 1

The source version of map gets transformed into two functions, a direct-style version that is also called map, and a destination-passing-style version (DPS) called map_dps. The destination-passing-style version does not return a result directly, instead it writes it into a memory location specified by two additional function parameters, dst (a memory block) and i (a position within the memory block).

The source call y :: map f xs gets transformed into the creation of a mutable block y ::{mutable} Hole, whose second parameter is an un-initialized hole. The block is then passed to map_dps as a destination parameter (with offset 1).

Notice that map does not call itself recursively, it calls map_dps. Then, map_dps calls itself recursively, in a tail-recursive way.

The call from map to map_dps is not a tail call (this is something that we could improve in the future); but this call happens only once when invoking map f l, with all list elements after the first one processed in constant stack by map_dps.

This explains the “getting out of tail-mod-cons” subtleties. Consider our previous example involving mutual recursion between flatten and append_flatten.

let[@tail_mod_cons] rec flatten l =
  match l with
  | [] -> []
  | xs :: xss ->
    append_flatten xs xss

The call to append_flatten, which syntactically appears in tail position, gets transformed differently depending on whether the function has a destination-passing-style version available, that is, whether it is itself annotated [@tail_mod_cons]:

(* if append_flatten_dps exists *)
and flatten_dps l dst i =
  match l with
  | [] -> dst.i <- []
  | xs :: xss ->
    append_flatten_dps xs xss dst i

(* if append_flatten_dps does not exist *)
and rec flatten_dps l dst i =
  match l with
  | [] -> dst.i <- []
  | xs :: xss ->
    dst.i <- append_flatten xs xss

If append_flatten does not have a destination-passing-style version, the call gets transformed to a non-tail call.

4 Current limitations

Purely syntactic criterion

Just like tail calls in general, the notion of tail-modulo-constructor position is purely syntactic; some simple refactoring will move calls out of tail-modulo-constructor position.

(* works as expected *) let[@tail_mod_cons] rec map f li = match li with | [] -> [] | x :: xs -> let y = f x in y :: (* this call is in TMC position *) map f xs
(* not optimizable anymore *) let[@tail_mod_cons] rec map f li = match li with | [] -> [] | x :: xs -> let y = f x in let ys = (* this call is not in TMC position anymore *) map f xs in y :: ys
Warning 71 [unused-tmc-attribute]: This function is marked @tail_mod_cons but is never applied in TMC position.
Local, first-order transformation

When a function gets transformed with tail-mod-cons, two definitions are generated, one providing a direct-style interface and one providing the destination-passing-style version. However, not all calls to this function in tail-modulo-cons position will use the destination-passing-style version and become tail calls:

Consider the call Option.map foo x for example: even if foo is called in tail-mod-cons position within the definition of Option.map, that call will never become a tail call. (This would be the case even if the call to Option.map was inside the Option module.)

In general this limitation is not a problem for recursive functions: the first call from an outside module or a higher-order function will consume stack space, but further recursive calls in tail-mod-cons position will get optimized. For example, if List.map is defined as a tail-mod-cons function, calls from outside the List module will not become tail calls when in tail positions, but the recursive calls within the definition of List.map are in tail-modulo-cons positions and do become tail calls: processing the first element of the list will consume stack space, but all further elements are handled in constant space.

These limitations may be an issue in more complex situations where mutual recursion happens between functions, with some functions not annotated tail-mod-cons, or defined across different modules, or called indirectly, for example through function parameters.

Non-exact calls to tupled functions

OCaml performs an implicit optimization for “tupled” functions, which take a single parameter that is a tuple: let f (x, y, z) = .... Direct calls to these functions with a tuple literal argument (like f (a, b, c)) will call the “tupled” function by passing the parameters directly, instead of building a tuple of them. Other calls, either indirect calls or calls passing a more complex tuple value (like let t = (a, b, c) in f t) are compiled as “inexact” calls that go through a wrapper.

The [@tail_mod_cons] transformation supports tupled functions, but will only optimize “exact” calls in tail position; direct calls to something other than a tuple literal will not become tail calls. The user can manually unpack a tuple to force a call to be “exact”: let (x, y, z) = t in f (x, y, z). If there is any doubt as to whether a call can be tail-mod-cons-optimized or not, one can use the [@tailcall] attribute on the called function, which will warn if the transformation is not possible.

let rec map (f, l) = match l with | [] -> [] | x :: xs -> let y = f x in let args = (f, xs) in (* this inexact call cannot be tail-optimized, so a warning will be raised *) y :: (map[@tailcall]) args
Warning 51 [wrong-tailcall-expectation]: expected tailcall