Introduction to Tagless Final

Vasiliy Kevroletin
Friday, December 7th, 2018

I’m Vasiliy Kevroletin, and I work at Serokell with a lot of different people. Teams consist not only of Haskell experts (who contribute to GHC and Haskell libraries), but also of people like me who have less Haskell experience, but strive hard to learn and expand their Haskell knowledge.

Recently, my team decided to implement an eDSL using the tagless final style for one of the new projects. Although it’s a fairly known technique, I had zero experience with tagless final and there were some difficulties associated with terms tagless, final, and eDSL.

In preparation for the task, I’ve done some research and organized the learned material into a small HowTo. Now I want to share it with others.

Prerequisites

I assume that the reader is fairly comfortable with MTL because I will use a lot of analogies with MTL.

Gist

Recall your everyday MTL-style programming. Forget about concrete monad transformers and concentrate on type classes. Without transformers, there are only two things that are left:

  1. Functions are declared using type constraints instead of concrete types:

    getUser :: (MonadReader r m, Has DatabaseConfig r, MonadIO m) => Name -> m User
    
  2. Instantiation of a polymorphic function to a concrete type (aka interpreter) happens somewhere “in the end”:

    liftIO $ runReader (getUser (Name "Pedro")) env
    

That’s all. We’ve just covered the tagless final style:

  1. Write code using overloaded functions.
  2. Run code using any of the suitable implementations (aka interpreters).

All good (and bad) in tagless final comes from ad-hoc polymorphism and type classes (i.e. overloading). Your output depends directly on your commitment to overload things.

A distinct feature of tagless final is extensibility in two dimensions, which is, in fact, a significant achievement (see The Expression Problem Revisited for an explanation for why it’s hard to achieve this property).

Let’s discuss extensibility while keeping this function signature in mind:

wimble :: (MonadReader Env m, MonadState State m) => m ()

We can run wimble using a custom new implementation of MonadReader and MonadState (by defining a new data type and defining instances for it). This is an extension in the first dimension: a new interpreter. Furthermore, we can use a new set of operations, say MonadWriter, and use wimble in a new function which uses all 3 classes: MonadReader, MonadState and MonadWriter (i.e. old and new operations). This is an extension in the second dimension: a new set of operations.

From my point of view, available learning resources show two different approaches to using tagless final:

  1. Define operations abstracted over a monad

    In that case, we can use do notation.

  2. Define an Abstract Syntax Tree using overloaded functions

    In that case, potentially, we can pretty print, inspect and optimize AST.

People who have experience with the technique might say that the two approaches are exactly the same. After learning about tagless final, this opinion makes sense for me. But earlier, when I had just started searching through available learning resources, I was confused by the difference in the look and feel of the resulting code. Also, some people say that having do notation is enough to call something an eDSL, others say that eDSL should define an AST. So, by saying tagless final, different people might assume slightly different approaches which might look as completely different techniques to a novice Haskell programmer. We will briefly explore programming with Monads and defining ASTs using tagless final, and also will touch a few other relevant topics.

Application Monad

It’s common among Haskell programmers to organize effectful application code using monads. Details vary between implementations but the basic idea is to define a monad together with a set of operations available in this monad. Similarly to this blog post, I’ll call a monad for organizing effectful application code an application monad.

Tagless final is a suitable technique for defining application monads. In facts, thanks to MTL, it is one of the most widely used tools for that task.

Let’s take a simplified problem of fetching/deleting a user from a database as an example to demonstrate how tagless final can be used to define operations available in do notation. Our application monad will provide two operations: getUser and deleteUser. By applying tagless final approach, we will define a set of overloaded functions and later will provide their implementation. Right from the start there is a design decision to make: which operations to overload. We can define a new typeclass with getUser/deleteUser operations, or we can use more generic functions from MTL and build on top of them. Although in practice I often will choose the 2nd option, here I’ll show the 1st one because in our particular case it leads to shorter code:

data Name = Name String
data User = User { name :: Name, age :: Int }

class Monad m => MonadDatabase m where
    getUser    :: Name -> m User
    deleteUser :: User -> m ()

Using operations given above we can define some logic like this:

test :: MonadDatabase m => m ()
test = do user <- getUser (Name "Pedro")
          when (age user < 18) (deleteUser user)

Note that the test function is abstract: it can be executed using a different implementation of MonadDatabase. Now, let’s define a MonadDatabase instance suitable to run that test function. One way to do it is to build on top of MTL transformers. I’ve assumed that getUser/deleteUser functions can be implemented on top of Reader and IO monads and I’ve also omitted some implementation details (marked by ...):

data DatabaseConfig = DatabaseConfig { ... }

newtype AppM a =
    AppM { unAppM :: ReaderT DatabaseConfig IO a }
    deriving (Functor, Applicative, Monad, MonadIO, MonadReader DatabaseConfig)

instance MonadDatabase AppM where
    getUser name = do cfg <- ask
                         ...

    deleteUser user = do cfg <- ask
                         ...

runAppM :: AppM a -> DatabaseConfig -> IO a
runAppM app config = runReaderT (unAppM app) config

Now, we can execute the abstract test function using a particular AppM implementation:

main = do cfg <- ...
          runAppM test cfg

By using tagless final style, we have separated the definition of abstract operations from their implementation which gives us extensibility. With such an approach it is possible to define a new set of operations and use it together with MonadDatabase. It is also possible to add a new interpretation (for example, for testing purposes).

Even with such a small example, there were many possibilities to organize code. The first question: how to choose the set of overloaded operations? Is it better to define a new typeclass such as MonadDatabase with application-specific functions, or is it better to stick to MTL typeclasses and define operations on top of more generic functions? The second question is: how to write the implementation? Are there practical alternatives to MTL transformers? Although it’s very tempting to discuss those and several other questions here, I don’t know all answers and the topic of proper application architecture is too broad. For more in-depth resources on application architecture, you can visit other blogs: (1, 2, 3, 4, 5, 6).

Mini Q&A

Q. I heard you were describing tagless final using MTL. What about the famous n^2 problem?

A. n^2 problem appears in transformers implementation (because transformers need to propagate methods of their child monads). Transformers have nothing to do with tagless final. We were only talking about type constraints and freedom to switch between implementations.

If you are still wondering about the n^2 problem, here is a small trick to mitigate it (export method implementations as separate functions with a hope that other instances will use your implementation).

If you create many similar implementations, tagless final causes some effort duplication. In that case, you might want to use transformers, which leads to the n^2 problem.

Q. You were talking about an “Application” monad and MTL style. Is it really an eDSL?

A. Even if there is a canonical scientific definition for the term eDSL, people use the term to talk about different things. Opinions range from “eDSL is a completely distinct language with its own semantics” to “eDSL is a library with a nice, consistent interface”. Here are the answers I got in several public Haskell-related channels to the question “what are good examples of eDSLs implemented in Haskell?”: SBV, diagrams, accelerate, blaze, esqueleto, shake, lens, gloss, streaming libraries (pipes, streamly, etc.), Servant, opaleye, frp-arduino, HHDL, ivory, pandoc. As you can see, those answers clearly show that the term eDSL is vague. But anyway, tagless final can be used to create both “true” eDSLs and nice library interfaces (probably with monads).

eDSLs

The most complete discussion of the tagless final approach was done by Oleg Kiselyov and his colleagues. He talks mostly about the embedding of different versions of typed lambda calculus using the tagless final encoding. He achieves very motivating results, such as embedding lambda calculus with linear types and transforming ASTs.

Let’s pick a simple language as an example and explore two ways to encode an AST: using Initial and Final encodings. The chosen language has integer constants, an addition operation, and lambda functions. From one hand it’s quite simple to put most of the implementation in this blog post. From the other hand, it’s complicated enough to discuss two versions of Initial encodings and to demonstrate extensibility of tagless final approach.

Initial encoding

Initial encoding means that we are representing AST using values of a given algebraic data type. The term “Initial encoding” was inspired by the category theory and it follows from the observation that inductive data type can be viewed as an “initial algebra”. Bartosz Milewski gives a gentle description of what an initial algebra is and why inductive data structure can be viewed as an initial algebra.

Tagged initial encoding

Here is one way to represent an Abstract Syntax Tree for our simple language (we are re-using Haskell lambda functions in the definition of Lambda for simplicity so that we don’t need to implement identifiers assignment/lookup by ourselves, this approach is called higher-order abstract syntax):

data Expr = IntConst Int
          | Lambda   (Expr -> Expr)
          | Apply    Expr Expr
          | Add      Expr Expr

This representation allows us to define well-formed eDSL expressions like these:

-- 10 + 20
t1 = IntConst 10 `Add` IntConst 20

-- (\x -> 10 + x) 20
t2 = Apply (Lambda $ \x -> IntConst 10 `Add` x) (IntConst 20)

Unfortunately, it also allows us to define malformed expressions like these:

-- Trying to call integer constant as a function
e1 = Apply (IntConst 10) (IntConst 10)

-- Trying to add lambda functions
e2 = Add f f where f = Lambda (\x -> x)

Evaluation of Expr values can produce errors because the representation of our eDSL allows encoding malformed expressions. Consequently, interpreter eval should check for type errors during its work. To be more precise, it should pattern-match on resulting values to find out which concrete values came out from eval function in runtime to ensure that the Add operation was applied to integer constants and the Apply operation was used with a lambda function. We define the Result data type to represent possible resulting values and use Maybe Result to represent possible errors:

data Result = IntResult Int
            | LambdaResult (Expr -> Expr)

eval :: Expr -> Maybe Result
eval e@(IntConst x) = Just (IntResult x)
eval e@(Lambda   f) = Just (LambdaResult f)
eval (Apply f0 arg) = do
    f1  <- eval f0
    case f1 of
        LambdaResult f -> eval (f arg)
        _              -> Nothing
eval (Add l0 r0) = do
    l1 <- eval l0
    r1 <- eval r0
    case (l1, r1) of
        (IntResult l, IntResult r) -> Just $ IntResult (l + r)
        _                          -> Nothing

The technique is called “tagged” because sum types in Haskell are tagged sum types. At runtime such values are represented as a pair (tag, payload) and tag is used to perform pattern-matches. The eval function uses pattern matching on IntResult and LambdaResult to perform type checking and errors checking, or in other words, it uses tags in runtime. Hence the name.

Tagless initial encoding

The idea is that we can use GADT to add information about values into Expr type and use it to make malformed eDSL expressions unrepresentable. We no longer need a Result data type and there is no more runtime type checking in eval function. In the Finally Tagless, Partially Evaluated paper authors refer to their versions of data constructors IntResult and LambdaResult as “tags”. And because the GADTs-based approach has no tags, they call it “tagless initial” encoding.

The GADTs-based AST definition and corresponding interpreter eval are given below. New AST is capable of representing examples t1, t2 from the previous section while making e1, e2 unrepresentable. The idea of Expr a data type is that a parameter holds a type to which a given expression should evaluate. IntConst and Lambda just duplicate its field types in a parameter because evaluating a value just means unwrapping it. In the case of Add constructor, a parameter is equal to Int which means that Add evaluates to an integer. Apply evaluates to a result of a passed lambda function.

data Expr a where
    IntConst :: Int                     -> Expr Int
    Lambda   :: (Expr a -> Expr b)      -> Expr (Expr a -> Expr b)
    Apply    :: Expr (Expr a -> Expr b) -> Expr a -> Expr b
    Add      :: Expr Int -> Expr Int    -> Expr Int

eval :: Expr a -> a
eval (IntConst x) = x
eval (Lambda f)   = f
eval (Apply f x)  = eval (eval f x)
eval (Add l r)    = (eval l) + (eval r)

Final encoding

Although the term “Initial” came from category theory, the term “Final” didn’t. Oleg shows that “the final and initial typed tagless representations are related by bijection” which means that these approaches are equivalent in some sense and both are “Initial” from the category theorist’s point of view. The Finally Tagless paper states “We call this approach final (in contrast to initial) because we represent each object term not by its abstract syntax, but by its denotation in a semantic algebra”. My best guess is that the name “Final” was chosen to differentiate from the term Initial as much as possible.

With tagless final, we build expressions using overloaded functions instead of data constructors. The expression from the previous section will look like this:

test = lambda (\x -> add x (intConst 20))

Machinery to make it work consists of two parts.

  1. Combinators definition:

    class LambdaSYM repr where
        intConst :: Int -> repr Int
        lambda   :: (repr a -> repr b) -> repr (a -> b)
        apply    :: repr (a -> b) -> repr a -> repr b
    
  2. Interpreter implementation:

    data R a = R { unR :: a }
    
    instance LambdaSYM R where
        intConst x = R x
        lambda f   = R $ \x -> unR (f (R x))
        apply f a  = R $ (unR f) (unR a)
    
    eval :: R a -> a
    eval x = unR x
    

Applying interpreter:

testSmall :: LambdaSYM repr => repr Int
testSmall = apply (lambda (\x -> x)) (intConst 10)

main = print (eval testSmall) -- 10

Interesting points:

  1. eval function instantiates testSmall expression to a concrete type R Int (aka interpreter).

  2. It’s easy to define other interpreters. For example, a pretty printer. There is a little twist, though: a pretty printer needs to allocate names for free variables and keep track of allocated names, so the printing interpreter will pass an environment and it will look very similar to a Reader monad.

  3. It’s extremely easy to extend the language with new operations.

Adding a new add operation to our previous example requires only defining a new type class and implementing a new instance for each interpreter. Functions which use new add operations should add additional AddSYM repr constrain to its type.

class AddSYM repr where
    add :: repr Int -> repr Int -> repr Int

instance AddSYM R where
    add a b = R $ (unR a) + (unR b)

test :: (LambdaSYM repr, AddSYM repr) => repr Int
test = apply (apply (lambda (\y -> lambda (\x -> x `add` y))) (intConst 10)) (intConst 20)

Please note that in this particular case we are lucky because it’s possible to write instance AddSYM R. Or, in other words, it’s possible to implement the new operation on top of an existing interpreter. Sometimes we will need to extend existing interpreters or write new ones.

Introspection. Host vs target language

In Oleg’s papers, he pretty prints and transforms tagless final AST. It’s very counterintuitive to expect the existence of such utilities because combinators are functions, and we are used to manipulating values of Algebraic Data Types. Yet, it is possible to extract facts about the structure of Final Tagless ASTs (i.e. introspection is possible) and to transform them. For the proof of that claim, check section 3.4 (page 28) of Oleg’s course where he presents pretty-printer and transformer.

The pretty-printer and the transformer of ASTs are just tagless final interpreters which keep track of some additional information and propagate it during interpreting from parents to children. Both are extensible in the same ways as other tagless final interpreters.

However, if we return to our first section and think about applying a tagless final approach to a simple case of defining Application monad, then we will quickly find out that we can’t inspect and transform resulting monads. Consider our simple example:

class Monad m => HasDatabaseConfig m where
    getDatabaseConfig :: m DatabaseConfig

getUser :: (HasDatabaseConfig m, MonadIO m) => Name -> m User
getUser = ...

test :: (HasDatabaseConfig m, MonadIO m) => m String
test = do user <- getUser (Name "Pedro")
          if age user > 3 then pure "Fuzz"
                          else pure "Buzz"

Although the getDatabaseConfig function is overloaded, a lot of logic is expressed using functions and other constructions which are not overloaded. Therefore, there is no way to statically inspect the resulting monadic value. This is an important point: if you want introspection and transformation of ASTs, then you need to keep track of what’s overloaded and what’s not. Oleg obtained his great results because he overloaded everything and expressed embedded lambda calculus by using only overloaded functions. In other words, the power of tagless final depends on how far you want to go with overloading.

Relation to Free monads

People often compare tagless final and free monads. Both approaches give you machinery to define overloaded operations inside a monadic context. I am not an expert in free monads, but tagless final:

  • is faster;
  • extensible (one can easily add new operations);
  • requires less boilerplate.

One argument for free monads is that it’s possible to statically introspect free monads. That is not completely true. Yes, you can easily execute actions one by one, and it helps to combine monadic values by interleaving actions (we can achieve a similar thing by interpreting into continuation with tagless final). But here is a blog post which describes the difficulties of free monad introspection (we’ve already covered the gist of the problem in the previous section). Also, see this blog post where the author describes difficulties associated with free monads and suggests using tagless final instead.

Here is a very good overview of free monad performance challenges. Here Edward Kmett gives his perspective on the same problem.

In few words:

  1. A simple implementation of free monads causes O(n^2) asymptotic for left-associated monadic binds. It always adds one element to a “list” like this [1, 2, 3] ++ [4].
  2. Using continuations (similarly to the DList package) gives O(n) binds, but makes some operations slow (for example combining two sequences of commands by interleaving them).
  3. Using a technology similar to the Seq data structure leads to a good asymptotic behaviour for all operations, but also gives significant constant overhead.

Performance

General techniques regarding optimizing Haskell code with polymorphic functions apply here. In few words, sometimes using overloaded functions cause compiler to generate code which uses methods dictionaries to dispatch calls. The compiler often knows how to specialize functions and get rid of dictionaries, but module boundaries prevent that from happening. To help the compiler, we need to read the “Specializing” section of this document and then use an INLINEABLE pragma like this

getUser :: (MonadReader r m, Has DatabaseConfig r, MonadIO m) => Name -> m User
...
{-# INLINEABLE  getUser #-}

Limitations

Haskell lacks first-class polymorphism (aka impredicative polymorphism), which means that we can’t specialize an existing data type to hold a polymorphic value like this:

Maybe (LambdaSym repr => repr Int)

It follows that we can’t interpret such polymorphic value twice (but this situation doesn’t appear very frequent in cases where we just want an Application monad with some overloaded operations). This is an issue when, for example, we parse some text file, obtain a tagless final AST, and want to interpret it twice: to evaluate and to pretty print. There is a limited workaround: define a newtype wrapper around a polymorphic value. The wrapper specifies concrete type constraints and hence kills one extensibility dimension of tagless final.

Oleg’s paper also presents another workaround: a special “duplicating” interpreter. Unfortunately, It is presented using a simple eDSL without lambda functions and I failed to apply the same technique to a more complicated AST with lambdas. I mention it here just for the sake of completeness.

Also, note that sometimes people want to change implementations (aka interpreters) in the runtime, not in the compile time, or even change only a part of the existing behaviour. For example, change the data source but leave all other application-specific logic intact. Tagless final can support it by implementing an interpreter configurable in the runtime which uses some sort of a method dictionary (see the handle pattern).

Conclusion

Thanks to MTL, tagless final style of programming was battle-tested and has wide adoption. In my opinion, it’s quite a natural way to write Haskell code because it utilizes a very basic Haskell feature: type classes. It also goes far beyond MTL — it can be used both for writing application-specific logic with and without monad transformers and “true” eDSLs with their own semantics.

I also found that it’s not a hard concept to grasp, so it can safely be used in a large team of developers with different backgrounds.

That’s all, I hope my post will help others to grasp the main idea of tagless final and to use it in their projects.

Acknowledgement

Many thanks to Gints Dreimanis, Vlad Zavialov and others from Serokell for their help in writing this article. Without their reviews and suggestions, this post would not have happened.

Literature